Recently, the Search Engine Strategies conference held in Chicago in the cold.Many participants are always asked Google about duplicate content issues.For this topic, we found many broken and some confusion in understanding.We would like to be clarified.
What is duplicate content?
Duplicate content generally refers to a large number of different name between the same or similar content.In many cases, it is not intentionally or at least the non-malicious.For example, some forums will generate a page for mobile phone screen; store items are displayed in (and sometimes even link to) many different URL, and so on.But sometimes, some Web sites to manipulate search engine rankings, get more popular or long tail queries visits, a large number of copy content from other sites.
What is not duplicate content?
Although the site is easy to use Google's translation feature, our algorithm will not have the same content in English and Spanish as duplicate content.Also, you should not worry about your article on the other snippets quoted as duplicate content.
Why does Google care about duplicate content?
Google Google users typically want to see a variety of original content.If they are content to find the same set of search results, you can imagine how annoying it is.In addition, webmasters do not want to see Google given as example.com / contentredir?value = shorty-george?= En as complex as the URL. They tend to prefer similar example.com / en / shorty-george.htm the URL.
Google done?
Google in the crawling and search results, try hard to index and show pages with distinct information.This filtering means, for instance, if your site has articles and the print version of the normal version, and you do not set the noindex tag in the robots.txt, Google will only choose one version to Google users.For the attempt to use duplicate content to manipulate our rankings and deceive the few Google users, we will in the indexing and ranking relevant pages make appropriate adjustments.Of course, we prefer to focus on filtering rather than ranking adjustments ... Therefore, in most cases, the worst result is that those who saw the original version not want to see appear in our index.
How can Webmasters proactively address duplicate content issues?
Block appropriately: Rather than letting our algorithms determine a file's "best" version, you may wish to help guide Google to your preferred version.For example, if you do not want us to index the print version of the article on your site, you can write your robots.txt file in the directory name, or a regular expression against Google Print version of the capture of those.
Use 301s: If you have restructured your site, in your original site.htaccess to use 301 redirects (permanent redirect) to redirect your users, Google reptiles, spiders and other search engines.
Links to the same: to make your internal linking consistent; not both / page /, there / page and / page / index.htm internal links.
Use TLDs: To help us always use the most appropriate version of the file, please use the country-specific as possible top-level domain.And such as example.com / de or de.example.com the URL compared to, Google certainly know better example.de German is the core content.
Joint care for syndicated text: If you are also available for other sites to your content, each article includes other sites with links back to the original article.Note: Even so, for a query, Google always shows that we think is the most suitable (not to be unblocked) version, which may or may not be the version you want.
Google Webmaster Tools using the preferred domain feature: If other sites link to your URL using both the www version and use the non-www version, you can use Google webmaster tools to let us know what you want indexed.
Minimize boilerplate repetition: take the copyright statement, you have two options, one is in the bottom of every page you have a lengthy copyright notice.Another is the establishment of a dedicated detailed statement of copyright page, and then the bottom of each page to write a very brief summary, and links to copyright notice page.
No content to avoid the release page: Users do not like seeing the actual content of the page.To try to avoid empty frame page.Real estate website, for example, not publishing (or at least blocking) is not zero reviews page, or do not list real estate listing real estate websites.Only in this way, users (and Google bots) will not see a zillion "Here is" a city name "to be renting in the list not to be missed ..." but in fact there was not any list of pages.
Know your content management system: Make sure you're familiar with how to display your site is content, especially when it includes a blog, forum, or related systems.Often in these systems the same content in multiple formats.
Do not worry, be happy: Do not over that scrape (misappropriate and republish) your content site interference.Though annoying, they are almost impossible for your presence in Google have a negative impact.If you really unbearable, welcome your DMCA request to claim ownership of your content.We will deal with those rogue sites.
In short, if you copy the contents of a general awareness of issues, but also an insightful way to spend a few minutes of preventive maintenance of your Web site, which will not only help you, but also help us to provide users with unique and relevant content.