Duplicate Content: Why It Happens and How to Fix It
Was ist Duplicate Content?
Duplicate content, or "DC" for short, is identical content on different websites or URLs. A difference is made between internal and external duplicate content. If there are identical text passages on different URLs, this can be recognized as duplicate content.
Why is duplicate content a problem?
Search engines want to display the best possible content for their users, and the results in the serps should also be diverse. The same or similar content offers the user less added value, and user satisfaction decreases. Therefore, duplicate content can lead to ranking problems. Google is interested in using the crawl budget as effectively as possible, since computing power costs the company money.
When is duplicate content a problem?
Internal duplicate content can become a problem, for example, if several URLs on your website are hampered by one or more keywords in the ranking. One speaks here of keyword channelization, ie Google cannot decide exactly which of the URLs should be decisive for the ranking, so you share several placements for one keyword with different URLs.
Bestraft Google Duplicate Content?
There is no direct penalty for internal duplicate content; you prevent the build-up of good rankings through keyword channelization. Penalties are imposed by Google for external duplicate content:
- Scraper Sites: Websites that automatically copy content one-to-one and do not offer the user any added value
- Spinning content: The scraped content is automated and individualized, this is referred to as article spinning
- Doorway Pages: So-called bridge pages, which also count as a black hat method
Examples of internal duplicate content
Duplicate content can arise on your own site if:
- A page can be reached via several URLs, e.g. with or without www, with or without http/https, with different filters/parameters
- Pages that are also available as a print version
- Incorrect pagination
- The website also exists in the same form as a mobile version (not meant to be responsive!)
- Tag overviews or filter overviews
- Upper and lower case in URLs
- New pages are created while old ones are not forwarded
- development environments
- Internal Search Results Pages
- Trailing Slash „/“
- One and the same product, or article, assigned to different categories and can be found under different URLs
Examples of external duplicate content
External websites or their URLs can create duplicate content, e.g. if:
- Product descriptions from manufacturers are copied one to one
- Stealing Content/Posts
- Content that is imported through a feed (RSS).
- Content scraping and duplication through article spinning
- distribution of press releases
- Sharing content, or using it, in the affiliate area
- Content collaborations that share content on other sites
- Use of the same content on different country versions
Other classifications of duplicate content
Exact Duplicate Content
Content and possibly also the design exist exactly in the same form on different URLs
Partial Duplicate Content
A large part of the same content can be called up on different URLs
Near Duplicate Content
Content is not copied exactly one-to-one, but says the same thing as on other URLs or has been slightly changed in parts
How to find duplicate content?
A simple way to find external duplicate content can be to enter a search phrase using quotation marks
In order to find internal duplicate content, there is the possibility through the site: – Query:
Duplicate content check by tools
- copyscape.com
You can use the free Copyscape tool to check a website or URL for duplicates
Query "excluded" URLs to learn which pages Google considers duplicates. With a click on the URL check tool you get the necessary information about which pages are classified equally.
Other tools for checking duplicate content:
- screamingfrog.co.uk
- smallseotools.com
- duplichecker.com
What can you do against duplicate content?
- If the content has been stolen from you, you have the option of submitting a SPAM report to Google if you cannot contact the webmaster of the site (Report spam, phishing, or malware) . Otherwise it is advisable to contact the webmaster of the site and ask for the deletion or modification of the content. Content that persists should be prosecuted by a lawyer citing your copyright.
- Pay attention to the structure of your website. This should be as unique as possible, ie make it clear whether your site should be accessible with or without www. If you use SSL encryption , it is advisable to only make the page accessible under https. Content and categories should be clearly structured. If you have an article in different variants, make sure that this product ideally only exists once as a URL.
- Use canonical tags to make it clear to search engines which page is the original or relevant page for the content
- Make sure the internal linking is consistent
- Control the crawling of your site with the help of Robots.txt to exclude categories that should not be indexed or test environments. Other options are the meta tags noindex or nofollow
- Set up 301 redirects to the old URL for the preparation of new content , the same for a website relaunch
- Minimize recurring text modules, e.g. in the footer , categories
- You should use the hreflang, especially in the DA-CH area, but also for international alignment
- Create thematically unambiguous pages, with the best possible all-encompassing content on a topic
- Do not copy content or simply rewrite content
- If you publish new content, new pages (which are not crawled that often) have the option of sending them to the index
Conclusion
Even if Google says it tries to recognize and automatically filter duplicate content, you should consider this topic to be an important part of search engine optimization (SEO). Duplicates, which ultimately do not represent any added value for the user, do not lead to satisfaction for either the user or the search engines. The crawl budget should be used sensibly in order to focus on websites that are important for ranking.