(Reading time: 3 – 5 minutes)
“Duplicate content? What is it and why should I care?”
An excellent question, and a question not easily answered from a technical point of view.
For example, if every one of your blog posts is totally original and published only once, but within two categories which are indexed by a search engine, you may get dinged for having “duplicate content.”
Sounds stupid I know, but there’s reasons, good reasons, which you need to know about.
In a perfect world, everyone sitting down at their computer would write excellent, all-original blog posts and articles. In the real world, search engines hve trouble distinguishing unique content, and can’t figure out the quality of the material.
So the search engines get “gamed” by web pages containing the appearance of useful content.
It’s important that search engines be able to distinguish duplicate content: If they didn’t, the web would disintegrate into pure advertising spam blogs stuffed with keywords, with the same “highly ranked” web page propogating it’s way through the search engine index. Spammers would have every incentive to stuff pages full of keywords, then leverage the traffic to consume higher and higher search results by propogating such pages web-wide. This would suck. The web would become useless. Tragedy of the commons and all that.
Clearly, duplicate content must be dealt with in some way.
Here’s a few of the techniques I’ve read about that are important to understand how duplicate content is handled by search engines.
IP address matching
Google evidently splits the difference by “punishing” duplicate content delivered from the same IP address. This is a pretty good solution. Modern hosting companies allow “addon” domains to be served from subdomains, all of which are (of course) running from the same IP address as the main hosting account. If you want to serve identical content, you need to pay a little more money to get another IP address.
This IP address policy may also reduce the SERPs effectiveness of a group of unrelated websites all served from the same address. I’ve read that you should keep all the content served from a particular IP address “on topic” so that you aren’t “punished” as a spammer, but I don’t know how to evaluate whether this is true or not. Comments, links, more information definitely welcome!
Multiple categories, yes or no?
I recently spent an afternoon discussing this with a business partner, citing Matt Cutts talk at WordCamp. And was really embarrassed when my partner showed me that Matt uses multiple categories on his blog!
Using nofollow tag with multiple categories: Evidently, with clever use of the “nofollow” attribute in your links, you can allow posts to be filed in multiple categories…. so that a post is only indexed by one category and doesn’t suffer duplicate content penalties. My recommendation if you’re just starting out, or you want to keep your thinking confined to what you know best (your blog’s subject matter), pick smart categories defining topics, use only one category for each blog post, and don’t worry about it.
When Google returns a note saying other pages were very similar but not shown, it’s essentially “punishing” the collection of pages.
More information on duplicate content on the web
Here’s a good discussion about duplicate content from Searching Solutions, where the author Justin Smith makes a distinction between “onsite” and “offsite” duplicate content. Onsite duplication is what happens when pages are multiply indexed as discussed above. Unfortunately, when I read the article, I didn’t read any discussion of offsite duplicate content.
In the end, all the discussion in the world cannot answer the question definitively. Only Google can say with certainty exactly how it deals with duplicate content. Everyone else’s opinion is just that, opinion. Find someone whose opinion seems accurate, pay attention to their advice.
Would you like more? Send me a letter...

{ 2 comments }
if there are duplicated contents… Before Google will implement penalty, google will look first the site age. In that way,search engine can tell who posted first…
Nice SEO site.
I normally don’t approve comments without a Real Name, and you actually landed in my spam queue, which for some very odd reason, I read through this morning. Usually I just delete all spam without looking at it.
But your comment indicates you read the post… I’m curious whether you have any links to back up your assertion on how Google’s looking up site age before penalizing. If anyone else has this info or a link, please comment!
Comments on this entry are closed.