Rel="canonical" - does it actually fix duplicate content issues?
All of the major search engines recently agreed on an element to address the problems webmasters experience related to duplicate content. I feel their pain, and while I dislike proprietary features (since they don't address the underlying problem) it seems like it might be a useful tool in certain circumstances.
Certainly, there are times when duplicates are present, and either cannot or will not be fixed (for various reasons). If we can move some of the burden from developers and onto others tasked with SEO, then that can't be a bad thing.
By all accounts, it seems that the new element should work as desired. Essentially, search engines have decided to use the <link> element (you've likely seen it used for including external CSS) to allow a webmaster to identify the definitive URL for an item of content (the "canonical" URL.) Usage is straightforward, as below:
<link rel="canonical" href="http://www.example.com/some-url" />
Google's blog post about the element gives more information, including encouraging noises like:
It ... helps to make sure that properties such as link popularity are consolidated to your preferred version.
I think the wording of the above may have changed slightly since the original publishing, incidentally, or I misremember the quote. Anyhow. What I'd like to see in terms of the results of using the element is:
- Only the canonical URL in SERPS
- Link juice, and other ranking signals copied to the URL that remains in SERPs
Google also provide a live example:
wikia.com helped us as a trusted tester. For example, you'll notice that the source code on the URL http://starwars.wikia.com/wiki/Nelvana_Limited specifies its rel="canonical" as: http://starwars.wikia.com/wiki/Nelvana.
The two URLs are nearly identical to each other, except that Nelvana_Limited, the first URL, contains a brief message near its heading. It's a good example of using this feature. With rel="canonical", properties of the two URLs are consolidated in our index and search results display wikia.com's intended version.
But some time after the announcement, I still see the non-canonical URL (as used as an example above) in results: http://www.google.com/search?q=site%3Awikia.com+nelvana+limited
For the sceptical out there, you don't need to include the word "limited" in the search either. Try http://www.google.com/search?q=site%3Astarwars.wikia.com+nelvana&filter=0, for example.
Perhaps this is just a quirk, but it certainly doesn't inspire confidence in me. I think this new element is going to need (at best) some time to "bed in" before it's going to be a good choice for SEO's who like tried and tested methods. Personally, I'll stick to the HTTP spec.
There's also forum discussion over at WebmasterWorld: Search Engines Agree on "Canonical tag"
Update 25/02/09: Google have finally dropped the non-canonical URL.