Meta muddle at Google
nofollow [NOFOLLOW].
'nofollow' is the latest idea to get everyone excited -- although not necessarily for the right reasons. It's the addition of a special value for the HTML
rel attribute on links in documents, and it's used to tell crawlers not to traverse those links.
If you haven't seen this yet, you well be asking the obvious question -- why would a crawler not want to follow the link? What's the point of being a crawler that doesn't crawl (or a spider that does't spide)?
Well, one reason might be that the crawler is having trouble keeping up with the number of pages that are being added to the web every day in the form of blogs and news items, and so it may simply want to limit the number of links it has to follow [KO]. Another reason could be that the crawler doesn't like the fact that the accuracy of its search results are skewed by spammers adding comments to other people's blogs that contain links to their sites, and so have the effect of bumping the spammers up the search rankings.
Google's motivation is of course the second reason ... and it seems at first sight fair enough. But when you take a step back, you have to ask whether we're really that bothered about the inaccuracies of search results when people search for "pharmaceutical", "discount" and "all weekend"? And if you take another step back, you have to ask further why Google is giving such weight to these links anyway? Is it really possible to skew the search results this way, and if it is, whatever happened to their world-famous algorithms?
However, if that was all the problem was, then I wouldn't bother writing anything here, since this blog is about things related to Internet Applications. The reason I've looked into this further is because the proposed solution is actually pretty bad.
The proposal is that a new value for
rel is used on anchors, and its presence tells a spider not to navigate the link. All links added as comments to blogs will have this attribute added automatically by the blogging software, which means that if you were to spam the comments section of someone's blog, you wouldn't benefit from it. (The value suggested for the rel attribute is nofollow.)
The problem with this whole proposal is that it changes the meaning of the
rel attribute -- or if we were to slip into OO terminology for a moment, it overloads the attribute. The whole point of @rel is to define a relationship between two documents. The whole point of nofollow is to say that there is no relationship between two documents. It's like using a special value of the title element to say that there is no title, or a special value of the style attribute to say that there is no style -- it's just plain wrong. You cannot say that the relationship between two documents is that a search engine should ignore the relationship between those two documents!
So the relationship exists, whether we like it or not, and all we're really talking about is that Google doesn't want to make use of that relationship. That's fair enough, but that is an issue for indexing software [ALGORITHMS], and not HTML. Otherwise, where does it stop? Will the next move be that we have to say which paragraphs of text should be indexed! After all, if the criteria for which techniques should be used to mark-up our documents is whether they help or hinder the accuracy of search engines, then hadn't we better protect our paragraphs and divs from spam content too? In other words, will we have to change our mark-up again next week because every blog has Google ad words that display listings for poker and cheap body-enhancing tablets -- and Google's algorithms are so confused by spam-comments that they just can't work out what the page is really supposed to about.
So in my view this is a wrongly conceived idea that can only really get worse -- and in the context of XHTML 2.0 [XHTML2], where the whole metadata story is really brought to the fore, it's pretty worrying. In XHTML 2.0 we've brought
meta, link and rel right up front as a way of making the clickable web truly semantic. This proposal sets us back years.
Oh, and if you were wondering whether Google Suggest could be done in XForms, the answer is ... well I can't believe you even had to ask!
[GS] http://labs.google.com/suggest/faq.html
[NOFOLLOW] http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
[KO] Blog noise achieves Google KO, The Register, October 14th, 2003
[ALGORITHMS] OS X Panther Discussion, Infothought, October 14th, 2003
[XHTML2] http://www.w3.org/TR/xhtml2/









5 Comments:
The attribute isn't being overloaded. You're just confusing the symbol and the meaning of the symbol. Human annotations always have this problem, so it's something Google is used to, and you'll have to get used to it to.
rel=nofollow seen from a semantic web perspective means that the person who controls the generator doesn't believe there's any relationship between this document and the linked document, but they allowed the link to be created anyway, most likely by someone else. Exciting stuff. You can translate it as "3rd party" instead of "Do not follow" if you like, to the software they're both just strings.
If we weren't storing the links inside the documents this wouldn't matter, because the authors links would be clearly separate. Indeed Google would probably (ask them) be quite happy to accept an XHTML attribute which clarified when some parts of a document are written by a third party. But rel=nofollow is here today, actually reducing the incentive to spam today. And not just on Google.
Many of the anti-nofollow posts I've seen say "throw everything else out of the window, yes, the baby too, but don't use an HTML attribute in the generator software, that's just _wrong_". Some of them are frank enough to admit their interest, they're building complicated "anti-spammer" software that relies on Javascript, on expecting your visitors to be ready, able and willing to do image processing for you, or on half-smart keyword matching. The last thing they would want is for the spammers to just decide it's Not Worth It and Go Away, because then who'll validate the importance of their tools? Others pontificate for a few paragraphs about the purity of the semantic web (evidently visitors from some other planet, with some other semantic web) before linking their.. anti-spam javascript software.
The remaining objections to nofollow seem to come from people who believe this (and everything else that happens in their day) is a conspiracy against bloggers. They believe that when they write "me too" in a comment on a popular XHTML blog this inherently makes their web site (which is about cats) linked from the signature of the comment, into important metadata about XHTML. A Google search for "cats" should (according to this twisted logic) assume that the XHTML blog is recommending them as an authoritative source of information about cats.
My expectation is that this leads to a marked reduction in comment spam (but in the blog world each individual spammed will still react as though the sky fell on them) and people find some interesting things by spidering the (now labelled) comment links and comparing them to author links, especially with a time dimension (are your readers ahead of you, always writing about things you haven't seen, or following you, often unknowingly repeating what you wrote six months earlier...)
fuck milf pic*
dick milf tit*
dirty fuck mature*
momanaladventures preview*
free mature sexy*
black mature tit*
granny peeing very*
nude mature asian*
archive older woman*
fuck milf porn*
This blog is awesome! If you get a chance you may want to visit this time clock software site, it's pretty awesome too!
milf mom
Come See How I pulled This Off Google Checks
Post a Comment
Links to this post:
Create a Link
<< Home