A Guide to Robots Exclusion Protocol

Google's Prashanth Koppula wrote a ready-to-bookmark post over at the official Webmaster Tools blog, showing tons of different robots-exclusion protocol (REP) directives that can be implemented in various ways. Following is a listing of directives discussed and the methods of implementation:

Directives for the robots.txt file:
  • Disallow
  • Allow
  • $ Wildcard
  • * Wildcard
  • Sitemaps location
Meta tags for insertion into HTML:
  • NOINDEX
  • NOFOLLOW
  • NOSNIPPET
  • NOARCHIVE
  • NOODP
Of special note are the two different wildcard uses; the post links to usage models for each. One additional funny bit is in the explanation of NOARCHIVE, in which the post describes the tag's usage as "Do not make available to users a copy of the page from the Search Engine cache." Contrast this with "Do not cache the page," which I believe is most people's idea of the tag's effect. I love little semantic hooks like that.

The post notes that the directives above are observed by Google, Yahoo, and MSN/Live, which is a nice bonus. In addition, the post discusses some directives that only Google honors, such as UNAVAILABLE_AFTER (which I discussed about a year ago), NOIMAGEINDEX, and NOTRANSLATE.

{ 0 comments... read them below or add one }

Post a Comment