A Guide to Robots Exclusion Protocol

Google's Prashanth Koppula wrote a ready-to-bookmark post over at the official Webmaster Tools blog, showing tons of different robots-exclusion protocol (REP) directives that can be implemented in various ways. Following is a listing of directives discussed and the methods of implementation:

Directives for the robots.txt file:

Disallow
Allow
$ Wildcard
* Wildcard
Sitemaps location

Meta tags for insertion into HTML:

NOINDEX
NOFOLLOW
NOSNIPPET
NOARCHIVE
NOODP

Of special note are the two different wildcard uses; the post links to usage models for each. One additional funny bit is in the explanation of NOARCHIVE, in which the post describes the tag's usage as "Do not make available to users a copy of the page from the Search Engine cache." Contrast this with "Do not cache the page," which I believe is most people's idea of the tag's effect. I love little semantic hooks like that.

The post notes that the directives above are observed by Google, Yahoo, and MSN/Live, which is a nice bonus. In addition, the post discusses some directives that only Google honors, such as UNAVAILABLE_AFTER (which I discussed about a year ago), NOIMAGEINDEX, and NOTRANSLATE.

A Guide to Robots Exclusion Protocol

{ 0 comments... read them below or add one }

Post a Comment

About

Popular Posts

Search