A geeky measure of transparency for the federal government

January 21, 2009 by Doug Masson 3 Comments

Jason Kottke noted a significant change in the “robots.txt” file for whitehouse.gov. “Robots.txt” is a file that tells web crawlers –Google’s indexing program, for example — what to index and what not to index.

The old whitehouse.gov site gave web crawlers 2,400 lines of instructions about what not to include. The new whitehouse.gov site gives two lines of instruction:

User-agent: *
Disallow: /includes/

This tells the webcrawler not to track the “includes” directory at the site, but everything else is fair game.

Comments

Shelly says

January 21, 2009 at 10:06 -05006

Third party content is also CC attribute licensed (unless otherwise specified), which I think is symbolic of the difference in the two administrations’ tones. (Obviously, original content is not protected by copyright, as it’s a product of the US government.)

Reply
Justin says

January 21, 2009 at 12:43 -05006

For those not aware, an Include file is merely a repeating piece of code, like a navigation bar or logo, that gets used over and over on other pages in the site. Includes serve a purpose of expedience for web developers and other technical wizardry.

The best part is that this effectively means the entire site can fully searched. Nothing is hidden from Google and other search engines.

Reply
Doug says

January 21, 2009 at 18:29 -05006

Thanks for that information on the “include.” I ran a brief search trying to get a layman’s explanation but was unsuccessful. I suspected it was something that had its reasons but wouldn’t be terribly interesting to the Googling public.

Reply

Comments

Leave a Reply Cancel reply