In 2007, I created four new research test sites to test robot (web crawler) behavior, using "real" content i.e., quotations from noncopyrighted, classic English-language works, so that the sites would not appear to be simply "spam." The four sites were split among dot-com and dot-edu domains: CRATE.com and ODUCRATE.com, on a commercial server; and Blanche-00.edu and Blanche-02.edu on servers at ODU. These 4 sites were created and organized in a way that makes visualizing robotic paths through the sites simple. A quick preview of Google's traversal of one of the sites is shown in this animated GIF below:
March of the Googlebots
|
The "March of the Googlebots" is an animated view of Google's robots crawling one of the experimental sites. Each blue X represents a "GET" request. Red X indicates a conditional GET request, i.e., Google is asking if the page has changed since its last visit. The spread of gray in the background shows the links that have already been visited. Thus, the background becomes fully gray as all links have been retrieved by Google at least once. Notice that Google continues to revisit various links. The animation covers late February 2007 through September 2007. We collected data for a year -- more graphs and a discussion of our findings can be read in our March 2008 D-Lib Magazine article, Site Design Impact on Robots. Information on other experiments in this vein can be found on the Publications page. |
