![]() |
|
Welcome
RESEARCH:
RESOURCES:
CONTACT:
|
Research OverviewCRATE
Two factors impact web site preservation: the inability to confidently enumerate all of a site's resources (the counting problem) and the lack of sufficient, detailed, resource information (the representation problem). My dissertation research focuses on this area, and how to utilize the web server itself to provide both the resource and its preservation-related metadata in a single, HTTP GET response. It's like a museum that moves an object from display to storage, putting it into a crate together with whatever curation information is required. CRATE (not an acronym) is both a process and the complex-object result. I am implementing the CRATE model by redesigning the architecture of an Apache module, mod_oai, which is capable of building these complex objects. For more information, see the CRATE section of the publications page. Lazy Preservation: The Gray WebSearch engines (including Google, Yahoo and MSN) will often cache pages that they visit. This is particularly helpful when a page may be temporarily unavailable: You can still look at the page content by clicking on the cached version. Congressman Foley's web site, for example, was quickly reconstructed by a number of people after the infamous site was removed from his server. This underlying Web Infrastructure I've christened the Gray Web. But how much of a site is cached? If I woke up one morning to find that my website host company had gone out of business, could I recover my web site's content by using cached pages from search engines? The Lazy Preservation project looks at the practicality of just such an approach to disaster recovery of a web site. is listed in the Gray Web section of the publications page Digital ArchivesOne of the great tragedies of history was the burning of the great library at Alexandria. Most works were irreplaceable, one-of-a-kind, reputedly dating back several centuries. Similar tragedies have occured in recent times, where modern libraries that housed unique ancient manuscripts have burned to the ground. What about our digital libraries? With copyrights and other intellectual-property reproduction restrictions, some works could be impossible to replace if the digital media were destroyed or damaged. Backing up these collections takes money and effort. Can we use existing infrastructure such as usenet as a replication tool? Sponsored by an NSF grant, the Digital Archiving Project examines the feasibility of this approach for web sites. See the Digital Architecture section of the publications page for more information. Impact FactorsEvery year the ISI publishes the Journal Citation Review (JCR), which is an evaluation of the 'impact factor' of the top research journals around the world. Basically, it counts the number of times other researchers cite articles in a particular journal. The more frequently a journal is cited, the higher that journal is ranked -- that is, it has a higher impact factor. The same process is used to rank individual researchers and the impact that their research is having on the community. Such rankings are often used when evaluating a professor's application for tenure, promotion, pay increases, etc. The web appears to be affecting the validity of this approach to ranking research impact. For more information, see the Impact Factors section of the publications page. |
| © Joan A. Smith 2008 |