Unearthing defunct and outdated Web pages

By IKU KAWACHI

Wanting to look at older versions of a Web page to see what changes have been made to its content — or at a once-working Web page that is now defunct —is not an uncommon occurrence, especially in the field of online reporting. Many of us are probably already familiar with Google’s Cached Links feature, which allows us to access the most recent version of a Web page saved by Google as its servers crawled the Web. To look at the cached version, all one needs to do is click the “Cached” link placed under most search results.

But what if we need to look at an even older version of a Web page, determine when a page was last updated, or even plot the changes that have been made across a given time period? The Internet Archive, a non-profit digital library founded in 1996 that offers “free universal access to books, movies & music,” can often provide a solution. While its extensive selection of archived multimedia content is worth perusal, too, its most intriguing (and most well-known) feature is the Wayback Machine, an archive of 150 billion Web pages that can be accessed through a search engine and works in much the same way as Google’s Cached Links. The Wayback Machine is much more comprehensive, though, in that it dates each cached version and places an asterisk next to versions that include changes in content. Popular Web sites, such as CNN.com and Blogger, have thousands of cached pages in its database.

Entering the URL for The New York Times, for example, brings up 1,316 results, the oldest of which is dated Nov. 12, 1996. Clicking on one of the numerous versions dated Sept. 11, 2001, reveals some of The Timesearliest articles on the terrorist attacks on New York and Washington, D.C., that took 3,000 lives.

The site even allows one to select and compare two different versions of the same page, highlighting the differences in the same fashion the “Track Changes” feature in Microsoft Word works. The Wayback Machine is a robust, if not particularly sophisticated, tool that can prove exceedingly valuable to students and journalists alike looking to uncover versions of Web pages that are now “hidden” to the public.

This entry was posted in Iku Kawachi and tagged , , , . Bookmark the permalink.

Leave a Reply