Skip to content

Links for Web Archiving

marigolds, front garden, home, falmouth, Virginia, US

Possible Applications to Try:

Information about Website compression:

Tools:

Web tool that examines the size gains of website compression. Downloadable Website Analyser

http://webarchivist.org/resources.htm

Below: Some  annotated text taken from Web Archiving Resources Office for Information Systems, Harvard University Libraries

Harvesting Services

ArchiveIt

A subscription harvesting service provided by the Internet Archive. Through a web based interface, users can capture, catalogue and archive their institution’s own web site or build additional collections, and then search and browse the collection when complete.
http://www.archive-it.org/

Harvesting Software

Open source harvesting software

Combine Harvesting Robot
http://www.lub.lu.se/combine/
Harvesting and indexing software written in Perl and C++ and under the GPL license. Once used (and still?) by Swedish, Danish and Austrian archives. Do not know if this is actively developed anymore.

GNU Wget
http://www.gnu.org/software/wget/wget.html
A non-interactive command-line tool under the GPL license that can be used from scripts and other programs.

Heritrix, Internet Archive and Nordic National Libraries
http://crawler.archive.org/
A robust web archiving harvester under the LGPL license. Has very flexible means to configure and control the harvest. Designed to be extensible by writing new Java modules. Configurable through a web interface. This work is sponsored by the IIPC (International Internet Preservation Consortium).

HTTrack
http://www.httrack.com/
Offline browsers under the GPL license that can be used from a graphical interface or the command line.

Nalanda iVia Focused Crawler (NIFC)

http://ivia.ucr.edu/projects/Nalanda/

Designed to find Web resources with the same topic as a seed set of known resources. NIFC was created by Dr. Soumen Chakrabarti at the Indian Institute of Technology (Bombay), and further developed in collaboration with the iVia team.

Nedlib Harvester, Center for Scientific Computing – the Finnish IT Center for Science
http://www.csc.fi/sovellus/nedlib/
Developed as a part of the Nedlib project funded by the European Union. Written in C and dependent on the MySQL database. No longer supported or developed.

Commercial harvesting software

Internet Researcher, Zylox Software
http://www.zylox.com/
A Windows-only offline browsing tool with a graphical interface.

Offline Explorer and Mass Downloader, MetaProducts Software Corporation
http://www.metaproducts.com/mp/mpProducts_List.asp
Various offline browsing tools for Windows. The MetaProducts Offline Explorer Pro 2.1 is used by DACHS (Digital Archive for Chinese Studies) - http://www.sino.uni-heidelberg.de/dachs/

RafaBot, Spadix Software
http://www.spadixbd.com/rafabot/
A Windows-only offline browsing tool with a graphical interface. As well as supplying it with a list of URLs, can give it search terms and RafaBot will download all matching web sites using search engines.

SuperBot, Sparkleware
http://www.sparkleware.com/dl.html
A simple Windows-only offline browsing tool with a graphical interface.

SurfSaver, askSam Systems
http://www.surfsaver.com/
An add-on to Microsoft Internet Explorer.

Teleport Webspiders
http://www.tenmax.com/teleport/home.htm
Various sophisticated Windows-only versions with different interfaces (graphical, console, scriptable) and feature sets.

WebCopier, MaximumSoft Corp.
http://www.maximumsoft.com/index.html
An offline browsing tool with a graphical interface in multiple versions for different operating systems and performance/feature level.

Discovery, Display and Access Software

ARC Access Tools
http://archive-access.sourceforge.net/
Internet Archive’s list of tools for processing and accessing content in ARC files.

Kea

http://www.nzdl.org/Kea/

A GPLed tool for automatic keyword extraction from text documents. Originally written in a combination of Perl, C and Java; now available in an all-Java version. From the New Zealand
Digital Library at the University of Waikato, New Zealand.

libiViaMetadata

http://ivia.ucr.edu/manuals/libiViaMetadata/current/

A GPLed C++ library for assigning descriptive metadata to web files. Developed under the iVia Project. Includes the PhraseRate program which is described at http://ivia.ucr.edu/projects/PhraseRate/

NutchWAX (Nutch + Web Archive eXtensions), Internet Archive and Nordic National Libraries
http://archive-access.sourceforge.net/projects/nutch/gettingstarted.html
A tool for indexing and searching web archives. Currently works only with the Arc format
(http://www.archive.org/web/researcher/ArcFileFormat.php).
Implemented as a Java servlet. Add parsers to handle different formats, e.g. xpdf for PDF files. This work is sponsored by the IIPC (International Internet Preservation Consortium).

Wayback, Internet Archive

http://archive-access.sourceforge.net/projects/wayback/

The open source version of the Internet Archive’s proprietary search and display interface, the “Wayback Machine” (listed next).

Wayback Machine, Internet Archive
http://www.archive.org/web/web.php
A proprietary interface to the Internet Archive’s huge collection of web pages archived from 1996 to the present.

WERA (Web Archive Access), Internet
Archive and National Library of Norway http://nwa.nb.no/
An archive viewer application that gives an Internet Archive Wayback Machine-like access to web archive collections as well as the possibility to do full text search and easy navigation between different versions of a web page. WERA is based on, and replaces the NwaToolset. It uses the NutchWAX search engine and is written in PHP and Java. This work is sponsored by the IIPC (International Internet Preservation Consortium).

General Web Archiving Suites

Software that is more of a system of web archiving tools rather that individual applications

DataFountains

http://ivia.ucr.edu/manuals/DataFountains/1.0.0/

A tool for discovering, harvesting
and describing web resources. Developed under the iVia Project.

PANDAS (PANDORA Digital Archiving
System), National Library of Australia http://pandora.nla.gov.au/pandas.html Tools for controlling the harvest, conducting quality assurance checking, initiating archiving processes, managing the metadata including access restrictions, and producing management reports. Uses the HTTrack harvester. PANDAS was created to enable very selective harvesting and is not intended for large-scale automated harvests. The developers of this software are re-engineering PANDAS to use IIPC tools like Heretrix and WERA, and to
be better integrated with their digital repository.

WebArchivist Software Suite, SUNY Institute of Technology and University of Washington
http://www.webarchivist.org/resources.htm
Tools for entering metadata, searching, analyzing and displaying archived sites. The software isn’t licensed yet but according to the product’s website the plan is to make this software available to other organizations.  Used for the Library of Congress’ Election 2002 (http://lcweb4.loc.gov/elect2002/) and September 11 (http://september11.archive.org/)
web archives as well as the Asian Tsunami Web Archive (http://tsunami.archive.org/).

Tagged ,

A Place to Bury Strangers links for 2009-03-26

Waterfalls, Virginia, US

  • The headliner was A Place To Bury Strangers, who completely rocked the place. Though they were not an official SXSW band, they were one of the best bands I saw during the whole week. Towards the end of their set guitarist/vocalist Oliver Ackerman slammed his guitar on the stage and then proceeded to rip all of the strings off of it, which left a few people with their mouths open.
Tagged , , , ,

A Place to Bury Strangers links for 2009-03-25

Waterfalls, Virginia, US

Tagged , , ,

A Place to Bury Strangers links for 2009-03-24

Waterfalls, Virginia, US

Tagged , , ,

Coding and Security, Amazon Server, & A Place to Bury Strangers links for 2009-03-23

Waterfalls, Virginia, US

  • While most developers are proficient in several languages, today’s economic climate coupled with advances in technology has meant that oftentimes developers need to pick up a new language quickly. And although most developers are typically fluent in the security issues surrounding their specific languages and do their best to ensure that the code they produce is secure, security vulnerabilities in new language environments may not be as well understood.   

    Enter Fortify, a software security company that has organized security issues by both vulnerability category and by language so developers can easily ascertain the types of errors that have an impact on security.

  • Dave Winer yesterday announced EC2 for Poets, a step-by-step guide to help you create a server on Amazon’s EC2. His how-to is so easy to understand that we had our own server up and running within the hour. Sure, it may not seem like much that this fairly uninteresting page is sitting out there somewhere, but for this writer, it was an amazing coup.
    (tags: server amazon)
  • So the story – the long version – goes like this: A while back, we dropped our name in one of those enter-to-win free-European-vacation sweepstakes boxes at Ralph’s, and craziest thing – like, what are the odds, y’know? – we won! So all we had to do was figure out when to go, and that was when we noticed that A Place To Bury Strangers was touring Europe the first 2 weeks of April. Coincidence? Perhaps. So we decided to follow ’em around – after all, it’s spring break for us, and Widespread Panic’s not touring this year. But the kicker is that That Very Place To Bury Strangers actually took note of our travel plans, and asked us to play with them! Weird, wild stuff.
    (tags: aptbs)
  • Sitting in the internet shack at the Fader Fort, just after a nice set from The Strange Boys. The best band I have seen so far this week is A Place To Bury Strangers, and a close second is Here We Go Magic, who showed up late to the Austinist party at Mohawk earlier today but still had enough time to impress.
    (tags: aptbs austin tx)
Tagged , , , ,

Dilbert links for Object Oriented Analysis and Design class, links for 2009-03-19

Waterfalls, Virginia, US
Dilbert links for Object Oriented Analysis and design class. Getting going on the group project. I thought these might be entertaining and informative.

Tagged , ,

Future of the Web, Skywave, & A Place to Bury Strangers links for 2009-03-18

remodeling construction, home, Falmouth, Virginia, US

    Future of the Web 

  • Future of World Wide Web
    (tags: video web)
  • Skywave

  • While all you neu-shoegazers were still wearing Garanimals, I was scooping up Skywave seven inches without reading the price tag. Ackerman went on to help form A Place to Bury Strangers, which is alright, I guess, but Skywave had the songs. “Don’t Say Slow” is not at all Skywave’s only high point. Good luck finding their output – all of it was self-released, I believe — unless you live in the Fredericksburg, Va area, where they were based and didn’t seem to care to leave much.
    (tags: skywave)
  • A Place to Bury Strangers

  • Sunday brought one of the more interesting acts to be included on the bill, Brooklyn’s A Place to Bury Strangers. Self proclaimed loudest band in Brooklyn and deliverers of total sonic annihilation. I caught the band during their last visit here in Austin, and the show was a complete sensory overload complete with throbbing strobe lights, more smoke than a Dead show, and a volume rarely experienced @ a live show these days. Singer/guitarist Oliver Ackerman lead the trio through a ferocious set once again that attacked the auditory senses slowly working himself into a frenzy that resulted in broken guitar strings while slinging the axe around on stage. Sonic annihilation that was absolutely a grand closure to a great weekend of music.
Tagged , , ,

Stalking Death

Stalking Death was the first of Kate Flora’s books that I read.

This is a “Thea Kozak Mystery”, and like her other works like the others traces the thoughts and actions of a protagonist who appears in several of Flora’s novels in a series. Thea Kozak isn’t a private detective or police officer, but is a member of a consulting firm that works with private schools in New England that are facing a crisis. In this case the school’s headmaster wants to expel a student who he claims has been a problem. The student claims she is being stalked. A murder is committed along the way, Thea is physically threatened and attacked, and the plot twists through a backdrop of deception, hatred, and violence. Still, it is written in a way that keeps you involved wanting  to know  how the story is revealed.

I’ve read several of her books since this one, and have others on my dresser to get to next. I am glad to have found her stories. Building them in a series based on a main character, such as Thea Kozak, allows Kate Flora to keep the reader involved in a story that spans several works, and the experience is enhanced if you’ve read several of the books in a series.

Tagged , , ,

Love Disk & A Place to Bury Strangers links for 2009-03-17

spring snow, home, Falmouth, Virginia, US

  • Backed by an solid rhythm section, frontman Oliver Ackermann is free to unleash the full potential of his own Death by Audio effects without compromising the appeal of the songs. However, the presence of a single guitarist means that several of A Place To Bury Strangers’ songs resemble each-other in their stripped-down linearity. This record isn’t so much a concise album as a pell-mell collection of songs written during different time periods, and this is painfully apparent in the tremendous shift in style from one track to another. Expanding into a four-piece band could open new avenues of sonic experimentation to this group.
    (tags: aptbs review)
  • She talked to two bands about the mid-March festival down in Austin: first, A Place to Bury Strangers, a band composed of a drummer, a bassist, a guitarist and a gigantic wall of noise and feedback. Then, she chatted with one-half of Ra Ra Riot, a six-member group that features strings and catchy melodies. They offered up what they’re excited about (mostly free stuff), who they’re excited to see and what not to pack.
    (tags: aptbs sxsw)
  • Oliver Ackermann, founder of Death By Audio, has created Total Sonic Annihilation, an experimental effects pedal that is unlike any other that has been produced in the past.
  • One of the greatest examples of interactive media I’ve ever seen came burned onto a CD-ROM bundled with an annual awards issue of ID magazine. Love Disc 95, the work of Paul Kim and Karl Ackermann, former RISD graduates, was a series of stream of consciousness mini games of clickety-click bliss, navigated with a little yappity dog avatar (there is a website but it appears unloved and abandoned unfortunately).
    (tags: karl lovedisk)
Tagged , ,

Jobs & Wozniak, Wiki selection and use, & A Place to Bury Strangers links for 2009-03-16

begonias and violets in a sunny window, home, Falmouth, Virginia, US

Tagged , , , , , , ,