At Libraries, Taking the (Really) Long View

Storing digital data is becoming more essential to the work of librarians, who are trying to think in terms of the next 100 years -- a virtual eternity in computer time.
July 23, 2008

One of the benefits of digitally encoded content is that it can't deteriorate. With files that consist of 1's and 0's, there are no pages to turn yellow or brittle, tape to demagnetize or bindings to snap. In theory, that would be a boon to libraries that devote boundless resources to preserving old documents, ancient texts and even videos recorded in Betamax.

But as libraries shift more of their resources to holdings that either originate as digital or become digital through scanning, it's become clear that just because something lives in the virtual stacks doesn't mean it will be around forever. Anyone who's ever suffered through a hard drive crash (or tried futilely to save a scratched DVD) has faced the inherent physical limitations of digital storage. Now librarians are having to do the same as they determine how digital holdings fit into their central mission: preserving works so that they can be accessed not just today, not just tomorrow, but indefinitely.

And for anyone who's also worked through a mere "upgrade" in file formats or e-mail clients, it's probably not a stretch to assert that in computer time, 10 years might as well be infinity. What does that make 100?

So, in a literal race against time -- but one with a perpetually receding deadline -- librarians from research universities and other institutions around the world are collaborating to tackle a whole host of problems that so far have no satisfactory solution. They include hardware complexities, such as constructing storage devices that continuously monitor and repair data while remaining easily scalable; redundancy measures, such as distributing and duplicating data across storage devices and even across the country; universal standards, such as formats that could conceivably remain readable in the distant future; and interfaces, such as open software protocols that manage digital holdings and make them accessible to the public.

Some of the solutions are still in development, while others are piecemeal. Various institutions are trying different approaches, and corporations are competing with each other as others collaborate on open-source approaches.

“For the most part, they’re all untested. None of the solutions have withstood the test of time yet,” said Michael Witt, an assistant professor of library science and interdisciplinary research librarian at Purdue University.

Coming Down to Earth

If worries about digital preservation seem premature or overly pessimistic about an eventual solution, it's worth comparing the success of restoring traditional holdings with comparable digital records. In 1975, NASA's Viking landers sent back reams of data from Mars, where they were scouring for possible evidence of extraterrestrial life. Unfortunately for scientists, the magnetic tapes used for storage became brittle and nearly unusable even after the space agency made considerable efforts to keep them in a properly controlled environment. Beyond the physical obstacles, moreover, scientists in the late 1990s found that they couldn't read the data format anyway -- and they had to crack open the original (analog) printouts to retype them.

That experience, recounted in a 2006 report from Britain's Digital Preservation Coalition, was one of several that helped to jump-start a movement among librarians, information technology specialists and others concerned with the real possibility that much of today's digital material is not only in flux but in danger of being lost in the ether altogether.

"The state of things is that we’re in the digital dark ages right now," Witt said. "We’re losing a ton of valuable information that is electronic because of the transient nature of the Internet and of storage technology and how people use it."

Tom Cramer, the associate director of digital library systems and services at Stanford University, said that NASA's inadvertent discovery -- that even machine-produced data can be lost to the environment or obsolescence -- echoes his own experience. Closer to home, Stanford's library was tasked with helping the Monterey Jazz Festival preserve its historical recordings from decades ago. Out of hundreds of tapes taken from nearly 40 years of recording history, Cramer said, only one couldn't be recovered. But audio from a digital format the festival began using in the 1990s wasn't as reliable: out of scores of those tapes, covering about six years, six were damaged beyond recovery.

So digital preservation encompasses not only the problem of reliable storage and recovery but of how to finance it, how to manage it and how to make such systems sustainable over the long run. For that to happen, though, enough institutions have to participate. The British report, "Mind the Gap," found that although a slight majority of respondents in the United Kingdom said they had an institutional commitment to addressing the issue, only 20 percent said there was enough funding to tackle it, a third said there were "clear responsibilities" for handling it, and only 18 percent said there was a strategy for digital preservation at all.

Still, Stanford has been one of the pioneers in developing solutions to digital preservation, especially through its Silicon Valley ties to Sun Microsystems, which last year set up the Sun Preservation and Archiving Special Interest Group, or PASIG, to bring together leaders in research libraries, universities and the government to periodically meet and collaborate on digital archiving issues.

"We are trying to meet the needs of the evolving 'cybrarian' community that is grappling with storage and data management, workflow and high-level architecture trends in the area of preservation and archiving," said Art Pasquinelli, Sun's education market strategist, in the initial announcement.

One project Cramer has been working on is the Stanford Digital Repository, which he said currently hosts geospatial data as well as content from other scholarly sources. The SDR, according to its Web site, provides "a trusted environment for long-term digital information storage and preservation activities."

As the project's description implies, the trust issue is an important one for librarians. The fragility of partnering with companies was reinforced last month when Microsoft announced that it would discontinue its Live Search Books project that helped research universities scan books and journals to be accessed digitally. For many librarians, it was a signal -- or a reminder -- that corporate partnerships, while in many cases helpful financially, can raise questions not only of ownership, but of reliability over the short term (let alone the long view).

"I wouldn’t rely on [corporate sponsorships] as the sole source for digitizing and preserving and providing access to my materials. I think it’s very dangerous to go down that road both for reasons of the integrity of the information, any kind of ethical ... issues that may arise," said Sarah Houghton-Jan, a blogger and the digital futures manager at the San Jose Public Library, which is run in partnership with San Jose State University.

So many developers have instead been taking the open-source route, collaborating and building on each other's code. Already, there are three established "repository" packages -- software that manages, organizes and allows access to online materials. Fedora, one of the major ones, has about 130 registered institutions and logged about 25,000 downloads over the past 12 to 18 months, said Sandy Payette, a researcher at Cornell University and the executive director of the foundation that supports the software. (The other popular repository solutions are DSpace and EPrints.)

"Some of the principles and elements of open source software communities really reinforce ... the principles of digital preservation," Cramer said, noting that “[y]ou don’t want any black boxes ... because when someone starts taking your content and modifying it in ways that aren’t apparent to you, you’re kind of at the whims" of the company you're working with.

The Biggest Obstacle?

Technology aside, however, what may be the biggest obstacle to a universal, agreed-upon solution might sound familiar: "The biggest challenge is actually related to human beings,” said Witt. Libraries need to acknowledge the problem they face and work it into their management structure.

Already, he said, libraries are starting to hire "digital preservation officers."

"But really, if you're going to have some assurance from an institutional standpoint that someone is stewarding these objects … [there's] a human resources issue."

Houghton-Jan summed up the daunting task facing libraries like this: "The clarity is that there is no set course, and that things are very much in the air. It’s nice to have clear uncertainty at the very least, I guess."


Be the first to know.
Get our free daily newsletter.


Back to Top