MLA Forum
Vol. III, Issue 2, July 14, 2004

Preservation of Digital Information

By Nandita Mani, Librarian I, Shiffman Medical Library, Wayne State University Libraries, nmani@med.wayne.edu

Abstract

The issue of digital preservation is of utmost importance in order to maintain our mission of preserving a record of our culture. In this context, it is imperative that the library profession is cognizant of the central issues surrounding digital preservation. Identifying optimal preservation methods and the need for standards are at the core of this mission. With further research and implementation of national guidelines, the goal of long-term preservation of digital information can be attained.

 

We are in the midst of a digital era where vast amounts of information exist in digital format. Will today’s digital information be here tomorrow, a year from now, or even 100 years from now? Are we on the verge of forever losing our electronic information because of the lack of preservation efforts and standards? Understanding the issues and limitations that we face regarding the preservation of digital information will enable us to develop a plan of action when it comes to our digitization efforts.

Digital preservation is defined as ‘the means of taking steps to ensure the longevity of electronic documents. It applies to documents that are both “born digital” and stored on-line or to the products of analog-to-digital conversion (Bullock, 1999). Since technology is continuously evolving, this technological component is constantly transforming on both its software and hardware levels. For this reason, the method employed for the preservation of digital information will affect the kind of access to information available in the future. Information from all eras needs to be accessible at all times and in its full entirety, therefore, it is integral that key elements are identified and in place to ensure that the information of today is will exist tomorrow and years from now. It is important to note that preservation in this instance is not just about saving information and preserving its physical form. The concept of preservation has evolved over time and has as its focus the importance of ensuring access to information. If access to information is not available, then that data is virtually of no use to anyone and the attempt to preserve the data has failed.

It was once thought that the rise of technology would make digital information indestructible and always retrievable. It is now known this was a false ideal. It was widely thought that media such as clay tablets and paper materials would be destroyed over time and could not be depended upon. What many did not consider is that digital information would be faced with the same issue of preservation and lifetime or long-term accessibility to information. In terms of preserving paper-based materials, the main concern revolves around methods utilized to reduce the acidity of paper so that it can last indefinitely. Attaining access to paper-based information did not appear to be as much of a dilemma as compared to the accessibility of digital information, because paper-based materials were normally bound in book format and put in libraries for patrons to physically use. From this perspective, in terms of being accessible, paper-based materials are accessible to those with access to a library. For this reason, by thinking that all digital records are safe, and not allocating enough resources to the long-term preservation of digital information, information professionals face a new threat to the accessibility of information.

If one studies the concept of preservation in history, preservation techniques have existed for a lengthy period of time. For example, when preserving paper materials (such as a book) all aspects of the book are preserved being its content, layout, format and its physical appearance. The same does not hold true for digital objects. Bullock (1999) mentions that when preserving digital objects, the goal is to preserve the object as a whole, including its physical appearance, content, presentation, functionality, authenticity, provenance, context and the ability to locate the digital object over time. In this context, it is essential to identify and evaluate other preservation techniques that have gained momentum, one such technique being hard copy. Hard copy is often seen as a useful preservation technique since the process involves printing a copy of the digital document and saving it in a file folder or other type of physical data holder for later use. The issue regarding hard is: how does this form of preservation account for the various interactive features that many digital documents posses? Audio files, images, and streaming video that exist within a digital document cannot be preserved by simply printing a document. For this reason, preserving digital information as hard copies is normally seen as a good secondary or tertiary preservation method, since a hard copy would allow one to have access to the basic text of a document and it would mean that the document has been ‘preserved’ and could be located or referred to in case the original digital document was lost or corrupt.

We live in an electronic world that is changing before our eyes each and every day. What is considered to be new at this current date will be considered out-dated within six months. This phenomenon is called technological obsolescence. When reflecting on the concept of technological obsolescence, one must consider the storage equipment (medium) on which data is held, since it has a great impact on the life expectancy of digital information. If the storage medium is in terrible physical state, and has not been cared for over time, then the data’s longevity within that storage medium has already been compromised.

There are several types of data storage ‘holders’ available such as magnetic tapes and optical disks. In regard to these data ‘holders’, the question that we must ask ourselves is: Will the recording or playback features of these media be as efficient and effective over time? Can vendors assure us that their products will be operable with older versions of software or hardware? Will their products be interoperable with competing hardware or software currently available at the time? These are some questions that arise when we think about technological obsolescence and its effect on digital media.

When discussing talk about digital media, it has been said that most magnetic tapes can last up to 10 to 30 years (Hedstrom, 1995). The 30-year timeframe mentioned is only attainable when the tapes are kept at optimal temperature and humidity levels. Optical disks are another type of popular medium. Whether optical disks can last for 100 or 200 years is still unknown because many scientists reporting these figures say it will depend on various environmental factors. Environmental factors that affect the longevity of these media include dust, water damage, humidity, and temperature. Can it be assumed that all precautions will consistently be taken into account to ensure their long-term survival? The answer is most likely no. It is not possible to control every environmental factor that can effect the preservation of optical disks or magnetic tapes. Therefore, the next step is to research the tools utilized to save information.

According to Rothenberg (1998), examination of the medium in which data are stored is essential, but it is equally important to research the software packages that are used to house digital information. It is these software packages that are dependent on specific hardware requirements and operating system specifications.

When thinking of a technical solution to the digital preservation issue, the solution employed needs to provide accessibility to information for an indefinite amount of time. Successfully preserving digital information means that preserving the look and feel of the original object. There should be a guarantee that audio, video, images and text resources are all preserved in the manner in which they exist in the original form. Developing a long-term solution to this issue is integral to ensure that preservation of digital information will be accomplished in an efficient and effective manner. All techniques utilized for preservation will require some minor modification, but the key here is minor modification. The technique that is chosen should be one in which all new documents can be entered without error or the possibility of data loss or corruption.

Emulation

Is the answer to this problem in the process of emulation? According to Arms (1999), emulation can be defined as the replication of a computing system to process programs and data from an early system that is no longer available. If emulation were used as a primary technique in the preservation of digital information, then libraries must decide what is to be emulated. To emulate an operating system means that it is possible to run older applications in newer environments. For example, emulating operating systems such as Unix or Windows XP will be helpful in the future when they are no longer in use. These are popular operating systems, currently in use, and while their existence may be short-lived (in terms of the version one is running) many software packages can only be run on these particular operating systems.

When referring to the question of what should be emulated, many who support emulation believe that the hardware platform should be emulated because it can then be used for several other digital objects, whereas, software emulators may only work with a few digital objects. Emulating a hardware platform is often seen as a one-to-many relationship where one item (hardware platform) can be used with many elements or digital objects. While it is important that to contemplate what should be emulated, it is also imperative to be cognizant of what needs to be implemented in order to be able to locate or retrieve a digital object over time. Jeff Rothenberg, who has written a lot on the process of emulation, has stated a good point in that the emulation approach should also involve developing techniques for saving the metadata needed to find, access, and recreate the digital documents. It is with this detailed metadata that digital documents can be located with ease.

The process of emulation appears to be mostly theoretical in nature. However, a project named CAMiLEON has been exploring emulation. CAMiLEON, or Creative Archiving at Michigan & Leeds: Emulating the Old on the New, was jointly formed by the University of Michigan and University of Leeds in England. Its goal is to explore methods for long-term preservation of digital records so that the original functionality of the digital object would be preserved. Another goal of the project was to specifically investigate technology emulation as a long-term strategy for the preservation and access to digital information. The CAMiLEON project works with material from the 1970s and 1980s as a way to show that one can go from an old system to a new system. The CAMiLEON project also aimed to evaluate publicly available emulators and to conduct test from both technical and user perspectives cases (using Apple 2 and BBC micro computers and to specifically investigate the use of emulation applied to the BBC Doomsday Project). This project also conducted user trials in comparing original systems with emulation of those systems, and pursued a cost-benefit analysis of emulation in comparison to other digital preservation strategies (Granger, 2000).

Migration

Emulation has been mentioned as one method for long-term digital preservation; however there is another strategy for the preservation of digital information that is becoming evermore popular called migration. Migration is a set of organized tasks designed to achieve the periodic transfer of digital materials from one generation of computer technology to a subsequent generation (Waters & Garrett, 1996). According to Wheatley (2001), the preservation process of migration can be broken down to minimum preservation, minimum migration, preservation migration, recreation, human conversion migration and automatic conversion migration. Minimum preservation refers to preserving a copy of the byte-streams that make up the original object, minimum migration is when all special formats or structure of a document are removed and only the basic text (ASCII characters) are kept for the sole purpose of viewing the actual raw data.

Preservation migration, according to Wheatley (2001), can be broken down into three categories: basic, annotated and complex preservation migration. The difference between these exists with basic preservation migration, screen shots of the software in use are taken, and with annotated preservation migration, textual descriptions and annotations are included to describe the data. Complex preservation migration captures even more descriptive information about the original object in question. Recreation is the fourth step in the migration process and consists of recoding the digital object by hand. Recreation seems to be a potentially lengthy process. For example, if one were to re-type a Word document using current software and adding formatting and special details to match the original this could take a great deal of time. The process of recreation seems to be plausible for the recreation of small objects, but for more complex software objects, this may become a problem.

Migration tends to be considered the best choice in terms of short and medium term preservation and is widely being used by many organizations. There are those who do not believe that reliance on migration is the optimal choice when it comes to the long-term preservation of digital information. One who holds such a stance is Jeff Rothenberg who states that migration is ‘labor-intensive, time consuming, expensive, and is prone to data loss and corruption’ (Rothenberg, 1998). There are several issues that must be identified in order to evaluate the process of migration in a thorough manner. By using migration as a preservation technique, not all information can be copied onto other formats and preserving the exact presentation and functionality of a document may not be done. It is also possible that with successive migrations, data loss or corruption may occur.

Cost of Preservation

The cost of migration depends on the type of migration performed. For example, when doing minimum migration, the cost will be relatively low since this process requires minimal technical work. On the other hand, automated complex migration will have a higher cost because data would have to be migrated once the current format becomes obsolete, thus incurring more cost in the migration process. It is clear to see that many issues need to be addressed for this preservation technique to be utilized as the primary method for preserving digital information.

When discussing preservation processes such as migration or emulation, people always wonder about the cost factor. Will one way be more expensive than the other, or are there ways to cut back financially on some aspects of the preservation process? The cost of long-term digital preservation seems to be unknown. The key here is in the term ‘long-term’. The technique employed for the preservation of digital information is one variable that will determine how much money is spent throughout the preservation process. Many factors such as the cost of maintaining hardware platforms, software packages, hiring technical staff, licensing agreements, vendor fees, and the cost of data storage facilities cannot be forecasted for the future. These factors will affect the cost of the preservation process.

Technology is changing rapidly and so are its associated expenses. Ten to 20 years from now, fees will vary a great deal when compared to today’s norm. Therefore, it is essential to understand that although tremendous costs are associated with preservation, the result of preserving information should not be measured solely in terms of a dollar amount. It is important to note that no matter the cost, the cost will only be greater if digital preservation efforts are not continued. Information is invaluable and therefore, one cannot place a dollar amount on its significance or worth, because without a record of our culture, this information cannot be transmitted over time and the record of our collective existence will not be documented for the future.

Preservation Standards

Another concern in the preservation of digital information is that few established preservation standards are in place. The preservation of digital information is looked upon by many as a project in motion where there are no guidelines for librarians to follow, yet there is a desire and need for the preservation process to begin. The goal is for uniformity, where Library A and Library B are utilizing the same preservation techniques so that in 20 or 30 years, the efforts of our predecessors are reliable and access to digital information is available. It is for this reason that national standards are implemented so that librarians and their counterparts across the nation understand what is required in the preservation process, and to promote consistency in the preservation techniques utilized. Otherwise, relying on inconsistent preservation methods could bring a rude awakening in the future.

Many libraries are not commencing their digital preservation projects in a large scale until a specific technique is formed and is declared to be the optimal method to employ for digital preservation purposes. Consistency among libraries partaking in the preservation process seems to be of utmost importance when discussing the preservation process and its current status. If some libraries use migration as their primary technique, and some proceed with refreshing or emulation methods, how will it be ensured that when these processes are completed, the look and feel of the original object is maintained without data loss or corruption? Preservation of information takes time, money, technical expertise and resources. It is no wonder that many librarians are looking for a standard before they invest enormous amounts of time and money into starting digital preservation projects.

Policies and standards need to be implemented so that long-term digital preservation projects can be started with confidence. If librarians know that digital information is being preserved in the same manner across the nation in using the same technology or technical resources, there will be consistency in preservation efforts and a feeling of security in knowing that no one is alone in this huge venture.

Understanding that national standards need to be implemented is the first step in facilitating the digital preservation process. The next step is determining what should be included in the list of requirements needed for librarians to start their preservation projects. The primary method for digital preservation should be noted. An example of this would be in saying that the process of emulation should be used for digital documents that contain audio or image resources. Not only has the main preservation technique been mentioned, but it states which type of digital documents would require this type of technique. It is also necessary that all software and hardware requirements be mentioned. If the process of emulation requires a computer with 2 GHz RAM and a backup and recovery server running on Windows NT, this needs to be identified. Finally, all required technical resources should be mentioned. If it is found that a library starting a preservation project using the process of emulation or migration will need the technical expertise of a computer programmer or network analyst in order to complete the backup of the data or the emulation/migration process, this would need to be stated.

Thinking and planning for the future will hopefully allow us to remember that change is occurring around us each and every day in the technology realm. Migration and Emulation have been identified as the optimal methods for preservation and with national standards in place, it appears that migration is crucial for the preservation of more simple data objects and emulation seems to be the optimal method for preserving complex objects that incorporate software elements. Both migration and emulation need to be tested further to see if they can meet our goal of long-term access and preservation of digital information. Although it seems implausible that these techniques can provide complete and thorough long-term preservation of data, library and technology professionals must keep trying to modify current techniques and to develop new ones as time progresses.

Clearly it is vital to place a great emphasis on the long-term preservation of digital information. Records pertaining to the past must be preserved and accessible in order to understand the present and future; if access is lost to this information, knowledge of historical events and issues will be limited. Preserving information has always been a way to preserve one’s culture and to transmit this information over time. The preservation process has been explored in ancient civilizations, and continues to be a goal in all cultures in the present. The way in which information is preserved may be different over the course of time, but the goal remains the same: to preserve and ensure the long-term accessibility of information for the purpose of transmitting information from one generation to the next. The present-day world is one full of information and hopefully through technical preservation techniques such as emulation and migration, this information can be accessible 100 years from now.

Librarians must continue to work together and with their technical counterparts to find a solution to the process of preservation of digital information. This is a cooperative effort in which librarians from all environments and information technology specialists need to come together and identify the key issues and challenges in the preservation process. Without a team effort, preservation efforts will not be efficient and effective. It is the survival of information from the past, present and future that lay in our hands, therefore, the goal of long-term preservation of digital information must be kept in the forefront of our minds.

References

Arms, W.Y. (1999). Digital libraries. Cambridge, MA: MIT Press.

Berger, M. (1999). Digitization for preservation and access: A case study. Library HI Tech, 17 (2), 146-151.

Bullock, A. (1999). Preservation of digital information: Issues and current status. Retrieved on January 26, 2004 from http://www.nlc-bnc.ca/publications/1/p1-259-e.html.

Cleveland, G. (1998). Digital libraries: Definitions, issues and challenges. Retrieved January 25, 2004 from http://www.ifla.org/VI/5/op/udtop8/udtop8.htm.

Conway, P. (1996). Preservation in the digital world. Retrieved January 26, 2004 from http://www.clir.org/pubs/reports/conway2/index.html.

Digital Preservation Coalition. (2002). Digital preservation. Retrieved January 26, 2004 from http://www.dpconline.org/graphics/digpres/stratoverview.html.

Gilheaney, S. (1998). Preserving information forever and a call for emulators. Retrieved March 15, 2003 from http://www.archivebuilders.com/aba010.html.

Granger, S. (2000). Emulation as a digital preservation strategy. Retrieved January 27, 2004 from http://www.dlib.org/dlib/october00/granger/10granger.html.

Hedstrom, M. (1995). Digital preservation: A time bomb for digital libraries. Retrieved January 26, 2004 from http://www.uky.edu/~kiernan/DL/hedstrom.html.

Hedstrom, M. & Montgomery, S. (1998). Digital preservation needs and requirements in RLG member institutions. Retrieved January 26, 2004 from http://www.rlg.org/preserv/digpres.html.

Hoffmann, M. (2002). OS emulation. Retrieved January 27, 2004 from http://www.kearney.net/%7Emhoffman/softwindows98_review.html.

Holdsworth, D. & Wheatley, P. (2001). Emulation, preservation, and abstraction. Retrieved January 27, 2004 from http://www.rlg.org/preserv/diginews/diginews5-4.html#feature2.

Howell, A.G. (2000). Perfect one day—digital the next: Challenges in preserving digital information. Australian Academic and Research Libraries, 31 (4), 121-141.

Knutson, L. (1998). The challenges of preservation in a digital library environment. Current Studies in Librarianship, 22, 56-71.

Kuny, T. (1998). The digital dark ages? Challenges in the preservation of electronic information. International News, 17, 8-13.

Marcum, D. (1996). The preservation of digital information. The Journal of Academic Librarianship, 22, 451-454.

National Library of Australia. (2003). Digital preservation strategies. Retrieved January 27, 2004 from http://www.nla.gov.au/padi/topics/19.html.

Research Libraries Group. (1995). Preserving digital information: Final report and recommendations. January 25, 2004 from http://www.rlg.org/ArchTF/tfadi.index.htm.

Ross, S. & Gow, A. (1999). Digital archaeology: Rescuing neglected and damaged data resources. Retrieved January 26, 2004 from http://www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf.

Rothenberg, J. (1995). Ensuring the longevity of digital documents. Scientific American, 272, 42-47.

Rothenberg, J. (1998). Avoiding technological quicksand: Finding a viable technical foundation for digital preservation. Retrieved January 26, 2004 from http://www.clir.org/pubs/reports/rothenberg/contents.html.

Russell, K. (1999). Digital preservation: Ensuring access to digital materials into the future. Retrieved January 26, 2004 from http://www.leeds.ac.uk/cedars/Chapter.htm.

Wadham, R.L. (1999). Digital preservation. Library Mosaics, 10 (5), 19.

Waters, D. & Garrett, J. (1996). Preserving digital information: Report of the task force on archiving of digital information. Retrieved January 26, 2004 from http://www.rlg.org/ArchTF/tfadi.index.htm.

Wheatley, P. (2001). Migration – A CAMiLEON discussion paper. Retrieved January 26, 2004 from http://www.ariadne.ac.uk/issue29/camileon/.