Ecoinformatics Conference Service, Environmental Information Management 2008

Archival Data Formats - Archivists and Users

Dave Rugg

Last modified: 2008-08-21


Choosing data file formats for a data archive offers an opportunity to affect the operating costs of the archive. This is true even for the most common types of data - numbers and text. Operational facets that can be affected include archive maintenance, archive growth, providing data to customers, and creating added-value products. There is usually a trade-off between minimizing the cost of initial processing for a data set submitted to the archive and creating an archive data package aimed at long-term persistence. Another trade-off occurs when considering the optimal data format for the archive against the optimal format for customers of the archive. The variety of types of data sets also brings complexity to the archivist's planning - small data sets, large data sets; varying levels of customer demand. A single data file format seems unlikely to address all of these needs in a satisfactory way. Instead, the archivist should work towards keeping the collection of file formats used in the archive as compact as possible. Fortunately, with the creation of eXtensible Markup Language (XML), the information technology community has created a format that can help us achieve a robust level of compactness.