Building the Framework for a Virtual Data Center for Ecology and the Environmental Sciences
William Michener
Last modified: 2008-08-21
Abstract
Data centers (also referred to as data archives or data repositories) have been created to preserve data and explanatory documentation (i.e., metadata), support discovery of data by searching (e.g., by location, time, taxa, keywords), enable data access, and sometimes data processing and integration with other data. In addition to their core functions, some data centers provide other services such as research and development, help-desk support, training, and outreach. Data centers are vital to science because they can provide secure and permanent repositories for the data and information that are legacies of the scientific enterprise, and they can facilitate new research and synthesis efforts.
Despite the proliferation of data centers throughout science, the discovery, acquisition, and integration of the disparate data needed to address the grand environmental challenges are exceptionally difficult, time-consuming, and expensive to achieve. Reasons for this include insufficient metadata, heterogeneity of data and metadata standards, lack of interoperability solutions across data centers, organizational and funding instabilities, and a poorly developed scientific culture of data sharing and data stewardship.
Because many key needs are not presently being met, we propose a new type of organization--a virtual data center--that can bind together existing data centers and provide seamless and straightforward discovery and access to the broad array of data, information, and analytical resources needed to address current and emerging scientific challenges. Steps involved in the formation of such a center, including principles that should guide its organization, required functionality, opportunities for leveraging existing cyberinfrastructure, and potential funding mechanisms are presented. This poster highlights results from a series of data workshops hosted by the Ecological Society of America and supported by the NSF, as well as three proposed implementation efforts (Dryad, INTEROP, and DataNet).
Despite the proliferation of data centers throughout science, the discovery, acquisition, and integration of the disparate data needed to address the grand environmental challenges are exceptionally difficult, time-consuming, and expensive to achieve. Reasons for this include insufficient metadata, heterogeneity of data and metadata standards, lack of interoperability solutions across data centers, organizational and funding instabilities, and a poorly developed scientific culture of data sharing and data stewardship.
Because many key needs are not presently being met, we propose a new type of organization--a virtual data center--that can bind together existing data centers and provide seamless and straightforward discovery and access to the broad array of data, information, and analytical resources needed to address current and emerging scientific challenges. Steps involved in the formation of such a center, including principles that should guide its organization, required functionality, opportunities for leveraging existing cyberinfrastructure, and potential funding mechanisms are presented. This poster highlights results from a series of data workshops hosted by the Ecological Society of America and supported by the NSF, as well as three proposed implementation efforts (Dryad, INTEROP, and DataNet).