Integrating Ecological Data: Notes from the Grasslands ANPP Data Integration Project
Last modified: 2008-08-21
Trends in annual aboveground net primary productivity (ANPP) at regional and global scales are an important component of the structure and function of ecosystems across spatial and temporal gradients in a changing world. Ecologists are interested in conducting cross-site or large-scale integration and analysis of annual ANPP values, but are often hindered by the lack of standard methodologies for data collection, data management practices and detailed metadata documentation across sites. The Grasslands ANPP Data Integration (GDI) project has brought together experts in ecology, information management, and computer science to address the challenges of integrating ANPP data. Together, we have created a centralized database of annual ANPP data and metadata from five national and international Long Term Ecological Research (LTER) grassland sites. The database contains ANPP data at a level of granularity appropriate to each site, but standardizes vegetation species codes and sampling location metadata to facilitate cross-site comparison. This approach is important to local ecologists and information managers as no data are lost, and data can still be aggregated to the proper level of granularity for statistically valid cross-site analysis. The GDI database facilitates transformation, integration, and exploration of site-specific ANPP data, and preliminary cross-site statistical analyses and synthetic research. The GDI team has created processes and tools that will enable future warehousing of ANPP data by streamlining data insertion, update, integration, and standard metadata documentation and species information. This paper presents a description of the GDI data model, data transformation and integration techniques, and quality assurance standards. Lessons learned that might be applicable to other ecological and scientific data integration are also included.