Ecoinformatics Conference Service, Environmental Information Management 2008

Improving metadata search efficiency by enabling semantic queries

Chad W Berkley, Shawn Bowers, Shawn Bowers, Matthew Jones, Matthew Jones, Mark Schildhauer, Mark Schildhauer

Last modified: 2008-08-21


Increasing amounts of digital ecological data are becoming available (e.g., over 15,000 datasets in the Knowledge Network for Biocomplexity alone), making it critically important to improve techniques for more precisely locating and delivering relevant information from scientific searches of these resources. Semantic technologies hold the promise of enabling powerful "smart" search of online data archives. Here we describe how we are constructing semantic search features within the Metacat XML database system, which is used by many ecological research sites around the world for archiving their data using a standardized metadata format. The prototype semantic search system in Metacat uses a system of OWL-DL ontologies, such that ontological concepts can be linked to specific features and attributes of the Metacat data holdings, via an XML-based annotation language. Queries are then resolved through a free, widely-available reasoning engine that can yield effective search results due to leveraging the ontological structures. We have architected Metacat to seamlessly store and access ontologies alongside the datasets and their associated annotations and metadata, making it easy for any Metacat implementation to harness the power of semantic queries. In the future as data repositories continue to grow, these tools will be instrumental in helping scientists locate and interpret data for their research needs.