April 10th, 2013

Improving Data Discovery for DataONE

Line Pouchard

Information Scientist Scientific Data Group
Oak Ridge National Laboratory

Earth and Climate Systems Scientists produce and consume large amounts of diverse and multi-scale data that are kept in numerous institutional data centers, each with their own mission and policies. The DataONE is a federation that provides access,curation, preservation and specialized tools to ensure data discovery, re-use, and provenance to this data. Data discovery in DataONE is based upon XML metadata documents representing datasets. Each federated member (a so-called DataONE Member Node) provides keywords to DataONE. The DataONE search interface, OneMercury, presents the metadata documents as search results in facets, including authors, originating data centers, location of experiment, keywords. DataONE presents rich metadata and data links as search results to a scientist looking for datasets to address interdisciplinary problems in Earth sciences. In this talk we will present two approaches to improve data discovery in DataONE. In the first approach we deployed a local instance of the Stanford NCBO Bioportal, populated it with ontologies relevant to the Earth Sciences and obtained expanded search results aswell as context for search keywords. In the second approach, we focused on expanding the DataONE keyword index by applying the Topic Modeling method to DataONE metadata documents to suggest new keywords.

About Line Pouchard

Dr. Line Pouchard is an Information Scientist in the Scientific Data Group at Oak Ridge National Laboratory, US Department of Energy. She is a co-founder of the DataONE Integration and Semantics Working Group and an active participant. Her recent work includes implementing semantic technologies to improve data discovery in the Earth and Atmospheric Sciences for the NSF-sponsored DataONE. Her long-term research interests have focused on ontologies and the implementation of frameworks for scientific applications of interest to the Departments of Energy and Defense. These interests have been applied to the scientific domains of climate and earth sciences, fusion, medical modeling, and homeland security. She is an active participant to several leading ORNL efforts contributing to other agencies, including the NSF-sponsored Remote Data Visualization and Analytics.