Bio-Informatics Information Integration

Georgia State University :

Vijay Vaishnavi
vvaishnavi@gsu.edu

Art Vandenberg
avandenberg@gsu.edu

Susmita Datta

Roop Gaurav Singh
rg_singh@yahoo.com

Abstract

As more advanced DNA sequencing technologies are developed, genome research projects generate enormous amount of data, expanding exponentially and doubling every 12-18 months. Increasingly biomedical research must contend with large datasets, multi-terabyte, even petabyte (1000 TB) datasets, that are presenting challenges to researchers. Clearly such datasets can be major assets in the biomedical community depending on the efficiency of managing not only transmission, sharing, and access among collaborating researchers but also understanding the metadata semantics that can allow interoperability. Currently each scientist accesses and uses only a small portion of this data, mainly because it is physically impossible to see all relevant information due to the heterogeneity present in the metadata and the practical difficulties of resolving these metadata disparities across geographically distributed data sources. The current state of data management issues and requirements for Bio-informatics are well documented by [Jagadish and Olken, 2003].

We have been working on a novel approach to mitigate the "essential" heterogeneity problem, with the goal of providing a uniform seamless interface that facilitates a scientist's access to the growing number of heterogeneous, independently compiled bioinformatics data sources, utilizing available tools for addressing accidental heterogeneity. This goal corresponds to the "idealized" system envisioned by Jagadish and Olken "that actively identifies data sources of interest, automatically overcomes syntactic and semantic heterogeneities wherever it discovers them, and provides transparent declarative, optimized query access over all sources."

We have done preliminary work in the domain of directory metadata: developing an architecture model, demonstrating feasibility of our approach by experimental validation of clustering algorithms, and implementing a prototype (Semantic Facilitator TM SM ). We have also articulated the problems of integration of semantically heterogeneous entities, research approaches devoted to the problem, and challenges to web-enabled virtual communities. This poster paper presents a summary of this work and our intended research on information integration of Bio-informatics sources.

Acknowledgement : This material is based in part upon work supported by the National Science Foundation under Grant No. ANI-0123937 and Grant No. ITR-0312636. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Topic revision: r1 - 15 Sep 2008 - 18:11:28 - SaravanarajDuraisamy
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback