Integrating Genomic Data to Build Mechanistic Networks for Genes and Small Molecules
The recent increase in interest in mining biological data has spurred the aggregation of genomic data from various model organisms into numerous databases. We want to integrate the data available for one of the more well studied organisms, saccharomyces cerevisiae (yeast), and to build mechanistic networks that present interactions between proteins and compounds (small molecules) so that we can accurately predict drug targets. To that end, we apply machine learning algorithms to the data to create the interaction network; a graph where nodes represent proteins or molecules and edges between nodes represent the probability that two nodes interact. Our two step integration process, where we first predict protein-protein interaction networks for various interaction types and then use these networks to predict protein-compound interaction networks, will provide detailed insight into how pathway level knowledge can be leveraged to predict small molecule level interactions.
Recent years have also seen an explosion in plant genomics, as the difficulties inherent in sequencing and functionally analyzing these biologically and economically significant organisms have been overcome. Arabidopsis thaliana, a versatile model organism, represents an opportunity to evaluate the predictive power of biological network inference for plant functional genomics. We provide a compendium of functional relationship networks for A. thaliana leveraging data integration based on over 60 microarray, physical and genetic interaction, and literature curation datasets. These include tissue, biological process, and development stage specific networks, each predicting relationships specific to an individual biological context. These biological networks enable the rapid investigation of uncharacterized genes in specific tissues and developmental stages of interest and summarize a very large collection of A. thaliana data for biological examination. We found validation in the literature for many of our predicted networks, including those involved in disease resistance, root hair patterning, and auxin homeostasis.