Over the past decade, gene expression microarray data has become one of the most important tools available for biologists to understand molecular processes and mechanisms on the whole-genome scale. Microarray data provides a window into the inner workings of the transcriptional process that is vital for cellular maintenance, development, biological regulation, and disease progression. While an exponentially increasing amount of microarray data is being generated for a wide variety of organisms, there is a severe lack of methods designed to utilize the vast amount of data currently available. In my work, I explore several techniques to meaningfully harness large-scale collections of microarray data both to provide biologists with a greater ability to explore data repositories, and to computationally utilize these repositories to discover novel biology.

First, I will discuss techniques for visualization-based analysis of microarray data on the scale of individual datasets. These techniques include incorporating statistical measures into visualization schemes and utilizing alternative views of data to gain a broader picture. Second, I will focus on novel methods that allow users to simultaneously view multiple datasets with the goal of providing a larger context within which to understand individual datasets. These techniques include developing multi-dataset visualization methods as well as utilizing new technologies such as very large format display devices. Third, effective search and analysis techniques are required to guide researchers and enable their effective use of large-scale repositories. I will present a user-driven search algorithm designed to both quickly locate relevant datasets in a collection and to then identify novel players related to the user’s query. This technique is useful as an independent search/exploration method, can be incorporated into visualization systems, and can be used to predict novel functions for genes. I will discuss how we have successfully used this approach to discover novel biology, including directing a large-scale experimental investigation of S. cerevisiae mitochondrial organization.

The combination of visualization-based analysis methods and exploratory algorithms such as those presented are vital to future systems biology research. As data collections continue to grow and as new forms of data are generated, it will become increasingly important to develop methods and techniques that will allow experts to intelligently sift through the available information to make new discoveries.