Visual Exploration and Analysis of Gene Expression Regulation

An organism's DNA encodes all of the RNA and protein molecules required to construct its cells. Yet a complete description of the DNA sequence of an organism be it the few million nucleotides of a bacterium or the few billion nucleotides of a human no more enables us to reconstruct the organism than a list of English words enables us to reconstruct a play by Shakespeare (Alberts et al. Molecular Biology of the Cell, 2002).

How much of a protein is produced within a cell is defined by the expression regulation of individual genes. The information on how much of a protein should be produced is encoded in regulatory DNA which is interspersed in the DNA encoding the protein, in so-called non-coding regions. These regions bind to protein molecules that are present in the local cell and through that mechanism regulate how much of a specific protein is actually produced and therefore determine the function of a cell.

The study of gene expression regulation is important as it can give clues to the function of a gene as well as information on which gene is involved in which type of process. Furthermore it can be a diagnostic tool. One can, for example, find out of which type of cancer a tumor sample is. This is due to the fact that the different cancer types have unique expression patterns. As a consequence targeted therapy is possible. Other applications include the study of the effects of drugs, personalized medicine and many more.

Traditionally gene expression analysis has been dominated by statistics. However, the Caleydo Team believes that the visual analysis and exploration can provide additional insights into the data, and can help to discover new features that are lost in a purely statistical approach.

Currently Caleydo supports two types of visual gene expression analysis: Parallel Coordinates and Heat Maps.

Parallel Coordinates

Parallel coordinates screenshot Parallel Coordinates are a well established method to visualize multi-dimensional data. However, they have been used rarely for the visualization of gene expression regulation. Our implementation of parallel coordinates aims to be especially usable and explicitly targets the needs of gene expression analysis.

Among the key features are:

  • Different brushes (selection tools)
  • Occlusion prevention techniques
  • Axis and polylines can be exchanged, thus supporting different use cases
  • Moving, removing duplication of axis

The main contribution of the Caleydo Framework, is the seamless integration of its different parts. The parallel coordinates provide several features to support such an integration, see Integration of Multiple Visualizations for more information.

Heat Maps

Heat map screenshot A heat map is a visualization where the magnitude of a value is correlated to a color. The different values are arranged in rows and columns, corresponding to their meaning. Heat Maps are the standard way of visualizing gene expression regulation. Caleydo currently uses heat maps mainly to visualize contextual information, therefore the usually seen clustering of the elements is not yet implemented.

Features of the heat map include:

  • Freely selectable color mapping
  • Exchange of rows and columns to support different use cases

Again, the integration of the heat map with other parts is one of the areas the Caleydo project focuses on.

Integration of Multiple Visualizations

Heat map screenshot In the Caleydo Project, we use the InfoVis paradigm of multiple linked views, and take it to the next level, with the Link Bucket. The Link Bucket is a concept that arranges 2D visualizations in a 2.5D environment, thereby allowing a user to browse several visualization in context with each other.

Visual Links, which are lines drawn between selected elements facilitate the easy identification of related elements in the different representations. The Link Bucket thereby connects gene expression information with pathways in a non-occluding and highly informative and interactive way.

The gene expression visualizations such as the heat map are aware of their current environment, only displaying as much information as necessary and thereby avoid visual clutter. For example, only gene expression data that is relevant in the context of the currently loaded pathways is displayed.

For a detailed exploration of one view the Link Bucket allows the user to zoom into the center view, thereby revealing all the interaction features available in the stand-alone visualizations.

More information on navigation in the Link Bucket and on loading new visualizations is available in the pathways section.

Publications

  • Alexander Lex:
    Master's Thesis: Exploration of Gene Expression Data in a Visually Linked Environment
    Supervision: Prof. Dieter Schmalstieg, June 2008.


website maintained by Marc Streit and Alexander Lex
last updated on 2010-02-17