Help: Analyzing gene expression regulation data

Parallel Coordinates

Introduction

Parallel coordinates icon Parallel coordinates Parallel coordinates visualize multidimensional data by drawing lines between a set of axis. Each line represents one entity, typically a gene. Each axis a dimension, typically an experiment. When one gene is viewed in the parallel coordinates, all it's expression values for the experiments are visible simultaneously. By drawing multiple lines at the same time, trends in a dataset can be identified. In the screen-shot a dense (dark) area around the bottom shows that many genes have expression values in this area.

In parallel coordinates you can select the polylines as well as the axis. Selections and some brief information is shown in the Info Area.

Rearranging axis

Axis drop To make an accurate comparison between two different experiments it is often necessary to place them side by side. Therefore it is important to be able to re-arrange and duplicate, possibly also remove unneeded experiments.

Reset the axis spacingThe parallel coordinates use a drop below the axis, shown in the picture on the right to achieve this task. By default the drop is shown in a small, simplified version. Once the mouse is placed over it, it changes to the version containing three buttons. By clicking the left button, the associated axis is duplicated. By clicking the right button, the axis is removed. By dragging the central button to the sides, the axis is dragged along and consequently rearranged. When dragging around axes, the spacing between the axes can become uneven. To balance the spacing between the axes again, click the icon shown on the left in the tool bar.

Filtering

Filter dropThe parallel coordinates support three types of filters, a one dimensional filter, a global filter and an angular filter.

FilterThe one dimensional filter allows you to remove all genes that are smaller and/or larger then a specified value. To activate the brush for a particular axis click the small drop on top of the axis (shown on the right). This makes the filter, as shown on the left appear.

To change the height of the filter, click the black labels with the caption and drag them up or down. This adjusts the size of the filter an thereby adds or removes polylines. By dragging the body of the filter the top and the bottom are moved simultaneously.

To remove the brush click the handle right of the top label.

The global filter works excatly the same, but has different consequences. To activate the global filter, click the drop at the tip of the line on the far left of the parallel coordinates. Again, the filter appears.

This filter allows to remove all lines that never leave a specified corridor. This helps to remove those genes, that are neither over- nor underexpressed in any experiment. It removes all polylines that don't leave the spanned region in any of the experimental conditions currently visible. So, if you set this filter to span the region between 1 and 3, all genes, that only have expression values between 1 and 3 are removed. If only one value for that gene is smaller or bigger, the gene is not removed.

Angular Filter Angular FilterThe angular filter can be used to identify discontinuities between two experiments. It allows you to remove all lines, that are not similar to a certain slope of a master line. First click on the icon for the angular brush in the toolbar (shown on the left). Then click on a polyline, between two axis where it has a slope which you would like to use as the basis for your filter. Once you clicked the polyline, the brush appears, as shown on the right.

By dragging the legs up or down, you can adjust the level of tolerance for matching the slope of other polylines.

Clear Selections To clear selections and filters press the clear selections icon. This will remove all filters, mouse-over and clicked selections you previously applied.

Save Brushes The save filters feature allows you to actually remove all items that you currently filtered out. This is useful, if you handle a lot of data and want to continue working with a subset. Also, if you want to export your filtered data you have to press this button first. The same is true for swapping dimensions and bookmarking genes, which will be explained later.

Bookmarking

Bookmarking Bookmarking screenshot By clicking the bookmark icon, all elements which are not currently filtered out are added to the bookmark bar, which appears on the right, once it is filled with content. Notice however, that you can only bookmark 20 items or less, so you have to modify your filters to show less than 20 items.

If you want to bookmark genes individually, you can right-click on a gene, which makes the context menu appear and bookmark it from there.

The bookmarks are synchronized with the bucket. Any element that appears in this bookmark bar is also visible in the buckets bookmark bar.

Resetting the View

Save BrushesBy clicking the reset button, the view is reset to its initial state. All re-arranged axes, all removed polylines are added again.

Swapping Dimensions

Swap dimensions Swap dimensions Swap dimensions When using parallel coordinates, it is possible to show the data in two different ways. In our case we have experiments and genes. Both can be the axes or the polylines, but exchanging them produces radically different results. Showing genes as axes is desired when differences between the experiments are of interest. One example is the comparison of gene expression patterns from carcinogenic tissue with healthy tissue. However, when discontinuities of gene expression over several experiments are of interest, it is beneficial to show the experiments as axes. As an example, consider beeing interested in different expression values of a small number of genes in many different experimental conditions.

In Caleydo you can switch between what the parallel coordinates treat as axes and what as polylines. In the left picture you see 13 experiments treated as axes. The number of genes has been reduced to 6. By clicking the switch dimensions button the picture on the right is produced. Now the 6 genes which were polylines in the left picture are now axes in the right, and vice versa.

This feature is also particularly useful if you only have a view genes, but many experiments. Then you can switch dimensions immediately and analyze the differences of the experiments at a glance.

Heat Map

Heat map icon Heat map screenshot The heat map is a visualization that maps a color to the magnitude of a value. One gene corresponds to one row, and one experiment to one column.

A heat map is especially useful, if the underlying data is clustered. Clustering gives the ordering of elements meaning. Elements close to each other have similar properties (and therefore also similar color).

The heat map in Caleydo makes use of this. Its three levels (two for smaller amounts of data) allow you to browse thousands of values. On the left side, a thin overview bar shows all values. In this overview trends are visible. The overview bar can contain up to 30,000 values.

The second level shows an enlarged version of the area selected in the overview bar. This level contains up to 1,000 values. Here trends and differences are clearly visible.

The final level contains less than 100 values. Genes and experiments are labeled an all detail is visible.

Heat map handle bars The width of the second level can be adjusted by clicking the arrow button between the second and third level. To select which part of the overview should be shown in the detailed levels simply click the region.

The amount of data to be shown in the last level can be adjusted by dragging the handle bars shown on the right. Dragging the grey area between the handles lets you scroll through the overview smoothly.

Selections are handled the same way as in all other views. Additionaly, indications of selections are also shown on the first and second level, by displaying colored bars respectively lines. When a selection is triggered by another linked view, the second and third layer jump directly to the selected element. In case of multiple selections, the heat map jumps to the first, and the others are indicated in the second and third level.

Clustering

Clustering icon Clustering GUI screenshot Caleydo currently supports four clustering algorithms, two partitional (where no relation between the separate cluster is known) and two hierarchical (which build a complete tree of relations):

  • A tree clusterer, using similarities (hierarchical),
  • affinity propagation clustering, which usually delivers the best results (partitional),
  • KMeans clustering, as implemented by WEKA (hierarchical), and
  • Cobweb, as implemented by WEKA (partitional).

All algorithms can be used to cluster either the genes, the experiments or both (bi-clustering).

Two different distance measures are available: the euclidian distance and the pearson correlation.

Affinity propagation lets you choose a factor which influences the number of clusters returned, which may be between 1 and 10. KMeans requires you to specify the number of clusters.

When you have chosen the clustering algorithm and the parameters, press OK. A status bar will pop up, displaying the progress of the algorithm. Depending on the algorithm and the size of the data this process can take a while. Once clustering is complete, the heat map will show the updated ordering of the elements.

Tabular Data Viewer

Tabular data viewer iconTabular data viewer screen shot The tabular data viewer is a table containing the original values. It is synchronized with the rest of the system and always jumps to the currently active value.

The gene short name and the RefSeq ID are always shown in the two leftmost columns. Furthermore you can delete columns by pressing the button at the bottom of the column.


< Previous: The user interface Next: Using the Bucket, working with pathways >


website maintained by Marc Streit and Alexander Lex
last updated on 2010-07-29