Help: Analyzing gene expression regulation data
- Starting Up
- The user interface
- Analyzing gene expression regulation data
- Using the Bucket, working with pathways
- Searching for genes and pathways
- Uninstallation
- Known Issues and Bugs
Parallel Coordinates
Introduction
Parallel coordinates visualize multidimensional data by drawing lines
between a set of axis. Each line represents one entity, typically a
gene. Each axis a dimension, typically an experiment. When one gene is
viewed in the parallel coordinates, all it's expression values for the
experiments are visible simultaneously. By drawing multiple lines at the
same time, trends in a dataset can be identified. In the screen-shot a
dense (dark) area around the bottom shows that many genes have
expression values in this area.
In parallel coordinates you can select the polylines as well as the axis. Selections and some brief information is shown in the Info Area.
Rearranging axis
To
make an accurate comparison between two different experiments it is
often necessary to place them side by side. Therefore it is important to
be able to re-arrange and duplicate, possibly also remove unneeded
experiments.
The
parallel coordinates use a drop below the axis, shown in the picture on
the right to achieve this task. By default the drop is shown in a small,
simplified version. Once the mouse is placed over it, it changes to the
version containing three buttons. By clicking the left button, the
associated axis is duplicated. By clicking the right button, the
axis is removed. By dragging the central button to the sides, the
axis is dragged along and consequently rearranged. When dragging
around axes, the spacing between the axes can become uneven. To balance
the spacing between the axes again, click the icon shown on the left in
the tool bar.
Filtering
The
parallel coordinates support three types of filters, a one dimensional
filter, a global filter and an angular filter.
The one
dimensional filter allows you to remove all genes that are smaller
and/or larger then a specified value. To activate the brush for a
particular axis click the small drop on top of the axis (shown on the
right). This makes the filter, as shown on the left appear.
To change the height of the filter, click the black labels with the caption and drag them up or down. This adjusts the size of the filter an thereby adds or removes polylines. By dragging the body of the filter the top and the bottom are moved simultaneously.
To remove the brush click the handle right of the top label.
The global filter works excatly the same, but has different consequences. To activate the global filter, click the drop at the tip of the line on the far left of the parallel coordinates. Again, the filter appears.
This filter allows to remove all lines that never leave a specified corridor. This helps to remove those genes, that are neither over- nor underexpressed in any experiment. It removes all polylines that don't leave the spanned region in any of the experimental conditions currently visible. So, if you set this filter to span the region between 1 and 3, all genes, that only have expression values between 1 and 3 are removed. If only one value for that gene is smaller or bigger, the gene is not removed.
The angular
filter can be used to identify discontinuities between two experiments.
It allows you to remove all lines, that are not similar to a certain
slope of a master line. First click on the icon for the angular brush in
the toolbar (shown on the left). Then click on a polyline, between two
axis where it has a slope which you would like to use as the basis for
your filter. Once you clicked the polyline, the brush appears, as shown
on the right.
By dragging the legs up or down, you can adjust the level of tolerance for matching the slope of other polylines.
To clear
selections and filters press the clear selections icon. This will remove
all filters, mouse-over and clicked selections you previously applied.
The save
filters feature allows you to actually remove all items that you
currently filtered out. This is useful, if you handle a lot of data and
want to continue working with a subset. Also, if you want to export your
filtered data you have to press this button first. The same is true for
swapping dimensions and bookmarking genes, which will be explained
later.
Bookmarking
By
clicking the bookmark icon, all elements which are not currently
filtered out are added to the bookmark bar, which appears on the right,
once it is filled with content. Notice however, that you can only
bookmark 20 items or less, so you have to modify your filters to show
less than 20 items.
If you want to bookmark genes individually, you can right-click on a gene, which makes the context menu appear and bookmark it from there.
The bookmarks are synchronized with the bucket. Any element that appears in this bookmark bar is also visible in the buckets bookmark bar.
Resetting the View
By
clicking the reset button, the view is reset to its initial state. All
re-arranged axes, all removed polylines are added again.
Swapping Dimensions
When
using parallel coordinates, it is possible to show the data in two
different ways. In our case we have experiments and genes. Both can be
the axes or the polylines, but exchanging them produces radically
different results. Showing genes as axes is desired when differences
between the experiments are of interest. One example is the comparison
of gene expression patterns from carcinogenic tissue with healthy
tissue. However, when discontinuities of gene expression over several
experiments are of interest, it is beneficial to show the experiments as
axes. As an example, consider beeing interested in different expression
values of a small number of genes in many different experimental
conditions.
In Caleydo you can switch between what the parallel coordinates treat as axes and what as polylines. In the left picture you see 13 experiments treated as axes. The number of genes has been reduced to 6. By clicking the switch dimensions button the picture on the right is produced. Now the 6 genes which were polylines in the left picture are now axes in the right, and vice versa.
This feature is also particularly useful if you only have a view genes, but many experiments. Then you can switch dimensions immediately and analyze the differences of the experiments at a glance.
Heat Map
The
heat map is a visualization that maps a color to the magnitude of a
value. One gene corresponds to one row, and one experiment to one
column.
A heat map is especially useful, if the underlying data is clustered. Clustering gives the ordering of elements meaning. Elements close to each other have similar properties (and therefore also similar color).
The heat map in Caleydo makes use of this. Its three levels (two for smaller amounts of data) allow you to browse thousands of values. On the left side, a thin overview bar shows all values. In this overview trends are visible. The overview bar can contain up to 30,000 values.
The second level shows an enlarged version of the area selected in the overview bar. This level contains up to 1,000 values. Here trends and differences are clearly visible.
The final level contains less than 100 values. Genes and experiments are labeled an all detail is visible.
The
width of the second level can be adjusted by clicking the arrow button
between the second and third level. To select which part of the overview
should be shown in the detailed levels simply click the region.
The amount of data to be shown in the last level can be adjusted by dragging the handle bars shown on the right. Dragging the grey area between the handles lets you scroll through the overview smoothly.
Selections are handled the same way as in all other views. Additionaly, indications of selections are also shown on the first and second level, by displaying colored bars respectively lines. When a selection is triggered by another linked view, the second and third layer jump directly to the selected element. In case of multiple selections, the heat map jumps to the first, and the others are indicated in the second and third level.
Clustering
Caleydo currently supports four clustering algorithms, two partitional
(where no relation between the separate cluster is known) and two
hierarchical (which build a complete tree of relations):
- A tree clusterer, using similarities (hierarchical),
- affinity propagation clustering, which usually delivers the best results (partitional),
- KMeans clustering, as implemented by WEKA (hierarchical), and
- Cobweb, as implemented by WEKA (partitional).
All algorithms can be used to cluster either the genes, the experiments or both (bi-clustering).
Two different distance measures are available: the euclidian distance and the pearson correlation.
Affinity propagation lets you choose a factor which influences the number of clusters returned, which may be between 1 and 10. KMeans requires you to specify the number of clusters.
When you have chosen the clustering algorithm and the parameters, press OK. A status bar will pop up, displaying the progress of the algorithm. Depending on the algorithm and the size of the data this process can take a while. Once clustering is complete, the heat map will show the updated ordering of the elements.
Tabular Data Viewer

The
tabular data viewer is a table containing the original values. It is
synchronized with the rest of the system and always jumps to the
currently active value.
The gene short name and the RefSeq ID are always shown in the two leftmost columns. Furthermore you can delete columns by pressing the button at the bottom of the column.
| < Previous: The user interface | Next: Using the Bucket, working with pathways > |
website maintained by Marc
Streit and Alexander
Lex
last updated on 2010-07-29