Help: Starting Up
- Starting Up
- The user interface
- Analyzing gene expression regulation data
- Using the Bucket, working with pathways
- Searching for genes and pathways
- Uninstallation
- Known Issues and Bugs
Prerequisites
Caleydo visualizes thousands of data points and therefore is a resource intensive application and requires somewhat recent hardware. The system requirements vary widely with the amount of data you want to load and visualize simultaneously. For a detailed discussion on the scalability of Caleydo please refer to the section at the end of this page. We recommend the following hardware and software configurations:
- A graphics card with at least 64 MB RAM is recommended.
- Up to date graphics card drivers
- Currently only Sun Java 6 is supported. Download and install the latest Java Runtime Environment. OpenJDK is not supported
- Supported operating systems: Windows and Linux. Mac Operating Systems are not supported (though it should theoretically run).
- A mouse with a mouse wheel.
- For the first start only: a broadband Internet connection.
Installation
Webstart
Java Webstart allows you to run Caleydo without the need to install it. All the files will be downloaded automatically to your computer and Caleydo will start automatically once download is complete.
To run Caleydo as a webstart application, make sure you have Java installed and then simply click the large web start icon on the home or the downloads page.
The application now downloads to your local hard drive and then executes. Note that you do not need to have administrator privileges to do that.
Using web start ensures that you always have the most current version of Caleydo.
Manual Installation
To manually install Caleydo download the compressed file appropriate for your operating system from the download page. Extract the archive, enter the created Caleydo folder and click the executable (caleydo.exe on Windows).
Startup
Upon
startup you are presented with choices for different execution modes:
- Genetic Analysis and
- General Data Analysis
While the former is tailored to the analysis of microarray gene expression data and pathways, the latter can load any kind of csv file. Of course, the pathways are not available in the general data analysis mode.
The genetic analysis page currently lets you choose between three options:
- Start with sample gene expression data. This starts Caleydo with a set of clustered gene expression data.
- Start with random generated sample gene expression data starts Caleydo with a set of random data.
- Load data from file let you load your own microarray data, as long as it adheres to the following criteria specified in Data Format
Finally, a check box let's you choose whether you want to load pathways for the analysis (which makes startup slower) or not.
In case of General Data Analysis you currently have only the option to load comma separated data.
First Run, Pathway Fetching
When
you start Caleydo for the first time in Genetic Analysis mode with
pathways enabled (see Startup) you will be asked
to fetch the pathways from BioCarta
and KEGG. Click Fetch
pathways to start the process. Caleydo will then load the pathways from
the databases and store them on your local computer.
The download will not surpass 30MB, but will exceed 1000 files.
The data is stored in your users home directory in a folder called .caleydo.
When the download is complete click Finish. You will need to do this only once, but you can manually redo it to update the pathway database via the preferences.
In case you update Caleydo from a previous version, pathway
fetching can be necessary as well.
Data Format
To analyze gene expression data in Caleydo the data currently has to be provided in a specific way:
- File Format: Comma separated files (as can be exported from for example Microsoft Excel) with the extensions of *.csv or *.txt are supported.
- Delimiters: Any kind of delimiter is possible. The most common are TAB and Semicolon.
- Header: The file may contain a header, which can be ignored upon import
- Identifier: The first column in the file has to contain NCBI Accession numbers with the prefix NM. While it is legal to have rows without accession number in the file, those values will not be available during the analysis.
- Columns: The file may contain columns with other data, such as gene short names. Those columns will have to be deselected upon import.
- Values: Any real value is legal. The decimal symbol is period (.) not comma (,). 0.3445 is a legal value, 0,3445 is not. Non existing values are legal. They can either be left blank or be specified as NaN (not a number).
- Column captions: The row before the actual values has to contain column captions (this would be the first line in the file, if no header is present). It is legal (but unwise) to leave them blank
- Length: All columns are of equal length
Data Loading
To
select a data file click the Choose data file button. You now get
a file dialog where you can navigate to your .csv or .txt file.
Once loaded you can see a preview of your data in the table. If you have a delimiter different then TAB you have to specify the delimiter, which will automatically update the preview.
If you have a header in your file, adjust the number of ignored lines by editing the Ignore lines in header text field.
Data can be logarithmized (log10 and log2 are available). By using this filter, the original data remains untouched, and you will also see the original values in the visualization. However, visualizations, for example diagrams will display the data logarithmized.
The min and max options are disabled by default. If you want to clip your data to a certain range use min and max to do that. All values greater then max resp. smaller than min will be displayed as max resp. min in the visualizations.
If you have data that you don't want to import uncheck the checkbox on to of the column. You have to uncheck any columns containing text etc.
Now click Finish and Caleydo will begin to load the data.
Scalability
The performance of Caleydo varies with the hardware you use, the type of your data and the size of the data.
Loading of large data files, of the mappings and the pathway graphs needed for genetic pathway analysis takes some time at startup. We have experienced long startup times on notbook hard-drives with 5400 RPM disk rotation speed.
To reduce startup time uncheck the Load Kegg and Biocarta pathway data box if you don't need pathways, this should make startup faster by about 30%.
General data analysis mode is faster on startup, because it does not need to load mappings for the different gene identifiers.
Caleydo uses about 200MB of RAM. Make sure that you have sufficient RAM available, since paging (the use of your hard-drive as virtual RAM) makes the application very slow. Try to close other applications if this happens.
Since Caleydo uses 3D graphics extensively good graphics hardware is essential. At least 64MB dedicated Video RAM are reccomended. We reccomend NVidia graphics card, since all developers use them and they therefore are the best-tested. ATI and Intel cards should also work, but are tested less frequently.
A major issue is the quality of the graphics cards drivers. We have seen many problems solved once our users updated their graphics cards drivers to the latest versions.
If you have a slow graphics card, you can reduce the number of samples you want to show in the parallel coordinates. The default value is set to 1000 (File > Preferences > General). When reducing it down to 200, the parallel coordinates can still provide meaningful results while requiring much less preformance.
The quality of your CPU is mainly important for clustering, where good hardware can speed up the process significantly.
Generally, to explore a whole genome set with about 40 experiments, such as the sample dataset, we recommend a computer not build before 2006, with a hard drive with at least 7500 RPM, at least 2GB ram and 128 MB VRAM on a NVidia Geforce 7 or a ATI Radeon R500.
Smaller datasets can be explored with significantly less recent hardware.
| Next: The user interface > |
website maintained by Marc
Streit and Alexander
Lex
last updated on 2010-02-17