2.3 Loading a CAGE result bed file
2.4 Looking at the Correlation table
3.2 Where can I find the source code?
3.4 Could you implement this new feature? | I want to report a bug.
3.5 I have uploaded a CAGE result bed file. How do I remove it?
3.6 What happens to the files I upload?
To access the application, click on the Use Application button on the top bar. The step-by-step instructions for using the application are below.
HeatRNAseq runs some quite intensive tasks on both server and client sides. It thus needs a reasonably recent browser on a reasonably fast computer, tablet of phone. Here are some general suggestions concerning performance:
The first step is to select a dataset to work with. At the moment, there are 2 datasets available: CAGE experiments from FANTOM5 in human (hg19) and mouse (mm9).
You can now upload a CAGE result bed file. Note that this is not mandatory, and you can jump to section 2.5 if you simply want to browse the selected dataset. The application accepts a six column tab-delimited text file following a bed format, and requires at least 6 columns: first column should be chromosome name (chr1, chr2, chrX, etc.), the second genomic region start coordinate, the third genomic region end coordinate, the fourth region name (not used, but required), the fifth a score CAGE expression value, such as RPM values or tag count, and the sixth strand information. Any additional column will not be considered. Please, make sure to untick the My bed file contains a header (does first line of the file contains column name?) option if your file does not contain a header. You can also fill the Name of your experiment field which will modify the label of your experiment.
HeatRNAseq computes Pearson's correlation coefficient between experiments, after scaling them using log10(expression value + 1). Don't upload log-scaled values as the application will do it itself.
The maximum size you can upload is 10 Mb. If your bed file is larger than that, try keeping only the first six columns of it to reduce file size. If after removing the non-essential columns the file is still bigger than 10 Mb, contact us (replace at with @).
Once the file is uploaded, a subtle progress bar will appear on top of the page, and quick description of the on-going steps can be found on the top right of the page. It should take a minute to process 100.000 regions. The My bed file tab will display a view of the uploaded file.
The easiest way to remove a bed file without uploading a new one is to retrun to the main page and open a new HeatCAGEseq window.
First lines of your file should look like this:
chr | start | end | name | rpm | strand |
---|---|---|---|---|---|
chr2 | 25486325 | 25486487 | CAGEpeak_1 | 458.12 | + |
chr6 | 5896321 | 5896380 | CAGEpeak_2 | 25.03 | + |
chr6 | 223541 | 223602 | CAGEpeak_3 | 1.23 | - |
chr17 | 5012035 | 5012100 | CAGEpeak_4 | 45.3 | + |
chr21 | 960032 | 960098 | CAGEpeak_5 | 8.70 | - |
You can download this example file. It is a CAGE experiment from mouse liver cells, kindly provided by Dr. Carninci and colleagues. Use the mouse dataset, and untick the "My peak file contains a header" option. It corresponds to the WT1 sample from GEO dataset GSE60982 supporting this study. It is quite easy to rebuild. Goe to the GEO page and download the GSE60982_Expression.mm9.HCC.49096peaks.txt.gz file. Extract the archive, and replace the fifth column by values contained in the thirteenth column, then delete every column but the six first. The file is now ready to be upload to HeatCageSeq.
If you uploaded a bed file, the Correlation table tab should now display a 3 column table. The first column contains the name of experiments in the selected dataset, the second column contains the correlation coefficient between the uploaded coordinates and the corresponding experiment in the dataset. A third column contains scaled correlation values (discussed in section 2.10); this column is not visible by default. At the bottom of the page, you will find a Save as tab delimited .txt download button, allowing you to download a copy of the correlation table as a tab delimited text file. Those files can be read by many spreadsheet programs including Microsoft Excel.
You can copy an experiment name from the correlation table and paste it into the Search field of the Samples metadata tab to see the metadata available for that experiment.
The heatmap can take up to a minute to be displayed, and a bit longer if you are importing a new peak file or switching datasets.
The static heatmap tab will display a clustered heatmap representation of the correlation matrix: each row and column represent a single experiment. The colour legend can be found below the heatmap: white represents a correlation coefficient equal to 0, red equals to 0.5, and black equals to 1. Several options are available on the side bar panel on the left, 3 - Plot customixation. In the order of appearance:
If less than three experiments from the dataset match your criteria, the heatmap will not be displayed. You can look for available experiments in the Samples metadata tab.
The uploaded experiment is not affected by the filters and will always be displayed in the heatmap.
If you see an error message: Figure margins too large, try reducing the size of the margin as well as the size of the labels.
Buttons Save as png, Save as pdf and Save as svg can be found below the heatmap to export the image in those formats. Button Export data as tab delimited .txt exports the heatmap data as a text file. To download the colour key, right click on it and select Save Image As...
The responsive heatmap tab displays an interactive plot provided through the plot.ly API. It can take about a minute to be displayed on powerful computer running recent version of Firefox, Chrome, Safari, Internet explorer or Edge. It represents the same version of the Static heatmap presented in the previous section, without the side dendrograms (trees). Most of the options are the same as for Static heatmap, so please check section 2.5 for more information. Additional options are available on the responsive heatmap itself:
If the Highlight my experiment in the heatmap option is enabled, the z value of the highlighted cells will be offset by 2: a correlation value of 0.75 will have a z-value of 2.75.
The Tree tab will display only the dendrogram (or Tree) from the experiment clustering. Options are mostly similar to the one for static heatmap, please refer to section 2.5. The Highlight my experiment in the heatmap option will not highlight your experiment in the dendrogram.
If you see a red, unfriendly error message: Figure margins too large, try reducing the size of the margin as well as the size of the name size.
The Pairwise plot tab will display a scatter plot of two selected experiments where each dot represents a CAGE peak. Under the Plot customisation section on the left of the page, you can:
Spearman correlation coefficient is independent of scaling, as scaling methods proposed here do not change the order of the data. All log transformations of the data have the same Pearson correlation coefficient as log in one base is a linear transformation of log in a different base.
Further bellow are buttons to download the scatter plot as png, pdf or svg (recommended), as well as to export a tabulated text file containing the data used.
The sample metadata tab will display metadata information (experiment name, cell type, url of the original data, etc.) for the selected dataset. The table is sortable and searchable, and can be downloaded as a tab-delimited txt file using the save button below the table.
You can copy an experiment name from the correlation table and paste it into the Search field of the Samples metadata tab to show all the metadata we have for that experiment.
Sometimes, the maximum correlation of a user peak file with any experiment in the dataset can be quite low. In some cases, when the user is confident that the top hits are relevant, this may be evidence of strong "batch effect" that could reflect an artefact of library preparation method or diferences in bioinformatic pipeline. The low correlation values might bias the clustering due to Long Branch Attraction. We provide an option to correct for this bias using the Linear scaling method on the Uploaded experiment correlation correction option. The correlation values will be linearly up-scaled so the maximum correlation value will now be equal to the value of the Maximum expected correlation value for linear scaling correction slider (default 0.95). The resulting transformed correlation value can be obtained from the scaledCorrelation column of the Correlation table.
The linear scaling of correlation value does not change the ordering of the values, it only scales the value (i.e. the third most correlated experiment without scaling will still be third with scaling).
Dataset | Organism | Number of experiments | download date | data from |
---|---|---|---|---|
Fantom5 | human | 1058 | 2015-11-04 | Fantom5 |
Fantom5 | mouse | 490 | 2015-11-04 | Fantom5 |
If you would like us to add a dataset, or to update an existing one, please contact us (replace at with @). A well curated dataset can be implemented / updated within a working day.
Please cite this paper: Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data
The source code is available on GitHub.
Please contact us (replace at with @). A well curated dataset can be implemented / updated within a working day.
Please contact us (replace at with @). We will be very pleased to consider implementing any feature that will improve the usability of this application.
To remove a bed file without uploading a new one, the simplest method is to open a new HeatCAGEseq session by going Back to the main page. Refreshing the HeatCAGEseq page may result in slightly erratic outcome. One can always replace it with any other bed file clicking the Browse button again.
Uploaded files are stored in a temporary folder in our server. They are automaticaly deleted once the R session expires (i.e. when you close the window).
HeatCAGEseq is a part of Heat*Seq, an attempt to make genome-wide comparison of high throughput sequencing experiments easier. It was developed by Guillaume Devailly, Anna Mantsoki and Anagha Joshi at the Roslin Institute, and funded by the Biotechnology and Biological Sciences Research Council. It uses R shiny, plot.ly, and various CRAN and Bioconductor packages, and datasets from FANTOM5. Sources are available on GitHub.