Heat*seq
  • Use application
  • Instructions

HeatRNAseq

1 - Select a dataset

2 - Load your data (optional)

Upload a tab delimited text file of at least two columns. First column should contain gene id (Ensembl or Flybase), second column should contain normalised expression value (i.e. FPKM or TPM). Maximum size: 10MB.

First lines of your file should look like this:

geneID tpm
ENSG00000134046 120.12
ENSG00000141644 0
ENSG00000169057 85.24
ENSG00000174282 0.54
ENSG00000187098 42

You can download an example file . It is a mouse RNA-seq experiment from the brain, with a header.

The example file is a mouse RNA-seq from the brain. Please select only mouse datasets.

3 - Plot customization

  • My expression file
  • Correlation table
  • Static heatmap
  • Responsive heatmap
  • Tree
  • Pairwise plot
  • Samples metadata
Save as tab delimited .txt
Save as tab delimited .txt
Save as png Save as pdf Save as svg Export data as tab delimited .txt
Save as png Save as pdf Save as svg
Save as png Save as pdf Save as svg Export data as tab delimited .txt
Save as tab delimited .txt

HeatRNAseq

1 Contents

2 Instructions

2.1 Remarks on performance

2.2 Selecting a dataset

2.3 Loading an expression file

2.4 Looking at the Correlation table

2.5 Static heatmap

2.6 Responsive heatmap

2.7 Tree

2.8 Pairwise plot

2.9 Sample metadata

2.10 Correlation correction

2.11 About the datasets

3 FAQ

3.1 How to cite?

3.2 Where can I find the source code?

3.3 Can you add this dataset on HeatRNAseq? One of your datasets does not seem up to date, can you update it?

3.4 Could you implement this new feature? | I want to report a bug.

3.5 I have uploaded a peak file. How do I remove it?

3.6 What happens to the files I upload?

4 About

HeatRNAseq screenshot

2 Instructions

To access the application, click on the Use Application button on the top bar. The step-by-step instructions for using the application are below.

2.1 Remarks on performance

HeatRNAseq runs some quite intensive tasks on both server and client sides. It thus needs a reasonably recent browser on a reasonably fast computer, tablet of phone. Here are some general suggestions concerning performance:

  • Most of the tasks are queued: changing parameters during loading does not stop current tasks but puts them on hold until all previous tasks are completed. The direct consequence is that some buttons might look non-responsive because the application is running a task in the background (loading bar on top of the webpage). We suggest to wait until current tasks are fully executed before changing parameters.
  • One particular intensive task is the Interactive heatmap, we recommend using the static heatmap first while changing options and filters before switching to the interactive heatmap.
  • If the application shows a grey filter, it means the connection with the server was lost, sorry. :( Nothing can be done but re-launching the application. Refreshing the page usually keeps options as they were, while opening a new web page will start the application with default values.

2.2 Selecting a dataset

The first step is to select a dataset to work with. At the moment, there are 9 datasets available: RNA-seq from ENCODE in human and mouse, from Bgee in human and mouse, from BLUEPRINT in human, from Roadmap epigenomics in human, from GTEx in human (available as either a summary dataset or a complete dataset), and from FlyBase in drosophila.

2.3 Loading an expression file

You can now upload an expression file. Note that this is not mandatory, and you can jump to section 2.5 if you simply want to browse the selected dataset. The application accepts a two column tab-delimited text file: the first column should be Ensembl gene id (ENSG00000134046, ENSMUSG00000024513), or FlyBase for drosophila (FBgn0000003), the second should be the expression value. We recommend using Transcript per million (TPM) or Fragment per kilobase pes million (FPKM) metrics. Any additional column will not be considered. Please, make sure to untick the My expression file contains a header (does first line of the file contains column name?) option if your file does not contain a header. You can also fill the Name of your experiment field which will modify the label of your experiment.

HeatRNAseq computes Pearson's correlation coefficient between experiments, after scaling them using log10(expression value + 1). Don't upload log-scaled values as the application will do it itself.

The maximum size you can upload is 10 Mb. If your expression file is larger than that, try keeping only the first two columns of it to reduce file size. If after removing the non-essential columns the file is still bigger than 10 Mb, contact us (replace at with @).

Once the file is uploaded, a subtle progress bar will appear on top of the page, and quick description of the on-going steps can be found on the top right of the page. It should take less than a minute. The My expression file tab will display the uploaded gene expression table.

The easiest way to remove an expression file without uploading a new one is to return to the main page and open a new HeatRNAseq window.

First lines of your file should look like this:

geneID tpm
ENSG00000134046 120.12
ENSG00000141644 0
ENSG00000169057 85.24
ENSG00000174282 0.54
ENSG00000187098 42

You can download this example file. It is a mouse brain RNA-seq experiment, kindly provided by Dr. Arbeitman. Use a mouse dataset (for example Bgee mouse), and leave the My expression file contains a header option ticked. It corresponds to the NP1071 sample from GEO dataset GSE70732 supporting this study. It is quite easy to rebuild. From the bottom of the GEO page, download the GSE70732_Mouse_all_regions_FPKM_data_all_genes.xlsx.gz file. Extract the archive, and open the xlsx file into excel. Keep the first column, and delete all expression columns but the last one (named FPKM_NP1071_GTGGCC_R1), and then save the table as a tab-delimited text file.

2.4 Looking at the Correlation table

If you have uploaded an expression file, the Correlation table tab should now display a 3 column table. The first column contains the name of experiments in the selected dataset, the second column contains the correlation coefficients between the uploaded file and the corresponding experiment in the dataset. A third column contains scaled correlation values(discussed in section 2.10); this column is not visible by default. At the bottom of the page, you will find a Save as tab delimited .txt download button, allowing you to download a copy of the correlation table as a tab delimited text file. The file can be read by many spreadsheet programs including Microsoft Excel.

You can copy an experiment name from the correlation table and paste it into the Search field of the Samples metadata tab to see the metadata available for that experiment.

2.5 Static heatmap

The heatmap can take up to a minute to be displayed, and a bit longer if you are importing a new expression file or switching datasets.

The static heatmap tab will display a clustered heatmap representation of the correlation matrix: each row and column represent a single experiment. The colour legend can be found below the heatmap: white represents a correlation coefficient equal to 0, red equals to 0.5, and black equals to 1. Several options are available on the side bar panel on the left, 3 - Plot customization. In the order of appearance:

  • The Highlight my experiment in the heatmap tick box. When checked, if you uploaded an expression file, the row and line corresponding to your experiment will be highlighted: a 0 correlation value will be displayed in yellow, a 0.5 value in bright green, and a 1 correlation value still in black. This can help you localise your experiment in crowded heatmaps.
  • One or several subset options: You can restrict the experiments to be displayed using those fields. They may vary a bit between datasets. You can select experiments done in one or several cell types. Leaving the field empty will select all experiments.

If less than three experiments from the dataset match your criteria, the heatmap will not be displayed. You can look for available experiments in the Samples metadata tab.

The uploaded expression file is not affected by the filters and will always be displayed in the heatmap.

  • The function of the Uploaded experiment correlation correction field and the Maximum expected correlation value for linear scaling correction slider is explained in section 2.10.
  • The Advanced clustering options button will reveal the Distance calculation and Clustering method options. The selected values are passed to the R dist() and hclust() method parameter. A special case is the 1 - Pearson's correlation coefficient distance method: instead of using the dist() function on correlation values, it uses 1 - correlations as a measure of distance between experiments (a correlation of 0.75 will give a distance of 0.25).
  • Label and margin size can be decided manually: The Sample name size slider will modify label size. The Sample name margin will modify the size of the margin of the plot. Two scrolling menus will allow to show and hide row and/or column dendrograms and sample labels.
  • The Customise colour button will revealed items allowing to change the colour key used for the heatmap. Four threshold values and associated colours can be specified (by default, 0.25 is blue, 0.5 is white, 0.75 is red, and 1 is black). Values need to be strictly increasing. To apply changes, click the Apply colour changes button.

If you see an error message: Figure margins too large, try reducing the size of the margin as well as the size of the labels.

Buttons Save as png, Save as pdf and Save as svg can be found below the heatmap to export the image in those formats. Button Export data as tab delimited .txt exports the heatmap data as a text file. To download the colour key, right click on it and select Save Image As...

2.6 Responsive heatmap

The responsive heatmap tab displays an interactive plot provided through the plot.ly API. It can take about a minute to be displayed on powerful computer running recent version of Firefox, Chrome, Safari, Internet explorer or Edge web browsers. It represents the same version of the Static heatmap presented in the previous section, without the side dendrograms (trees). Most of the options are the same as for Static heatmap, so please check section 2.5 for more information. Additional options are available on the responsive heatmap itself:

  • Mouse over information: when hovering the mouse over the heatmap, a text box will appear, giving you the name of the experiment in the x-axis, the name of the experiment in the y-axis, and their correlation value (z).

If the Highlight my experiment in the heatmap option is enabled, the z value of the highlighted cells will be offset by 2: a correlation value of 0.75 will have a z-value of 2.75.

  • One can zoom on the heatmap by dragging and dropping the mouse defining a rectangle. To zoom out, double-click on the heatmap or click the Reset axis or Autoscale buttons on the top left corner of the heatmap.
  • By selecting the Pan button on the top left corner of the heatmap, dragging and dropping the mouse will allow to pan, which can be useful once you have zoomed in a specific part of the heatmap.
  • The Download plot as a png button seems non-functional (blame plot.ly). Please use the one under the Static heatmap or look at the next step.
  • The Save and edit plot in the cloud will send the data to plot.ly where, after the loading, one can play with many tools offered by plot.ly, such as export a JSON version of the data, save the plot as a png, change the theme (and notably the colour scale), etc.

2.7 Tree

The Tree tab will display only the dendrogram (or Tree) from the experiment clustering. Options are mostly similar to the one for static heatmap, please refer to section 2.5. The Highlight my experiment in the heatmap option will not highlight your experiment in the dendrogram.

If you see a red, unfriendly error message: Figure margins too large, try reducing the size of the margin as well as the name size.

2.8 Pairwise plot

The Pairwise plot tab will display a scatter plot of two selected experiments where each dot represents a gene. Under the Plot customisation section on the left of the page, you can:

  • choose experiment 1 and experiment 2.
  • choose the plot type, between XY (where x-axis will be expression level in experiment 1, and y-axis expression level in experiment 2) and MA (where x-axis will be mean expression level, and y-axis expression level in experiment 1 minus expression level in experiment 2).
  • choose a different scaling to apply to the expression data. By default, if
    e
    is the expression value, HeatRNAseq apply a logscale log10(e + 1) before computing the Pearson correlation coefficient. You can select different scaling here to see the impact on the Pearson correlation coefficient.
  • add or remove a linear regression line (blue).
  • add or remove a guide line (red), which will be y=x in XY plot, and y=0 in MA plot.
Below the scatterplot should be the Pearson and Spearman correlation coefficients between the two experiments, computed after scaling.

Spearman correlation coefficient is independent of scaling, as scaling methods proposed here do not change the order of the data. All log transformations of the data have the same Pearson correlation coefficient as log in one base is a linear transformation of log in a different base.

Further below are buttons to download the scatter plot as png, pdf or svg (recommended), as well as to export a tabulated text file containing the data used.

2.9 Sample metadata

The sample metadata tab will display metadata information (experiment name, cell type, url of the original data, etc.) for the selected dataset. The table is sortable and searchable, and can be downloaded as a tab-delimited txt file using the save button below the table.

You can copy an experiment name from the correlation table and paste it into the Search field of the Samples metadata tab to show all the metadata we have for that experiment.

2.10 Correlation correction

Sometimes, the maximum correlation of a user peak file with any experiment in the dataset can be quite low. In some cases, when the user is confident that the top hits are relevant, this may be evidence of strong "batch effect" that could reflect an artefact of library preparation method, quantification method or RNA integrity. The low correlation values might bias the clustering due to Long Branch Attraction. We provide an option to correct for this bias using the Linear scaling method on the Uploaded experiment correlation correction option. The correlation values will be linearly up-scaled so the maximum correlation value will now be equal to the value of the Maximum expected correlation value for linear scaling correction slider (default 0.95). The resulting transformed correlation value can be obtained from the scaledCorrelation column of the Correlation table.

The linear scaling of correlation value does not change the ordering of the values, it only scales the value (i.e. the third most correlated experiment without scaling will still be third with scaling).

2.11 About the datasets

Dataset Organism Number of experiments download date data from
Bgee human 77 2015-06-01 Bgee ftp
Bgee mouse 109 2016-01-06 Bgee ftp
Blueprint epigenome human 163 2016-02-08 EBI ftp
ENCODE human 302 2016-05-09 ENCODE data portail
ENCODE mouse 192 2016-01-22 ENCODE data portail
Roadmap Epigenomics human 57 2016-03-04 Roadmap ftp
GTEx (summary) human 53 2016-04-13 GTEx website
GTEx (all samples) human 8555 2016-04-13 GTEx website
Flybase drosophila 124 2016-03-15 Flybase ftp

If you would like us to add a dataset, or to update an existing one, please contact us (replace at with @). A well curated dataset can be implemented / updated within a working day.

GTEx (summary) data contains median gene expression values for all samples from the same tissue. GTEx (all samples) data contains over 8000 samples, too many samples to be displayed as a single heatmap, but subsets of experiments can be displayed. One can still calculate the correlation of any expression file with the entire GTEx dataset in less than 5 minutes by selecting the Correlation table tab.

3 FAQ

3.1 How to cite?

Please cite this paper: Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data

3.2 Where can I find the source code?

The source code is available on GitHub.

3.3 Can you add this dataset on HeatRNAseq? One of your datasets does not seem up to date, can you update it?

Please contact us (replace at with @). A well curated dataset can be implemented / updated within a working day.

3.4 Could you implement this new feature? | I want to report a bug.

Please contact us (replace at with @). We will be very pleased to consider implementing any feature that will improve the usability of this application.

3.5 I have uploaded an expression file. How do I remove it?

To remove an expression file without uploading a new one, the simplest method is to open a new HeatRNAseq session by going Back to the main page. Refreshing the HeatRNAseq page may result in a slightly erratic outcome. One can always replace it with any other expression file clicking the Browse button again.

3.6 What happens to the files I upload?

Uploaded files are stored in a temporary folder in our server. They are automaticaly deleted once the R session expires (i.e. when you close the window).

4 About

HeatRNAseq is a part of Heat*Seq, an attempt to make genome-wide comparison of high throughput sequencing experiments easier. It was developed by Guillaume Devailly, Anna Mantsoki and Anagha Joshi at the Roslin Institute, and funded by the Biotechnology and Biological Sciences Research Council. It uses R shiny, plot.ly, and various CRAN and Bioconductor packages, and datasets from ENCODE, Bgee, BLUEPRINT, Roadmap epigenomics, GTEx, and FlyBase. Sources are available on GitHub.