    Search both TCGA-GBM and TCGA-LGG dataset by genes and compare results between high and low grade gliomas from both gene and protein expression data.
    All RNA-seq data in this module for both cancer studies were downloaded from TCGA via GDC data portal and the PANCAN32 L4 RPPA (Reverse phase protein array) protein expression data from TCPA (The Cancer Proteome Atlas) portal.
    The Genes module gives access to tools for a comparative analysis of gene(s) of interest between two TCGA brain cancer studies: TCGA-GBM vs TCGA-LGG.

    It has three main panels: the gene search panel, the participant panel, the experiment panel. The gene search panel presents options to search genes by identifiers, either gene symbol or gene entrez id. . The experiment panel displays what kind of experimental data are presently available for visualization. For now these consist of gene expression and protein expression data. Gene expression data came from TCGA on illumina platforms and protein expression data from TCPA on RPPA platforms. Once options have been selected for each of the panels, pressing the "Search" button initiates a search and displays the search results.
    • Searching genes by identifier
      The identifiers that can be used to search genes are "Symbol," which is the official HUGO symbol of the gene, or any known synonyms, "ID", or a "Locus tag."
      Gene ID: 7157, 1956
      Symbol: TP53, EGFR
      Lists of terms can be entered either manually in the text area, or uploaded from a text file. The terms in a list can be separated by any space character, commas, or semicolons.
    • Displaying a summary of experimental data associated with selected genes
      Once the gene search term list has been entered, and the desired expression data options have been selected, pressing the "Search" button displays a summary of the experimental data from both gene and protein associated with the genes of interest. The following screen shows the results obtained from filling the search form as shown in Genes above.

      First, the table shows a summary of the gene expression data followed by protein data in similar format, including: the gene ID, official symbol, the chosen platform, the number of samples arrayed, antibody for protein result and a quick overview of the expression distribution characteristics. For example, for gene EGFR between TCGA-GBM/TCGA-LGG, the mean of the distribution is 127.87/54.97, and the standard deviation is 181.65/123. To get an idea of how close to normal the distribution is, the means of the two tails (areas outside the IQR range of [-0.675sd,0.675sd]) and their size (the numbers of samples in the Down and Up range) are also displayed.
      Second, the table contains links to external gene information from the NCBI Gene database, which can be useful for obtaining more information about the gene, as well as disambiguating in cases when you search by gene synonym and you do not recognize the search term in the displayed results.
      Third, the table displays the visualization options available to you at this point. These are Performing single-gene survival analysis, both of which are accessed by pressing the image under the header "Plots" for the gene of interest. When the results indicate availability of data for two or more genes (as is the case in the example above), visualization options also include Displaying heatmap clustering of gene expression data for selected genes, which is accessed by pressing the heatmap image for the platform of choice under the table.
    • Performing single-gene survival analysis
      The table for Displaying a summary of experimental data associated with selected genes contains a column called "Plots", which lists for each gene links for visualization of data associated with the gene and platform in question. Pressing the link associated with "EGFR" brings up a new window whose lower half displays survival analysis graphs:

      The row of graphs contains 3 plots: the first is a Kaplan-Meier survival curve comparison for samples classified according to their cancer subtype. Note that this graph is not gene-specific. The next two plots show survival analysis for each cancer subtype group (TCGA-GBM and TCGA-LGG). In each case the samples are further stratified according to gene expression levels. The default options display analysis for stratification of samples into two groups: those with (EGFR) expression levels smaller than the median over the subgroup, and those with higher than median expression levels. A p-value for the significance of difference between the two resulting curves is also displayed.
      The survival analysis can be customized according to the options shown in the form. one may choose to stratify the samples not just down the median of expressions, but say to compare samples with extreme expression of the gene of interest (for example, the lowest quartile, versus the highest quartile). To rerun the analysis with new parameters, make the desired choices and press the "Remodel" button.
    • Displaying heatmap clustering of gene expression data for selected genes
      For gene searches that result in multiple hits, an option to display a heatmap -- clustering of the gene expressions according to their similarity, is made available. This is presented in the form of a heatmap icon following the table Displaying a summary of experimental data associated with selected genes. Clicking the icon in our example, displays the following screen.

      The samples (columns of the heatmap) are annotated in two ways: first, according to survival status; and second, according to cancer subtypes
  3. Frequently Asked Questions

Type Data Source Platform Cancer # Samples Level
Expression-Genes TCGA Illumina HiSeq GBM 173 3
Expression-Genes TCGA Illumina HiSeq LGG 529 3
Expression-Proteins TCPA RPPA GBM 205 4
Expression-Proteins TCPA RPPA LGG 427 4