GBM Bio Discovery Portal

Overview
Genes
Frequently Asked Questions
- Why is my gene search producing no results?
- Who can I contact with technical questions or difficulties?

Overview
Search both TCGA-GBM and TCGA-LGG dataset by genes and compare results between high and low grade gliomas from both gene and protein expression data.
All RNA-seq data in this module for both cancer studies were downloaded from TCGA via GDC data portal and the PANCAN32 L4 RPPA (Reverse phase protein array) protein expression data from TCPA (The Cancer Proteome Atlas) portal.

Genes
The Genes module gives access to tools for a comparative analysis of gene(s) of interest between two TCGA brain cancer studies: TCGA-GBM vs TCGA-LGG.

It has three main panels: the gene search panel, the participant panel, the experiment panel. The gene search panel presents options to search genes by identifiers, either gene symbol or gene entrez id. . The experiment panel displays what kind of experimental data are presently available for visualization. For now these consist of gene expression and protein expression data. Gene expression data came from TCGA on illumina platforms and protein expression data from TCPA on RPPA platforms. Once options have been selected for each of the panels, pressing the "Search" button initiates a search and displays the search results.

Searching genes by identifier
The identifiers that can be used to search genes are "Symbol," which is the official HUGO symbol of the gene, or any known synonyms, "ID", or a "Locus tag."

Gene ID:	7157, 1956
Symbol:	TP53, EGFR

Lists of terms can be entered either manually in the text area, or uploaded from a text file. The terms in a list can be separated by any space character, commas, or semicolons.

Displaying a summary of experimental data associated with selected genes
Once the gene search term list has been entered, and the desired expression data options have been selected, pressing the "Search" button displays a summary of the experimental data from both gene and protein associated with the genes of interest. The following screen shows the results obtained from filling the search form as shown in Genes above.

First, the table shows a summary of the gene expression data followed by protein data in similar format, including: the gene ID, official symbol, the chosen platform, the number of samples arrayed, antibody for protein result and a quick overview of the expression distribution characteristics. For example, for gene EGFR between TCGA-GBM/TCGA-LGG, the mean of the distribution is 127.87/54.97, and the standard deviation is 181.65/123. To get an idea of how close to normal the distribution is, the means of the two tails (areas outside the IQR range of [-0.675sd,0.675sd]) and their size (the numbers of samples in the Down and Up range) are also displayed.
Second, the table contains links to external gene information from the NCBI Gene database, which can be useful for obtaining more information about the gene, as well as disambiguating in cases when you search by gene synonym and you do not recognize the search term in the displayed results.
Third, the table displays the visualization options available to you at this point. These are Performing single-gene survival analysis, both of which are accessed by pressing the image under the header "Plots" for the gene of interest. When the results indicate availability of data for two or more genes (as is the case in the example above), visualization options also include Displaying heatmap clustering of gene expression data for selected genes, which is accessed by pressing the heatmap image for the platform of choice under the table.
Performing single-gene survival analysis
The table for Displaying a summary of experimental data associated with selected genes contains a column called "Plots", which lists for each gene links for visualization of data associated with the gene and platform in question. Pressing the link associated with "EGFR" brings up a new window whose lower half displays survival analysis graphs:

The row of graphs contains 3 plots: the first is a Kaplan-Meier survival curve comparison for samples classified according to their cancer subtype. Note that this graph is not gene-specific. The next two plots show survival analysis for each cancer subtype group (TCGA-GBM and TCGA-LGG). In each case the samples are further stratified according to gene expression levels. The default options display analysis for stratification of samples into two groups: those with (EGFR) expression levels smaller than the median over the subgroup, and those with higher than median expression levels. A p-value for the significance of difference between the two resulting curves is also displayed.
The survival analysis can be customized according to the options shown in the form. one may choose to stratify the samples not just down the median of expressions, but say to compare samples with extreme expression of the gene of interest (for example, the lowest quartile, versus the highest quartile). To rerun the analysis with new parameters, make the desired choices and press the "Remodel" button.
Displaying heatmap clustering of gene expression data for selected genes
For gene searches that result in multiple hits, an option to display a heatmap -- clustering of the gene expressions according to their similarity, is made available. This is presented in the form of a heatmap icon following the table Displaying a summary of experimental data associated with selected genes. Clicking the icon in our example, displays the following screen.

The samples (columns of the heatmap) are annotated in two ways: first, according to survival status; and second, according to cancer subtypes

Frequently Asked Questions
- Why is my gene search producing no results?
  There are several possibilities for why a gene search may produce no results.
  - First, make sure that you have selected the correct identifier type (Symbol, ID). Recall that the default selection is "Symbol."
  - Second, your gene of interest is not featured in any or both of the TCGA studies currently in the database.
- Who can I contact with technical questions or difficulties?
  For any technical questions or difficulties, please contact us

Type	Data Source	Platform	Cancer	# Samples	Level
Expression-Genes	TCGA	Illumina HiSeq	GBM	173	3
Expression-Genes	TCGA	Illumina HiSeq	LGG	529	3
Expression-Proteins	TCPA	RPPA	GBM	205	4
Expression-Proteins	TCPA	RPPA	LGG	427	4

Cancer type: TCGA-GBM vs TCGA-LGG

mRNA Expression

Protein Expression