Enrichment analysis
What is enrichment analysis?
Gene set enrichment analysis (GSEA) is a computational method used to evaluate the associations of a list of differentially expressed genes to a collection of pre-defined gene sets, where each gene set has a specific biological meaning. Once differentially expressed genes are significantly enriched in a gene set, the conclusion is made that the corresponding biological meaning (e.g. a biological process or a pathway) is significantly affected.
Enrichment Analysis helps uncover biologically relevant patterns in large-scale omics data, providing insights into the collective behaviour of functionally related genes, revealing potential biological processes associated with the observed changes in gene expression.
Enrichment Analysis methods need a statistical test to determine whether the predefined gene sets are statistically enriched. As input, enrichment analysis methods usually require either a list of significant differentially expressed genes and predefined gene sets, which can be obtained from various sources such as pathway databases, or curated collections of genes associated with specific functions. An overview of methods and input data used in PanHunter is explained below.
Enrichment analysis in PanHunter
Enrichment analyses in PanHunter are performed during comparison calculation. After differential expression analysis is performed and the newly generated comparison is successfully saved in PanHunter, post-processing steps are automatically performed. The full list of all post-processing steps can be found here. Gene ontology (GO) and pathway enrichment are performed among those. Additionaly, enrichment analysis can be performed in the Enrichment visualization app.
Algorithms and methods used for enrichment in PanHunter are detailed below.
Comparison post-processing
Gene Ontology (GO) Enrichment
The GO term enrichment in PanHunter utilizes evoGO package, developed by Evotec International GmbH, in conjunction with the weight algorithm. The evoGo algorithm with weight option performs down weighting of genes attributed to a term in all its ancestor terms in case the term is more significant that the corresponding ancestor. This approach decreases redundancy in the resulting list of significantly enriched terms. Full documentation on evoGO package is available here.
Correction for multiple testing are performed with Benjamini-Hochberg method (described here) resulting in reported FDRs.
For the calculation, PanHunter uses the following input data:
- The latest version of Gene Ontology from GO Consortium website and GO term annotation from Ensembl database
- List of features generated by differential expression analysis and reported in the comparison
The results of post-processing analyses for specific comparison can be found and explored in the TopTables app, as described here.
Pathway Enrichment
The pathway enrichment analysis uses Fisher’s test, Wilcoxon test and Kolmogorow-Smirnow test to perform analysis. P-values as well as p-values adjusted for multiple testing in form of FDR are reported. Benjamini-Hochberg methodology is used for adjustment.
For the calculation, PanHunter uses the following input data:
- The latest version of human pathways from Wikipathways, an open platform and database for creating, curating, and sharing biological pathways.
- List of features generated by differential expression analysis and reported in the comparison.
The results of post-processing analyses for specific comparison can be found and explored in the TopTables app.
What is the difference between Fisher’s test, Wilcoxon test and Kolmogorow-Smirnow test?
The Fisher test is used for dichotomous variables and is suitable for smaller sample sizes.
The Wilcoxon test and the Kolmogorov-Smirnov (KS) test are used for continuous or ordinal data. The main difference between them is that the Wilcoxon test is more sensitive to differences in location of the distributions (such as median or mean), while the KS test is more sensitive to differences in shape of the distributions (such as variance or skewness).
Furthermore, Fisher’s test assesses the overrepresentation of a specific pathway based on the number of significantly regulated features, whereas Wilcoxon and KS test do the assessment on the logFC level.
Enrichment Visualization App
Although enrichment analysis is performed together with post processing steps during the comparison calculation, a dedicated Enrichment Visualization App provides user with additional options to customize and visualize gene set enrichment analysis outputs. To learn more about usage of the app, have a look at detailed documentation on the Enrichment Visualization App available here.
Enrichment with specified gene set collections is performed by Fisher’s test reporting p-values, with the exception being the GO term enrichment analysis which uses evoGo package in combination with Weight method, as described here (link).
Additionally, the Bonferroni correction reporting the family-wise error rate (FWER) and Benjamini-Hochberg and Benjamini-Yekutieli methods reporting false discovery rates (FDRs) are performed.
The output of the enrichment analysis and different visualization options available within this app are described in the app documentation.