Differential gene expression (DEG) analysis
Purpose of DEG analysis
DEG analysis aims to identify features (e.g. genes, proteins) with altered expression levels across different experimental conditions or sample groups, providing valuable insights into biological processes affected by specific contexts (e.g. disease, treatment, or developmental stage).
DEG methods in PanHunter
DESeq2 and limma are available in PanHunter. They are two go-to tools for researchers studying feature expression patterns and regulatory mechanisms. Their robustness, accuracy, and widespread adoption make them valuable assets in the field of bioinformatics.
DESeq2
How does DESeq2 work?
DESeq2 operates through the following steps:
-
Data Preparation: DESeq2 takes raw count data and models them using a negative binomial distribution. The data are then normalized to account for differences in sequencing depth and library size using the median-of-ratios method.
-
Variance Stabilization: DESeq2 stabilizes the estimated dispersion of individual feature expression by sharing information between the features with similar expression levels. The strength of the stabilization is influenced by both the global and individual variance estimated by the Empirical Bayes method.
-
Statistical Models: DESeq2 employs generalized linear models to account for multiple factors (e.g. batch effects) influencing expression profiles.
-
Differential Expression Testing: DESeq2 compares feature expression levels between conditions (e.g. treatment vs. control) and identifies features that show significant differences. Wald tests are used to test for significance.
-
Results: DESeq2 provides lists of differentially expressed features along with statistical significance (adjusted p-values) for robust downstream interpretation.
For more information, see the detailed documentation - DESeq2: Differential gene expression analysis based on the negative binomial distribution.
limma
How does limma work?
limma (Linear Models for Microarray Data) is a versatile tool initially designed for analyzing gene expression data from microarrays. However, it has found utility beyond microarrays in other high-throughput sequencing data analyses. Here’s how it operates:
-
Normalization: Limma takes log-norm transformed expression data as input, assuming a normal distribution. Raw counts can also be submitted to limma via the VOOM transformation.
-
Variance Stabilization: Limma uses the mutual information of the feature expressions to moderate the residual variances. This means that the estimated variation of a single feature is stabilized to the global feature variance.
-
Statistical Models: It employs feature-wise linear models to account for multiple factors. This allows differential analysis across multiple contrast levels.
-
Differential Expression Testing: Limma compares feature expression levels between conditions (e.g. treatment vs. control) and identifies features that show significant differences. Moderate t-statistics, adjusted for multiple testing, are used to estimate the significance of differential expression analysis results.
-
Results: Limma provides lists of differentially expressed features along with statistical significance for robust interpretation. Estimated standard errors can also be extracted from the results of limma analyses.
For more information, see the detailed documentation - limma.
How does PanHunter calculate standard error (SE)?
Since the standard error (SE) is not directly given by the limma tool, PanHunter generates a matrix containing the standard errors for each coefficient and feature as suggested by Gordon Smyth. For more information, see this discussion - Standard error and effect size from Limma.
How to choose between DESeq2 and limma for DEG analysis?
DESeq2 is robust for small sample sizes and is designed for count-based data only (e.g. RNA-Seq data). Limma takes continuous numbers and is therefore applicable to proteomics data. Limma consumes fewer computational resources and may be a better option for large transcriptomics data.
In PanHunter, DESeq2 is selected by default for transcriptomics data, and limma for proteomics and microarray data. The DEG analysis method can be customized in the New Comparison app according to scientific needs.