General Plot interpretation

This section explains how to interpret the plots belonging to the apps provided by the PanHunter.

For more detailed information on navigating through these plots , please go to Plot Navigation Panel documentation page

Sample QC App

Alignment Stats tab

The Alignment Stats tab summarizes key statistics from the read alignment process, helping evaluate sequencing success and mapping quality.

Interpretation

The alignment metrics help determine:

sequencing success
mapping accuracy
suitability of samples for downstream analysis.

Alignment Stats plot

For instance, the above example depicts the following metrics:

Number of Input Reads: This plot shows the distribution of total sequencing reads generated for each sample.

Higher read counts generally provide better transcriptome coverage.
The density curve highlights the most common read count values.
Samples with much lower read counts than the majority may indicate sequencing or library preparation issues.

Uniquely Mapped Reads Percent: This metric shows the percentage of reads that align uniquely to one genomic location.

High percentages are desirable, indicating reliable mapping.
Low values may indicate:
- contamination
- poor read quality
- incomplete reference genome
- repetitive sequences.

Reads Mapped to Multiple Loci Percent: This plot shows the percentage of reads mapping to multiple genomic locations. This can occur when reads originate from:

repetitive regions
paralogous genes
homologous sequences.
Lower percentages are generally better.
Very high multi-mapping rates may reduce confidence in gene quantification.

Read Distribution tab

The Read Distribution tab shows how sequencing reads are distributed across different genomic features.

Genomic Feature Categories

The X-axis displays genomic regions such as:
- Upstream regions (Up.10kb, Up.5kb, Up.1kb)
- 5′ UTR
- CDS (coding sequences)
- Introns
- 3′ UTR
- Downstream regions
The Y-axis shows read counts normalized per kilobase of genomic feature.

Read Distribution plot

Plot Type: Users can visualize distributions using:

Violin plot: shows full distribution shape
Box plot: shows median and quartiles
Points: displays individual sample values.

Coloring Options: This option groups samples according to metadata fields (e.g., Plate, CRank, Batch, etc). Important note: Only categories with 10 or fewer unique values in the current dataset subset can be used for coloring. If no such category exists, coloring will not be available.

Total Counts Option: By default, the plot shows counts normalized per kilobase of genomic features. When the Show total percentages option is selected, the plot instead displays the distribution of total counts (percentage) across genomic features.

Interpretation

This plot helps determine:

whether reads are mostly located in coding regions (expected in RNA-seq),
whether unusual read distributions occur in introns or intergenic regions, which may indicate contamination or library issues.

Gene Body Coverage tab

The Gene Body Coverage plot evaluates how evenly reads cover the length of genes.

Axes

X-axis: Gene Length (%). The gene position is represented from:
- 0% = 5′ end
- 100% = 3′ end
Y-axis: Shows normalized read coverage across gene positions.

Gene Body Coverage plot

Interpretation

A relatively flat curve indicates uniform sequencing coverage across the gene.
3′ bias: increased coverage toward the 3′ end (often due to poly-A capture methods)
5′ bias: indicates higher coverage near the 5′ end
Uneven curves between samples: shows potential library preparation differences.

Biotype tab

The Biotype plot shows the percentage of reads assigned to different gene biotypes, such as:

protein-coding genes
long non-coding RNA (lncRNA)
pseudogenes

Note: Plot type and coloring options behave the same as in the Read Distribution tab.

Biotype plot

Interpretation

This visualization helps confirm that reads map primarily to expected gene classes. Typical RNA-seq datasets show:

the majority of reads in protein-coding genes
smaller proportions in lncRNA or pseudogenes

Unexpected distributions may suggest annotation issues or contamination.

Mitochondrial tab

The Mitochondrial tab evaluates the proportion of reads mapping to mitochondrial sequences.

Three boxplots are displayed:

Mitochondrial plot

Non-Mitochondrial Reads: This plot depicts the percentage of reads mapping to nuclear genes. High values indicate good RNA-seq data quality.

Mitochondrial Reads: This plot shows the percentage of reads mapping to mitochondrial genes. High mitochondrial content may indicate:

cell stress
degraded RNA
low-quality samples

Spike-In Transcripts: This plot displays reads mapping to spike-in control RNAs.

Spike-ins are artificial RNA molecules added during library preparation and used as technical controls.
Consistent spike-in levels across samples indicate stable sequencing performance.

plateRNA-Seq tab

The plateRNA-Seq tab provides several visualizations for plate-based RNA-sequencing experiments.

Samples / Conditions

Samples and Conditions plot

Plot Settings: Users can configure the following options:

Boxplot X-Axis: Defines the grouping variable (e.g., treatment).
Boxplot Y-Axis: Helps selecting metrics such as Unmapped reads, Multimapped reads, Uniquely mapped reads, etc to display:
Boxplot Coloring: Colors samples according to metadata categories such as treatment, sequencing provider, plate position, etc

Interpretation

These boxplots help identify:

variability between treatments
batch effects
outlier samples with abnormal read counts.

Library Size and UMI Dedup

This scatter plot shows expression levels of a selected gene across samples.

Library Size and UMI Dedup plot

Interpretation

This visualization helps identify:

expression variability across samples
differences between experimental conditions
potential outliers.

Clusters of points may indicate groups of samples with similar gene expression patterns.

Plate / Library

This heatmap displays sequencing metrics arranged according to the physical plate layout.

Plate and Library plot

Plot Elements:

Rows represent plate rows.
Columns represent plate columns.
Cell color intensity indicates the value of a selected metric (e.g., mapped reads).

Interpretation

This visualization helps identify spatial artifacts, such as:

edge effects
plate position bias
systematic technical variation across wells.

For more detailed information regarding the app, please go to SampleQC app documentation page

Transcriptomics QC App

Alignments tab

The Alignment Stats tab summarizes key statistics from the read alignment process, helping evaluate sequencing success and mapping quality.

Interpretation

The alignment metrics help determine:

sequencing success
mapping accuracy
suitability of samples for downstream analysis.

Alignment plots

For instance, the above example depicts the following metrics:

Number of Input Reads: This plot shows the distribution of total sequencing reads generated for each sample.

Higher read counts generally provide better transcriptome coverage.
The density curve highlights the most common read count values.
Samples with much lower read counts than the majority may indicate sequencing or library preparation issues.

Uniquely Mapped Reads Percent: This metric shows the percentage of reads that align uniquely to one genomic location.

High percentages are desirable, indicating reliable mapping.
Low values may indicate:
- contamination
- poor read quality
- incomplete reference genome
- repetitive sequences.

Reads Mapped to Multiple Loci Percent: This plot shows the percentage of reads mapping to multiple genomic locations. This can occur when reads originate from:

repetitive regions
paralogous genes
homologous sequences.
Lower percentages are generally better.
Very high multi-mapping rates may reduce confidence in gene quantification.

Read Distribution tab

The Read Distribution tab shows how sequencing reads are distributed across different genomic features.

Genomic Feature Categories

The X-axis displays genomic regions such as:
- Upstream regions (Up.10kb, Up.5kb, Up.1kb)
- 5′ UTR
- CDS (coding sequences)
- Introns
- 3′ UTR
- Downstream regions
The Y-axis shows read counts normalized per kilobase of genomic feature.

Absolute Value: Displays absolute counts.

Read distribution plot absolute value

Relative Value: Displays proportional values (e.g., % of reads per category).

Read distribution plot relative value

Interpretation

This plot helps determine:

whether reads are mostly located in coding regions (expected in RNA-seq),
whether unusual read distributions occur in introns or intergenic regions, which may indicate contamination or library issues.

Gene Body Coverage tab

The Gene Body Coverage plot evaluates how evenly reads cover the length of genes.

Axes

X-axis: Gene Length (%). The gene position is represented from:
- 0% = 5′ end
- 100% = 3′ end
Y-axis: Shows normalized read coverage across gene positions.

Gene body coverage plot

Interpretation

A relatively flat curve indicates uniform sequencing coverage across the gene.
3′ bias: increased coverage toward the 3′ end (often due to poly-A capture methods)
5′ bias: indicates higher coverage near the 5′ end
Uneven curves between samples: shows potential library preparation differences.

Biotype tab

The Biotype plot shows the percentage of reads assigned to different gene biotypes, such as:

protein-coding genes
long non-coding RNA (lncRNA)
pseudogenes

Note: Plot type and coloring options behave the same as in the Read Distribution tab.

Biotype plot

Interpretation

This visualization helps confirm that reads map primarily to expected gene classes. Typical RNA-seq datasets show:

the majority of reads in protein-coding genes
smaller proportions in lncRNA or pseudogenes

Unexpected distributions may suggest annotation issues or contamination.

Mitochondrial tab

The Mitochondrial tab evaluates the proportion of reads mapping to mitochondrial sequences.

Three boxplots are displayed:

Mitochondrial plot

Non-Mitochondrial Reads: This plot depicts the percentage of reads mapping to nuclear genes. High values indicate good RNA-seq data quality.

Mitochondrial Reads: This plot shows the percentage of reads mapping to mitochondrial genes. High mitochondrial content may indicate:

cell stress
degraded RNA
low-quality samples

Spike-In Transcripts: This plot displays reads mapping to spike-in control RNAs.

Spike-ins are artificial RNA molecules added during library preparation and used as technical controls.
Consistent spike-in levels across samples indicate stable sequencing performance.

Plates tab

This tab provides a heatmap displays sequencing metrics arranged according to the physical plate layout.

Plates plot example

Plot Elements:

Rows represent plate rows.
Columns represent plate columns.
Cell color intensity indicates the value of a selected metric (e.g., mapped reads).

Interpretation

This visualization helps identify spatial artifacts, such as:

edge effects
plate position bias
systematic technical variation across wells.

For more detailed information regarding the app, please go to Transcriptomics QC app documentation page

Proteomics QC App

Contaminants tab

The Contaminants tab helps identify unwanted proteins (e.g., keratin, albumin, or experimental carryover)

Transformation Options: The Transformation dropdown determines how protein intensities are displayed:

Log-transformed label-free quantification values (LFQ) Intensity: Default for most downstream comparisons because it corrects for technical variability.
Summed Intensity: Sum of peptide intensities per protein. It is useful for raw abundance comparisons
Summed Intensity Scaled: Scaled version of summed intensity. It is useful for cross-sample normalization
Summed Intensity Normalized: Normalized across samples. It removes systematic biases
Intensity-based absolute quantification (iBAQ): It approximates protein abundance
LFQ Intensity Normalized: LFQ values normalized across samples

Contaminants plot

Contaminant Abundance Across Samples

This plot shows the abundance of known contaminant proteins across samples.

Interpretation

Each bar represents a sample
The y-axis shows contaminant abundance
Samples highlighted as outliers indicate unusually high contamination.
Good data: Contaminant levels are low and consistent across samples
Potential issue: A few samples show significantly higher contaminant intensity. This may indicate:
- sample preparation contamination
- handling errors
- carryover between injections

Well-Specific Quantity Rank

This visualization maps contaminant abundance to the physical plate layout to detect spatial contamination patterns.

Well-Specific Quantity Rank plot

Interpretation

Plate layout with rows (A–H) and columns
Each circle represents a well/sample.
Hovering over the circle dispays the Sample ID information.
Color scale indicates contaminant abundance rank
Dark red → lower rank
Blue → higher rank
Uniform color distribution → no spatial contamination
Clusters of similar colors → localized contamination
Edge or row patterns → pipetting or plate handling effects
Single high-intensity wells → sample-specific contamination
If spatial patterns are detected:
- Review pipetting procedures
- Check plate handling
- Inspect reagent distribution

Precursors Identified by Injection Order

This scatter plot helps detect instrument drift during LC-MS runs.

Axes and color:

X-axis: protein intensity
Y-axis: number of detected precursors
Point color: injection order

Precursors Identified by Injection Order plot

Interpretation

Healthy experiment:
- Points appear randomly scattered.
- Hovering over the points dispays the Sample ID information.
Possible instrument drift: Gradual trend from low to high intensity across injection order. This may indicate:
- column degradation
- instrument sensitivity changes
- spray instability

Quantity Ranking of Contaminants

This plot ranks samples by contaminant abundance. Hovering over the points in the graph displays:

Sample ID
UniProt ID
Contaminant
Rank

Quantity Ranking of Contaminants plot

Interpretation

Consistent ranks across samples = stable contamination levels
Samples with much higher rank = contamination problem

Quantification tab

This section summarizes protein identification coverage across samples.

Feature Counts

Bar Plot: This plot displays how many protein groups were quantified in each sample. It can be observed as both bar and box plot.

Axes

X-axis: samples
Y-axis: number of proteins detected

Feature Counts Bar plot

Interpretation

Good dataset: Similar counts across all samples
Possible issues: Some samples detect much fewer proteins. This may indicate:
- poor sample loading
- LC-MS failure
- digestion issues

Box Plot: This plot helps assess the distribution of quantified features across samples and conditions, and evaluate consistency in detection depth.

Feature Counts Box plot

Axes

X-axis: Groups (e.g., DMSO vs FT671 or other conditions)
Y-axis: Number of protein groups (features)
Each box represents the distribution of feature counts across samples within a group

Interpretation

Each box includes:

Median Feature Counts (center line): Depicts the typical number of features detected.
- Similar medians across groups → comparable data completeness
- Higher median in one group → deeper proteome coverage
Box (IQR): Represents middle 50% of samples
- Narrow boxes (small IQR) → consistent feature detection across samples
- Wide boxes → variability in sample quality or acquisition
- Overlapping boxes → similar performance between conditions
- Shifted boxes → systematic differences (biological or technical)
Whiskers: Shows range of values (excluding outliers)
- Short whiskers → stable dataset
- Long whiskers → some samples differ significantly
Outliers (if present): Denotes samples with unusually high/low feature counts. Samples far from the box:
- May indicate technical issues, missing values, or sample preparation differences
- Should be investigated for quality control

Note: When to Use Box Plot vs Bar Plot

Bar Plot: Comparing absolute counts per sample
Box Plot: Understanding distribution, variability, and consistency

Shared Features

This plot shows how many proteins are detected across samples.

Shared Features plot

Interpretation

*High overlap indicates:

strong reproducibility
consistent protein identification
Low overlap may suggest:
- inconsistent sample preparation
- unstable mass spectrometry runs

Feature Presence Across Samples

This line plot shows how many proteins are detected in a certain percentage of samples.

Feature Presence Across Samples plot

Interpretation

Good dataset: many proteins detected in most samples
Poor reproducibility: many proteins detected only sporadically

Intensities Tab

The Intensities tab provides tools to assess signal distribution, variability, and reproducibility of protein intensities across samples. These plots are essential for evaluating data quality, detecting technical bias, and ensuring consistency across conditions.

Intensity Distribution

This plot helps in evaluating the distribution of protein intensities across samples and check for consistency between conditions.

Plot Options

Plot type:
- Line plot (density or count-based curves)
- Box plot (summary statistics per sample)
Y-axis scale:
- Density → normalized distribution
- Count → raw frequency of intensities

Line Plot

Interpretation

X-axis: Intensity values (e.g., LFQ intensity)
Y-axis: Density or count
Each line represents a sample or condition
Overlapping curves across samples → good normalization and consistency
Shifted curves (left/right) → systematic differences in intensity (batch effect or normalization issue)
Wider curves → higher variability within a sample
Multiple peaks → possible subpopulations or mixed signal distributions

Density vs Count

Intensity Distribution Line Density plot

Density: best for comparing shapes across samples

Intensity Distribution Line Count plot

Count: useful for understanding absolute data distribution

Box Plot

Intensity Distribution Box plot

Interpretation

Each box represents a sample or group
It displays:
- Median (center line)
- Interquartile range (box)
- Spread/outliers (whiskers and points)
Similar medians across samples → well-normalized data
Large differences in medians → systematic bias
Wide boxes → high variability
Outliers → extreme intensity values (may require filtering)

Coefficient of Variation

To measure reproducibility and variability of protein intensities across samples.

CV = standard deviation / mean
Lower CV → more consistent measurements

Plot Options

Plot type:
- Histogram / density plot
- Box plot
Axis scale:
- Linear → standard view
- Log → better visualization of wide CV ranges

Histogram Plot

Interpretation

X-axis: Coefficient of variation
Y-axis: Density or frequency
Separate curves for each condition
Peak near low CV (e.g., < 0.2) → high reproducibility
Long right tail → presence of highly variable proteins
Shift toward higher CV values → increased technical or biological variability
Tighter distributions → more reliable dataset

Linear vs Log Scale

Coefficient of Variation Histogram Linear plot

Linear scale: clear view of low CV region

Coefficient of Variation Histogram Log plot

Log scale: better visibility of high variability proteins

Box Plot

Coefficient of Variation Box plot

Interpretation

Distribution of CV values per sample/group
Lower median CV → better reproducibility
Narrow boxes → consistent variability
Wide spread → unstable measurements
Outliers → highly variable proteins

Feature Count with CV < Cutoff

This plot quantifies how many proteins are reliably measured (low variability) in each sample or condition.

Feature Count with CV

Interpretation

Bars represent samples or conditions
Bars are divided into:
- CV < cutoff (e.g., 0.2) → reliable proteins
- CV ≥ cutoff → variable proteins
Higher proportion of CV < cutoff → better data quality
Lower counts of reliable proteins → potential issues with measurement consistency
Differences between conditions → condition-specific variability

Sample Similarity Tab

The Sample Similarity tab is designed to evaluate how similar or different your samples are based on their protein intensity profiles. It provides key tools for assessing data structure, clustering, batch effects, and overall data quality.

Principal Component Analysis (PCA)

This plot helps to reduce high-dimensional proteomics data into a few components and visualize sample relationships and clustering patterns.

Principal Component Analysis

Interpretation

Points: Individual samples
Colors: Grouping variable (e.g., Treatment: DMSO vs FT671)
Axes (PC1, PC2): Principal components explaining variance
Percentage values: Variance explained by each component
Optional overlays:
- Polygons: Group boundaries
- Loadings (arrows): Features (proteins) driving separation

Sample Clustering:

Tight clusters within a group → high reproducibility
Clear separation between groups → strong biological differences
Overlap between groups → weak or no separation

Variance Explained:

PC1 (e.g., 28%) = largest source of variation
PC2 (e.g., 18%) = second largest
Higher percentages → more meaningful separation

Polygons (Group Envelopes):

Show spread of samples within each group
Large polygons → higher variability
Compact polygons → consistent samples

Loadings (Feature Contributions):

Arrows indicate proteins contributing to separation
Interpretation:
- Longer arrows → stronger influence
- Direction of arrow → associated with specific sample group
- Proteins pointing toward a cluster → enriched in that group

Key Insights from PCA

Detect batch effects or outliers
Evaluate treatment effects
Identify drivers of biological variation

Correlation Heatmap

This plot helps to quantify and visualize pairwise similarity between samples using correlation metrics (e.g., Pearson).

Correlation Heatmap

Interpretation

Rows and columns: Samples
Each cell: Correlation coefficient between two samples
Color scale:
- Darker → higher similarity
- Lighter → lower similarity

Diagonal Line

Always highest correlation (sample vs itself)
Serves as a reference

Within-Group Similarity

Consistently high correlation within a condition → good reproducibility
Uniform color blocks → stable dataset

Between-Group Comparison: Compare correlation between:

Same condition → expected high
Different conditions → may be lower

Outliers: Samples with:

Lower correlation across all others → potential outliers
Distinct color patterns → technical issues or biological anomalies

Batch Effects

Visible as blocks or stripes across unrelated samples
Suggest systematic variation unrelated to biology

For more detailed information regarding the app, please go to ProteomicsQC app documentation page

Patient Data App

Graphics tab

The Graphics tab in the Patient Data App allows users to visualize clinical variables and molecular data from the patient dataset. These plots help explore relationships between patient characteristics, disease features, and biological measurements.

By selecting different variables, users can generate plots that reveal patterns, distributions, or associations within the dataset.

The visualization automatically adapts depending on whether the selected variables are categorical or numerical.

Categorical Variable Distribution (Bar Plot): When two categorical variables are selected, the Graphics tab generates a bar plot.

Categorical Variable Distribution (Bar Plot)

_For instance, the above depicted example denotes the following components:

First variable: tissue_or_organ_of_origin
Second variable: histological_type
X-axis: Primary categorical variable (e.g., tissue or organ of origin).
Y-axis: Number of patients or samples (count).
Colored bars: Each color represents a category of the second variable (e.g., histological type).
Legend: Displays the categories corresponding to the bar colors.
p-value (top of plot): Indicates the statistical significance of the association between the variables.

Interpretation

This plot helps users determine:

whether certain histological tumor types occur more frequently in specific tissues
how patient samples are distributed across categories
whether there is a statistically significant relationship between variables

A small p-value (e.g., 5e-04) suggests an association is statistically significant.

Proportion Plot

When the Proportion option is selected, the bar plot is converted into a stacked proportion chart.

Categorical Variable Distribution with Proportion(Bar Plot)

_The above depicted example denotes the following components:

X-axis: Primary categorical variable.
Y-axis: Proportion (0–1 scale) representing the relative distribution of categories.
Stacked segments Each segment represents the fraction of samples belonging to each category.

Interpretation

This plot shows relative distributions rather than absolute counts.

It allows users to determine:

which histological types dominate within each tissue category
how category proportions differ between groups
whether certain histological types are over-represented.

This visualization is particularly useful when group sizes differ substantially.

Continuous Variable Comparison (Box Plot): When the second variable is continuous, the Graphics tab generates a box plot.

Continuous Variable Comparison (Box Plot)

_For instance, the above depicted example denotes the following components:

First variable: tumor_stage
Second variable: days_to_death
X-axis: Categorical grouping variable (e.g., tumor stage).
Y-axis: Continuous variable (e.g., days to death).
Center line: median value
Box edges: interquartile range (25–75%)
Whiskers: range excluding outliers
Points: outliers.
Numbers above each box: Number of patients in that category.
p-value (top of plot): Indicates whether differences between groups are statistically significant.

Interpretation

This plot helps identify:

whether survival time differs between tumor stages
which stages show greater variability in patient outcomes
whether there is a statistical relationship between stage and survival

For example:

higher median survival in early stages may suggest better prognosis
wide boxes indicate high variability in patient outcomes.

Log2 Transform (Box Plot): If the log2 transform option is enabled, the continuous variable is displayed on a logarithmic scale.

Log transformation helps in:

stabilize variance
reduce the impact of extreme values
make skewed distributions easier to interpret

Continuous Variable Comparison with log2 transform(Box Plot)

Interpretation

After log transformation:

differences between groups may become clearer
large outliers have less influence
distributions appear more comparable across groups.

Types of Variables Supported: PanHunter supports multiple categories of variables that can be visualized in the Graphics panel.

Patient Demographics such as

gender
race
ethnicity
age
weight
height

These variables help describe population characteristics and identify demographic patterns.

Clinical and Pathological Variables such as

tissue or organ of origin
histologic type
tumor stage
morphology
cancer status
radiation therapy.

These variables describe the clinical characteristics of the disease.

Cancer Staging Variables such as

ICD-O-3 site
Masaoka stage
TNM staging categories
anatomic neoplasm subdivision.

These variables provide detailed information about tumor progression and disease severity.

Sample Processing Variables such as

dataset
sample name
definition
FFPE status.

These variables describe how the samples were collected and processed, which can influence molecular analyses.

Outcome Variables such as

vital status
days to death
days to last follow-up

These variables are important for survival analysis and prognosis studies.

Biological Data: Variables such as OmicsFeature1 or sampleTable1 represent molecular measurements such as:

gene expression
protein levels
mutation status.

These data allow integration of clinical and molecular information.

For more detailed information regarding the app, please go to Patient Data app documentation page

New Comparison app

Feature Selection Plot

The Feature Selection Plot visualizes and ranks features (genes or proteins) based on their variance across the dataset. The main purpose of this plot is to help determine how many features should be retained for downstream analyses.

Interpretation

Each bar represents a feature ranked by variance.
Users should look for a point where the bar heights drop sharply. This point is commonly referred to as the “elbow.”
Features beyond the elbow contribute less to explaining the overall variance and are often less informative.

Feature selection plot

For example, in transcriptomic analyses:

Each feature corresponds to a gene.
Selecting features up to the elbow ensures that most of the variability in gene expression across samples or conditions is captured.
This selection is crucial for downstream “comparisons” (formerly top tables) of differentially expressed genes, allowing analyses to focus on genes that show the most meaningful differences between experimental groups.

Sample Clustering Plot

The Sample Clustering panel displays a dimension reduction plot that summarizes relationships between samples in a two-dimensional space.

Interpretation

Each dot corresponds to a single sample.
The position of each dot reflects the entire feature expression profile (genes or proteins) of that sample, not individual features.
The plot summarizes high-dimensional data into two dimensions based on variance across all measured features.

Sample clustering plot

Samples that cluster closely together have highly similar feature expression profiles.
When samples are grouped by attributes such as tissue type, distinct clusters may emerge.
These clusters can reveal:
- Tissue-specific expression patterns
- Potential biomarkers
- Biological differences related to conditions, treatments, or disease states

Principal Component Analysis

Density Curve

The Density Curve visualizes the distribution of feature expression levels within samples.

Plot Description

Each curve represents one sample.
Up to 100 samples can be displayed simultaneously for clarity.
Color and symbol settings match those used in the Sample Clustering dimension reduction plot.

Axes

X-axis:
- Transcriptomics Analysis: log10 Counts Per Million (CPM)
- Proteomics Analysis: log2 Intensity
Y-axis:
- Feature density, representing the proportion of genes or proteins (depending on the type of analysis) within a given expression range.
The above is an example plot belonging to the proteomics analysis

Interpretation

High-density regions indicate many features with similar expression levels.
Low-density regions indicate fewer features at those expression levels.
The plot provides an overview of expression distributions and helps identify:
- Global expression trends
- Outliers
- Unusual or sample-specific patterns
Expected Distributions:
- Transcriptomic datasets typically show:
  - Many genes with low expression (left side of the plot)
  - Fewer genes with very high expression (right side of the plot)
- Proteomic datasets usually show:
  - Fewer proteins with extremely low or extremely high abundance

For more detailed information regarding the app, please go to New Comparison app documentation page

Comparisons app

Compare Comparisons tab

The Compare Comparisons tab in the Comparisons app allows users to compare the results of up to four comparisons simultaneously using a heatmap.

Plot Description

The heatmap is a two-dimensional visualization where:
- Rows represent individual features (genes or proteins).
- Columns represent different comparisons.
Each cell displays the log fold change (logFC) value of a feature in a given comparison.
- logFC values are calculated on a base-2 logarithmic scale.

Interpretation

Negative logFC values indicate downregulation of a feature in that comparison.
Positive logFC values indicate upregulation.
By default:
- Red represents downregulation.
- Blue represents upregulation.
The intensity of the color reflects the magnitude of the change:
- Darker red = stronger downregulation
- Darker blue = stronger upregulation
The color scheme can be customized by clicking on the color scale in the upper-left corner of the heatmap.

Compare Comparison heatmap

Dendrograms displayed next to the heatmap illustrate clustering based on similarity in regulation patterns:
- The row dendrogram groups features that show similar regulation behavior across comparisons.
- The column dendrogram groups comparisons that show similar regulation profiles across features.
Clustering is based on the overall regulation pattern, not solely on whether features are up- or downregulated.
Heatmap interpretation depends strongly on the selected feature list, such as:
- Features associated with a specific Gene Ontology (GO) term
- Features involved in a particular biological pathway
The heatmap displays only the selected features and their regulation across the chosen comparisons.
Features with both positive and negative logFC values may appear within the same cluster.
This can occur when features share similar regulation patterns across comparisons, even if their direction of change differs.
For example, genes within the same biological pathway may act as activators or repressors and be regulated in opposite directions, yet still cluster together due to similar overall regulation profiles.

For more detailed information regarding the app, please go to Comparisons app documentation page

MA Plots App

The MA Plots app provides multiple visualization tools to explore expression changes, statistical significance, and feature behavior across comparisons and samples. The app is organized into two main sections: Plots and Detailed Plots.

Plots Tab

Volcano Plot

The Volcano Plot visualizes the relationship between the magnitude of expression changes and their statistical relevance between two conditions. PanHunter provides two variants of this plot, differing in how statistical information is displayed on the y-axis.

Variant 1: log2 Fold Change vs. Statistical Significance (FDR) - This variant emphasizes biological relevance combined with statistical confidence.

Volcano Plot: log2 Fold Change vs. Statistical Significance (FDR)

Axes

X-axis: log2 fold change
- Positive values indicate upregulation
- Negative values indicate downregulation
Y-axis: −log10 False Discovery Rate (FDR), representing the statistical significance of the observed changes

Features appearing high on the y-axis and far from zero on the x-axis show both strong expression changes and high statistical confidence and are therefore often of greatest biological interest.

Interpretation

Dark red: significant downregulation
Orange: downregulation
Light blue: upregulation
Dark blue: significant upregulation
Gray: no significant change

Variant 2: log2 Fold Change vs. Standard Error - This variant highlights the reliability and variability of expression estimates rather than significance thresholds.

Volcano Plot: log2 Fold Change vs. Statistical Significance (FDR)

Axes

X-axis: log2 fold change
Y-axis: standard error of the log2 fold change

Interpretation

Features with large fold changes and low standard error indicate robust and consistent regulation across samples.
Features with high standard error suggest greater variability, which may arise from biological heterogeneity or technical noise.

This variant is particularly useful for:

Assessing the stability of differential expression estimates
Identifying features with strong but variable responses
Complementing FDR-based interpretation by focusing on confidence in effect size estimation rather than hypothesis testing alone

MA Plot

The MA Plot displays the relationship between overall expression level and expression change.

Axes

X-axis (A): log average abundance
Y-axis (M): log ratio of expression changes (log fold change)

This visualization is well suited to identify features with significant up- or downregulation while considering their overall expression level, helping to detect expression-dependent effects and systematic biases.

Log2 Fold Change: Comparison 1 vs. Comparison 2

Log2 Fold Change Plot

This scatter plot compares feature-wise log2 fold changes between two comparisons.

Axes

X-axis: log2 fold change in Comparison 1
Y-axis: log2 fold change in Comparison 2

Interpretation

Blue: consistent regulation across both comparisons
Red: inconsistent regulation between comparisons

This plot enables a direct assessment of concordant and divergent expression responses across experimental conditions.

Abundance: Comparison 1 vs. Comparison 2

Abundance Plot

This scatter plot compares feature abundance between two comparisons.

Axes: Both axes represent log abundance values

Interpretation

Features with higher expression in one comparison versus the other
Overall trends in feature expression across datasets

It is particularly useful for identifying highly expressed features that may play key biological roles.

Detailed Plots Tab

The Detailed Plots tab provides an in-depth view of the expression pattern of a user-selected gene across samples.

Detailed Plot

Axes

Samples are grouped into separate distributions based on the contrast factor of the selected comparison (e.g. age, tissue type, treatment, or technical parameters such as sequencing platform).
X-axis: values of the selected contrast variable
Y-axis: expression levels of the selected gene, shown as individual data points

Interpretation

Clusters of points indicate consistent expression within a subgroup.
A wide spread of points suggests heterogeneity.
Outliers or distinct groupings may reflect biological variation or technical artefacts.

This plot helps identify how gene expression varies with external factors or internal conditions and can reveal patterns that warrant further biological investigation or suggest potential regulatory mechanisms.

For more detailed information regarding the app, please go to MA Plots app documentation page

Gene Info App

The Plot tab in the Gene Info App allows users to visualize the expression of a selected gene across different samples and experimental conditions. The plot helps explore how gene expression changes in response to treatments, compounds, or other metadata variables.

Plot Types: The Gene Info app supports several visualization formats:

Bar Graph displays the average expression level of the gene for each group, with error bars indicating variability across samples.
Boxplot shows the distribution of expression values for each group, including:
- median
- interquartile range
- potential outliers
Boxplot + Jitter combines the boxplot with individual sample points to show both the distribution and the underlying data.
Scatterplot displays each sample as a point, allowing users to observe individual expression measurements.
Jitter Plot similar to a scatterplot but spreads overlapping points horizontally to make dense data easier to interpret.

Interpretation

The plot displays normalized gene expression levels for the selected gene across samples. This visualization allows users to:

compare gene expression across multiple conditions
identify conditions that increase or decrease gene expression
observe variability across biological replicates
detect potential outliers or unusual responses

For more detailed information regarding the app, please go to Gene Info app documentation page

Gene Comparison App

Feature Abundance

The Feature Abundance tab in the Gene Comparison app provides a Gene Expression Heatmap, which visualizes the abundance of selected genes across multiple samples. This visualization helps users explore expression patterns, identify similarities between samples, and detect groups of genes with similar behavior.

Axes: The heatmap displays gene expression values in a matrix format:

Rows represent genes (features).
Columns represent samples.

Interpretation

Each cell in the heatmap corresponds to the expression level of a gene in a specific sample. Color intensity indicates the abundance level of the gene.

Darker colors represent higher expression levels.
Lighter colors indicate lower expression levels.

This visual representation allows users to quickly detect patterns of gene expression across samples.

Gene expression heatmap

The heatmap includes hierarchical clustering, which groups genes and samples based on similarity in expression patterns.

Row dendrogram: groups genes with similar expression profiles.
Column dendrogram: groups samples that show similar gene expression patterns.

These clusters help identify potential biological relationships, such as groups of genes involved in the same pathway or samples sharing similar biological conditions.

The Feature Abundance heatmap enables users to:

identify genes with similar expression patterns
detect groups of samples with related expression profiles
examine how gene abundance varies across experimental conditions
explore relationships between gene sets and sample metadata

For more detailed information regarding the app, please go to Gene Comparison app documentation page

Gene Clustering App

Feature Heatmap

The Feature Heatmap tab in the Gene Clustering app visualizes the expression patterns of genes that show similar behavior to a selected feature (gene) across samples. This heatmap helps users identify groups of genes with correlated expression patterns, which may indicate shared biological functions, regulatory mechanisms, or pathway involvement.

Interpretation

The heatmap displays expression values in a matrix:

Rows represent genes (features).
Columns represent samples.

Each cell represents the expression level of a gene in a specific sample.

Feature heatmap

Color intensity indicates expression abundance:

Darker green shades represent higher gene expression.
Lighter shades represent lower expression levels.

This visual format enables users to quickly observe patterns of gene activity across all samples.

The heatmap enables users to:

identify genes with expression patterns similar to the selected feature
detect groups of co-expressed genes
observe sample-specific expression patterns
explore potential biological pathways or regulatory mechanisms

Genes with similar color patterns across samples are expected to have correlated expression behavior, which may indicate shared biological functions or regulatory relationships.

For more detailed information regarding the app, please go to Gene Clustering app documentation page

Pathway Mapping App

The Graph tab in the Pathway Mapping App visualizes biological pathways and highlights genes from the dataset that are associated with the selected pathway. This view helps users understand how gene expression changes relate to known biological processes and molecular interactions.

Interpretation

The graphic represents a biological pathway diagram. In the example shown, the pathway illustrates the Adipogenesis process, which describes the transition from preadipocytes to differentiated adipocytes.

The pathway diagram contains:

biological stages (e.g., preadipocyte → adipocyte differentiation)
functional gene groups (e.g., transcription factors, growth factors, markers)
individual genes involved in the pathway
Each box represents a gene or molecular component involved in the pathway.

Pathway Mapping Graph

Genes within the pathway are color-coded based on their expression behavior in the dataset.
Color highlighting helps identify genes that meet the selected statistical and expression thresholds. For example:
- Highlighted genes indicate features that satisfy the filtering criteria (e.g., significant differential expression).
- Non-highlighted genes remain in a neutral color and represent pathway genes that do not meet the selected thresholds in the current dataset.
This allows users to quickly observe which parts of the pathway are active or affected under the studied conditions.

The pathway visualization helps users understand how gene expression changes affect biological processes. By examining highlighted genes, users can:

identify key regulatory genes involved in the pathway
observe which pathway components are affected
detect clusters of activated or suppressed genes
explore potential mechanisms underlying experimental conditions

For more detailed information regarding the app, please go to Pathway Mapping app documentation page

Network Visualization App

The Network Visualization App displays relationships between genes or molecular features using an interactive network graph. This visualization helps users explore how genes are functionally connected, identify important regulatory hubs, and understand the structure of biological interaction networks.

Network Structure: The network consists of nodes and edges.
- Nodes (Circles): Each node represents a gene in the dataset.
- Genes of particular interest are displayed as larger circles, making them easier to identify within the network.
- The color of each node reflects the gene’s log fold change (logFC) value:
  - Blue indicates upregulated genes
  - Red indicates downregulated genes
- This color coding helps users quickly recognize genes whose expression is increased or decreased in the dataset.
- Nodes often form clusters or groups within the network. These clusters represent genes that are highly interconnected. Clusters may correspond to:
  - biological pathways
  - functional modules
  - co-regulated gene groups

Genes located at the center of dense clusters are often key regulators or hub genes.

Edges (Lines): Lines connecting nodes represent relationships or interactions between genes. These connections may reflect:
- regulatory relationships
- co-expression patterns
- protein–protein interactions
- functional associations
Genes connected by edges are likely to participate in related biological processes.

Network Visualization graph

Interpretation

This network visualization helps users:

identify genes that interact with many others (hub genes)
explore clusters of functionally related genes
examine expression changes across interacting genes
understand how genes are organized into biological modules

For example:

A large, centrally connected node may indicate a key regulatory gene.
A cluster of interconnected genes may represent a shared biological pathway or functional process.

For more detailed information regarding the app, please go to Network Visualization app documentation page

Signature Visualization App

The Signature Visualization app allows users to explore how different conditions influence gene expression. The app presents the results using two complementary visualizations:

Signature Plot: This plot displays individual genes as dots, allowing users to visually assess how gene expression changes under a selected condition.

Interpretation

Each dot represents a single gene.
The position and color of the dot indicate the direction and magnitude of expression changes.
Blue dots represent upregulated genes, meaning their expression levels increased after the selected condition or treatment.
Red dots represent downregulated genes, indicating decreased expression levels.
The plot uses log fold change (logFC) values to position genes:
- Genes located on the left side of the plot have negative logFC values.
- Genes located on the right side have positive logFC values.
In some cases, the dot color may not match the expected direction of the logFC value. For example:
- A blue dot on the left side (negative logFC) indicates that the gene is upregulated compared to a previously lower baseline expression level.
- A red dot on the right side (positive logFC) indicates downregulation relative to the baseline, despite having a positive logFC value.
This representation highlights how gene expression levels change relative to their baseline state.

The signature plot helps identify genes that are most affected by a selected condition, such as treatment with a specific compound.

For example, users may explore gene sets derived from resources like CMAP2 drug meta signatures. In such cases, the plot illustrates how genes respond to particular treatments or biological perturbations.

Signature Visualization Plot

Comparison Plot: This plot summarizes the overall gene expression trend using a bar chart.

Interpretation

The bars represent the number of significantly regulated genes with the width normalized to their total number:
- The blue bar indicates the number of upregulated genes.
- The orange bar represents the number of downregulated genes.
The length of each bar corresponds to the count of genes showing significant regulation. This visualization provides a quick overview of the dominant gene expression response.
- A longer blue bar suggests that more genes are upregulated.
- A longer orange bar indicates that downregulation is more prevalent.

Note: Users can also flip the plot, displaying signature genes as bars and comparison genes as dots. This view can be particularly useful for CMAP signatures, where the signature gene set represents a predefined perturbation response. Showing the signature genes as bars highlights their overall regulation pattern, while the dots represent how genes in the comparison dataset align with that signature, making it easier to visually assess whether the dataset shows a similar or opposite transcriptional response to the CMAP perturbation.

By comparing the gene expression signature from a selected specific comparison against the profile from the signature database, e.g., for a particular compound (e.g., 1,4-chrysenequinone), users can quickly compare the gene expression signatures of both. By default, the signature from the comparison is shown as the orange and blue bar while the signature elements of the here compound is shown via a scatter plot above the bar. If the correlation (same genes show up- and downregulation in both signatures) is high, this is evidence that both conditions follow the same expression regulation mechanisms.

For more detailed information regarding the app, please go to Signature Visualization app documentation page

Enrichment Visualization app

2D Plot

The 2D Enrichment Plot visualizes relationships between enriched gene sets using two advanced dimensionality reduction techniques: t-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection).

Below is an example t-SNE plot of comparison study conditions DCM and Healthy

2D Plot

Axes: The X-axis and Y-axis represent the two dimensions generated by the selected embedding method, which positions gene sets based on similarity in their enrichment patterns.

Interpretation

Gene sets that appear close together share similar enrichment characteristics, while those farther apart show more distinct behavior. Clusters of points can therefore indicate groups of biologically related pathways or processes that respond similarly under the studied condition.
The size of each dot reflects the statistical significance of the enrichment for that gene set.
- Larger dots indicate stronger statistical support for the enrichment result.
- This suggests that the enrichment is less likely to be caused by random variation in the dataset.
The color of each dot represents the overall direction and magnitude of regulation within the gene set, calculated using the signedLog2 value of the total significant log fold change (logFC).
- Red tones generally indicate gene sets dominated by downregulated genes
- Blue tones generally indicate gene sets dominated by upregulated genes
- Stronger color intensity reflects a greater magnitude of regulation

The 2D enrichment plot helps users identify biological pathways or processes that are strongly associated with the studied condition. By examining clusters, dot sizes, and color patterns, users can:

detect groups of related biological pathways
identify strongly co-regulated gene sets
generate hypotheses about affected biological mechanisms

This visualization helps bridge the gap between gene-level results and gene-set-level biological interpretation.

Barplot

The Barplot tab provides a detailed overview of gene regulation within enriched gene sets.

Each bar represents a gene set and shows the number of upregulated and downregulated genes contributing to the enrichment. The plot also displays the gene set ID and name, helping users quickly identify the pathways or functional categories involved.

Barplot

By default, when more than 40 enriched gene sets are detected, the plot displays the top 40 gene sets.

For more detailed information regarding the app, please go to Enrichment Visualization app documentation page