This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data

1: General widgets
2: General Plot interpretation
3: Data Quality Control

3.1: Quality Control for Bulk RNA-Seq data

3.1.1: Starting quality control
3.1.2: Deeper look into SampleQC
3.1.3: QC clustering
3.1.4: Finalizing the Quality Control

4: Data Administration

4.1: PanHunter Preprocessing

4.1.1: Reference genomes

4.2: Studies overview
4.3: Data upload
4.4: Updating data
4.5: Deleting studies

1 - General widgets

This section describes all the general panels used in the abundance apps.

Sample Selector

Samples can be selected wih the help of “Sample selector” panel located on the left side of the interface. This panel provides a variety of filters under the operation called "Modifiers", that can be used to narrow down the samples selected from the entire catalogue of samples in your project to the ones you are interested in. For QC purposes it is usually advisable to start with one complete study to get an overview.

To filter the samples, modifier(s) need to be added. It is done by adding as many modifiers as required from the drop down menu (See below). An empty value field in a modifier will result in selection of all available data and has no modification effect. Any column in the sample table can be selected for applying modifier (with constraints of type of modifier and value type in the field). Corresponding values should be selected if to be included/excluded in the modification.

Note: For further information on modifier functionality, check modifier tooltip.

Modifiers

A modifier is a filter or enrichment applied on the table columns resulting from last applied modifier. It consists of the following functions:

Sample selector

Filter Study is a “mandatory filter” to select and load your studies of interest. Please click on the field below “Values” to select the studies from a drop-down menu.

Modifiers

Add a new modifier from the drop-down below and confirm the addition by clicking on the "+" button.

Filter categ. can filter the samples according to categorical variables such as " Tissue, Sex, Compound, etc". The values in the selected column of the table will also appear for selection in the values field under this category.
Filter num. can filter the samples according to numerical variables. A slider will appear, with range of minimum and maximum numeric value in the column.
Join columns can combine two or more categorical variables into one. For e.g., on selecting the categories sex (male, female) and tissue (liver, brain, heart) will result in male_liver, male_brain, male_heart, female_liver, female_brain, female_heart, which can now be used for further filtering or as analysis options within the apps. It produces a new column by joining the values from the columns selected in combine field. It uses underscore for joining levels of the selected columns and names the new column by the given label.
Column binning can divide samples into groups according to a specified numeric variable. It produces a new labeled column with bin ID’s given by selected number of bins. Currently it provides two modes and can only work on columns with numerical values.
Enrich can add additional information to your sample table, such as “QC Data, Patient data and Custom annotations”
Additionally, you can toggle between Include or Exclude to keep the specified values in or out of your selection.

Modifier options

Load, Save, and delete your current set of modifications with the help of the “settings” button. Users can save their modifier selection with a name of their choice in the “Type a name” box.

It also consists of the option to Return to your initial set of modifications and Switch between the “globe” and the “person” icon which symbolizes global (project wide) and local (only current user). The former helps saving and loading modifier sets stored for all users and the latter for current users. This will also allow you to make the sample selection in all abundance based apps.
Show excluded samples displays samples that are marked as excluded. Note: In QC apps the default setting is that all the samples including the excluded ones are shown. For all other apps the default is that excluded samples are not shown.
Instant mode can be enabled to automatically apply changes as they are made. Disable this option to manually apply changes by clicking the apply button. The double tick icon on the “Apply changes” button indicates that there are unsaved changes waiting to be applied.

This panel provides users with options to navigate through the plots in the apps.

Plot Type lets you select plot types to visualize your data:

Boxplot – Summarizes the distribution of data with median, quartiles, and whiskers.
Jitter Plot – Displays individual data points for better visibility of variation.
Violin Plot – Combines boxplot with kernel density to show data distribution.

Plot Options: Users can choose from multiple plot types from the Plot type option for visualizing sequencing quality metrics such as Boxplot, Jitter Plot and Violin Plot

With the help of Group By or Sort by option, users can group their visualization according to the various metadata variables such as “Plate, Tissue, Timepoint,etc”

A settings panel allows users to select or deselect QC parameters to display in the plots. Currently, only Q30 is available for “Sequence tab”, but additional parameters may be added in the future.

Quality Indicators lets you choose which alignment parameters to display in the plots (e.g., Number of input reads, Uniquely mapped reads, Average mapped length)
Plot Design and Features:

Each QC parameter is displayed in a separate plot to ensure proper visualization of thresholds.
Y-axis: Represents the parameter values (e.g., Q30). - Scales are parameter-specific and automatically adjusted based on thresholds, with extra spacing for clarity.
X-axis: Displays sample groups based on metadata categories (e.g., Tissue, Timepoint, Concentration).
All samples can be displayed simultaneously by clicking and dragging over it to zoom in, ensuring a detailed overview.
Users can reset the axes using the small “home” near the plot.

Background Color Coding:

Displays thresholds (if defined in the Threshold Manager) with a semi-transparent color scale for easy interpretation.
Each plot includes a header with the parameter name.

Legend and Hover Details:

Threshold categories are displayed on hover.
Metadata group information and highlighted samples are also explained via hover tooltips.

Download Options

The plot can be downloaded using the “camera” icon near the plot.
A single click allows users to download all plots together for reporting or documentation purposes.

Stats Table panel

This panel provides a detailed tabular view of all the statistical values and key sample identifiers such as “Study, Sample ID, SeqFile (FASTQ file name)” for the parameters with regards to each app.

On clicking the “hamburger icon” above the table, you will be provided with the following options:

Columns: You can select and deselect the columns you want to explore
Download CSV: You can download the table in CSV format
Download XLSX: You can download the table in excel format
Copy filtered rows: You can copy only the rows you have filtered in the table using the filter option, for further analysis.

2 - General Plot interpretation

This section explains how to interpret the plots belonging to the apps provided by the PanHunter.

For more detailed information on navigating through these plots , please go to Plot Navigation Panel documentation page

Sample QC App

Alignment Stats tab

The Alignment Stats tab summarizes key statistics from the read alignment process, helping evaluate sequencing success and mapping quality.

Interpretation

The alignment metrics help determine:

sequencing success
mapping accuracy
suitability of samples for downstream analysis.

Alignment Stats plot

For instance, the above example depicts the following metrics:

Number of Input Reads: This plot shows the distribution of total sequencing reads generated for each sample.

Higher read counts generally provide better transcriptome coverage.
The density curve highlights the most common read count values.
Samples with much lower read counts than the majority may indicate sequencing or library preparation issues.

Uniquely Mapped Reads Percent: This metric shows the percentage of reads that align uniquely to one genomic location.

High percentages are desirable, indicating reliable mapping.
Low values may indicate:
- contamination
- poor read quality
- incomplete reference genome
- repetitive sequences.

Reads Mapped to Multiple Loci Percent: This plot shows the percentage of reads mapping to multiple genomic locations. This can occur when reads originate from:

repetitive regions
paralogous genes
homologous sequences.
Lower percentages are generally better.
Very high multi-mapping rates may reduce confidence in gene quantification.

Read Distribution tab

The Read Distribution tab shows how sequencing reads are distributed across different genomic features.

Genomic Feature Categories

The X-axis displays genomic regions such as:
- Upstream regions (Up.10kb, Up.5kb, Up.1kb)
- 5′ UTR
- CDS (coding sequences)
- Introns
- 3′ UTR
- Downstream regions
The Y-axis shows read counts normalized per kilobase of genomic feature.

Read Distribution plot

Plot Type: Users can visualize distributions using:

Violin plot: shows full distribution shape
Box plot: shows median and quartiles
Points: displays individual sample values.

Coloring Options: This option groups samples according to metadata fields (e.g., Plate, CRank, Batch, etc). Important note: Only categories with 10 or fewer unique values in the current dataset subset can be used for coloring. If no such category exists, coloring will not be available.

Total Counts Option: By default, the plot shows counts normalized per kilobase of genomic features. When the Show total percentages option is selected, the plot instead displays the distribution of total counts (percentage) across genomic features.

Interpretation

This plot helps determine:

whether reads are mostly located in coding regions (expected in RNA-seq),
whether unusual read distributions occur in introns or intergenic regions, which may indicate contamination or library issues.

Gene Body Coverage tab

The Gene Body Coverage plot evaluates how evenly reads cover the length of genes.

Axes

X-axis: Gene Length (%). The gene position is represented from:
- 0% = 5′ end
- 100% = 3′ end
Y-axis: Shows normalized read coverage across gene positions.

Gene Body Coverage plot

Interpretation

A relatively flat curve indicates uniform sequencing coverage across the gene.
3′ bias: increased coverage toward the 3′ end (often due to poly-A capture methods)
5′ bias: indicates higher coverage near the 5′ end
Uneven curves between samples: shows potential library preparation differences.

Biotype tab

The Biotype plot shows the percentage of reads assigned to different gene biotypes, such as:

protein-coding genes
long non-coding RNA (lncRNA)
pseudogenes

Note: Plot type and coloring options behave the same as in the Read Distribution tab.

Biotype plot

Interpretation

This visualization helps confirm that reads map primarily to expected gene classes. Typical RNA-seq datasets show:

the majority of reads in protein-coding genes
smaller proportions in lncRNA or pseudogenes

Unexpected distributions may suggest annotation issues or contamination.

Mitochondrial tab

The Mitochondrial tab evaluates the proportion of reads mapping to mitochondrial sequences.

Three boxplots are displayed:

Mitochondrial plot

Non-Mitochondrial Reads: This plot depicts the percentage of reads mapping to nuclear genes. High values indicate good RNA-seq data quality.

Mitochondrial Reads: This plot shows the percentage of reads mapping to mitochondrial genes. High mitochondrial content may indicate:

cell stress
degraded RNA
low-quality samples

Spike-In Transcripts: This plot displays reads mapping to spike-in control RNAs.

Spike-ins are artificial RNA molecules added during library preparation and used as technical controls.
Consistent spike-in levels across samples indicate stable sequencing performance.

plateRNA-Seq tab

The plateRNA-Seq tab provides several visualizations for plate-based RNA-sequencing experiments.

Samples / Conditions

Samples and Conditions plot

Plot Settings: Users can configure the following options:

Boxplot X-Axis: Defines the grouping variable (e.g., treatment).
Boxplot Y-Axis: Helps selecting metrics such as Unmapped reads, Multimapped reads, Uniquely mapped reads, etc to display:
Boxplot Coloring: Colors samples according to metadata categories such as treatment, sequencing provider, plate position, etc

Interpretation

These boxplots help identify:

variability between treatments
batch effects
outlier samples with abnormal read counts.

Library Size and UMI Dedup

This scatter plot shows expression levels of a selected gene across samples.

Library Size and UMI Dedup plot

Interpretation

This visualization helps identify:

expression variability across samples
differences between experimental conditions
potential outliers.

Clusters of points may indicate groups of samples with similar gene expression patterns.

Plate / Library

This heatmap displays sequencing metrics arranged according to the physical plate layout.

Plate and Library plot

Plot Elements:

Rows represent plate rows.
Columns represent plate columns.
Cell color intensity indicates the value of a selected metric (e.g., mapped reads).

Interpretation

This visualization helps identify spatial artifacts, such as:

edge effects
plate position bias
systematic technical variation across wells.

For more detailed information regarding the app, please go to SampleQC app documentation page

Transcriptomics QC App

Alignments tab

The Alignment Stats tab summarizes key statistics from the read alignment process, helping evaluate sequencing success and mapping quality.

Interpretation

The alignment metrics help determine:

sequencing success
mapping accuracy
suitability of samples for downstream analysis.

Alignment plots

For instance, the above example depicts the following metrics:

Number of Input Reads: This plot shows the distribution of total sequencing reads generated for each sample.

Higher read counts generally provide better transcriptome coverage.
The density curve highlights the most common read count values.
Samples with much lower read counts than the majority may indicate sequencing or library preparation issues.

Uniquely Mapped Reads Percent: This metric shows the percentage of reads that align uniquely to one genomic location.

High percentages are desirable, indicating reliable mapping.
Low values may indicate:
- contamination
- poor read quality
- incomplete reference genome
- repetitive sequences.

Reads Mapped to Multiple Loci Percent: This plot shows the percentage of reads mapping to multiple genomic locations. This can occur when reads originate from:

repetitive regions
paralogous genes
homologous sequences.
Lower percentages are generally better.
Very high multi-mapping rates may reduce confidence in gene quantification.