TranscriptomicsQC

The Transcriptomics QC application is utilised to inspect bulk and plate RNA-Seq QC statistics. Samples can be selected with the help of the Sample selector panel, located on the left side of the interface.

Generate QC Report

Generate Report

The “Generate QC Report” option allows users to generate standardized QC reports based on the selected metadata. These reports provide a structured overview of quality control metrics and are stored directly within the “Data Overview app” for easy access and review. Currently, the reports can only be generated for bulk RNA-Seq data. Users can use the following options to generate their QC report:

Generate Report options

Steps to generate a report

  1. Metadata Selection
  • X Variable: - Choose a metadata column from the dropdown to be used on the x-axis of plots in the report. - Example: Study, SampleID, Protocol this defines how samples are grouped in the report.
  • Color Variable: - Choose a metadata column to color-code the samples in the plots. - Example: Species, Age, Sex this helps to visually separate categories.
  1. Optional Report Information
  • Evotec Number (optional): - Allows entry of an internal project or reference number. - Useful for tracking and documentation across teams.
  • Report File Name (optional) - Enter a custom name for the report file. - If left empty, a default name will be assigned automatically.
  1. Create Report:
  • After entering the data in the options above, click on the Create Report button. It will provide the generated QC report.

Threshold Manager

Threshold manager

The Threshold Manager allows users to set and adjust thresholds for all QC parameters included in an analysis (details, see below). These thresholds helps to quickly evaluate data quality by applying color-coded indicators across the application.

Threshold manager options

Selecting a parameter in the Threshold Manager automatically includes it in visualizations and general QC indicators. Thresholds can be customized for all parameters, even those not in the current study but included in a loaded template. Changes are reflected across all relevant QC plots after applying.

Parameters are grouped by their QC step (e.g., Sequencing, Alignment, etc). These can be expanded or collapsed for better focus.

  • Flexible Threshold Settings: * Add, remove, or modify thresholds for any parameter by clicking on the “pen” button. * Switch between one-sided, two-sided, or more complex thresholds with multiple color ranges. * Set up to 10 threshold points (maximum 11 color segments). * Select “Apply” to apply the changes.

Threshold manager settings

  • Template Management: * “Load” existing templates (from your own work or shared by others). * You can delete the loaded template with the “bin” button near the load option. * Update and overwrite your templates using the “Save” button. * Use the “Save As” button for saving your template as new. Set a name for your template under the “Preset name” field. You can click on the tick box near “Publish to Project” to save this template to your project and make it accessible to the other users.

Threshold manager save as settings

  * **Reset Option:** You can untick and empty all thresholds with _"Reset all"_ at any time.

Thresholds for QC Metrics

Users must provide thresholds for a set of standard quality control (QC) parameters. These thresholds are dynamic depending on the underlying data. Furthermore, the thresholds are pre-filled with the thresholds given in the Threshold manager, which the users can overwrite. These thresholds define acceptable data quality ranges. If values fall outside the thresholds, they are flagged in the report.

  • % Uniquely Mapped Reads (1 and optional 2nd value): - Minimum percentage of reads that map uniquely to the reference genome. - A second value can be provided to define a stricter or alternative cutoff.
  • % Protein Coding (1 and optional 2nd value): - Minimum proportion of reads mapping to protein-coding regions. - Ensures that sequencing captures biologically meaningful transcripts.
  • % Biotype ‘Mt_rRNA’ (1 and optional 2nd value): - Maximum acceptable proportion of reads mapping to mitochondrial rRNA. - High values may indicate contamination or poor library preparation.
  • % Biotype ‘lncRNA’ (1 and optional 2nd value): - Proportion of reads mapping to long non-coding RNAs. - Helps assess the balance of coding vs. non-coding transcript coverage.
  • Number of Input Reads (1 and optional 2nd value): - Minimum number of reads per sample required for downstream analysis. - A second threshold can define a stricter cutoff.
  • % Mapped to Exons (1 and optional 2nd value): - Proportion of reads mapping to exonic regions. - Indicates sequencing efficiency and relevance to functional transcriptome analysis.

Overview tab

Overview tab

The Overview Tab provides a comprehensive summary of the selected studies and their quality control (QC) results. It is designed to give users an immediate understanding of the dataset’s quality, enabling quick detection of potential issues before proceeding with further, in-depth analysis.

Details

General Study Information

The first section of the Overview Tab displays key metadata for the selected study or studies, including:

  • Study Name(s)
  • Platform and Protocol
  • Number of Plates and Number of Sample (Displays actual counts for filtered data, e.g., “x of y selected”)
  • Date Added (Usually, the date when pre-processing was completed; if unavailable, the integration date is shown)

Additionally, information regarding the pre-processing is displayed which includes:

  • Aligner version
  • Genome GTF file name
  • Genome fasta file name
  • Samtools version
  • Gene Body Coverage version
  • Read Distribution version

The system supports multiple studies from the same platform. When multiple studies are selected, all relevant details are displayed in the same section, with the ability to link each study to its corresponding plates and samples.

General Indicators

General Indicators summarized view

  1. Overall Summary (Pie Chart)
    • Provides a strict summary of QC status across all samples and QC steps combined.
    • If a sample fails any QC step, it is classified as “Out of Target” in the overall summary.
    • The pie chart displays the total number of samples in the center, and hovering over individual sections reveals sample counts and percentages.
    • Under the pie chart there is an option available to “Exclude critical samples” for further analysis. This option is available for the Data manager (who can add data and sample table information) and Project lead (who can assign data managers and add/remove users from a project)
  2. Summarized View (Bar Charts per QC Step)
    • Each bar chart represents a QC step (e.g. Sequencing, Alignment).
    • Samples are categorized according to threshold levels, displayed using a color scale (up to 10 levels, from red → green). Note: it always takes the lowest level
    • Hovering over a category provides additional details, including sample counts and percentages. General Indicators detailed view
  3. Detailed View (Bar Chart per Parameter)
    • On toggling the “Show Details” slider, the individual QC parameters selected in the Threshold Manager are displayed.
    • Parameters are grouped under their respective QC steps.
    • Uses the same color-coding and interaction principles as the summarized view.

Threshold Legend

The color legend dynamically adapts to the thresholds defined in the Threshold Manager:

  • Two groups: Red = Off Target, Green = On Target
  • Gradients of color used : Red, Yellow, Green. For up to ten groups, gradual scale from Red → Green is used for more granular categorization
  • If no threshold has been defined for a specific QC step, the corresponding section will appear greyed out for clarity.

Critical/Highlighted Samples

The heatmap provides a detailed view of sample performance across all QC parameters:

Critical/Highlighted Samples plot

Go to Plot Navigation panel for more information on how to navigate through the plot.

Sequencing tab

Sequencing tab

The Sequencing tab provides visual and tabular representations of sequencing quality metrics, helping users quickly assess data integrity and detect potential issues.

Sequencing plots

Go to Plot Navigation panel for more information on how to navigate through the plot.

Sequencing Stats Table Below the plots, a table view provides detailed QC values for all parameters in the “Sequencing” tab. Currently, it provides the following data:

Sequencing stats table

Go to Stats Table panel for more information on its features.

The following are columns unique to the “Sequencing tab”.

  • Q30 value
  • Link to the FastQC report (HTML file generated during pre-processing)
  • Key identifiers for proper sample linkage: Study, Sample ID, SeqFile (FASTQ file name)

Alignments tab

Alignments tab

The Alignment tab provides visual and tabular representations of alignment statistics for your selected samples. This helps you evaluate mapping quality and detect potential issues.

Alignment Plots panel

This section displays alignment-related parameters in the form of plots. These plots show the distribution of the selected statistics across all samples or fastq files. If you work with a multiplexing protocol like ScreenSeq you have the option to see the statistics on sample level as well as on file level. When you hover over a data point, the exact value for that sample is displayed.

Alignment plots

Go to Plot Navigation panel for more information on how to navigate through the plot.

Alignment Stats Table

Go to Stats Table panel for more information on its features.

Alignment stats table

Read distribution tab

Read distribution tab

The Read Distribution tab provides visual and tabular insights into how reads are distributed across different genomic regions (e.g., exonic, intronic, intergenic regions) for the selected samples (fastq files in case of a multiplexing protocol like ScreenSeq). This helps you assess sequencing quality and detect potential biases in library preparation or alignment.

Read Distribution Plots panel

This section displays plots for parameters belonging to the “Read Distribution” tab. Each parameter is plotted separately to ensure proper threshold visualization.

Go to Plot Navigation panel for more information on how to navigate through the plot.

In addition to the options provided for plot navigation, a Switch Data Type option is available here for users to toggle between in the same plot:

  • Absolute Value – Displays absolute counts.

Read distribution plot absolute value

  • Relative Value – Displays proportional values (e.g., % of reads per category).

Read distribution plot relative value

  • Legend: Explains metadata groupings and threshold categories (shown on hover).

  • Hover Tooltips: Show exact parameter values for each sample.

Read Distribution Table

Go to Stats Table panel for more information on its features.

Read distribution stats table

Gene body coverage tab

Gene body coverage tab

The Gene Body Coverage tab provides a visual representation of the read coverage across the entire length of genes, helping users assess uniformity in RNA-Seq data. This analysis is essential for detecting 5’ or 3’ biases that may indicate RNA degradation or library preparation issues.

Gene Body Coverage Plots panel

This section displays a line plot showing the normalized coverage of sequencing reads along the length of the gene for each FASTQ file.

Gene body coverage plot

Go to Plot Navigation panel for more information on how to navigate through the plot.

Biotype tab

Biotype tab

The Biotype tab provides an overview of the distribution of different RNA biotypes within your samples, allowing you to assess sample composition and identify any unexpected biases.

Biotype Plots panel

This section displays interactive plots for visualizing QC parameters related to RNA biotype distribution.

Go to Plot Navigation panel for more information on how to navigate through the plot.

Biotype plot

Biotype Stats Table

Biotype stats table

Go to Stats Table panel for more information on its features.

Mitochondrial tab

Mitochondrial tab

The Mitochondrial tab provides a visual overview of the proportion of reads mapped to mitochondrial genes, non-mitochondrial genes, and spike-in transcripts. This allows users to assess sample quality and identify potential technical artifacts or contamination.

Mitochondrial Plots panel

Go to Plot Navigation panel for more information on how to navigate through the plot.

Mitochondrial plot

Mitochondrial Stats Table

Mitochondrial stats table

Go to Stats Table panel for more information on its features.

Plates tab

Plates tab

The Plates tab provides a visual representation of the distribution of your data over the plate(s) using a heatmap (for this, well information must be available), enabling quick assessment of sample distribution, QC metrics, and metadata across multiple plates.

The heatmap represents the plate layout:

  • Y-axis: Plate rows
  • X-axis: Plate columns

Each cell corresponds to a sample, and coloring and/or text provides additional information.

The following options are available to users for adjusting the visualization:

  • Plates: This option lets you select the number of plates for display. On the top-right of the panel, there is an option to download the plates count data in excel format.

Plates plot example

  • Color: This option provides a list of options to color your plates for analysis.

The “Color by” option allows users to select from the following variables for analysis:

  1. Sample metadata: Users can select any available metadata variable from the Select Metadata list such as “Study”, “SampleID”, “Species”, etc and filter the display accordingly.

Plates plot sample metadata

  1. Feature Intensities: Users can display counts of selected features.

Plates plot feature intensities

  1. QC Statistics: All QC parameter that are available on well level can be visualized.

Plates plot qc statistics

  • Users can edit their text labels with the help of options under “Display Text Labels”, which consists of: - “None” if you want no text labels - “Same as color” if you want the text labels to follow the same parameters as the colors - “Add another parameter” if you want the text labels to follow a different parameter which you can coose byfollowing the same instrucions as for color.

Go to Plot Navigation panel for more information on how to navigate through the plot.