Dataset Analysis using Pipelines

Overview

Quark provides researchers with access to a range of pre-configured, ready-to-run bioinformatics pipelines that can be discovered and launched directly from the platform. These pipelines cover multi-omics data analysis workflows including DNA Sequencing, RNA Sequencing, scRNA Sequencing, Protein Folding, and others.

On Quark, a scientific pipeline refers to a containerised workflow of bioinformatics tools for multi-omics data analysis. All the bioinformatics tools needed for a given analysis are stitched together and containerised within the pipeline, enabling researchers to run complex workflows without managing infrastructure or software dependencies.

Pipelines are accessible through the Launchpad, which serves as a centralised hub for discovering, configuring, and executing analysis workflows.

Prerequisites

Before running a pipeline, ensure you have:

Logged in to your TRE with registered credentials.
Uploaded the required input data files (e.g., sample datasheets, FASTQ files). See Dataset Analysis using Workstations — Managing Workstation Files for guidance on file management.
Reviewed the pipeline's input requirements and supported file formats.

Discovering Pipelines on the Launchpad

The Launchpad displays all available pipelines that can be run on the platform. To access the Launchpad:

Select Pipelines from the Navigation Menu on the left.
Click on the Launchpad tab.

Pipelines Launchpad listing available workflows

Searching for a Pipeline

Use the Search bar at the top of the Launchpad to find a specific pipeline by name or keyword. For example:

Search for sarek or dna to find genomics pipelines for DNA Sequencing analysis.
Search for rnaseq to find RNA Sequencing pipelines.

Filtering Pipelines

Researchers can narrow the list of available pipelines using the Filter options:

Type — Filter by pipeline framework (e.g., nf-core, AWS HealthOmics).
Category — Filter by multi-omics workflow category (e.g., Genomics, Proteomics, Transcriptomics).

Sorting Results

Use the Sort By drop-down to order the pipeline results by:

Name — Alphabetical order.
Last Release — Most recent version first.

Running a Pipeline

Once you have identified the pipeline you wish to run, follow these steps to configure and execute it.

Step 1: Select the Pipeline

From the Launchpad, click on the pipeline you wish to run.
Click the Run button. This opens the pipeline's configuration dashboard.

Step 2: Review Pipeline Requirements

The pipeline dashboard opens to the About page, which provides:

A description of the pipeline and its intended use.
A list of all Input Parameters or fields required to run the pipeline.
The types of input files supported and their prescribed formats.

Pipeline About page with input requirements

Important: Review the About tab carefully before proceeding. Ensure your input data files are in the correct format as specified by the pipeline.

Step 3: Configure Input Parameters

Navigate to the Run Pipeline tab.
Enter a name for this pipeline run.
Upload your sample datasheets and input data files in the format prescribed on the About tab.
Configure additional input parameters from the respective drop-down menus (e.g., reference genome, analysis options).

Pipeline Run configuration with input parameters

Step 4: Review and Execute

Click Review to verify all configured input parameters, including uploaded files, input settings, and output configurations.
Confirm that all parameters are correct.
Click Run to execute the pipeline.

Pipeline review and run confirmation

The pipeline will begin execution. You can monitor its progress from the Runs tab.

Viewing Pipeline Results

Once a pipeline run is complete, researchers can retrieve and visualise the results.

Retrieving Run Results

Select Pipelines from the Navigation Menu and click the Runs tab.
Use the Search bar to find your pipeline run by name.
Check the Status column. Once the status is marked as Complete, click on the run to open its details.

Pipeline Runs tab with completed run

The run details dashboard displays the following tabs:

Tab	Description
Summary	Displays runtime, cost, and a link to View Samples under the Analytics sub-heading (available once the run is complete).
Results	Allows researchers to download individual output files (e.g., MultiQC reports, variant annotation files, variant calling files).
Input	Summarises the input parameters provided at the time of launching the run.
Log	Provides a timeline for each tool deployed during the pipeline's execution.
About	Gives an overview of the pipeline, its input requirements, and output formats.

Visualising Results with Vizapp

Quark's Vizapp is an intuitive interface that enables researchers to visualise the results of their secondary data analyses without requiring coding or bioinformatics expertise.

To access Vizapp:

Navigate to the Summary tab of your completed pipeline run.
Click the View Samples link under the Analytics sub-heading. This opens the Vizapp dashboard.

Pipeline Summary with View Samples link

The Vizapp dashboard includes:

Reports — Links directly to the pipeline's workflow management system report (e.g., a Nextflow report) with details on resource allocation, project directory path, pipeline version, and other run logistics.
MultiQC — Aggregates all samples and associated metadata, enabling data retrieval and quality assessment across all samples in the run.

Vizapp dashboard with Reports and MultiQC

What's Next

Download your pipeline output files — See Downloading Data Results.
Perform further analysis in a workstation — See Dataset Analysis using Workstations.