Importing a Nextflow Pipeline into Quark

This guide provides step-by-step instructions for onboarding a Nextflow pipeline.

It is designed to help Bioinformaticians configure the technical architecture to set up a user-friendly interface on Quark for the rest of the team.

Overview

Quark transforms command-line Nextflow pipelines into accessible applications by mapping your Git repository to Quark.

This walkthrough uses nf-core/bactmap as a reference, though the flow applies to any Nextflow-based repository.

Before You Start

Ensure you have the following components ready:

Repository Access: HTTPS URL for public repos; SSH URL for private repos.
Revision: The specific branch, tag, commit SHA, or HEAD you want to deploy.
Entry Point: The primary execution file (usually main.nf).
Reference Datasets: Any static data (e.g., GRCh38, adapter lists) must already exist in Quark.
User Inputs: A list of variables from your nextflow.config that should be exposed as UI fields.

For private repos, Quark must have clone access. If an import fails at the clone stage, verify that your Git credentials or SSH keys are correctly configured within Quark.

Step-by-Step Instructions

Start the Import

Navigate to the My Pipelines tab on your dashboard.
Click the Import Pipeline button in the top-right corner.
In the Select Pipeline Type window, choose Nextflow and click Continue.

Pipelines

Click on `Import Pipeline`

Pipelines

Select `Nextflow` and click `Continue`

Step 1: General Pipeline Details

Define how the pipeline is identified and categorized for your team.

Name: Provide a short, unique name for the pipeline.
Summary: Provide a one-sentence description of the biological objective.

Pipelines

Badge: Defaults to "Nextflow."
About (Optional): Upload a Markdown file for a longer overview.
- Bioinformatician Tip: Use this to list specific container versions, tool citations, and expected output structures to help wet-lab users interpret their data. Upload a Markdown file that gives a detailed overview of your nf-core pipeline.

Pipelines

Category: Choose the appropriate match (e.g., Genomics, Transcriptomics).

Pipelines

Tags: Provide metadata filtering Tags for your pipeline. Add at least one key/value pair (e.g., pipeline: nf-core/bactmap, organism: bacteria).

Pipelines

Review and click Next.

Step 2: Pipeline Source

Configure where Quark clones the code from.

Repository: A short label for this specific import (e.g., bactmap-v1).
Source: Defaults to Git.
Repository URL:
Public: https://github.com/my-org/my-repo.git
Private: git@github.com:my-org/my-repo.git

Pipelines

Revision: Set to HEAD, a branch name, a tag (e.g., 1.0.0), or a specific commit SHA.
Entry Point: The file Nextflow runs first (commonly main.nf).

Pipelines

Nextflow Version: Select the version required by your pipeline. Matching your local development version is recommended for consistency. Ensure the selected version is compatible with your pipeline's syntax (e.g., ensuring DSL2 support if using recent nf-core templates).

Pipelines

Review and click Next.

Step 3: Mount Required Datasets

Mounting is used for stable, pre-existing datasets that the pipeline requires for every run. You can connect either objectstores (S3/Azure/GCP) or filesystems (NFS/Lustre) datasets to your pipeline.

Click Add New Dataset Mount.

Pipelines

Search and select the required dataset (e.g., refdata-grch38). In the example below, refdata-1 (an objectstore dataset) is selected.

Pipelines

Repeat for all necessary files by clicking Add New Dataset Mount.
If you select a filesystem dataset, you will be prompted to add one or more directories of the dataset as mount paths. Ensure these paths match the hard-coded paths in your Nextflow scripts if you aren't using parameters for reference files.

Pipelines

Once all datasets required for the pipeline are mounted, click Next.

Step 4: Define Pipeline Parameters

Map your Nextflow parameters to UI elements i.e, for every params.name in your code, create a corresponding field. This creates the form users will fill out at runtime.

Pipelines

For each parameter, define:

Name: Must match the variable name in your Nextflow script.
Type: String, Integer, Float, Boolean, or File.

Pipelines

Type	Technical Behavior	Use Case
Boolean	Maps to a binary flag. In Nextflow, "On" passes `--param true` and "Off" passes `--param false`.	`--skip_trimming`, `--save_intermediates`.
String	Passes a text string.	Specific genome IDs or sample names.
Integer/Float	Passes numeric values. Quark validates that the input is a number before launching.	`--max_cpus`, `--min_depth`, `--threshold`.
File	Handles the staging of data into the Nextflow work directory.	FASTQ files, Sample Sheets (CSV).

Help Text: Write a clear prompt to guide the user (e.g., "Enter the minimum depth for variant calling").
Check Optional Field, or Hide Field as required for the parameter.

File Parameters: Upload vs. Mounted Data

If Type \= File, choose how the user provides data:

Browse: The user uploads a file from their local machine. Set Supported File Types (e.g., .csv or .fastq.gz) for validation.

Pipelines

Directory Only: The user selects from a dataset already mounted to the pipeline in Quark.

Pipelines

For parameters with String, Float or Integer Type, you will be prompted to specify a Field Type. Choose based on whether the pipeline's user can Input their value or select them from a list of Dropdown values that you specify.

Input: The user types a value manually.
Dropdown: The user chooses from a restricted list of allowed values (prevents typos). For example, if a pipeline only supports BWA or Bowtie2, a dropdown prevents the user from entering an unsupported aligner.

Pipelines

Example for updating an Integer parameter type:

Pipelines

Choose Boolean to create a simple toggle (e.g., --skip_trimming).

Example parameter inputs for nf-core/bactmap

For the nf-core/bactmap example, your first parameter may be:
- Name: Input CSV
- Type: File

Pipelines

Select Add New Parameter for adding a reference map.
- Name: Reference
- Type: File
- Browse or Directory Only: Directory Only
- Dataset: Name of your attached dataset (e.g. refdata-1)

Conditional Parameters (Optional)

Configure fields to appear only when a specific condition is met. (Use one parameter to enable/disable another parameter.)

Pipelines

Example: Only show the "Phred Score" input if the "Quality Trimming" (Boolean) toggle is enabled. Or, for instance, if a user selects Type: Paired-end, you can trigger a second File input field for Read 2.

Pipelines

Review all parameters and click Next.

Step 5: Advanced Validation (Env + Args)

Use this section to define how Quark validates the inputs before the run starts.

Pipelines

Env: Define environment variables required for validation. Set keys like NXF_DEBUG or specific API tokens required during the initialization phase.
Args: Map UI parameters to the argument names your validation logic expects.
Example: Map Input CSV (UI) → input (Nextflow).

Pipelines

Nextflow Config: Optionally paste a nextflow.config fragment that acts as a -coverride. For example, set resource profiles (e.g., process.executor = 'awsbatch') or hard-code parameters you don't want the wet-lab team to change.
Review and click Next.

Step 6: Visualization App

Attach a viewer so that wet-lab scientists can interpret results directly in Quark.

Click Add New Visualization App.

Pipelines

Choose the App Name (e.g., IGV for genomic alignment browsing).

Pipelines

Set a Display Name which will become a tab in the results page or the "Vizapp" dashboard on Quark.

Pipelines

Step 7: Review and Submit

Check your settings for accuracy. Once satisfied, click Submit.

Pipelines

Your pipeline will now appear under My Pipelines.

Pipelines

Next Step: Version and Publish your pipeline.

Troubleshooting

Problem	Cause	Resolution
Import fails at "Clone"	Authentication	Check URL format. Ensure Quark's SSH key is added to your Git provider for private repos.
Pipeline fails immediately	Entry Point/Version	Confirm main.nf is in the root and that the Nextflow version is compatible with your code.
Input data not found	Pathing/Mounts	Ensure "Directory Only" parameters are pointed to the correct mounted dataset.
Parameters missing in UI	UI Configuration	Check if "Hide Field" is toggled or if a "Conditional Parameter" rule is hiding the field.
Validation Failure	Arg Mapping	Ensure the Args names match the params names in your nextflow.config exactly.

Importing a Nextflow Pipeline into Quark

Overview

Before You Start

Step-by-Step Instructions

Start the Import

Step 1: General Pipeline Details

Step 2: Pipeline Source

Step 3: Mount Required Datasets

Step 4: Define Pipeline Parameters

File Parameters: Upload vs. Mounted Data

Scalar Parameters: Input vs. Dropdown

Example parameter inputs for nf-core/bactmap

Conditional Parameters (Optional)

Step 5: Advanced Validation (Env + Args)

Step 6: Visualization App

Step 7: Review and Submit

Next Step: Version and Publish your pipeline.

Troubleshooting