Skip to content

DNA Sarek Run

Step 1: Launch Quark

Open quark.invisibl.io and click on the Quark button, which opens a new tab.

Quark Page

The landing page will display a Dashboard that gives statistics about jobs run on the platform and their respective costs.

Landing Page

Click on the Launchpad tab displayed next to Runs.

Launchpad

Step 2: Access the Sarek Pipeline

The Launchpad page will show all templates available on the Quark platform.

Launchpad

Navigate to the Search bar and type the name of the required pipeline e.g. sarek

Sarek Run

Step 3: Select Pipeline Version

The UI will show the Sarek pipelines.

Template

Select the required pipeline version by clicking the small yellow box at the top right corner of the search result.

Quark Page

Select version 3.5.1. and click Run.

Step 4: Fill Run Parameters

A blank template loads on the right side of the page.

Template

Fill the following fields specified in the template. Certain parameters are mandatory (specified below) and are required to start the job run.

Example of a Blank Template:

Template

Name (Mandatory): <Job run Name>

Parameters:

  • Input CSV (Mandatory): Takes csv as input. The samplesheet format is as per nfcore specification. Select Files -> My files -> Directory where the samplesheet is present -> Samplesheet.csv

    More details may be found here: Sarek: Usage

  • Tools (Mandatory) (fill as required): This should be comma separated. Some of the tools currently supported on Quark include vep, strelka, haplotypecaller

  • split_fastq (Mandatory): Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting (integer). We use the following value as default - 10000000
  • intervals (optional): Tick when using WES. It takes target bed file in case of whole exome or targeted sequencing or intervals file.
  • IGenome Reference (Mandatory): Select the igenome-reference-refdata
  • Genome (Mandatory): Select the reference genome
  • Caches for annotation (optional)
  • Snpeff: Select the path from the UI
  • vep: Select the top level directory from vep_cache2. Do NOT select the specific path inside the top level directory. The pipeline will automatically detect the organism and build, based on the inputs.

Template Vep

  • enable analytics (optional): Select if analytics needs to be triggered post job run. If this is not selected, the data will not be indexed and cannot be used for cohort analysis.

An example of a filled template is shown below:

Template

Step 5: Review and Run the Sarek Pipeline

Click Review and review the parameters.

Review Page

To submit the Job, click Submit. If any changes are required, click Edit.

Creating a Samplesheet

The samplesheet can be made in any of the text editors (notepad, excel, notepad++, vim, etc.) but the saved file should have the extension .csv

(For this documentation we will use MS Excel to easily visualize different columns.)

As per nfcore standards, some columns are mandatory in the CSV file.

In our current example, we are using a Normal vs Tumour sample.

Patient Sex Status Sample Lane fastq_1 fastq_2
p1 NA 0 HG008_N lane_1 s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-N_Illumina_R1.fastq.gz s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-N_Illumina_R2.fastq.gz
p1 NA 1 HG008_T lane_1 s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-T_Illumina_R1.fastq.gz s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-T_Illumina_R2.fastq.gz
  • Patient: This column should have patient IDs. Since we are comparing Normal vs Tumour samples, the Patient ID should be same. (This is a requirement for Strelka)
  • Sex: If the gender of the patient is known it can be filled as male or female.
  • Status: This can be either 0 or 1. Normal = 0, Tumour = 1
  • Lane: This column should include the lane number from the sequencing experiment.
  • fastq_1: location of the forward read.
  • fastq_2: location of the reverse read.

Getting the location/S3 Path for the fastq files

  • Step 1: Click on “My Files” from the side bar.

My Files

  • Step 2: Navigate to the required directory/folder.

My Files

  • Step 3: Locate the file within the directory/folder, and click on the 3-dot icon. Click Copy Path

My Files

  • Step 4: Paste the copied path in the relevant column of the samplesheet file.

My Files

Once all the required columns have been filled in the samplesheet, save it as a .csv file.