DNA Sarek Run
Step 1: Launch Quark
Open quark.invisibl.io and click on the Quark button, which opens a new tab.
The landing page will display a Dashboard that gives statistics about jobs run on the platform and their respective costs.
Click on the Launchpad tab displayed next to Runs.
Step 2: Access the Sarek Pipeline
The Launchpad page will show all templates available on the Quark platform.
Navigate to the Search bar and type the name of the required pipeline e.g. sarek
Step 3: Select Pipeline Version
The UI will show the Sarek pipelines.
Select the required pipeline version by clicking the small yellow box at the top right corner of the search result.
Select version 3.5.1.
and click Run.
Step 4: Fill Run Parameters
A blank template loads on the right side of the page.
Fill the following fields specified in the template. Certain parameters are mandatory (specified below) and are required to start the job run.
Example of a Blank Template:
Name (Mandatory): <Job run Name
>
Parameters:
-
Input CSV (Mandatory): Takes csv as input. The samplesheet format is as per nfcore specification. Select
Files
->My files
-> Directory where the samplesheet is present ->Samplesheet.csv
More details may be found here: Sarek: Usage
-
Tools (Mandatory) (fill as required): This should be comma separated. Some of the tools currently supported on Quark include vep, strelka, haplotypecaller
- split_fastq (Mandatory): Specify how many reads each split of a FastQ file contains. Set
0
to turn off splitting (integer). We use the following value as default -10000000
- intervals (optional): Tick when using WES. It takes target bed file in case of whole exome or targeted sequencing or intervals file.
- IGenome Reference (Mandatory): Select the
igenome-reference-refdata
- Genome (Mandatory): Select the reference genome
- Caches for annotation (optional)
- Snpeff: Select the path from the UI
- vep: Select the top level directory from
vep_cache2
. Do NOT select the specific path inside the top level directory. The pipeline will automatically detect the organism and build, based on the inputs.
- enable analytics (optional): Select if analytics needs to be triggered post job run. If this is not selected, the data will not be indexed and cannot be used for cohort analysis.
An example of a filled template is shown below:
Step 5: Review and Run the Sarek Pipeline
Click Review and review the parameters.
To submit the Job, click Submit. If any changes are required, click Edit.
Creating a Samplesheet
The samplesheet can be made in any of the text editors (notepad, excel, notepad++, vim, etc.) but the saved file should have the extension .csv
(For this documentation we will use MS Excel to easily visualize different columns.)
As per nfcore standards, some columns are mandatory in the CSV file.
In our current example, we are using a Normal vs Tumour sample.
Patient | Sex | Status | Sample | Lane | fastq_1 | fastq_2 |
---|---|---|---|---|---|---|
p1 | NA | 0 | HG008_N | lane_1 | s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-N_Illumina_R1.fastq.gz | s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-N_Illumina_R2.fastq.gz |
p1 | NA | 1 | HG008_T | lane_1 | s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-T_Illumina_R1.fastq.gz | s3://quark-demo-platform-data/artifacts/mpsdimamay-stlukes-com-ph/slmc/uploads/sarek/HG008-T_Illumina_R2.fastq.gz |
- Patient: This column should have patient IDs. Since we are comparing Normal vs Tumour samples, the Patient ID should be same. (This is a requirement for Strelka)
- Sex: If the gender of the patient is known it can be filled as male or female.
- Status: This can be either
0
or1
. Normal = 0, Tumour = 1 - Lane: This column should include the lane number from the sequencing experiment.
- fastq_1: location of the forward read.
- fastq_2: location of the reverse read.
Getting the location/S3 Path for the fastq files
- Step 1: Click on “My Files” from the side bar.
- Step 2: Navigate to the required
directory/folder
.
- Step 3: Locate the file within the
directory/folder
, and click on the 3-dot icon. ClickCopy Path
- Step 4: Paste the copied path in the relevant column of the samplesheet file.
Once all the required columns have been filled in the samplesheet, save it as a .csv
file.