Concepts

This section provides a guide for users on topics and terminologies frequently used on Quark.

Scientific Pipelines

On Quark, a scientific pipeline refers to a containerised workflow of bioinformatics tools for multi-omics data analysis.

For example, the Nextflow pipeline nf-core/sarek is a DNA Sequencing pipeline. Users can upload their DNA Sequencing sample data to the pipeline, along with a reference genome, to obtain secondary and tertiary analysis insights. All the bioinformatics tools needed for analysis are stitched together and containerised within the pipeline.

Users can access ready-to-run pipelines on Quark by selecting Pipelines from the Navigation Pane and clicking Launchpad.

More details on how to access and launch Scientific Pipelines may be found in the User Guide under Pipelines.

Workstations

On Quark, users can securely collaborate on projects through an access-limited Trusted Research Environment, or Workstation. A Trusted Research Environment (TRE) is a digitally secure environment that rigorously controls the flow of data at all times.

Data cannot leave the TRE without explicit permission from the Project Administrator. TREs add a layer of security that enables fine-grained access-control over data assets. Additionally, TREs enable full remote-access of data without compromising on data security and protection standards.

A Workstation requires prior approval and authorization by the Project Administrator to ensure that data flow is controlled and monitored at all times.

The Project Administrator and Workstation Users can audit, log and control Workstation Usage, ensuring real-time governance of data and cost-visibility.

Workstation capabilities include file management, request approval workflows, resource monitoring, lifecycle management, cost tracking, and event logging.

More details on how to request Workstations on Quark may be found in the User Guide under Workstations.

More details on the administration of TREs may be found in the Administrator Guide.

Details for creating Workstation Templates for TRE Admins may be found in the Administrator Guide.

Data Management

Quark integrates with Amazon Web Services (AWS) to streamline all logistics of data management such as storing, handling and accessing data.

On Quark, data is stored in Datasets, Data Locations, and My Files.

Datasets

Datasets contain all the reference data needed to run any multi-omics pipeline on Quark.

For example, a DNA Sarek pipeline requires a reference genome dataset, while the AlphaFold pipeline requires pdb indices. These are available as Datasets on Quark.

All reference data are managed internally from datasets, and are available at both the Organisation and Workstation Project levels.

More details on Datasets may be found in the Administrator Guide under Datasets.

Data locations

Data locations host internal sequencing data (raw sequencing data files like fastq) from multiple sources that include Amazon S3, Amazon EFS, Amazon FSx, and NFS.

This data is available at both the Organisation and Workstation Project levels.

My Files

Users can store their own data in My Files, and can upload/download data to run pipelines on Quark.

The MyFiles feature provides users with a centralized location to manage files efficiently within their workspace. Users can create folders, upload files from various sources (local system, HTTP, S3, SRA/ENA, and SFTP), and monitor their upload activities.

More details on how to upload data to My Files may be found in the User Guide under My Files.