Building a Cohort

Overview

A cohort on Quark defines a subset of a larger dataset that meets the specific phenotypic and/or genotypic criteria required by a research study. Building a cohort enables researchers to access de-identified, person-level data that is tailored to their project requirements.

The cohort-building process on Quark is designed to prioritise data discovery without exposure — researchers can validate the viability of their cohorts using aggregate statistics and visualisations, without compromising patient privacy at any stage.

Quark's cohort data is stored as standardised tables mapped to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), ensuring that the data is structured, retrievable, and ready for analysis from Day 1.

Prerequisites

Before building a cohort, ensure you have:

Logged in to your TRE with registered credentials.
Explored the Data Catalog to understand which datasets are available and relevant to your study. See Discovering Data on Quark for guidance.

Cohort Building Workflow

The following steps outline the typical user workflow for building a cohort:

Step	Description
1. Create a Cohort	Define your cohort's phenotypic and genotypic criteria using an intuitive Query Builder.
2. Visualise the Cohort	Review your cohort's attributes using charts and visualisations that provide a metadata overview. Refine criteria until the resulting attributes meet your project's requirements.
3. Request Cohort Access	Once satisfied with the cohort, fill and submit the access request form to the Data Access Committee. See Requesting Data Access.

Step 1: Creating a Cohort

The Query Builder provides an intuitive interface for defining cohort criteria without requiring knowledge of complex programming languages or database queries.

Accessing the Query Builder

Within the TRE, select Datasets from the Navigation Menu on the left.
Select the Cohorts tab at the top of the page.
Click the Add Cohort button on the top-right corner of the page.

Cohorts tab with Add Cohort button

This opens the Query Builder interface.

Defining Cohort Criteria

The Query Builder enables researchers to construct their cohort using filtering criteria provided in drop-down menus. The available filtering options include:

Data attributes: Year of Birth, Gender, Race, Ethnicity, Condition, Drug, and others.
Logical operators: =, !=, >, >=, <, <=, AND, OR, and others.
Value ranges: Select or type specific values from the drop-down options.

To define a criterion:

Select a Field from the drop-down menu (e.g., Year of Birth).
Select an Operator from the drop-down menu (e.g., >).
Select or enter a Value from the drop-down menu (e.g., 1980).

Query Builder with a single criterion defined

Adding Multiple Criteria

To add additional search criteria to refine your cohort:

Click the + Add New button below the existing criterion row.
Define the new criterion using the Field, Operator, and Value drop-downs.

For example, a multi-criteria cohort might include:

Field	Operator	Value
Year of Birth	`>`	`1980`
Conditions	`=`	`Atrial Fibrillation`
Drug	`=`	`metoprolol tartrate 25 MG Oral Tablet`

Query Builder with multiple criteria

Tip: When selecting values for the Conditions or Drug fields, begin typing the condition or drug name, then select the appropriate entry from the drop-down suggestions.

Running the Search

Once all criteria have been defined:

Review each row in the Query Builder to confirm accuracy.
Click the Search button at the bottom-right corner of the Query Builder.

The search retrieves data from all datasets that match the entered criteria.

Step 2: Visualising the Cohort

Running a search generates a Cohort Summary dashboard that summarises all retrieved data matching the defined criteria. This dashboard enables researchers to visualise and validate their cohort before requesting access.

Cohort Summary Dashboard

The Cohort Summary includes:

Search Query string — Displays the query criteria used to generate the cohort, allowing for easy review and modification.
Aggregate statistics — Shows key demographic and clinical metrics for the cohort.
Dynamic charts and visualisations — Provides graphical representations of the cohort's distribution across various attributes.

Cohort Summary dashboard with charts and statistics

Reviewing Cohort Attributes

Researchers may review the following cohort attributes within the dashboard:

Age Distribution — A chart showing the age distribution of patients in the cohort (ages shown are recorded at the time of first drug administration).
Gender Distribution — Breakdown of the cohort population by gender.
Condition Prevalence — Distribution of medical conditions within the cohort.
Drug Exposure — Summary of drug prescriptions or exposures within the cohort.

Age Distribution chart within Cohort Summary

Refining the Cohort

If the resulting cohort does not meet your project's requirements, you may refine the query:

Click the Modify Query icon next to the Search Query string at the top of the dashboard.
Adjust the existing criteria, or add/remove criteria using the Query Builder.
Click Search again to regenerate the Cohort Summary with updated results.
Review the updated metrics and visualisations. Repeat until the cohort size and characteristics are satisfactory.

Exploring Approved Cohort Dashboards

Once a cohort access request has been approved by the Data Access Committee, researchers gain access to additional dashboards that provide deeper insight into the cohort's data. These dashboards are accessible from the Cohorts tab under Datasets.

Cohort Dashboard

Clicking on an approved cohort opens a dashboard similar to the Cohort Summary generated during the search phase. This dashboard includes the same aggregate statistics and visualisations, along with additional features:

Kaplan-Meier Survival Curve (KM Curve) — Displayed when scrolling down within the dashboard. This curve provides survival analysis insights for the cohort population.

Kaplan-Meier survival curve within Cohort Dashboard

Person Table — A table listing anonymised person-level records. Researchers can search for specific records by typing a Person ID in the search field.
Specimen Table — A table listing specimen-level records. To view this table, click the drop-down in the top-right corner of the table area and select Specimen.

Person and Specimen table toggle

Person Dashboard

The Person Dashboard displays the anonymised medical history for an individual patient within the cohort. To access a Person Dashboard:

Navigate to the Person Table within the Cohort Dashboard.
Type a Person ID into the search field within the Person Id column (e.g., 64).
Click on the Person ID number in the results to open the Person Dashboard.

Person Dashboard with medical history

The Person Dashboard includes the following sections:

Medical History

Displays the patient's anonymised medical history, including a Sankey Chart that visualises the progression of conditions and treatments over time.

Person Dashboard Sankey Chart

Drug Exposure

Researchers can toggle between two views to examine the patient's drug exposure timeline:

Plot View — A graphical timeline showing drug exposure periods.
Table View — A tabular listing of drug exposure records, including drug name, start date, and end date.

Person Dashboard Drug Exposure plot view

Person Dashboard Drug Exposure table view

Specimen Data

The Specimen tab within the Person Dashboard provides a summary of genomic variants found in the patient's sample.

Person Dashboard Specimen tab

Specimen Dashboard

The Specimen Dashboard provides an in-depth view of genomic variant data associated with a specific specimen. To access the Specimen Dashboard:

Navigate to the Cohort Dashboard and scroll to the table section at the bottom.
Click the drop-down and select Specimen.
Type a Specimen ID into the search field (e.g., 39).
Click on the Specimen ID number in the results to open the Specimen Dashboard.

Specimen Dashboard overview

The Specimen Dashboard includes the following information:

Clinical Significance — Classification of the clinical significance of variants.
Variant Class — Types of variants identified (e.g., SNP, insertion, deletion).
Impact — Predicted functional impact of each variant.
Consequence — The consequence type for each variant (e.g., missense, synonymous).
Variant Occurrence Table — A detailed table listing all variant occurrences within the specimen, including genomic coordinates, allele frequencies, and annotation details.

Specimen Dashboard variant occurrence table

What's Next

Once your cohort is defined and validated:

Request access to your cohort or dataset — See Requesting Data Access.
Set up a workstation to begin analysis — See Dataset Analysis using Workstations.