Skip to content

Building a Cohort

Overview

A cohort defines a subset of a dataset that meets the specific phenotypic and/or genotypic criteria required by your research study. The Query Builder provides an interface for constructing cohort criteria without requiring knowledge of database query languages.

Once approved, cohorts serve a dual purpose: they provide access to de-identified, person-level data for analysis, and they can be passed directly as inputs to pipeline runs.

Navigation: Select Datasets from the left-hand navigation pane. The section opens on the Cohorts tab by default. To browse source datasets first, see Browsing the Catalog.

The Cohorts Tab

The Cohorts tab displays all cohorts you have saved or requested, each represented as a card.

Cohorts Screen

Click a card to view its composition, edit the definition, or use it as an input when launching a pipeline run. Use the search bar to find an existing cohort by name.

Cohort Building Workflow

Step Description
1. Create a Cohort Define your cohort's criteria using the Query Builder.
2. Visualise the Cohort Review aggregate statistics and visualisations. Refine criteria until the cohort meets your study requirements.
3. Request Cohort Access Submit an access request to your DS Administrator for review — see Requesting Data Access.

Step 1: Creating a Cohort

Accessing the Query Builder

  1. Select Datasets from the navigation menu.
  2. Navigate to the Cohorts tab.
  3. Click Add Cohort in the top-right corner.

Cohorts tab with Add Cohort button

This opens the Query Builder interface.

Quary builder interface

Defining Cohort Criteria

The Query Builder constructs cohorts using a row-based filtering interface. For each criterion, define:

  • Field — the data attribute to filter on (e.g., Year of Birth, Gender, Condition, Drug).
  • Operator — the logical comparison to apply (e.g., =, !=, >, >=, <, <=).
  • Value — the specific value or range to filter for (e.g., 1980, Atrial Fibrillation).

Query Builder with a single criterion defined

Adding Multiple Criteria

Click + Add Query Block below the existing criterion row to add additional filters. Criteria are combined to narrow the cohort to the exact population your study requires. For example:

Field Operator Value
Year of Birth > 1980
Conditions = Atrial Fibrillation
Drug = metoprolol tartrate

Query Builder with multiple criteria

Query Builder Filled

Tip: When entering values for Conditions or Drug fields, begin typing the name and select the appropriate entry from the drop-down suggestions.

Once all criteria are defined:

  1. Review each row to confirm accuracy.
  2. Click Search at the bottom-right of the Query Builder.

The search retrieves matching records from all datasets available to you.

Step 2: Visualising the Cohort

Running a search generates a Cohort Summary dashboard. Use this to validate your cohort before requesting access.

Cohort Summary Dashboard

The Cohort Summary includes:

  • Search query string — the criteria used to generate the cohort, displayed for easy review and modification.
  • Aggregate statistics — key demographic and clinical metrics for the retrieved population.
  • Dynamic charts — graphical distributions across cohort attributes.

Cohort Summary dashboard with charts and statistics

Reviewing Cohort Attributes

The summary dashboard covers:

  • Age Distribution — age distribution of patients in the cohort (recorded at time of first drug administration).
  • Gender Distribution — breakdown by gender.
  • Condition Prevalence — distribution of medical conditions.
  • Drug Exposure — summary of drug prescriptions or exposures.

Age Distribution chart within Cohort Summary

Refining the Cohort

If the cohort does not yet meet your requirements:

  1. Click the Modify Query icon next to the search query string at the top of the dashboard.
  2. Adjust, add, or remove criteria in the Query Builder.
  3. Click Search to regenerate the summary.
  4. Review the updated results. Repeat until the cohort size and composition are satisfactory.
  5. Request Cohort Access.

What's Next