Skip to content

Datasets

Overview

The Datasets section gives the DS Administrator visibility into all cohorts that researchers have requested for their projects, and the full catalog of datasets registered on the platform.

This section is central to data governance. The DS Administrator can monitor what sensitive data is being surfaced, review the status and history of cohort access, revoke access where appropriate, and browse the underlying dataset catalog that cohorts are built from.

Datasets is organised into two tabs: Cohorts and Catalog. The section opens on the Cohorts tab by default.

Navigation: Select Datasets from the left-hand navigation pane.


Catalog Tab

Select the Catalog tab to view the full list of datasets registered on the platform.

Screenshot: Catalog tab showing dataset cards with name, description, tags, and last updated date

Each card shows:

Field Description
Name The dataset's identifier, with a copy icon, useful for exporting the identifier for pipeline inputs.
Description A plain-language summary of the dataset's contents and scope.
Tags Key-value metadata describing the dataset (e.g., type: genomics, condition: lung cancer, nsclc, gene: EGFR). Tags vary by dataset and reflect the dataset's catalog metadata.
Last Updated The date the dataset entry was last updated, shown with a clock icon.

Use the search bar to find a dataset by name or description. The footer confirms when All catalog items loaded.

Note: The DS Administrator can view dataset catalog entries and their summaries. Publishing new datasets to the platform is the responsibility of the Infrastructure Administrator.

Dataset Summary Dashboard

Click on any dataset card to open its Cohort Summary dashboard. For a catalog-level dataset, this displays:

  • The dataset name, with a copy icon, and its description.
  • Number of Persons and Total Records aggregate tiles.
  • Distribution charts for Gender, Race, Ethnicity, and Top Conditions across the entire dataset.

Screenshot: Dataset summary dashboard showing aggregate statistics and distribution charts for a full dataset

This view gives the DS Administrator a high-level picture of a dataset's scope and population — useful for assessing its relevance to active projects before users build cohorts from it.


Cohorts Tab

The Cohorts tab lists every cohort that has been requested across your projects, displayed as cards.

Screenshot: Cohorts tab showing a grid of cohort cards with status badges, requesting user, date, and expiry information

Searching and Filtering

Use the search bar to find a cohort by name. Use the status dropdown (defaults to All) to filter the list to a specific cohort status.

Reading a Cohort Card

Each card shows:

Field Description
Name The cohort's name, with a copy icon to copy it to the clipboard.
Status badge The current status of the cohort access request — see Cohort Statuses below.
Description A summary of the cohort's composition, where provided.
Requesting User The user who submitted the cohort request.
Created Date The date the cohort request was created.
Expiry Tag Indicates how much longer the cohort's access grant remains valid — Expires in X day(s), Expires today, or Expired.

Tip: The icon next to the cohort name indicates the type of access request — a single-user icon typically represents a request for an entire dataset, while a group icon represents a derived cohort built from a subset of a dataset.

Cohort Statuses

Status Description
Pending Approval The request has been submitted and is awaiting review.
Approved The request has been approved and the requesting user has access to the cohort data.
Revoked Access to the cohort has been manually revoked by an administrator.
Expired The cohort's access grant has passed its expiry date.

Card Actions

Each cohort card has one or two action icons in the bottom-right corner:

Icon Action Available When
Timeline Open the Access Timeline panel for this cohort. Always
Revoke (red, crossed-out file) Revoke the user's access to this cohort. Approved or Expired

Access Timeline

Click the Timeline icon on any cohort card to open the Access Timeline panel on the right side of the screen.

Screenshot: Access Timeline panel showing a reverse-chronological list of approval events with user and timestamp

Screenshot: Access Timeline panel showing a reverse-chronological list of approval events with user and timestamp

The panel shows a reverse-chronological history of status changes for the cohort, including:

  • Approved — when the request was approved, and by whom.
  • Approval Requested — when the request was originally submitted, and by whom. Each entry shows the user responsible for the action and the exact timestamp. Use the search field at the top of the panel to filter the timeline if it contains a long history.

Revoking Cohort Access

If a user no longer requires access to a cohort — for example, their project has concluded, or a governance concern has been identified — the DS Administrator can revoke access directly.

To revoke access:

  1. Locate the cohort card with Approved or Expired status.
  2. Click the red Revoke icon in the bottom-right corner of the card.
  3. Confirm the action when prompted.

Screenshot: Cohort card with the Revoke cohort access tooltip visible on the red revoke icon

Once revoked:

  • The cohort's status badge updates to Revoked.
  • The revocation is recorded in the Access Timeline for the cohort.
  • The requesting user loses access to the cohort's underlying data.

Screenshot: Cohort card with the Revoke cohort access tooltip visible on the red revoke icon

Note: Revoking access does not delete the cohort definition itself — it only removes the user's access to the underlying data. The cohort remains visible in the Cohorts tab for audit purposes.


Approved Cohort Summary

Click on any approved cohort card (other than its action icons) to open the Cohort Summary dashboard for that cohort.

Screenshot: Cohort Summary dashboard for an approved cohort showing Top Drugs, Age Plot, KM Survival chart, and Cohort Table

The dashboard displays:

  • The cohort name, with a copy icon.
  • Status badges, including the cohort's Status (e.g., Approved, Revoked), Auto Updates (On/Off — indicates whether the cohort's results refresh automatically as the underlying dataset is updated), and its expiry tag.
  • The Search Query used to define the cohort (e.g., Dataset = impact).
  • The requesting user.
  • A Timeline icon to open the Access Timeline directly from this view.

Aggregate Statistics

Below the header, summary tiles show the Number of Persons, and Total Records in the requested. The dashboard includes a set of distribution charts that visualise the composition of the cohort, such as: Year of Birth, Gender Distribution, Race Distribution, Ethnicity Distribution, Top Conditions, Top Drugs (most prevalent drug exposures in the cohort).

Age Plot and KM Survival

For approved cohorts with active data access, two additional interactive charts are available:

  • Age Plot — age distribution filterable by Type (e.g., Drug, Condition) and Value (e.g., a specific drug name).
  • KM Survival — a Kaplan-Meier survival curve, filterable by the same Type and Value selectors, with Upper CI, Lower CI, and All series. Use the download icon to export the chart.

Cohort Table

For approved cohorts, a Person Table lists individual person-level records that were not previously available to the user.

Each column has a search field to filter the table. Use the Person / Specimen dropdown to switch the table between person-level and specimen-level views. Use the filter and download icons to refine and export the table data.

Note: For cohorts with Revoked status, the Cohort Summary dashboard may display its charts and tables as empty placeholders, reflecting that the underlying data is no longer accessible.


What's Next

  • Metadata — The quality and consistency of metadata directly affects how discoverable and queryable datasets and cohorts are. Configure validation rules to improve data findability.
  • Requests — New cohort and dataset access requests are reviewed and actioned here. The Cohorts tab in Datasets reflects the resulting status and provides ongoing oversight, including revocation.
  • Audit Logs — Review access events for datasets and cohorts, including approvals and revocations, to support compliance and data governance reporting.