Skip to content

Publishing a Dataset

Overview

Publishing a dataset makes it visible in the TRE Data Catalog, where researchers can browse its summary and submit data access requests. Datasets must be uploaded to a connected cloud storage location before they can be published — this guide assumes the upload step is complete.

Steps to Publish a Dataset

  1. Navigate to Datasets in the left-hand navigation pane.

    The Datasets dashboard displays all datasets previously published, along with their status and connected cloud accounts.

    Screenshot: Datasets dashboard listing previously published datasets with status and connected cloud accounts

  2. Click Publish Dataset in the top-right corner of the dashboard.

  3. In the Publish Dataset window, fill in the following details:

    Screenshot: Publish Dataset form showing Name, Summary, Tags, Cloud Account, and Dataset fields

    • Name (mandatory) — A unique, human-readable name for the dataset as it will appear in the Data Catalog.
    • Summary (mandatory) — A short description of the dataset's contents, intended audience, and any relevant context researchers should know before requesting access.
    • Tags (optional) — Add searchable metadata as Key and Value pairs (for example, disease: oncology or cohort-size: 5000). Tags improve discoverability in the catalog.
    • Cloud Account — Select the cloud account where the dataset is stored, from the accounts registered under your platform's infrastructure configuration (for example, your organisation's primary account, or a public registry such as AWS iGenomes for reference genomes and shared resources).
    • Dataset — Once a Cloud Account is connected, this dropdown is populated with datasets available in that account. Select the dataset you want to publish.
    • Data Access Committee (DAC) — Select one or more administrators from the dropdown list. These individuals will be responsible for reviewing and approving (or rejecting) any data access requests submitted by TRE users for this dataset.

    Screenshot: Publish Dataset form with the Data Access Committee dropdown selected

  4. Review your entries and click Create.

    Screenshot: Final review screen before publishing the dataset

    The dataset now appears on the Datasets dashboard and becomes discoverable to researchers in the TRE Data Catalog.

  5. You can view and review your dataset's attributes by clicking on the published dataset in the dashboard. This opens the Dataset Summary dashboard, which displays data visualisations capturing demographics and aggregate statistics of the uploaded dataset — the same summary view researchers see when evaluating the dataset, plus access to the underlying record-level data (Person and Specimen tables) that remains hidden from researchers until the Data Access Committee approves their request.

Screenshot: Dataset Summary dashboard showing demographic charts and aggregate statistics

Screenshot: Dataset Summary dashboard, additional view

Additional Notes for Datasets

  • The Dataset dropdown remains disabled until a Cloud Account is successfully connected.
  • At least one administrator must be assigned to the Data Access Committee — access requests cannot be processed without an assigned reviewer.
  • Dataset tables (such as Person tables) are not visible to researchers before they're granted access by the Data Access Committee. Summary statistics and visualisations remain viewable so that researchers can create data subsets or cohorts that match their project requirements.

What's Next

  • Workstation Templates — configure workstation templates that researchers can select and launch once they have data access.
  • Managing Cohort Access — once a dataset is published, ds-admins and admins oversee the cohort and access requests built against it.