Discovering Data on Quark

Overview

Quark's Trusted Research Environment provides researchers with a Data Catalog — a browsable collection of clinicogenomic data assets available on the platform. The Data Catalog enables researchers to explore, evaluate, and qualify datasets before submitting a data access request, ensuring that time and resources are invested only in datasets that are relevant to a project's requirements.

Quark's TREs are designed to ensure that patient data remains confidential at all times while remaining Findable, Accessible, Interoperable, and Reusable for research, in accordance with FAIR data principles. To support this, the Data Catalog provides researchers with pre-access to aggregate dataset statistics — allowing them to understand a dataset's scope without exposing individual-level data.

Browsing the Data Catalog

The Data Catalog displays all datasets that are available to you on the platform. To browse the catalog:

Log in to your TRE using your registered credentials.
Select Datasets from the Navigation Menu on the left.
Navigate to the Catalog tab. A list of available datasets will be displayed.

Data Catalog listing available datasets

If you are looking for a specific dataset uploaded by your Project Administrator, use the Search bar at the top of the catalog to type in the name of the target dataset.

Viewing the Dataset Summary Dashboard

Each dataset in the catalog has an associated Dataset Summary dashboard. This dashboard presents the aggregate statistics and visualisations of a data distribution, enabling researchers to understand the scope of a dataset's relevance to their project requirements — without exposing individual-level data.

To view a Dataset Summary:

Under the Catalog tab, select a dataset of interest from the list.
This opens the Dataset Summary dashboard, which displays the metadata overview of the dataset.

Dataset Summary dashboard showing aggregate statistics

What the Dataset Summary Includes

The Dataset Summary dashboard provides charts and metrics that illustrate the size and distribution of a dataset. Typical metadata fields displayed include:

Year of Birth — Distribution of patient birth years within the dataset.
Gender — Breakdown of patient gender demographics.
Race — Racial composition of the dataset population.
Ethnicity — Ethnic composition of the dataset population.
Conditions — Prevalence of medical conditions within the dataset.
Drugs — Distribution of drug prescriptions or exposures.

Hover over individual visualisations to view additional details and exact counts.

Hovering over Dataset Summary visualisations for details

Evaluating Dataset Feasibility

Each Dataset Summary includes charts and metrics that illustrate the size and distribution of a dataset. This enables researchers to validate whether a potential cohort is feasible before submitting an access request — saving time and resources by confirming that the target dataset contains a viable population.

Key questions the Dataset Summary can help answer:

Does the dataset contain a sufficient number of patients matching the study's demographic criteria?
Are the relevant medical conditions and drug exposures represented in the dataset?
Is the dataset large enough to produce statistically meaningful results?

Understanding OMOP CDM

Quark leverages the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) to enable a systematic data management and analysis workflow. OMOP transforms raw clinical data into standardised structures (tables) and content (vocabularies such as SNOMED and RxNorm).

This standardisation provides the following benefits:

Day 1 Research Analysis — Since complex clinical data is available in a standardised, retrievable format, researchers can begin their analysis immediately upon gaining access.
Reproducibility — By leveraging OMOP CDM, researchers can easily reproduce their analytical workflows across different workstations and environments.
Federated Research and Analysis — Researchers may run federated studies across multiple TREs using a single codebase to extract data.

OMOP enables researchers to interact with their data effectively without compromising patient privacy, facilitating secure and collaborative research.

On Quark, OMOP-standardised tables are made accessible to researchers once their data access request is granted approval. These tables provide access to de-identified person-level data.

Further Reading: Researchers may find more information on how the OMOP CDM is used to facilitate data retrieval and analysis in the OMOP CDM documentation.

What's Next

Once you have explored the Data Catalog and identified datasets that are relevant to your research:

Request access to an entire dataset — See Requesting Data Access to submit a dataset access request.
Build a custom cohort — See Building a Cohort to define a subset of data using the Query Builder.