Discovering Data on Quark
Overview
Quark's Trusted Research Environment provides researchers with a Data Catalog — a browsable collection of clinicogenomic data assets available on the platform. The Data Catalog enables researchers to explore, evaluate, and qualify datasets before submitting a data access request, ensuring that time and resources are invested only in datasets that are relevant to a project's requirements.
Quark's TREs are designed to ensure that patient data remains confidential at all times while remaining Findable, Accessible, Interoperable, and Reusable for research, in accordance with FAIR data principles. To support this, the Data Catalog provides researchers with pre-access to aggregate dataset statistics — allowing them to understand a dataset's scope without exposing individual-level data.
Browsing the Data Catalog
The Data Catalog displays all datasets that are available to you on the platform. To browse the catalog:
- Log in to your TRE using your registered credentials.
- Select Datasets from the Navigation Menu on the left.
- Navigate to the Catalog tab. A list of available datasets will be displayed.

If you are looking for a specific dataset uploaded by your Project Administrator, use the Search bar at the top of the catalog to type in the name of the target dataset.
Viewing the Dataset Summary Dashboard
Each dataset in the catalog has an associated Dataset Summary dashboard. This dashboard presents the aggregate statistics and visualisations of a data distribution, enabling researchers to understand the scope of a dataset's relevance to their project requirements — without exposing individual-level data.
To view a Dataset Summary:
- Under the Catalog tab, select a dataset of interest from the list.
- This opens the Dataset Summary dashboard, which displays the metadata overview of the dataset.

What the Dataset Summary Includes
The Dataset Summary dashboard provides charts and metrics that illustrate the size and distribution of a dataset. Typical metadata fields displayed include:
- Year of Birth — Distribution of patient birth years within the dataset.
- Gender — Breakdown of patient gender demographics.
- Race — Racial composition of the dataset population.
- Ethnicity — Ethnic composition of the dataset population.
- Conditions — Prevalence of medical conditions within the dataset.
- Drugs — Distribution of drug prescriptions or exposures.
Hover over individual visualisations to view additional details and exact counts.

Evaluating Dataset Feasibility
Each Dataset Summary includes charts and metrics that illustrate the size and distribution of a dataset. This enables researchers to validate whether a potential cohort is feasible before submitting an access request — saving time and resources by confirming that the target dataset contains a viable population.
Key questions the Dataset Summary can help answer:
- Does the dataset contain a sufficient number of patients matching the study's demographic criteria?
- Are the relevant medical conditions and drug exposures represented in the dataset?
- Is the dataset large enough to produce statistically meaningful results?
Understanding OMOP CDM
Quark leverages the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) to enable a systematic data management and analysis workflow. OMOP transforms raw clinical data into standardised structures (tables) and content (vocabularies such as SNOMED and RxNorm).
This standardisation provides the following benefits:
- Day 1 Research Analysis — Since complex clinical data is available in a standardised, retrievable format, researchers can begin their analysis immediately upon gaining access.
- Reproducibility — By leveraging OMOP CDM, researchers can easily reproduce their analytical workflows across different workstations and environments.
- Federated Research and Analysis — Researchers may run federated studies across multiple TREs using a single codebase to extract data.
OMOP enables researchers to interact with their data effectively without compromising patient privacy, facilitating secure and collaborative research.
On Quark, OMOP-standardised tables are made accessible to researchers once their data access request is granted approval. These tables provide access to de-identified person-level data.
Further Reading: Researchers may find more information on how the OMOP CDM is used to facilitate data retrieval and analysis in the OMOP CDM documentation.
What's Next
Once you have explored the Data Catalog and identified datasets that are relevant to your research:
- Request access to an entire dataset — See Requesting Data Access to submit a dataset access request.
- Build a custom cohort — See Building a Cohort to define a subset of data using the Query Builder.