Infrastructure Administrator Guide

Role Overview

The Infrastructure Administrator (Infra Admin) is responsible for provisioning and maintaining the underlying infrastructure that powers the Quark platform — configuring the cloud accounts, compute environments, clusters, datasets, and workstation templates that all other roles depend on.

Navigation: When logged in as an Infrastructure Administrator, the left-hand navigation pane provides access to all sections listed below.

Menu Item	Description
Datasets	Publish datasets to the platform catalog, manage their metadata and access committees, and review the aggregate statistics of published data.
Computes	Create and manage compute configurations — cluster, instance types, capacity type, and workload permissions — that back pipelines, workstations, and other platform workloads.
Providers	Connect and manage cloud provider accounts (e.g., AWS) and version control integrations (e.g., Git via SSH key) that the platform draws on for compute and storage.
Domains	Configure the network domains used by the platform, including ingress and routing rules for platform services.
Workstation Templates	Define reusable workstation configurations — operating system, instance specifications, AMI, VPC, and subnet — that researchers can select and launch on demand.
Clusters	Create and manage the Kubernetes clusters (e.g., Amazon EKS) that underpin platform workloads, including node pools, load balancers, and observability configuration.
Cloud Resources	View and manage the cloud resources provisioned by the platform, providing visibility into the infrastructure footprint across accounts and regions.
Reference Data	Manage shared reference datasets — such as reference genomes — that pipelines and workstations draw on without requiring per-user copies.
Settings	Configure platform-wide infrastructure settings, including budget alert thresholds, enforcement behaviour, and other global parameters.

Core Responsibilities

Cloud and Cluster Provisioning

The Infrastructure Administrator is responsible for connecting the cloud accounts the platform runs on and maintaining the clusters that execute platform workloads. This includes setting up Providers (cloud and Git integrations), creating Clusters (Kubernetes infrastructure), and configuring Domains (network routing).

Compute Configuration

Every pipeline run, workstation session, and visualisation app is backed by a compute configuration. The Infrastructure Administrator creates these configurations in Computes, defining which cluster, instance types, capacity model (on-demand or spot), and workload categories are permitted. Compute configurations are scoped to specific projects, so researchers only see and consume what has been allocated to them.

Dataset Publishing

Datasets are uploaded to cloud storage outside the platform, then published through the Datasets section to make them discoverable in the catalog. The Infrastructure Administrator controls which datasets appear on the platform, and can review their aggregate statistics.

Workstation Template Management

Researchers launch workstations by selecting from a menu of templates. The Infrastructure Administrator creates and maintains these templates in Workstation Templates — configuring the operating system, AMI, instance specifications, VPC, subnet, and project scope for each. Well-designed templates reduce researcher friction and enforce infrastructure standards consistently.

Reference Data and Shared Resources

Shared reference data — such as reference genomes used across multiple pipeline runs — is managed via Reference Data. Centralising these resources avoids redundant copies, reduces storage cost, and ensures all pipelines use a consistent, approved version.

How the Infrastructure Administrator Role Fits with Other Roles

The Administrator depends on the Infrastructure Administrator to have compute, clusters, and datasets available before they can create projects, allocate budgets, and onboard users.
The DS Administrator consumes the infrastructure the Infra Admin provisions — using compute configurations to back workstations, and selecting workstation templates when provisioning environments for researchers.
Researchers interact with infrastructure only indirectly — through the workstations they launch, the pipelines they run, and the datasets they query. The quality and appropriateness of the infrastructure provisioned here directly determines their experience.

Suggested Starting Point

Connect cloud accounts and Git integrations — Providers
Create the underlying cluster infrastructure — Clusters
Configure network domains — Domains
Create compute configurations for the projects that will run — Computes
Publish the datasets researchers need to access — Datasets
Create workstation templates appropriate for each research use case — Workstation Templates
Add any shared reference data the pipelines rely on — Reference Data