The dataset
The first open-access, harmonized multi-machine benchmark for fusion equilibrium — curated for Thomson-diagnostic availability, feature completeness, and EFIT-reconstruction quality.
The challenge releases 9,113 DIII-D shots and 2,416 MAST shots
(~11,500 total, ~35 GB), each filtered so that the inputs and EFIT ground truth a model needs are
actually present and reconstruction-quality. The full corpus is hosted on Hugging Face; the
datasets library supports streaming.
Every shot is one row of a Parquet file. Each time series and profile is a nested array
inside that single row — so df['efit_psirz'].iloc[0] is a list of 2D flux grids, not a
column of scalars. Always take .iloc[0], then index into the nested array.
Six sample shots (3 DIII-D, 3 MAST) ship in the starter kit so you can explore the schema immediately, before the full release.
What differs between DIII-D and MAST
The differences below are load-bearing for any code touching the data.
| DIII-D (conventional) | MAST (spherical) | |
|---|---|---|
| Type | Conventional tokamak (D-shaped) | Spherical tokamak (low aspect ratio) |
| Facility | DIII-D · General Atomics, San Diego | MAST · UKAEA Culham, UK |
| Shots (challenge set) | 9,113 | 2,416 |
| Target flux grid | 65 × 65 | 65 × 129 (~50% NaN central column) |
| Shaping coils | 18 F-coils (F1A–F9B) + ECOILA, bcoil | 10 P-coils (P2L–P6U) + sol, tf, efps |
| Magnetics time base | Per-signal (~49k samples) | Single shared (~15k samples) |
| Typical flux value | ≈ −0.25 V·s/rad | ≈ +0.05 V·s/rad |
Signals & targets
Target — EFIT ψ(R,Z)
Predict thisefit_psirz: a sequence of poloidal flux maps (one per efit_times slice) — the ground
truth your model reconstructs, kept exactly as EFIT produced it.
Magnetics — coil currents
Inputdsep X-point gap.
Sampled at tens of kHz on per-signal (DIII-D) or shared (MAST) time bases.
Thomson scattering
InputTe (eV) and density ne
(m⁻³) profiles, on their own time bases.
Modeling rules the data treats as load-bearing
Align time bases — but only the inputs
Split by shot, never by timestep
Normalize your inputs
Compress the target
NaN by design — not missing data.
Open by design
License
Format
datasets. Identifiers
DIII-D_182494, MAST_25607.