Testing
Testing philosophy
SPECTRAX-GK enforces high coverage on critical solver modules and requires physics-based checks for each numerical component. The test suite is designed to be:
pedagogic: each test explains the concept being validated
deterministic: no stochastic outcomes or tolerance drift
future-proof: targeted at invariants and well-posed regressions
Current testing target
The package-wide target is 95% coverage, but the coverage number is a guardrail rather than the scientific objective. New tests should be accepted because they protect one of the following contracts:
an implemented equation or reduced physical limit;
a numerical method, convergence rate, or conservation/free-energy identity;
a geometry, normalization, or diagnostic convention;
a benchmark artifact and its documented fit/window policy;
an autodiff contract checked against finite differences, tangent tests, or an adjoint consistency relation;
a regression for a bug found in parity, restart, runtime, plotting, or geometry-adapter work.
Long reference-code runs and office/GPU comparisons should not be hidden inside the default local suite. They should live behind explicit manifests or CI/manual lanes so local tests remain fast enough for routine development.
The refactor branch also carries a machine-readable validation/coverage
manifest at tools/validation_coverage_manifest.toml. It is checked by
tools/check_validation_coverage_manifest.py and maps each critical module
to reference anchors, physics contracts, numerical contracts, fast tests,
tracked artifacts, and next tests. This is the working guardrail for reaching
95% package-wide coverage without adding shallow tests that do not validate the
implemented physics or numerics.
The manifest now has two levels of coverage ownership:
direct
[[modules]]rows for public, high-risk, or actively refactored surfaces that need their own contracts and artifact traceability;owned_modulesentries for smaller implementation modules whose fast-test responsibility is intentionally carried by a direct row.
The checker inventories src/spectraxgk and fails if a package module is not
directly listed, owned by a listed row, or explicitly excluded as package
plumbing such as __init__.py or version metadata. This makes source
extractions fail fast until the coverage owner, fast tests, and next-test debt
are declared. New manifest tests for this policy should stay cheap and live in
tests/test_validation_coverage_manifest.py or
tests/test_refactor_coverage_*.py.
Manifest paths are intentionally concrete. fast_tests and
artifact_paths must name files, not directories or placeholder buckets, and
list fields must not repeat the same module, test, artifact, contract, or next
test. The optional Cobertura XML pass also rejects duplicate measured entries
for the same package module so coverage enforcement cannot depend on whichever
duplicate XML row happened to be parsed last.
The wide CI matrix also feeds the manifest checker with coverage-wide.xml.
That pass enforces the declared package-wide coverage target and writes the
measured summary to docs/_static/validation_coverage_manifest_summary.json.
Module-level rows in that summary are a debt map: they identify direct and
owned modules below their row target, but release blocking remains tied to the
package-wide gate unless the CI command is explicitly upgraded to
--enforce-module-coverage.
Optional external-backend artifact builders that require local vmec_jax or
booz_xform_jax checkouts are kept out of the default package-wide coverage
denominator when the public CI cannot install or execute those repositories.
Their fast contracts are still covered by mocked backend tests and low-level
geometry/numerics tests, while the real physics claims are validated by the
tracked JSON/PDF artifact gates documented below. This avoids treating
unavailable optional backends as missing unit coverage while preserving the
requirement that every differentiable-geometry claim has an explicit
finite-difference or parity artifact.
Test categories
Basis tests: orthonormality and recurrence checks.
Operator tests: Hermite ladder streaming and mode extraction.
Benchmark tests: loading reference data and growth-rate fitting.
Physics sanity checks: conservation properties under simplified limits.
Response-function tests: zonal-flow residuals, GAM damping, and late-time envelopes.
Spectral tests: fluctuation spectra and windowed nonlinear statistics.
Autodiff tests: tangent, finite-difference, and inverse/UQ consistency.
Unit tests (numerical invariants)
Representative unit checks include:
Hermite/Laguerre ladder identities:
spectraxgk.linear.apply_hermite_v(),spectraxgk.linear.apply_laguerre_x().Quasineutrality consistency:
spectraxgk.linear.quasineutrality_phi().Streaming term validation:
spectraxgk.linear.grad_z_periodic(),spectraxgk.linear.streaming_term().Growth-rate fitting windows:
spectraxgk.analysis.select_fit_window(),spectraxgk.analysis.fit_growth_rate_auto().Grid construction and normalization:
spectraxgk.grids.build_spectral_grid().Normalization contract consistency:
spectraxgk.normalization.get_normalization_contract(),spectraxgk.normalization.apply_diagnostic_normalization().Modular RHS equivalence:
spectraxgk.linear.linear_terms_to_term_config(),spectraxgk.terms.assemble_rhs_cached(),spectraxgk.linear.linear_rhs_cached().
These tests live in tests/test_linear.py and tests/test_grids.py and
tests/test_normalization.py and tests/test_terms_assembly.py and are
designed to fail deterministically if a discretization, assembly path, or
normalization changes.
Physics regression tests
The physics-focused tests exercise reduced or symmetry limits that should remain invariant across refactors:
Term toggles:
spectraxgk.linear.LinearTermsswitches individual operator components without changing the equation structure.Mirror/curvature activation: nonzero drift terms create nonzero response when streaming and drive are turned off.
Diamagnetic drive structure: the energy-weighted drive produces a nonzero response when gradients are enabled and vanishes at \(k_y=0\).
Normalization scaling:
rho_starrescales the cached \(k_y\) values exactly.End-cap damping: the linked-boundary taper only affects \(k_y>0\) modes and vanishes when
damp_ends_amp = 0.
These checks are in tests/test_linear.py and are meant to be future-proof
physics invariants.
Benchmark regression tests
Benchmark regression tests validate the Cyclone base case reference dataset and growth-rate extraction pipeline:
Loading the reference CSV via
spectraxgk.benchmarks.load_cyclone_reference().Running short linear scans via
spectraxgk.benchmarks.run_cyclone_linear()andspectraxgk.benchmarks.run_cyclone_scan().Reduced ky regression with tightened tolerances on the field-aligned grid.
These tests live in tests/test_benchmarks.py and tests/test_full_operator.py.
Literature-anchored response and spectrum tests
The next research-facing additions should follow the published benchmark observables rather than inventing repo-local metrics:
Rosenbluth-Hinton / GAM response in shaped tokamaks: use the shaped benchmark conventions summarized by Merlo et al. to track residual levels and GAM damping alongside the linear shaping scan.
W7-X zonal-flow response: use the stella/GENE W7-X benchmark conventions for residual level and damping envelope.
W7-X fluctuation spectra: follow the W7-X Doppler-reflectometry comparison work for density and zonal-flow frequency spectra. The current closed artifact is a simulation-spectrum diagnostic; experimental transfer functions remain outside the release claim.
Electromagnetic stellarator verification: adopt a heavy-electron electromagnetic lane before realistic-mass claims, following the GENE-3D verification pattern.
These should be implemented as reproducible, script-owned figure/artifact lanes, not as ad hoc notebooks.
The first reusable tooling for this lane now exists:
spectraxgk.zonal_validation.reference_residual_table()spectraxgk.zonal_validation.tail_trace_metrics()tools/plot_zonal_flow_response.pytools/plot_zonal_flow_response_from_output.pytools/generate_miller_zonal_response_pilot.pytools/generate_w7x_zonal_response_panel.pytools/plot_w7x_zonal_contract_audit.pytools/plot_w7x_zonal_moment_tail_audit.pytools/plot_w7x_zonal_closure_ladder.pytools/write_w7x_zonal_closure_sweep.pytools/plot_w7x_zonal_state_convention_audit.pytools/plot_w7x_zonal_recurrence_sweep.pytools/plot_w7x_fluctuation_spectrum_panel.py
The gate-report helpers are intentionally small and JSON-ready. They should be
used by manuscript refresh scripts so every reported artifact has the same
observable, reference, absolute/relative tolerance, and pass/fail convention.
The companion coverage manifest should be updated when a new gate helper,
artifact script, or refactor extraction changes module ownership or test
responsibility.
tools/generate_miller_zonal_response_pilot.py now writes the first such
gate report into its JSON metadata for the residual, GAM frequency, and signed
GAM growth/damping comparison against the Merlo Case-III paper-scale read-off.
tools/generate_kbm_reference_overlay.py writes the same gate structure for
the raw KBM eigenfunction overlay, using a strict overlap/relative-L2 policy.
The current refreshed KBM overlay passes that policy with overlap 0.999985
and relative L^2 mismatch 0.00721 against the frozen GX raw mode.
tools/generate_w7x_reference_overlay.py applies the same raw-mode policy to
the imported W7-X linear benchmark at k_y rho_i = 0.3. It refreshes the
frozen finite GX raw-mode bundle when a matching .big.nc file is supplied
and writes docs/_static/w7x_eigenfunction_reference_overlay_ky0p3000.png
plus JSON/CSV companions. The current artifact passes with overlap
0.9999999994 and relative L^2 mismatch 3.33e-5.
tools/compare_gx_nonlinear_diagnostics.py --summary-json now emits a
matching gate report for nonlinear diagnostic comparison figures, using the
window mean relative mismatch as the scalar acceptance metric. The summary
writer now accepts case/source labels, explicit tmin/tmax windows, and
writes strict JSON, replacing nonfinite absolute-gate relative errors with
null. The tracked release-window summaries cover Cyclone, Cyclone Miller,
KBM, HSX, and W7-X. The older short Cyclone diagnostic remains available as an
exploratory startup/resolved-spectrum audit, but it is not counted in the
release-gate index.
Observed-order and branch-continuity gate helpers are also available so
velocity-space convergence panels and branch-followed scan tables can use the
same JSON-ready acceptance convention.
tools/generate_observed_order_gate.py is the generic no-rerun path for
CSV-backed convergence studies: it reads either an explicit step column or a
resolution column, writes an observed-order JSON gate report, and can generate
a log-log convergence figure. The tracked Cyclone velocity-space convergence
artifact lives at docs/_static/cyclone_resolution_observed_order.json and
docs/_static/cyclone_resolution_observed_order.png. It uses an office/GPU
ky=0.30 time-path sweep through (Nl,Nm)=(4,8),(6,12),(12,24),(16,32)
with tmax=150 and passes the strict pairwise-order and final-error gates.
tools/compare_gx_kbm.py --branch-summary-json wires that convention into
the KBM branch-following workflow by summarizing adjacent gamma/omega
jumps and successive eigenfunction-overlap continuity for the selected branch.
tools/generate_kbm_branch_gate_summary.py provides the corresponding
no-rerun artifact path: it reads the existing selected KBM candidate table and
writes docs/_static/kbm_branch_gate_summary.json with the same strict gate
schema. The current continuity-first selected branch passes the adjacent
growth/frequency jump and successive-overlap gates.
tools/make_validation_gate_index.py scans tracked JSON metadata and writes
docs/_static/validation_gate_index.json, .csv, and .png so the docs
always have one compact pass/open view of the currently materialized release
validation gates. The current JSON index has 14/14 tracked reports passing.
Exploratory diagnostics can set gate_index_include=false
to remain documented without being treated as release blockers.
tools/plot_nonlinear_window_statistics.py provides the companion
manuscript-facing statistics panel for the nonlinear GX comparison gates by
plotting the per-diagnostic mean_rel_abs and max_rel_abs values from
those same tracked JSON summaries.
tools/plot_nonlinear_feasibility_pilot.py is the analogous tool for new
finite nonlinear pilots that do not yet have a reference comparison or
production-resolution convergence gate. It writes PNG/PDF/JSON/CSV artifacts
with explicit claim_level and promotion_gate.passed = false metadata,
so exploratory external-VMEC runs can be documented without being promoted to
transport validation claims.
tools/plot_external_vmec_nonlinear_convergence_gate.py is the promotion
gate for those pilots once at least two grid levels exist. It replays the
pilot JSON/CSV traces, compares common and least-trending late windows,
requires enough samples, bounds relative heat-flux trend and coefficient of
variation, and finally checks pairwise grid-refined heat-flux agreement. The
tracked CTH-like external-VMEC artifact intentionally fails this gate and sets
gate_index_include=false because it is a research-planning negative result,
not a release-blocking validation gate.
tools/write_external_vmec_holdout_configs.py is the reproducibility
companion for that lane. It writes the fixed-step nonlinear TOMLs and restart
copy commands for the standard two-grid external-VMEC holdout ladder, e.g.
t = 150 initial runs followed by t = 250 restart continuations at
48x48x32 and 64x64x40. The script does not promote any data by itself;
the resulting traces must still pass the convergence gate above before they can
enter quasilinear calibration reports or optimization studies. For the
production nonlinear optimization evidence lane the same generator also accepts
--seed-variant and --dt-variant entries. Those options write explicit
[metadata] blocks and variant-specific filenames so seed and timestep
replicate windows can be launched on the office GPUs, extracted with the same
transport-window protocol, and checked by
tools/check_nonlinear_window_ensemble_readiness.py before any
absolute-flux or turbulent-flux optimization wording can be considered.
For external-VMEC replicate campaigns,
tools/build_external_vmec_replicate_ensemble.py is the reproducible
NetCDF-to-evidence wrapper: it extracts heat-flux traces from finished
*.out.nc files, writes the transport-window summaries and convergence
reports, runs the readiness and ensemble gates, and produces the documentation
figure used by the manuscript ledger.
Before those files enter the ensemble builder, run
tools/check_nonlinear_runtime_outputs.py on every produced *.out.nc.
That gate verifies the grouped NetCDF contains Grids/time and the requested
heat-flux diagnostic, checks finite monotone time samples, enforces optional
tmin/tmax coverage, and fails closed for restart-only or metadata-only
artifacts. It is the first campaign-level smoke check after a long office GPU
batch exits with rc=0.
tools/check_production_nonlinear_optimization_guard.py then consumes those
replicated long-window ensembles together with the reduced optimization and
startup finite-difference artifacts. It is the fail-closed check that allows
release-safe scoped wording while blocking production nonlinear turbulent-flux
optimization promotion until optimized equilibria have replicated
post-transient transport-window audits.
For actual nonlinear turbulence-gradient promotion, use
tools/write_vmec_boundary_perturbation_inputs.py when the perturbation is a
VMEC boundary coefficient. It writes the matched input.* files and records
the exact vmec_jax commands needed to create the three real re-equilibrated
wout files. Then use
tools/write_nonlinear_turbulence_gradient_campaign.py to write the matched
baseline/plus/minus VMEC launch ladders and replay commands. The campaign
writer rejects missing files, duplicate resolved paths, and byte-identical VMEC
contents unless --allow-identical-vmec-content is explicitly used for a
plumbing-only smoke test; production evidence therefore requires real
wout files. The generated TOMLs are restart-ladder segments: a final
t=900 config only advances the last segment unless the earlier restart
artifacts have been seeded. The manifest therefore records
direct_full_horizon_launch_commands for one-shot final-horizon campaigns
and an output_gate_command that must pass before ensemble evidence is built.
For the direct one-shot route, launch the recorded commands with
tools/run_nonlinear_gradient_direct_campaign.py instead of an ad-hoc shell
loop. The launcher reads the manifest, assigns one worker per listed GPU, writes
per-task logs and a status JSON, supports --skip-existing for safe restarts,
and keeps the command provenance identical to the manifest.
Then use
tools/build_nonlinear_turbulence_gradient_fd_gate.py after the matched
baseline/plus_delta/minus_delta ensembles finish. The builder writes
the central finite-difference gradient sidecar and checks response resolution,
forward/backward asymmetry, subtraction conditioning, propagated uncertainty,
and the uncertainty gates on all three replicated nonlinear windows.
The tracked optimized-QA/ESS ZBS(1,0) example is deliberately kept as a
fail-closed regression: the real vmec_jax re-equilibrated t=[450,900]
baseline/plus/minus ensembles pass their replicated transport-window gates and
the initial three-replicate central finite difference is local, but
gradient_uncertainty_rel = 0.655 and therefore does not promote a
turbulence-gradient claim. A seed-5 follow-up for the same ZBS(1,0)
bracket also remains blocked: the response fraction weakens to about 0.037,
gradient_uncertainty_rel rises to about 1.18, and fd_asymmetry_rel
is about 0.520. The companion RBC(1,1) and ZBS(1,1) controls fail
the locality/asymmetry gates. The central-FD artifact now includes
diagnostic-only paired-replicate rows when matching seed or timestep labels are
available; these rows are useful for identifying sign reversals or weak
responses, but they do not relax the production gates. A future passing
artifact must satisfy both uncertainty and locality thresholds without
weakening either threshold.
For future perturbation refreshes, keep each coefficient/amplitude in a
distinct artifact slug such as
docs/_static/qa_ess_zbs10_rel5_nonlinear_gradient_zbs_1_0_central_fd_gradient_gate.*.
Do not promote new prose until
tools/check_nonlinear_turbulence_gradient_evidence.py reports
passed = true and the JSON sidecar sets
nonlinear_turbulence_gradient_gate = true. Until then, describe the result
as a bounded production-candidate finite-difference audit, not as a nonlinear
turbulence-gradient claim.
The current QA/ESS composite profile-direction follow-up demonstrates this
policy. The targeted plus_delta cross variants seed22_dt0p05,
seed32_dt0p04, and seed33_dt0p05 completed and all six plus-state
outputs passed the runtime-output gate. The extended plus ensemble still fails
the spread gate with mean_rel_spread = 0.166 against the 0.15 limit,
and the central finite-difference artifact remains blocked by
fd_asymmetry_rel = 2.84 and gradient_uncertainty_rel = 1.22. That
artifact is tracked as
docs/_static/qa_ess_descent_profile_rel2_nonlinear_gradient_plus_delta_followup_central_fd_gradient_gate.json.
It is a regression target for the fail-closed workflow and a design input for
the next campaign, not promotion evidence.
tools/rank_nonlinear_turbulence_gradient_candidates.py is the companion
planning utility for failed candidates. It ranks completed central-FD artifacts
by response, locality, conditioning, and propagated uncertainty margins, writes
a fail-closed JSON summary, and recommends whether the next campaign should add
replicas, shrink a bracket, or move to an overdetermined
least-squares/profile-gradient design. The current tracked ranking artifact is
docs/_static/nonlinear_turbulence_gradient_candidate_ranking.json and is
not itself promotion evidence.
tools/summarize_nonlinear_gradient_bracket_sweep.py is the next
same-control locality utility. It consumes one or more central-FD JSON
artifacts for the same control at different perturbation amplitudes, writes
JSON/CSV/PNG sidecars plus an optional PDF, and decides whether to promote an already passing
bracket, shrink/enlarge the amplitude, add statistical power, or abandon the
single-control direction. It also reads the diagnostic-only paired-replicate
rows when present. If those same-seed rows show sign reversals or large paired
uncertainty, the utility explicitly recommends not spending more GPU time on
more replicas at that same bracket. It also fails the campaign-planning
recommendation toward a new locality sweep or smoother composite control when
resolved central finite differences change sign across nearby amplitudes. The
tracked RBC(1,1) 5%/8% result,
docs/_static/qa_ess_rbc11_bracket_sweep.json, is a same-control negative
audit: response is resolved at both amplitudes, but finite-difference
asymmetry grows with amplitude, so the correct next action is a smaller
locality sweep or an overdetermined profile-gradient control.
tools/write_overdetermined_nonlinear_gradient_campaign.py implements that
next launch-contract step. It writes multiple matched boundary-control VMEC
perturbation manifests from one baseline input, records the per-control
nonlinear campaign commands, and writes the final candidate-ranking command.
The tracked QA/ESS profile-gradient launch plan is
docs/_static/qa_ess_overdetermined_nonlinear_gradient_campaign_plan.json.
Use tools/check_overdetermined_nonlinear_gradient_campaign.py to turn that
multi-control launch plan into a machine-readable status artifact and
tools/run_overdetermined_nonlinear_gradient_campaign.py to run all nested
long-window tasks through one shared CPU/GPU worker queue. The checker must
remain fail-closed until the VMEC states, nonlinear runtime outputs, ensemble
gates, central finite-difference gates, and candidate ranking all exist and
pass. Runtime outputs are only counted complete when their recorded
Grids/time coverage reaches the campaign analysis-window endpoint, so
in-progress NetCDF files cannot accidentally promote a result.
After the long runtime queue completes,
tools/postprocess_overdetermined_nonlinear_gradient_campaign.py runs the
per-control output gates, ensemble gates, central finite-difference gates,
candidate ranking, and final fail-closed status check in one reproducible
sequence.
The completed QA/ESS overdetermined campaign is intentionally tracked as a
negative gate result: all 27 full-horizon nonlinear outputs pass the runtime
coverage checks, but no control passes every production central-FD gate. The
best candidate is RBC(1,1) with resolved response and bounded locality, but
gradient_uncertainty_rel = 0.559 remains above the 0.5 promotion gate.
The status artifact
docs/_static/qa_ess_overdetermined_nonlinear_gradient_campaign_status.json
therefore reports complete runtime coverage and zero promoted controls. This is
a regression target for the fail-closed workflow and a design input for future
variance-reduction or smaller-bracket campaigns, not a nonlinear turbulence
gradient validation claim.
tools/write_vmec_boundary_profile_perturbation_inputs.py is the companion
for a single smoother composite direction. It perturbs several VMEC boundary
coefficients together, normalizes the finite-difference scalar by the Euclidean
norm of the coefficient-change vector, and writes the same
baseline/plus/minus VMEC launch contract. The tracked
docs/_static/qa_ess_descent_profile_direction_rel2_manifest.json uses the
current QA/ESS long-window evidence signs to define a 2% descent-oriented
ZBS(1,1), ZBS(1,0), RBC(1,1) direction. This is still a launch
artifact; promotion requires the resulting re-equilibrated VMEC files and
long-window nonlinear FD gate.
After a detached office campaign finishes, run
tools/run_nonlinear_gradient_manifest_postprocess.py on the generated
gradient_campaign_manifest.json rather than replaying individual commands
by hand. With --require-outputs it fails before post-processing if any
expected *.out.nc file is missing; otherwise it runs the output gates,
baseline/plus/minus replicated ensemble builders, the central-FD gate, and the
final nonlinear-gradient evidence check in dependency order. Use
--allow-blocked only when collecting a failure artifact for diagnosis; a
promotion run should keep the default fail-closed behavior.
If that central-FD gate is blocked by a replicated state, run
tools/summarize_nonlinear_replicate_spread.py on the baseline, plus, and
minus ensemble JSON files before launching more nonlinear simulations. The
tool enriches the ensemble rows with seed/timestep labels and convergence
statistics, writes JSON/CSV/PNG sidecars, and classifies whether the failed
state is seed-limited, timestep-limited, mixed seed/timestep spread, or missing
metadata. The current QA/ESS composite profile-direction diagnostic is
docs/_static/qa_ess_descent_profile_rel2_replicate_spread_diagnostic.json:
the plus state is a mixed seed/timestep failure, so the next GPU campaign must
disambiguate timestep sensitivity or shrink the bracket rather than adding
blind replicas.
tools/write_nonlinear_replicate_followup_campaign.py turns that diagnostic
back into a minimal run list. It reads the original
gradient_campaign_manifest.json and the spread diagnostic, infers the seed
and timestep metadata from the already-generated TOMLs, and writes only the
cross variants needed to disambiguate the failed state. For the current QA/ESS
profile-direction audit, the tracked launch artifact is
docs/_static/qa_ess_descent_profile_rel2_plus_delta_replicate_followup_plan.json;
it selects seed22_dt0p05, seed32_dt0p04, and seed33_dt0p05 for the
plus_delta state. After those three GPU runs finish, rebuild the plus
ensemble with the added outputs, rerun
tools/summarize_nonlinear_replicate_spread.py, and only then rerun the
central-FD/evidence gates.
tools/write_optimized_equilibrium_transport_configs.py is the production
optimization companion for that final audit. Given a concrete post-optimization
wout*.nc file, it writes the t=250,350,450,700 fixed-step nonlinear
ladder on the release n64 grid, two seed replicates, one timestep
replicate, restart-copy commands, and the exact
tools/build_external_vmec_replicate_ensemble.py plus
tools/check_production_nonlinear_optimization_guard.py commands needed
after the runs finish. This wrapper is a launch contract only: the production
optimization claim remains blocked until the generated t=[350,700] ensemble
actually passes finite-flux, running-window, block/SEM, replicate-spread, and
optimized-equilibrium marker gates.
tools/prepare_external_vmec_holdout_from_screen.py is the selector that
feeds that generator. It reads the tracked linear candidate screen, skips
excluded or already-audited cases, resolves the chosen VMEC file from the local
vmec_jax checkout, and writes the next bounded holdout ladder plus a JSON
selection summary. This removes another manual step from the external-VMEC
nonlinear campaign and makes office reruns deterministic.
tools/build_external_vmec_holdout_runbook.py is stricter than a positive
growth-rate sorter. It requires a configurable minimum screened growth rate
(gamma >= 0.02 by default) before writing nonlinear launch commands. This
keeps near-marginal branches in the manuscript evidence chain as linear/QI
feasibility data without silently promoting them to expensive nonlinear
transport holdout campaigns.
tools/build_qi_branch_refinement_gate.py is the focused companion for that
near-marginal QI evidence. It checks finite low-k_y branch rows, contiguous
positive support, optional Krylov consistency, and the same nonlinear-launch
growth threshold. A failed launch-growth subgate is a useful documented result,
not a release failure, because it prevents QI feasibility scans from being
misread as transport validation.
tools/write_w7x_zonal_closure_sweep.py is the analogous reproducibility
companion for the open W7-X zonal-response lane. It writes a manifest of
single-k_x closure probes for the paper-facing test-4 contract, separated
by operator family: baseline, constant-Hermite, |k_z|-weighted Hermite,
mixed Laguerre-Hermite, Laguerre-only, and isotropic hypercollision variants.
The manifest includes the exact
tools/generate_w7x_zonal_response_panel.py launch commands plus the
companion tools/plot_w7x_zonal_closure_ladder.py command needed to refresh
the bounded closure audit after the remote runs complete. Each launch command
writes a case-local panel.png and the final ladder command writes
w7x_zonal_closure_ladder_full.{png,json,csv}, preventing exploratory
office runs from overwriting the frozen documentation figure before the
candidate passes the residual, late-envelope, and moment-tail screens.
tools/check_quasilinear_calibration_inputs.py is the corresponding
calibration-admission guard. It scans quasilinear train/holdout reports and
requires every non-audit nonlinear artifact to match a passed nonlinear gate.
This makes validation provenance executable: finite-but-unconverged pilots can
be documented in the docs, but they cannot silently become calibration or
optimization data. The public CI runs this audit during the docs/packaging
job, and the fast test suite checks the current tracked train/holdout reports
against the same gate index.
tools/check_quasilinear_promotion_guardrails.py is the higher-level
absolute-flux promotion guard. It scans the tracked quasilinear reports plus
the claim-scope docs, fails if a promoted report lacks train/holdout points,
finite nonlinear window statistics, a passed holdout gate, or calibration
policy metadata, and writes
docs/_static/quasilinear_promotion_guardrails.json with a normal
gate_report for the validation index. This is not a runtime/TOML
absolute-flux predictor; it is a fast metadata and wording guard that prevents
overclaiming current diagnostics.
The model-development figure scripts for saturation-rule sweeps,
shape-aware saturation, and uncertainty-aware candidate scoring also validate
their nonlinear summary inputs by default and serialize an input_validation
block into the tracked JSON artifacts.
The diagnostics stream now also carries Diagnostics/Phi_zonal_mode_kxt, a
signed complex zonal-potential history reduced over z with the same volume
weights used elsewhere. That is the primitive to use for manuscript-grade
Rosenbluth-Hinton / GAM work. Diagnostics/Phi2_zonal_t remains useful as a
zonal-energy proxy for intermediate checks, but it is no longer the target
observable for the final paper lane.
The first case-specific shaped-Miller pilot for this lane is now reproducible
through examples/benchmarks/runtime_miller_zonal_response.toml and
tools/generate_miller_zonal_response_pilot.py. Its frozen artifact lives in
docs/_static/miller_zonal_response_pilot.png. The current frozen artifact
is pinned to Merlo et al. Case III: adiabatic electrons, zero gradients,
k_xρ_i≈0.05, k_y=0, and an initial ion-density perturbation. It uses
Nz=32, Nl=4, Nm=24, dt=0.005, and runs to t≈60 through the
same checkpoint-capable artifact writer used by long nonlinear runs. Using the
Rosenbluth-Hinton convention phi(t -> infinity) / phi(0) gives a residual
of about 0.192 against the Merlo Case-III figure read-off of about
0.19. The shipped extraction now follows the paper convention more
closely: positive and negative extrema of the signed residual-subtracted trace
are fit separately over a common pre-recurrence window, and the GAM frequency
is extracted from the instantaneous phase of that same window via a Hilbert
analytic signal. With the current t≈30 pre-recurrence window the artifact
gives ω_GAM R0 / v_i≈2.20 and γ_GAM R0 / v_i≈-0.176, both close to
the Merlo figure read-off. The explicit remaining follow-up item is the
long-time recurrence visible in finite moment runs, rather than the
benchmark-scale residual/frequency/damping gate itself.
An additional recurrence audit now brackets the numerical trade-off more
explicitly: increasing the resolution to Nm=28 and Nl=4 lowers the
late-time recurrence ratio from about 0.60 to about 0.54 and brings
ω_GAM R0 / v_i nearly onto the Merlo read-off, but it also pushes the
damping to roughly γ_GAM R0 / v_i≈-0.192, which is more damped than the
paper-scale target near -0.17. A minimal hypercollisions_const ladder
through 10^{-4} is effectively inert for this case, while 10^{-3}
only lowers the recurrence ratio to roughly 0.589 and still does not beat
the clean higher-moment run. The shipped artifact therefore remains on the
Nm=24, Nl=4 baseline until the long-time recurrence can be reduced
without moving the benchmark-scale damping gate.
The next literature lane now has a dedicated runtime contract as well:
examples/benchmarks/runtime_w7x_zonal_response_vmec.toml and
tools/generate_w7x_zonal_response_panel.py define the W7-X high-mirror
bean-tube zonal-flow relaxation benchmark from the stella/GENE paper. The
tool sweeps k_x rho_i over [0.05, 0.07, 0.10, 0.30]. The runtime
contract seeds the published electrostatic-potential perturbation with
init_field = "phi" and a Gaussian profile, while the panel extracts the
unweighted signed line-average diagnostic Phi_zonal_line_kxt. The paper
text states that the line-average trace is normalized to its value at t=0;
the caption also mentions the maximum value, but the source figure is clipped
at the initial point. The paper-facing default is therefore
--initial-normalization=line_first and --time-scale=1. The init_amp
normalization and non-unit time-scale options are retained as explicit audits,
not as the validation contract. The default early-time fit-window cap is an
explicit analysis policy chosen to isolate the initial GAM before the slower
stellarator-specific oscillation. The generator forces a periodic radial box
for this k_y=0 zonal response so the selected k_x rho_i values match
the published test-4 targets exactly; this avoids the linked-boundary
aspect-ratio override that is appropriate for drift-wave flux-tube runs but
wrong for this radial zonal scan.
The current frozen VMEC-backed artifact lives at
docs/_static/w7x_zonal_response_panel.png with strict JSON metadata at
docs/_static/w7x_zonal_response_panel.json. The tracked combined trace CSV
docs/_static/w7x_zonal_response_panel.traces.csv is written next to the
figure so comparison and audit scripts can be rerun without office-only
per-k_x directories. It is a long-window run: k_x rho_i=0.05 reaches
t≈3460 and the other three wavelengths reach t≈1980. After the
paper-faithful line-first normalization, the late residuals are about
0.0189, 0.137, 0.0938, and 0.526 for k_x rho_i = 0.05,
0.07, 0.10, and 0.30.
tools/digitize_w7x_zonal_reference.py now extracts the stella/GENE Fig. 11
main traces and inset residual levels from the arXiv source figs/ZF.pdf.
The resulting reference artifacts are
docs/_static/w7x_zonal_reference_digitized.csv,
docs/_static/w7x_zonal_reference_digitized_residuals.csv,
docs/_static/w7x_zonal_reference_digitized.json, and
docs/_static/w7x_zonal_reference_digitized.png. The comparison contract is
implemented in tools/compare_w7x_zonal_reference.py and materialized at
docs/_static/w7x_zonal_reference_compare.png with JSON metadata in
docs/_static/w7x_zonal_reference_compare.json. The current long-window
artifact passes the time-coverage gate for all four wavelengths, but the
residual gate only passes at k_x rho_i=0.05 and the late-envelope gate
fails by orders of magnitude. A previous init_amp-normalized audit happened
to pass residual values for all four wavelengths, but that comparison is no
longer treated as a validation result because it does not follow the paper text
normalization. A later gaussian_width=4 probe matched the clipped apparent
initial level of Fig. 11 better than the tracked width-1 profile, but the
source figure shows that the apparent 0.8 start is a plot-limit artifact,
not a reliable normalization target. The tracked TOML therefore keeps
gaussian_width=1, matching the source expression exp[-(z-z0)^2].
The runtime path now has three safeguards for this lane. First, strided nonlinear
diagnostics always retain the final step, so long traces do not silently stop
one stride before the intended horizon. Second, checkpointed artifact
generation validates each chunk for non-finite diagnostics, state, and fields
before writing or continuing. This makes high-moment W7-X recurrence sweeps
fail fast instead of running thousands of extra steps after a NaN. Third,
default VMEC/eik cache outputs are reused when valid and generated through a
unique temporary netCDF followed by atomic replacement, so parallel W7-X
validation sweeps cannot observe or corrupt a partially written geometry file.
A bounded
k_x rho_i=0.07, Nl=16, Nm=64, dt=0.05 probe remained finite to
t≈200 and a post-fix t≈50 rerun verified nonzero signed line-average
diagnostics through the retained final sample. A separate external-restart
artifact bug was then isolated to double-condensing already-active kx/ky
diagnostic axes when appending loaded history. The writer now accepts either
full spectral axes or already-active GX output axes, and a W7-X VMEC external
resume smoke verified nonzero Phi_zonal_line_kxt and
Phi_zonal_mode_kxt throughout the appended tail. A higher-moment follow-up
with Nl=16, Nm=64, dt=0.05 then restart-continued the
k_x rho_i=0.07 trace to t≈100 with finite diagnostics and nonzero
signed line/mode samples across the post-restart tail. A full four-wavelength
refresh at the same moment resolution also reached t≈100 with finite,
nonzero signed traces for every target k_x rho_i. A width-4 full-window
low-moment audit reached the digitized windows but flipped the residual sign at
k_x rho_i=0.07, 0.10, and 0.30. The remaining open item is
therefore not restart diagnostic continuity; it is the W7-X zonal damping,
closure, and velocity-space recurrence behavior under the paper-facing
line-first normalization.
tools/plot_w7x_zonal_contract_audit.py turns the same tracked CSV/JSON
artifacts into docs/_static/w7x_zonal_contract_audit.png. That panel is a
publication-facing diagnostic of the open mismatch rather than a release gate;
its JSON metadata has gate_index_include=false so the validation index does
not count it as closed.
tools/plot_w7x_zonal_moment_tail_audit.py adds a no-rerun velocity-space
audit at docs/_static/w7x_zonal_moment_tail_audit.png. It shows that the
long Nl=8, Nm=32 traces have large late normalized-trace standard
deviations and non-negligible final high-Hermite/high-Laguerre free-energy
fractions. The existing Nl=16, Nm=64, t≈100 audit lowers the early
trace standard deviation but already carries a large high-Hermite tail, so the
next closure experiment should be a bounded moment/closure or recurrence
control sweep, not a change to the paper normalization.
tools/plot_w7x_zonal_closure_ladder.py makes that bounded sweep explicit
for k_x rho_i=0.07 in
docs/_static/w7x_zonal_closure_ladder_kx070.png. The ladder separates
closure families one knob at a time under the paper-facing initializer and
line-average observable. The refreshed office-GPU ladder covers baseline,
constant Hermite, k_z-weighted Hermite, mixed Laguerre-Hermite,
Laguerre-only, and isotropic hypercollision variants at 0.01 and
0.03. The best early-window trace error is the isotropic nu_hyper=0.01
case with mean absolute error 0.2755 versus baseline 0.2861, but its
late-window standard-deviation ratio is 4.25 versus baseline 4.10 and
therefore worsens the recurrence/envelope metric. Laguerre-only and mixed
Laguerre-Hermite closures show the same pattern: strong tail suppression with
no simultaneous improvement of trace error and late envelope. The ladder is
therefore a documented negative result for these bounded closure families, not
a hidden validation setting.
tools/plot_w7x_zonal_state_convention_audit.py closes the state-level
initializer and observable convention layer for the same paper-facing setup.
At k_x rho_i=0.07, Nl=16, and Nm=64, the recovered Gaussian
potential has relative L2 error 1.85e-6, off-target spectral potential
content is zero to the reported precision, and the signed line-average and
volume-average helper diagnostics agree with manual reductions to about
2e-16. The line-first initial level is 0.28209 init_amp while the
volume-weighted level is 0.28450 init_amp; that explicit difference is why
the paper-facing observable must remain Phi_zonal_line_kxt normalized by
its first nonzero sample.
tools/plot_w7x_zonal_recurrence_sweep.py then performs the bounded
recurrence sweep requested for the paper lane without changing initializer or
normalization conventions. Moment resolution and closure source are varied
separately at k_x rho_i=0.07 over the common t v_t/a <= 100 window.
The no-closure rows give mean absolute reference errors 0.295 for
Nl=8,Nm=32, 0.276 for Nl=12,Nm=48, and 0.283 for
Nl=16,Nm=64. At fixed Nl=16,Nm=64, constant-source closure suppresses
the final Hermite-tail fraction from 0.388 to 0.062 but worsens the
trace mean absolute error to 0.291; the k_z-weighted closure remains
close to no closure. This separates the remaining recurrence/closure problem
from a state-convention error.
The newest constant-hypercollision follow-up keeps the paper-facing
normalization and compares nu_hyper_m=0.01 and 0.03 at
Nl=16,Nm=64 to t v_t/a=100. Increasing nu_hyper_m lowers the final
Hermite-tail fraction from 0.220 to 0.099 and lowers the free-energy
ratio from 0.759 to 0.600, but the mean trace error remains
0.289 and the late-window standard deviation remains more than four times
the digitized reference. The W7-X zonal lane therefore remains a physical
closure/recurrence problem, not a normalization problem and not a simple
constant-damping fix.
The mixed Laguerre-Hermite closure audit then tests the best bounded closure
candidate under a moment-resolution increase. At Nl=16,Nm=64 and
dt=0.05, the mixed closure gives mean absolute trace error 0.2753 and
late-window standard-deviation ratio 4.24. Raising the resolution to
Nl=24,Nm=96 requires dt=0.025 for a finite run; it lowers the
late-window standard-deviation ratio slightly to 4.11 and further reduces
the Hermite/Laguerre tail fractions, but the trace error remains 0.2768.
The more aggressive Nl=32,Nm=128 run still becomes non-finite by
t v_t/a≈10 even at dt=0.025. This separates a real high-moment
time-step limitation from the larger physical result: the current mixed
closure does not converge toward the digitized W7-X trace in a way that can be
promoted as validation.
tools/generate_w7x_zonal_response_panel.py now exposes explicit
--nu-hyper, --nu-hyper-l, --nu-hyper-m, --nu-hyper-lm,
--p-hyper-*, --hypercollisions-const, --hypercollisions-kz,
--enable-hypercollisions, and --gaussian-width overrides so future
closure probes can be launched from the tracked benchmark tool rather than from
unrecorded local TOML edits. Non-unit Gaussian widths remain initializer
audits, not validation defaults.
W7-X high-mirror bean-tube zonal-flow response for the stella/GENE test-4
target k_x rho_i values. The response is normalized to the first
nonzero line-average sample, following the paper text. The red dashed line
is the late-window residual estimate and the shaded band is the common
initial-GAM extraction window.
Digitized stella/GENE reference traces from the W7-X benchmark paper’s Fig. 11. The horizontal lines are residual levels read from the figure insets and are the reference targets for the next long-window SPECTRAX zonal-response gate.
Current W7-X zonal comparison gate. Time coverage passes for all four wavelengths, but the paper-normalized residuals and late-window envelopes remain open validation issues.
Publication-facing audit of the open W7-X test-4 zonal-response lane. The top row separates residual and late-envelope discrepancies; the bottom row overlays representative paper-normalized traces against the digitized stella/GENE mean. This figure is intended to localize the remaining velocity-space recurrence / closure problem, not to claim validation closure.
Velocity-space tail audit for existing W7-X test-4 outputs. The long
Nl=8, Nm=32 traces have large late normalized-trace variance and
visible Hermite/Laguerre tail content. The short Nl=16, Nm=64 run
reduces the early trace envelope but does not by itself close the
long-window recurrence question.
Bounded closure ladder for k_x rho_i=0.07. Constant Hermite,
k_z-weighted Hermite, mixed Laguerre-Hermite, Laguerre-only, and
isotropic hypercollision families are compared with the no-closure baseline.
Some variants reduce mean trace error or velocity-space tails, but none
improves the trace and late-envelope recurrence metrics together.
State-level W7-X test-4 convention audit. The runtime path recovers the paper Gaussian potential initializer, selects only the requested zonal spectral mode, and verifies that the signed line-average and volume-weighted zonal observables are intentionally distinct but internally consistent.
Bounded W7-X test-4 recurrence sweep at k_x rho_i=0.07. The left trace
panel varies moment resolution with no closure; the right trace panel varies
closure source at fixed high resolution. The bottom panels show that tail
suppression alone does not yet close the literature-trace mismatch.
Constant-Hermite-hypercollision follow-up for k_x rho_i=0.07. Stronger
constant damping reduces Hermite-tail and free-energy metrics but does not
reduce the long-window trace error or recurrence envelope enough to match
the digitized stella/GENE reference. This is a documented negative result
that motivates a more physical closure/operator study.
Mixed Laguerre-Hermite closure resolution audit for k_x rho_i=0.07. The
Nl=24,Nm=96 run is finite only with the smaller dt=0.025 and lowers
the late-window variability modestly, but it does not improve the trace
error relative to Nl=16,Nm=64. The omitted Nl=32,Nm=128 point is a
tracked non-finite result under the same closure family, so this remains an
open physics/numerics lane rather than a closed W7-X zonal validation.
Diffrax and nonlinear smoke tests
Diffrax integration and the nonlinear driver are exercised with fast smoke tests:
tests/test_diffrax_integrators.pyruns explicit and IMEX diffrax solvers on tiny grids.tests/test_diffrax_integrators_core.pyhardens branch coverage for diffrax helper paths (solver selection, save modes, streaming fits, IMEX branches, parallelization, and validation errors).tests/test_linear_krylov_core.pyhardens matrix-free Krylov internals (mode-family targeting, shift-invert preconditioner selection, fallback policy, and dominant eigenpair wrappers).tests/test_example_smoke.pyverifies the config-driven runner (diffrax enabled) and a short nonlinear scan through the assembled E×B nonlinear bracket.tests/test_nonlinear_exb.pyexercises the nonlinear bracket sign, real-FFT path, flutter coupling, scalar/precomputed gyroaverage paths, and EM component accounting. The targeted nonlinear-term tranche covers the pseudo-spectral bracket and electromagnetic decomposition branches without launching benchmark-size turbulence runs.tests/test_nonlinear_helpers_extra.pylocks the higher-level nonlinear diagnostic contracts: Hermitian real-FFT projection, signed-mode masks, explicit Runge-Kutta variants, fixed-mode frequency extraction, collision splitting, and IMEX nonlinear terms.tests/test_runtime_config.pyandtests/test_runtime_runner.pyverify unified runtime TOML loading and case-agnostic linear runs (Cyclone/ETG/KBM) through the same solver path.tests/test_runtime_config.pyalso locks the public nonlinear stellarator runtime contract, including the absence of adaptive-step truncation caps and the presence of defaulttools_out/...artifact paths for W7-X and HSX.
Parallelization identity gates
Independent scan and ensemble parallelization is tested before it is used for performance claims:
tests/test_parallel.pylocks thebatch_map/ky_scan_batcheshelper semantics, including deterministic padding, one-device fallback, and pytree outputs used by UQ and sensitivity workflows.tests/test_velocity_sharding.pylocks the GX-inspired species/Hermite velocity-decomposition planner. These tests verify load balance metadata, Hermite ghost-exchange flags, and field-reduction axes before any productionshard_mapimplementation can use that layout. The same test file also covers the full-array Hermite-neighbor reference and one-device fallback for the communication kernel.tests/test_sharded_integrators.pylocks the sharded linear RK2 wrapper in both no-sharding and explicit-sharding modes using a mocked RHS and mockedpjit. It also locks the fixed-step nonlinear state-sharded wrapper, including final-state-only profiling mode and the config-runner route throughTimeConfig.state_sharding. These are numerical-identity and control-flow gates, not speedup claims.tests/test_nonlinear_domain_parallel.pyandtests/test_nonlinear_spectral_communication_gate.pylock the diagnostic nonlinear decomposition gates. The first covers one-cell halo chunks for a bounded local stencil. The second covers split/reassemble spectral layout identity for FFT round trip, pseudo-spectral bracket, and field-solve layout. Both fail closed and carry no production routing or speedup claim.tests/test_generate_parallel_ky_scan_gate.pytests the artifact writer for the real Cyclonek_y-batch gate.tests/test_parallel_artifact_contracts.pylocks the tracked large-run scaling artifacts themselves. It requires the performance and validation manifests to list the CPU/GPU split artifacts, verifies serial numerical identity for independentk_yand quasilinear/UQ rows, checks that nonlinear whole-state sharding embeds per-device profiler/profile payloads, and fails if docs detach speedup wording from the current artifact set.tools/generate_parallel_ky_scan_gate.pyruns the actual linear solver serially and with fixed-shapek_ybatching, then writesdocs/_static/parallel_ky_scan_gate.{png,pdf,csv,json}. The JSON gate requires numerical identity for growth rate and frequency; the speedup value is reported separately for engineering tracking.tools/generate_logical_cpu_parallel_scan_gate.pyexercisesRuntimeParallelConfigandbatch_mapover logical CPU devices with a structured JAX-native scan output. Its artifactdocs/_static/logical_cpu_parallel_scan_gate.{png,pdf,csv,json}is an API identity gate, not a gyrokinetic physics benchmark.tools/generate_hermite_exchange_gate.pyruns the first actualjax.shard_mapcommunication-kernel gate for nearest-neighbor Hermite ghost exchange and writesdocs/_static/hermite_exchange_gate.{png,pdf,csv,json}. This is a prerequisite for production velocity-space decomposition, but it is not a nonlinear runtime speedup claim.tools/generate_velocity_field_reduce_gate.pyruns the matchingjax.shard_mapfield-reduction gate withlax.psumover the Hermite mesh and writesdocs/_static/velocity_field_reduce_gate.{png,pdf,csv,json}. Its tolerance is a float32 communication/reduction-tree tolerance, not a physics acceptance tolerance.tools/generate_electrostatic_field_reduce_gate.pyapplies that reduction pattern to the production electrostatic quasineutrality density moment and writesdocs/_static/electrostatic_field_reduce_gate.{png,pdf,csv,json}. It is currently scoped to single-species periodic electrostatic cases.tools/generate_hermite_streaming_ladder_gate.pycombines the Hermite exchange with the actualsqrt(m+1)/sqrt(m)streaming-ladder coefficients and writesdocs/_static/hermite_streaming_ladder_gate.{png,pdf,csv,json}. This is the last isolated communication/coefficient gate before a linear streaming microkernel can be wired.tools/generate_electrostatic_drift_gate.pygates the single-species periodic electrostatic mirror and curvature/grad-B drift slices against the production linear RHS. It uses offset-1 and offset-2 Hermite exchanges and writesdocs/_static/electrostatic_drift_gate.{png,pdf,csv,json}.tools/generate_electrostatic_diamagnetic_gate.pygates the single-species periodic electrostatic diamagnetic drive against the production diamagnetic-only linear RHS. It uses the Hermite-sharded electrostatic field reduction plus localm=0andm=2drive masks and writesdocs/_static/electrostatic_diamagnetic_gate.{png,pdf,csv,json}.tools/generate_periodic_streaming_microkernel_gate.pyadds the periodic spectral parallel derivative and compares the shard-map path directly againstspectraxgk.terms.operators.streaming_term. Its artifactdocs/_static/periodic_streaming_microkernel_gate.{png,pdf,csv,json}gates the first opt-in linear streaming microkernel before full RHS wiring.tools/generate_linear_rhs_streaming_gate.pyroutes the same sharded periodic streaming kernel through productionlinear_rhs_cachedwith all non-streaming terms and electromagnetic channels disabled. Its artifactdocs/_static/linear_rhs_streaming_gate.{png,pdf,csv,json}is the first full-call-graph linear-RHS identity gate for velocity-space streaming.tools/generate_linear_rhs_streaming_electrostatic_gate.pyrepeats that gate with anm=0density perturbation and nonzero electrostaticphi. Its artifactdocs/_static/linear_rhs_streaming_electrostatic_gate.{png,pdf,csv,json}gates the field-reduction-to-streaming call graph for the current single-species periodic electrostatic route.tools/generate_linear_rhs_electrostatic_slices_gate.pycompares the composed opt-inbackend="electrostatic_linear_slices"route against seriallinear_rhs_cachedwith streaming, mirror, curvature, grad-B, and diamagnetic drive enabled. Its artifactdocs/_static/linear_rhs_electrostatic_slices_gate.{png,pdf,csv,json}is the current single-species periodic electrostatic linear-RHS identity gate for velocity-space parallelization.tools/profile_linear_rhs_parallel_slices.pytimes that same composed route on a larger bounded CPU workload and writesdocs/_static/linear_rhs_parallel_slices_profile.{png,pdf,csv,json}. The tracked profile is explicitly an engineering artifact, not a publication speedup claim; it uses a Hermite-heavy workload and a float32 reduction-order tolerance so the stricter composed identity gate remains the release correctness check. The office GPU companion artifactdocs/_static/linear_rhs_parallel_slices_profile_gpu.{png,pdf,csv,json}is currently a negative performance baseline: it passes identity but is much slower than the single-GPU serial JIT path.tools/profile_nonlinear_sharding.pyruns a bounded fixed-step nonlinear serial-vs-sharded final-state comparison and writesdocs/_static/nonlinear_sharding_profile.jsonlocally anddocs/_static/nonlinear_sharding_profile_office_gpu.jsonfor the two-GPU office run. The release-gated nonlinear axes areauto/kyandkx;z-axis FFT sharding remains an exploratory domain-decomposition lane and must pass its own identity gate before it can be exposed as a runtime option. This keeps nonlinear state-sharding work profiler-backed while preventing unsupported runtime claims from entering the README.
Nonlinear parity snapshots
Recent GX parity spot checks are tracked outside the automated test suite:
Cyclone nonlinear short replay: the GX cyclone_salpha_short.in replay (dt=0.05, t_max=5, collisions off, diagnostics stride 1) now uses the explicit short-reference runtime contract in
examples/nonlinear/axisymmetric/runtime_cyclone_nonlinear_short.toml. The main short-run drift turned out to be configuration-level: the replay neededp_hyper = 2and no end damping to match the public GX short input. With that contract restored, the tracked comparison improves tomean_rel_abs(Wphi) ~= 2.11e-1andmean_rel_abs(HeatFlux) ~= 2.51e-1. The resolved audit remains indocs/_static/nonlinear_cyclone_short_resolved_audit_t5.{png,csv}, whereWphi_kystis still the dominant residual mismatch.Secondary (`kh01a`): the tracked secondary comparison now uses a dense real GX run (kh01a_shortdense.out.nc, 10 samples in
omega_kxkyt) and the rebuiltsecondary_gx_out_compare.csv. The comparison helper now uses the GX file horizon automatically inout-ncmode, so it no longer mixes a short GX replay with at_max = 100SPECTRAX stage-2 run. On the matched short window, growth rates match tightly (max rel_gamma ~= 1.87e-4) and the non-zonalomegamodes also close tightly (rel_omega ~= 3.23e-4and9.92e-4on thek_y = 0.1sidebands). The only large relativeomegavalues left are the effectively zero- frequencyk_y = 0sidebands, where the absolute mismatch staysO(1e-6).W7-X nonlinear (`t \approx 200`): the refreshed long-window NetCDF-backed comparison now closes at
mean_rel_abs(Phi2) ~= 9.74e-2,mean_rel_abs(Wg) ~= 3.20e-2,mean_rel_abs(Wphi) ~= 3.02e-2,mean_rel_abs(HeatFlux) ~= 4.53e-2.W7-X fluctuation spectrum:
tools/plot_w7x_fluctuation_spectrum_panel.pyreuses the same gated nonlinear NetCDF artifact and writesdocs/_static/w7x_fluctuation_spectrum_panel.{png,pdf,json,csv}. The JSON records the time window, dominant nonzonalk_y, dominant heat-fluxk_y, dominant zonalk_x, andclaim_level. This is a reproducible simulation diagnostic and explicitly not a Doppler-reflectometry transfer- function validation.W7-X/TEM extension status:
tools/build_w7x_tem_extension_status.pyreads the W7-X fluctuation panel plus the current TEM branch audit and writesdocs/_static/w7x_tem_extension_status.{png,pdf,json,csv}. It closes only the simulation-spectrum estimator.tools/build_tem_branch_parity_audit.pywritesdocs/_static/tem_branch_parity_audit.{png,pdf,json,csv}from the tracked TEM mismatch table. TEM linear parity remains open with maximum absolute relative growth-rate mismatch about4.25, maximum absolute relative frequency mismatch about3.3when near-zero reference denominators are excluded, one growth-rate sign mismatch, three frequency sign mismatches, and an inverted frequency-branch rank ordering (Spearman≈ -0.986). Because this reference is a provisional literature digitization rather than a direct case dump, the audit blocks broad TEM claims but is not a standalone tuning target. W7-X multi-alpha, multi-surface, and kinetic-electron nonlinear windows remain unstarted.HSX nonlinear (`t = 50`): the refreshed comparison closes at
mean_rel_abs(Wg) ~= 2.75e-2,mean_rel_abs(Wphi) ~= 3.61e-2,mean_rel_abs(HeatFlux) ~= 2.91e-2.KBM nonlinear (`t = 100`): the refreshed long-window comparison closes at roughly
9.3e-3mean-relative error acrossWg/Wphi/Wapar/HeatFlux/ParticleFlux.
W7-X nonlinear fluctuation-spectrum diagnostic from the gated t≈200
VMEC-backed run. The panel summarizes resolved simulation spectra and is
intentionally scoped below an experimental Doppler-reflectometry comparison.
Executable TEM branch audit. The growth-rate and frequency branches fail
simultaneously, with the frequency branch ordered oppositely to the
digitized reference over the tracked low-k_y interval.
Executable status of the W7-X fluctuation/TEM extension lane. The released simulation-spectrum diagnostic is closed, but TEM linear parity, alpha/surface-resolved W7-X scans, and kinetic-electron nonlinear windows remain open before broad W7-X/TEM validation claims.
Linear physics checks
Before nonlinear validation, we exercise linear physics checks grounded in published benchmarks and trend tests:
ITG/Cyclone base case: reproduce the standard Cyclone base case growth rates and frequencies across a reduced ky scan. [Dimits00] [Lin99]
GX term-by-term audit: use the term-dump tooling to compare SPECTRAX-GK streaming and linear-kernel RHS components against GX for a single Cyclone state (see
tools/dump_rhs_terms.pyandtools/compare_gx_rhs_terms.py).GX nonlinear term audit (KBM/Cyclone): compare nonlinear derivative, bracket, electromagnetic split, and total RHS dumps using
tools/compare_gx_nonlinear_terms.py. The tool supports GX dump folders withnl_apar.bin/nl_bpar.binand can infer shape metadata whenrhs_terms_shape.txtis absent.ETG linear instability: verify that growth rates remain positive across reduced electron-scale gradients and that the real frequency follows the electron diamagnetic direction. [Dorland00] [Jenko00]
KBM beta scan: verify the transition between ITG-like and KBM branches in a fixed-\(k_y\) beta sweep against the tracked benchmark reference and exact-diagnostic audits.
Running tests
pytest
Benchmark reproducibility stack
The public CI and the tracked benchmark atlas are currently validated against a tested numerical stack:
jax>=0.8,<0.9jaxlib>=0.8,<0.9numpy>=2.3,<2.4diffrax>=0.7,<0.8equinox>=0.13,<0.14
This is not a claim that newer releases are unsupported. It is a statement
about benchmark reproducibility. Near-marginal or branch-sensitive lanes such
as TEM, ETG runtime scans, and some imported-linear stellarator cases can move
materially under newer JAX/NumPy combinations even when the code still runs.
When investigating parity regressions, reproduce the issue on the tested stack
first before changing solver logic.
For runtime-example parity reproduction across recent precision-policy changes,
also set JAX_ENABLE_X64=1. Default precision can be faster while still
moving parity-sensitive linear example outputs.
Stress-matrix parity gates
In addition to unit/regression tests, SPECTRAX-GK includes a small set of “stress-matrix” gates meant to catch parity regressions early (before tracked benchmark figures move):
Restart parity:
tests/test_restart_gate.pyverifies that a nonlinear run resumed from a compatible restart reproduces the same final state as a continuous run. This now covers both the raw binary state path and the nonlinear*.restart.ncbundle path, together with append-on-restart history preservation in*.out.nc.CPU/GPU short-window parity (optional):
tests/test_device_parity_gate.pycompares a short nonlinear trajectory norm on CPU vs GPU. Enable explicitly:SPECTRAXGK_DEVICE_PARITY=1 pytest -q tests/test_device_parity_gate.py
VMEC roundtrip determinism (optional):
tests/test_vmec_roundtrip_gate.pyregenerates an*.eik.ncfrom a provided VMEC file twice and asserts the imported geometry arrays are bitwise identical. Enable explicitly:SPECTRAXGK_VMEC_FILE=/path/to/wout.nc pytest -q tests/test_vmec_roundtrip_gate.py
For developer workflows that require local reference benchmark NetCDFs or dump artifacts, use:
tools/run_gx_linear_stress_matrix.py(KAW, Cyclone kinetic electrons, KBM Miller)tools/run_imported_linear_targeted_audit.py(generic per-kytargeted imported-linear wrapper)tools/compare_gx_imported_window.py(exact imported-linear one-window replay against referencediag_statedumps)tools/run_kbm_lowky_extractor_audit.py(direct cached-trajectory KBM low-kyextractor audit)tools/run_exact_state_audit.py(manifest-driven wrapper around the exact-state audit tools)tools/plot_w7x_exact_state_audit.py(no-rerun W7-X exact-state convention audit panel)tools/run_restart_parity_gate.py(manifest-driven nonlinear restart/continuation parity gate)tools/run_device_parity_gate.py(manifest-driven CPU/GPU short-window parity gate)tools/run_vmec_roundtrip_gate.py(manifest-driven VMECvmec -> eik.ncdeterminism gate)
The current full-GK nonlinear ETG lane is now explicitly tracked as a pilot
runtime contract via
examples/nonlinear/axisymmetric/runtime_etg_nonlinear.toml. That lane is
separate from the reduced cETG solver and should be used for future
GX-backed nonlinear ETG parity work.
For ETG nonlinear audit runs, use dense short-window overrides first:
JAX_ENABLE_X64=1 spectrax-gk examples/nonlinear/axisymmetric/runtime_etg_nonlinear.toml \
--steps 10 \
--sample-stride 1 \
--diagnostics-stride 1
This lane is currently expensive enough that short persisted windows are the right first diagnostic step before attempting long production horizons.
The ETG short-window startup mismatch was traced to the GX input contract, not
the nonlinear ETG operator. GX reads init_single from [Expert] rather
than [Initialization], so the audited GX pilot was actually using the
Gaussian startup branch. The shipped runtime ETG pilot now matches that
contract with gaussian_init = true, init_single = false,
Lx = 1.25, and GX-style kz hypercollisions. On the matched
Nx=10, Ny=22, ntheta=16, Nl=4, Nm=4, dt=1e-4,
t_max=0.001 pilot, the refreshed short-window comparison lands at
mean_rel_abs(Wg) ~= 1.31e-2 and mean_rel_abs(Wphi) ~= 5.18e-3, with
the final heat-flux point within a few percent of GX.
The targeted imported-linear wrapper and the underlying
compare_gx_imported_linear.py comparator now support two important controls
for honest stress-lane scoring without changing the default full-window
behavior:
--sample-step-stride: subsample the saved diagnostic sample indices before scoring.--max-samples: truncate scoring to the first N selected samples.
The lower-level comparator also supports --cache-dir plus --reuse-cache
to persist per-ky trajectory/result arrays (gamma, omega,
Wg, Wphi, Wapar) as compressed .npz files keyed by the actual
reference file, geometry file, reference input, selected ky, Hermite/Laguerre
resolution, mode selector, and sample-window contract. This makes the
stress-lane tooling incremental instead of rerunning a full lane every time.
It now also writes absolute diagnostic-error columns and the reference
|gamma| / |omega| scales alongside the relative metrics. That matters
for near-marginal imported-linear stellarator lanes such as HSX, where
mean_rel_gamma can look large simply because the reference growth rate is
close to zero even while the absolute growth-rate mismatch and the field-energy
diagnostics remain small.
For VMEC-backed exact-state audits, the runtime bridge now prefers a local
booz_xform_jax checkout and injects a temporary booz_xform compatibility
shim only into the external geometry-helper subprocess. This preserves the
audited reference workflow while avoiding a host-level dependency on the original booz_xform
Python package.
The bridge auto-discovers booz_xform_jax from
BOOZ_XFORM_JAX_PATH / SPECTRAX_BOOZ_XFORM_JAX_PATH or from a checkout placed
next to the SPECTRAX-GK workspace. When a specific
Python environment is needed for the helper subprocesses, set
geometry.gx_python in the runtime TOML. On office, the normal audited
path is:
export BOOZ_XFORM_JAX_PATH=/path/to/booz_xform_jax
export SPECTRAX_VENV_PYTHON=/path/to/venv/bin/python
export SPECTRAX_OFFICE_ROOT=/path/to/SPECTRAX-GK
W7X_VMEC_FILE=/path/to/wout_w7x.nc \
HSX_VMEC_FILE=/path/to/wout_HSX_QHS_vac.nc \
"$SPECTRAX_VENV_PYTHON" tools/run_exact_state_audit.py \
--manifest tools/exact_state_lanes.office.toml \
--outdir tools_out/exact_state_audit_office
The tracked office manifest now pins these audit lanes to
JAX_PLATFORMS=cpu. These are parity/reference jobs, not performance runs,
and CPU pinning avoids spurious GPU RESOURCE_EXHAUSTED failures when
booz_xform_jax or grid-default assembly would otherwise grab a busy device.
The restart/continuation gate uses the same environment model and should be
run against the tracked nonlinear lanes with PYTHONPATH set to the source
tree so the office venv does not pick up a stale installed package:
PYTHONPATH="$SPECTRAX_OFFICE_ROOT/src" \
"$SPECTRAX_VENV_PYTHON" tools/run_restart_parity_gate.py \
--manifest tools/restart_gate_lanes.office.toml \
--outdir tools_out/restart_parity_office
The current office exact-state manifest now includes:
startup audits for Cyclone, KBM, W7-X, and HSX
late dumped-state audits for Cyclone Miller, Cyclone runtime, W7-X, and KBM
The tracked W7-X exact-state convention panel is generated by
tools/plot_w7x_exact_state_audit.py from the office W7-X startup and
late diagnostic-state dumps. It closes the VMEC geometry, Fourier-grid,
fieldsolve, and scalar-diagnostic convention layer against GX with a
1e-4 pointwise relative-error gate: startup g_state/phi are below
7.4e-7, late kperp2/fluxfac/kx/ky/phi arrays have
maximum finite relative error 4.62e-5 with phi RMS relative error
3.77e-7, and late scalar diagnostics are below 1.8e-7. This panel is
not a replacement for the open W7-X zonal-response literature lane; it rules
out the geometry/diagnostic convention layer as the source of that separate
recurrence/damping-envelope mismatch.
W7-X nonlinear exact-state convention audit. Startup state, late dumped geometry/field arrays, and re-evaluated scalar diagnostics are compared directly against GX dumps from the same VMEC equilibrium and nonlinear runtime contract.
For KBM specifically, the startup audit, late dumped-state audit, nonlinear term replay, and first RK4 partial-step replay now all close on the shipped nonlinear config for the current release pass. The remaining KBM work is therefore future long-window cleanup rather than a blocking startup-state, diagnostic-reconstruction, or first-step assembly mismatch.
The device-parity gate now has audited office manifests for one tokamak and
one stellarator lane, both requiring stable nonzero outputs rather than the
older zero-norm smoke probe:
PYTHONPATH="$SPECTRAX_OFFICE_ROOT/src" \
"$SPECTRAX_VENV_PYTHON" tools/run_device_parity_gate.py \
--manifest tools/device_parity_lanes.office.toml \
--outdir tools_out/device_parity_office
The VMEC roundtrip gate uses the same manifest pattern and currently covers the tracked W7-X and HSX VMEC lanes:
PYTHONPATH="$SPECTRAX_OFFICE_ROOT/src" \
"$SPECTRAX_VENV_PYTHON" tools/run_vmec_roundtrip_gate.py \
--manifest tools/vmec_roundtrip_lanes.office.toml \
--outdir tools_out/vmec_roundtrip_office
If the helper must be forced to another interpreter, set geometry.gx_python
in the runtime TOML used by the audit and rerun the same command. The old
environment-variable override is no longer documented because the preferred
path is the internal booz_xform_jax backend.
CI split: fast PR vs manual full
CI is split into two tiers to keep pull requests fast while preserving full physics rigor:
Fast PR/push tier: the quick-test matrix runs mypy and targeted test subsets across fundamentals, release artifacts, linear core, runtime, nonlinear, and parallel/autodiff contracts. This catches solver and dtype regressions quickly.
Wide coverage tier: CI runs the 48 top-level coverage shards as a matrix, uploads the per-shard
coverage.pydata, then combines the artifacts in one finalwide-coveragecheck that enforces the package-wide>=95%target. The same helper,tools/run_wide_coverage_gate.py, is used locally and in CI so the threshold is not weakened when the job is parallelized. Each shard has its own timeout so a single slow validation slice cannot become an unbounded release job. The combine step also requires labeled coverage data for every CI shard and writescoverage-wide-shard-manifest.jsonbefore refreshing the package-wide Codecov flag. Optional VMEC/Boozer artifact builders remain validated by their tracked offline artifact gates and mocked CI contracts, not by importing unavailable external repositories in the public coverage job.Manual full tier: full
pytestsuite plus strict coverage gates:spectraxgk.terms >= 90%and per-module core gates forlinear_krylov.pyanddiffrax_integrators.py.
This keeps iteration latency low for development and still enforces complete coverage and regression checks on demand without relying on scheduled runners.
For bounded local feedback, use the per-file runner:
python tools/run_tests_fast.py
It enforces both a per-file timeout and a whole-run timeout of 300 seconds by
default, then reports any remaining files as not_run(total_timeout) instead
of leaving orphaned pytest children. Use --total-timeout 0 only for an
explicit full sequential local pass.
The same wide gate can be run locally in one process with:
python tools/run_wide_coverage_gate.py \
--shards 48 \
--timeout 300 \
--fail-under 95 \
--pytest-arg=-o \
--pytest-arg=addopts= \
--pytest-arg=-m \
--pytest-arg="not slow"
On local machines where every pytest process must stay below the five-minute
release timeout, run one shard at a time and combine afterward. This is the
same data-flow used by CI, except CI runs the --only-shard jobs in
parallel and downloads the resulting coverage artifacts before the
--combine-only gate:
python -m coverage erase
for shard in $(seq 1 48); do
python tools/run_wide_coverage_gate.py \
--shards 48 \
--timeout 300 \
--only-shard "${shard}" \
--keep-existing-coverage \
--skip-combine \
--pytest-arg=-o \
--pytest-arg=addopts= \
--pytest-arg=-m \
--pytest-arg="not slow"
done
python tools/run_wide_coverage_gate.py \
--shards 48 \
--combine-only \
--fail-under 95 \
--pytest-arg=-o \
--pytest-arg=addopts= \
--pytest-arg=-m \
--pytest-arg="not slow"
Core modular coverage gate
To keep the modular RHS path future-proof, CI also enforces a dedicated
coverage gate for spectraxgk.terms:
pytest -q tests/test_terms_assembly.py \
tests/test_terms_operators.py \
tests/test_terms_fields.py \
tests/test_terms_integrators.py \
tests/test_terms_validation.py \
--maxfail=1 --disable-warnings \
--cov=src/spectraxgk/terms \
--cov-fail-under=90
This guard ensures term-wise kernels, field solves, custom-VJP behavior, and assembly plumbing stay highly covered while the rest of the benchmark and cross-code harness keeps evolving.
Core solver coverage gates
CI also enforces dedicated per-module thresholds for the two linear solver engines that are most likely to regress during algorithm work:
spectraxgk.linear_krylov(matrix-free Arnoldi/shift-invert path)spectraxgk.diffrax_integrators(explicit/IMEX/implicit diffrax path)
The gate runs focused tests and checks each module from coverage-core.xml:
pytest -q tests/test_linear_krylov_core.py \
tests/test_diffrax_integrators.py \
tests/test_diffrax_integrators_core.py \
--maxfail=1 --disable-warnings \
--cov=src/spectraxgk \
--cov-report=xml:coverage-core.xml
Both modules are required to stay at or above 90% line coverage in CI.