Validation And Coverage Strategy
================================

Purpose
-------

The refactor branch uses a traceability manifest to keep software coverage,
physics validation, and publication artifacts tied together. The package-wide
target is still 95% coverage, but coverage is treated as an engineering
guardrail. A test only counts as useful when it protects an equation, a
numerical method, a diagnostic convention, an artifact contract, or an
autodiff/performance guarantee.

The machine-readable manifest lives at
``tools/validation_coverage_manifest.toml`` and is checked by
``tools/check_validation_coverage_manifest.py``. Each critical module entry
records:

- the source file and owning refactor lane;
- reference anchors from the literature, independent-code comparisons, or
  documented numerical methods;
- physics contracts that should remain true across refactors;
- numerical contracts such as observed order, conservation, window policy, or
  finite-value handling;
- fast tests that run locally;
- shipped artifacts or gate reports that document the validation state;
- the next tests needed to reach the package-wide 95% target.

Claim-scope synchronization
---------------------------

Validation status and scientific claims are intentionally separate from raw
coverage. Use :doc:`release_scope` as the human-readable claim ledger, and keep
it synchronized with ``docs/_static/manuscript_readiness_status.json``,
``docs/_static/open_research_lane_status.json``, and the validation coverage
manifest. A passing test or coverage line is not enough to promote a physics or
performance claim unless the relevant artifact also records the observable,
reference, tolerance, and accepted claim level. Likewise, an example that runs
successfully is still only a release claim when it is labeled as release-gated
and tied to the relevant artifact; otherwise keep it framed as a stress lane,
pilot, or deferred manuscript lane.

How to check the manifest
-------------------------

Run:

.. code-block:: bash

   python tools/check_validation_coverage_manifest.py

For CI or release bookkeeping, write a JSON summary:

.. code-block:: bash

   python tools/check_validation_coverage_manifest.py \
     --out-json docs/_static/validation_coverage_manifest_summary.json

The wide-coverage CI job attaches measured coverage from the combined Cobertura
report:

.. code-block:: bash

   python tools/check_validation_coverage_manifest.py \
     --coverage-xml coverage-wide.xml \
     --enforce-package-coverage \
     --out-json docs/_static/validation_coverage_manifest_summary.json

This fails if total package coverage drops below the manifest target and records
direct/owned module coverage gaps for the next refactor tranche. Module-level
coverage enforcement is intentionally a separate switch so the release gate can
remain package-wide while the manifest still exposes specific debt.

The manifest complements ``tools/make_validation_gate_index.py``. The gate
index reports which validation artifacts currently pass. The coverage manifest
reports whether the remaining refactor and testing work is traceable to
physics, numerics, artifacts, and tests.

Finalization sequence
---------------------

The remaining work should be closed in this order.

1. **Freeze module contracts before moving code.**
   For each large file, write or update tests for current public behavior, then
   extract only cohesive helpers. Keep compatibility exports until examples,
   docs, and benchmark scripts use the new module boundaries.

2. **Finish the high-priority refactor modules.**
   The active blockers are ``runtime.py``, ``linear.py``, ``nonlinear.py``,
   ``benchmarks.py``, ``diagnostics.py``, ``runtime_artifacts.py``,
   ``validation_gates.py``, ``zonal_validation.py``, and
   ``from_gx/vmec.py``. Each slice should land with targeted tests and no
   physics-model change.

3. **Turn open or deferred physics lanes into explicit gates.**
   Literature-facing lanes should produce JSON/CSV/PNG/PDF artifacts with the
   same observable, window, tolerance, and source recorded in metadata. The
   current priority list is W7-X zonal recurrence/damping, W7-X
   fluctuation-spectrum experimental transfer functions, W7-X TEM /
   kinetic-electron and multi-flux-tube validation, production nonlinear
   transport-gradient gates, optimized-equilibrium nonlinear audits, and any
   stricter case-specific nonlinear window-statistics retuning that should
   become a paper claim.

4. **Replace coverage gaps with physics or numerics tests.**
   Do not add shallow import-only tests to chase the number. Prefer tests for
   ladder identities, field-solve limits, bracket antisymmetry, diagnostic
   normalization, strict JSON output, observed-order gates, restart invariants,
   and artifact reload behavior.

5. **Validate differentiability explicitly.**
   Autodiff examples should carry finite-difference or tangent checks, inverse
   recovery diagnostics, and covariance/uncertainty estimates. The Phase-A
   ``vmec_jax`` and ``booz_xform_jax`` bridge now carries a tracer-safe
   geometry-observable sensitivity check, a two-parameter inverse design, and
   local UQ covariance diagnostics. Reduced linear, quasilinear, and
   nonlinear-window-estimator derivatives now have AD/finite-difference gates,
   but production nonlinear transport derivatives still need long-window
   heat-flux convergence, local-gradient conditioning, and optimized-equilibrium
   audits before they are used for stellarator heat-flux optimization claims.

6. **Keep performance measurements separated from validation.**
   Performance panels should report cold compile, warm runtime, memory, output
   time, and parallelization speedup separately. Parallelization gates should
   first target independent scans, UQ ensembles, and sensitivity batches where
   strong scaling is scientifically useful and robust.

7. **Raise CI thresholds in phases.**
   Keep default tests under the five-minute local budget. Enforce fast critical
   modules first, then broad package coverage on the wide CI lane, then manual
   office/GPU parity and performance sweeps. The merge target is package-wide
   coverage at or above 95% with the validation manifest and gate index both
   passing.

Release readiness criteria
--------------------------

The refactor branch is ready to merge when:

- package-wide coverage is at least 95% on the wide lane;
- all high-priority manifest modules have their planned extraction tests;
- the tracked validation gate index is passing or open lanes are explicitly
  labeled as non-release exploratory artifacts;
- shipped examples still run and plot from output files;
- W7-X, HSX, Cyclone, Cyclone-Miller, KBM, ETG, Miller, and VMEC examples have
  current documentation that labels each lane as release-gated, stress, pilot,
  or deferred and matches the tracked artifacts;
- autodiff examples validate gradients and inverse/UQ outputs;
- the performance manifest points to current runtime/memory panels, CPU/GPU
  profiler artifacts, and numerical-identity gates for every performance
  claim made in the README/docs;
- docs build with warnings treated as errors;
- package build, release workflow, and PyPI metadata checks pass.