Status & State
Where live run state lives and how it is written
config.yaml is one-shot per run — every key is configuration. Live state
(what is running, what just finished) lives separately so the config stays a
clean run profile.
Live run state — the process table
There is no status section in config.yaml. Whether a run is live right
now is answered by the process table:
pgrep -af 'sync_1node_cc|train_1node_cc' # launch wrappers
pgrep -af 'main_ppo' # verl trainer/rl:check and /rl:run both probe these. /rl:run refuses to start while a
matching training PID is alive; once nothing matches, the box is free — there
is no stale flag to clean up after a crash.
artifacts/index.yaml
The append-only run ledger, one row per run, written by
scripts/archive_run.sh's EXIT trap:
- id: run_001
started_at: "..."
completed_at: "..."
status: completed # running | completed | failed
archive: artifacts/archives/run_001/
notes: "one-line summary"This is the source of truth for run history. The newest entry tells you what the block last did.
Visual state
For a live view, the dashboard reads the run logs directly — no status file required. It infers a run's state from log file mtime/size (running vs finished) and renders the per-step metrics.