hydromodpy.workflow.runner#
Pipeline orchestrator.
Pipeline runs a list of Step objects sequentially, optionally
folding parallelisable cohorts through a thread pool. State is reconstructed
from durable artefacts (Zarr, Parquet, DuckDB rows) plus the TOML config,
never from pickle blobs. The workflow_steps journal is the single
source of truth for resume decisions: every step records its
inputs_hash, outputs_hash and artifact_uris rows there. A
HeartbeatPulse emits heartbeat events on workflow_events
while a step executes so hmp gc can detect zombie runs.
Two-phase execution#
Pipeline.run operates in two phases:
Rebuild phase (steps in
[0, restart_index)): each step is either re-executed in memory (cheap, idempotent steps such as validate, resolve, setup) or restored from artefacts viaStep.rebuild_state(heavy steps with declaredartifacts). No journal rows are written during this phase.Execute phase (steps in
[restart_index, end)): cohorts are obtained from the Kahn DAG sort; each cohort is dispatched through aThreadPoolCohortExecutor(default) or sequentially when the caller passesparallel=False. The journal records inputs/outputs hashes and the event stream records step_start/step_end markers.
Classes
|
Ordered list of steps executed as a single pipeline. |