hydromodpy.workflow.runner#

Pipeline orchestrator.

Pipeline runs a list of Step objects sequentially, optionally folding parallelisable cohorts through a thread pool. State is reconstructed from durable artefacts (Zarr, Parquet, DuckDB rows) plus the TOML config, never from pickle blobs. The workflow_steps journal is the single source of truth for resume decisions: every step records its inputs_hash, outputs_hash and artifact_uris rows there. A HeartbeatPulse emits heartbeat events on workflow_events while a step executes so hmp gc can detect zombie runs.

Two-phase execution#

Pipeline.run operates in two phases:

  • Rebuild phase (steps in [0, restart_index)): each step is either re-executed in memory (cheap, idempotent steps such as validate, resolve, setup) or restored from artefacts via Step.rebuild_state (heavy steps with declared artifacts). No journal rows are written during this phase.

  • Execute phase (steps in [restart_index, end)): cohorts are obtained from the Kahn DAG sort; each cohort is dispatched through a ThreadPoolCohortExecutor (default) or sequentially when the caller passes parallel=False. The journal records inputs/outputs hashes and the event stream records step_start/step_end markers.

Classes

Pipeline(steps, *[, workspace, catalog])

Ordered list of steps executed as a single pipeline.