Calibration Guide#
Audience: hydrogeologists, modelers, and HydroModPy contributors who need to calibrate the parameters of a catchment-scale groundwater model. Suggested reading order: §1 (principles), §2 (workflow), §3 (TOML), §7 (reading the results). Sections §4 to §9 are reference material to consult once the loop is running.
This guide is the single source of truth for the calibration
sub-system. Code: hydromodpy/calibration/.
For complementary reading, see Calibration Architecture (calibration architecture root), Design Patterns (item 7 covers the calibration adapters), The three workspace databases, and Calibration Workflow.
Overview#
A HydroModPy calibration is an ask/tell loop that iteratively adjusts a
few model parameters (hydraulic conductivity K, specific yield
Sy, drain conductance) so that the simulated output matches a
reference series (discharge at a station, head at a piezometer). The
loop is driven by an optimizer that proposes values and reacts to a
scalar objective (NSE, KGE, RMSE) computed at every trial.
Compared to a plain hmp run, a calibration:
produces a trace of N evaluations (one row per trial in the DuckDB catalog), not a single simulation;
reuses the expensive setup steps (geographic, mesh, data loading) across trials through the
prepare-once-evaluate-manyprimitive;only writes the best runs to disk as full Zarr / Parquet simulations. Other trials remain as lightweight rows in DuckDB;
can be resumed across sessions through the content-addressed
params_hashcache.
Use a calibration to fit a model to observations. Use a plain
hmp run when the parameters are already known.
Specific terms used in this guide:
TrialContext: prepared runtime, reused by every trial of a session (§2, §4).earliest_affected_step: integer that decides which pipeline steps are skipped per trial (§4).params_hash: SHA-256 fingerprint used by the cross-session cache (§8).promote_trial: action that turns a lightweight trial row into a complete Zarr / Parquet simulation (§2, §5).
End-to-end workflow#
Stage |
What flows into it and what it produces |
|---|---|
|
User config with |
|
Dispatches the TOML to |
|
Runs pipeline steps |
Ask/tell loop ( |
Each trial reads the prepared context, writes one row to DuckDB
( |
DuckDB ( |
Stores trial fingerprints and session metadata. Read by
|
|
Replays the chosen trials through the full pipeline. Writes
Zarr + Parquet + |
Catalog (simulations + Zarr + Parquet) |
Read by |
|
Reads sessions, iterations, simulation outputs, and renders the calibration HTML report with figures. |
Node by node:
TOML: a regular HydroModPy project TOML extended with a
[calibration]section and a top-level[workflow].mode = "calibration"marker. The rest of the file ([simulation],[flow],[data],[solver]) is exactly what you would write for a single run.``hmp run``: the unified CLI entry point. It reads
[workflow].mode = "calibration"and dispatches tohydromodpy.calibration.cli_runner.run_calibration_cli. Calibration has no separate top-level subcommand.``prepare_trials``: runs pipeline steps
[0..earliest)exactly once.earliestis computed from the dotted paths declared by[calibration.parameters.*](see §4). The preparedWorkflowContext(geographic, mesh, loaded forcings) is held in RAM and forked by every trial.Ask/tell loop: the optimizer proposes a parameter point, the loop forks the prepared context, injects the values, runs steps
[earliest..8]in lightweight mode (no disk writes beyond the solver’s own scratch files), extracts the scalar objective from RAM, and tells the optimizer. Repeat up tomax_iter.DuckDB: every trial adds one row to
calibration_iterations(sim_idstaysNULL). The session metadata lives in one row ofcalibration_sessionsfinalized at the end.``promote_trial`` (top-N): if
save_runs != "none", the chosen trials are replayed through the full pipeline (steps00..11) byhydromodpy.Project(cfg_path).simulate(**values). Each promotion creates a Zarr store, a Parquet directory, and asimulationsrow, and back-fills the correspondingcalibration_iterations.sim_id.``hmp report <session_id>``: post-processing CLI that reads the session and iterations from DuckDB, renders the six calibration figures, and emits a standalone HTML report at
<workspace>/reports/<session_id>/report.html.
The user TOML#
The examples below come from the canonical case shipped with the
repository:
examples/projects/02_nancon_watershed/run_calibration_k.toml.
Overlay and workflow marker#
base_config = "project.toml"
[workflow]
mode = "calibration"
base_config: relative path to the base project TOML (project.tomlhere). All sections of the base are inherited and overridden by the current file. This is how a single description of the catchment is shared between simulation, calibration, and sweep overlays.[workflow].mode = "calibration": the single switch that tellshmp runto dispatch to the ask/tell loop instead of the default single-simulation path. Dispatch logic lives inhydromodpy/cli/workflows.py:DISPATCH.
Standard simulation block#
[simulation]
name = "nancon_calibration_k"
description = "Optuna calibration of K on NSE(discharge), Sy/Ss frozen."
[simulation.time]
start_datetime = "2000-01-01"
end_datetime = "2002-12-31"
step_value = "1 month"
coverage_policy = "warn"
[[simulation.process]]
id = "flow_main"
type = "flow"
solvers = ["modflow_nwt"]
The [simulation] tree is exactly what you write for a single
hmp run: a name, a time window, and one or more processes.
Changing the solver does not change anything in the
``[calibration]`` section: the calibration code is solver-agnostic
(no modflow, modflow6, boussinesq, or solver.backend
string appears anywhere in hydromodpy/calibration/). Swap
"modflow_nwt" for "modflow6" and the same TOML calibrates the
other solver (subject to the metric extractor coverage noted in §6).
Frozen parameters#
[domain.depth_model]
thickness = 30.0
[flow.param.Sy.field]
value = 0.05
[flow.param.Ss.field]
value = 1e-5
[flow.sinks_sources.recharge]
first_clim = "first"
Any leaf not listed under [calibration.parameters.*] is frozen
at the value written in the TOML. Here Sy and Ss are frozen at
0.05 and 1e-5; only K is calibrated.
The [calibration] section#
[calibration]
method = "grid"
max_iter = 40
batch_size = 1
save_runs = "best_n"
save_best_n = 5
seed = 42
objective = "nse"
variable = "discharge"
use_cache = true
Exhaustive option reference:
Option |
Values |
Default |
Effect |
|---|---|---|---|
|
|
|
Sampler backing the ask/tell loop. |
|
integer >= 1 |
|
Maximum number of trial evaluations. |
|
|
|
How many trials to promote to full simulations after the loop. |
|
integer >= 0 |
|
Number of top trials promoted when |
|
|
|
Scalar metric; the string form is the Python escape hatch (§10). |
|
|
|
Observed variable to compare against. |
|
integer or |
|
Random seed for reproducibility. |
|
bool |
|
Enable the |
|
integer >= 1 |
|
Suggestions drawn per |
|
dict |
|
Extra kwargs forwarded to the sampler (for example
|
Per-parameter declarations#
[calibration.parameters.K]
bounds = [1e-6, 1e-3]
transform = "log"
prior = "log_uniform"
path = "flow.param.K.field.value"
units = "m/s"
Exhaustive option reference for each
[calibration.parameters.<name>] block:
Option |
Values |
Required |
Effect |
|---|---|---|---|
|
|
yes (except some grid/fixed-space setups) |
Physical bounds. For |
|
|
no (defaults to |
Internal sampling space. Use |
|
|
no (defaults to |
Prior used by Bayesian samplers ( |
|
dotted path |
yes |
Where to inject the value in |
|
free string |
no |
Informational label used in figures and reports. |
Recipes for common parameter shapes:
# Homogeneous K
[calibration.parameters.K]
bounds = [1e-6, 1e-3]
transform = "log"
path = "flow.param.K.field.value"
# Zoned K: one block per zone, each with its own `path`
[calibration.parameters.K_granite]
bounds = [1e-6, 1e-3]
transform = "log"
path = "flow.param.K.field.values.zone_granite"
[calibration.parameters.K_schiste]
bounds = [1e-6, 1e-3]
transform = "log"
path = "flow.param.K.field.values.zone_schiste"
# Specific yield (no log: already O(1))
[calibration.parameters.Sy]
bounds = [0.02, 0.30]
transform = "identity"
path = "flow.param.Sy.field.value"
# Drain conductance
[calibration.parameters.drain_cond]
bounds = [1e-4, 1e-1]
transform = "log"
path = "flow.sinks_sources.stream_drain.conductance"
# Aquifer thickness
[calibration.parameters.thickness]
bounds = [10.0, 80.0]
transform = "identity"
path = "domain.depth_model.thickness"
Recipe cheat sheet#
Use case |
Block |
|---|---|
Quick exploratory sweep (1-2 params) |
|
Default production calibration |
|
Multi-dim continuous (3+ params) |
|
Local refinement near a known optimum |
|
Full Bayesian posterior (Phase 4) |
|
Automatic step invalidation#
Every pipeline step declares which TOML subtrees it reads via a
config_sections class variable. When a calibration mutates
flow.param.K.field.value, only the steps that consume flow.*
(and their downstream siblings) need to re-run. The earlier steps
produce the same result on every trial and are executed exactly once
inside prepare_trials.
Step |
Name |
Phase |
TOML subtrees consumed |
|---|---|---|---|
00 |
Validate |
Shared (run once) |
|
01 |
Resolve |
Shared (run once) |
|
02 |
LoadData |
Shared (run once) |
|
03 |
BuildGeographic |
Shared (run once) |
|
04 |
BuildMesh |
Shared (run once) |
|
05 |
SetupProcess |
Shared (run once) |
|
06 |
PrepareSolver |
Looped per trial |
|
07 |
RunSolver |
Looped per trial |
|
08 |
Extract |
Looped per trial |
none (in-memory extraction) |
09 |
Derive |
Promoted only |
|
10 |
Export |
Promoted only |
none (write Zarr/Parquet) |
11 |
Display |
Promoted only |
|
Green (shared): steps
00..05. Executed once byprepare_trials. Geographic, mesh, and loaded forcings live inTrialContext.ctxand are shared by reference across every fork.Orange (looped): steps
06..08. Re-run per trial inrun_trial_light. The trial fork deep-copies the config, injects the new parameter values, and re-executes this slice only.Blue (promoted): steps
09..11. Never run during the calibration loop. Executed only for the trials picked up bypromote_trialafter the loop converges.
If I calibrate X, what is replayed?#
Calibrated path |
|
Steps shared (run once) |
Steps re-run per trial |
Rough speedup |
|---|---|---|---|---|
|
6 |
00-05 |
06-08 |
~3x |
|
6 |
00-05 |
06-08 |
~3x |
|
6 |
00-05 |
06-08 |
~3x |
|
6 |
00-05 |
06-08 |
~3x |
|
5 |
00-04 |
05-08 |
~2x |
|
4 |
00-03 |
04-08 |
~1.7x |
|
3 |
00-02 |
03-08 |
~1.3x |
Matching rule: dotted longest-prefix. Given an override path like
flow.param.K.field.value, the selector walks each step’s
config_sections and accepts a match when the section is a
dotted-prefix of the path. flow matches, flow.param.K
matches, flow.param.K.field.value matches, but flow_runtime
does not. The lowest index among matching steps wins; everything
downstream of it is forced to re-run even if its own sections did not
match (step 08 Extract, for instance, has empty
config_sections, yet it re-runs because it consumes the state
produced by step 07).
Implementation:
hydromodpy/workflow/internals/dependencies.py:earliest_affected_step.
Per-step annotations: config_sections: ClassVar[tuple[str, ...]]
on each of the 12 step_<nn>_*.py modules.
Storage#
Calibration storage flow, by container:
Container |
Component |
Role during the calibration loop |
|---|---|---|
RAM (loop only) |
Aligned vector + scalar metrics |
Built per trial by |
|
|
One row per session (start, finalize, status). |
|
|
One row per trial: parameters, objective, scalar metrics,
|
|
|
Promoted top-N trials only ( |
Filesystem |
|
Spatial fields written only for promoted trials. |
Filesystem |
|
Detailed timeseries, budgets, mass balance for promoted trials only. |
Filesystem |
|
PNG figures rendered post-loop via the display registry. |
Filesystem |
|
HTML report rendered by |
Rule of thumb: RAM inside the loop, DuckDB for the trace, Zarr / Parquet only for promoted runs.
Artefact |
Lives in |
Written when |
|---|---|---|
Simulated vector aligned on observations |
RAM only |
Each |
Per-station scalar metrics (NSE, KGE, RMSE) |
|
After each trial |
Session metadata |
|
Start + finalize |
Parameters + scalar objective |
|
After each trial |
|
Column in |
After each trial |
Spatial fields |
|
Only via |
Detailed timeseries (head, Q) |
|
Only via |
Figures (PNG) |
|
Post-loop via Display registry |
HTML report |
|
|
Disk-space cheat sheet for a representative 100-trial session on a moderately fine mesh:
|
DuckDB |
Zarr dirs |
Parquet dirs |
Typical total |
|---|---|---|---|---|
|
100 iter + 1 session |
0 |
0 |
~500 KB |
|
100 iter + 5 sim + 5 param + 5 metric + 1 session |
5 |
5 |
~2.5 GB |
|
100 iter + 100 sim + … |
100 |
100 |
~50 GB |
Recommendation. Use save_runs = "best_n" as your default. It
keeps the full trace queryable in DuckDB while only materializing the
trials you might actually inspect later. Use "all" only for short
diagnostic calibrations (<= 20 trials) when you want every single run
on disk.
Optimization methods#
Method |
Type |
Typical budget |
Deterministic |
Strength |
When to pick it |
|---|---|---|---|---|---|
|
Enumeration |
Product of |
Yes |
Simple, exhaustive |
1-2 params, unlimited budget |
|
Light Bayesian |
50-200 |
Seed-dependent |
Adaptive, easy |
Sensible default |
|
Evolutionary |
100-500 |
Seed-dependent |
Continuous multi-dim |
3+ continuous params |
|
Evolutionary |
100-500 |
Seed-dependent |
Robust |
Alternative to Optuna CMA-ES |
|
Local simplex |
50-100 |
Partially |
Fast local |
Refine around a known optimum |
|
GP surrogate + EI |
30-100 |
Seed-dependent |
Few expensive evaluations |
Slow solvers |
|
MCMC + GP surrogate |
1000+ |
No |
Full posterior |
Uncertainty quantification |
optimizer_kwargs hints:
grid:{ "n_points": {K = 10, Sy = 5} }to set per-parameter granularity. Defaults to a uniform count across all parameters.optuna:{ "sampler": "tpe" | "cmaes" | "random", "pruner": "median" }. Default sampler is TPE and is available in the base install;cmaesrequires the calibration extra and is strongly recommended when you have 3+ continuous parameters.scipy_de:{ "popsize": 15, "mutation": [0.5, 1.0], "recombination": 0.7 }, the standardscipy.optimize.differential_evolutionknobs.scipy_nelder_mead:{ "xatol": 1e-4, "fatol": 1e-4, "adaptive": true }.adaptive = trueis friendlier in higher dimensions.gp_mapping:{ "n_initial": 10, "acq": "ei" }: expected improvement over an RBF Gaussian Process surrogate.da_mh_gp:{ "burn_in": 200, "thin": 5, "proposal_scale": 0.3 }, Metropolis-Hastings tuned by the surrogate.
Reading the results#
Python API and DuckDB#
import hydromodpy as hmp
catalog = hmp.open("~/workspace")
sessions = catalog.calibration_sessions()
iters = catalog.calibration_iterations(session_id=sessions.iloc[0]["session_id"])
best = catalog.best(project="calibration", metric="nse")
best.plot("watertable_map", save="~/figures/")
print(iters.head())
catalog.calibration_sessions()returns aDataFramewith one row per session (method, objective, best_objective, duration_s).catalog.calibration_iterations(session_id=...)returns the full per-trial trace (parameters, objective_value, status, sim_id, params_hash).catalog.best(project, metric)returns aRunobject for the best promoted trial across sessions matchingproject.
Figures via the display registry#
Six named figures ship with HydroModPy. Each implements the Figure
Protocol and is registered in
hydromodpy/display/figures/__init__.py.
calibration_convergence: best-so-far objective vs iteration.calibration_trace: parallel plots of every parameter and the objective across iterations.calibration_landscape: 2D scatter of any pair of parameters coloured by objective value.calibration_posterior: marginal histograms per parameter.calibration_objective_surface: interpolated NSE surface over any 2-parameter slice.calibration_pairplot: pairwise grid of scatter plus histograms.
Render one figure at a time from the CLI (the session id is printed on
stderr during hmp run):
hmp display run_calibration_k.toml --session <session_id> --figure calibration_convergence
hmp display run_calibration_k.toml --session <session_id> --figure calibration_trace
hmp display run_calibration_k.toml --session <session_id> --figure calibration_landscape
hmp display run_calibration_k.toml --session <session_id> --figure calibration_posterior
hmp display run_calibration_k.toml --session <session_id> --figure calibration_objective_surface
hmp display run_calibration_k.toml --session <session_id> --figure calibration_pairplot
Pre-rendered examples for a Dupuit / MODFLOW 6 twin benchmark live
under docs/source/_static/capability_gallery/calibration/: reuse
them as visual references.
The loop can also render figures automatically at the end of the
session by listing them under [display] figures = [...] in the
TOML.
HTML report#
For users who do not want to drop into Python:
hmp report <session_id>
xdg-open ~/workspace/reports/<session_id>/report.html
The report embeds the six figures together with a summary table (method, objective, best parameters, best_sim_id, duration). It is fully standalone: the HTML file plus its sibling PNGs can be shipped as a single folder.
Cross-session cache and reproducibility#
Every trial is fingerprinted by a params_hash: the SHA-256 of its
canonical parameter JSON representation (keys sorted, floats
normalized). The hash is written on every calibration_iterations
row.
use_cache = true(default): before running a trial, the engine looks upparams_hashin theParamsHashCache. The cache is preloaded at session start from every completed and promoted iteration of every previous session on the same workspace. On a hit, thesim_idis reused, the solver is skipped entirely, and the cached objective value is returned.seed = 42: seeds the sampler. Combined with a deterministic solver, this makes the whole sequence reproducible. Leavenullfor stochastic exploration.
Disable the cache (use_cache = false) when:
the input data changed (new forcings, new DEM, new observations);
the solver version changed;
you want to benchmark true per-trial wall time.
Common pitfalls and how to avoid them#
``path`` points to a missing field. The engine raises a
CalibrationSetupErrorwith the list of valid paths. Double-check withhmp doctor --toml run_calibration_k.toml.Bounds too tight. The optimizer converges on a plateau and reports a false “best”. Widen
boundsand re-run.``transform = “identity”`` for K from 1e-6 to 1e-3. Uniform sampling in the physical space burns 99% of the budget on the top decade. Always use
transform = "log"for multi-order quantities (K, conductances, storage coefficients).``save_runs = “all”`` on 100 trials. You end up with ~50 GB of Zarr. Use
best_nunless you really want every trial on disk.``max_iter`` too low for a Bayesian method. Optuna TPE needs at least ~50 trials before exploiting;
da_mh_gpneeds a few hundred before the posterior stabilizes.Variable ``”discharge”`` but no hydrometry in ``[data]``. The metric extractor returns
NaNon every trial; the session completes in seconds with no usable output. Check that[data.hydrometry](or[data.piezometry]for head) is populated.MODFLOW 6 + discharge calibration. The extractor in
hydromodpy.calibration.metricscurrently covers MODFLOW-NWT only; MODFLOW 6 returnsNaN. Scheduled as future work: use MODFLOW-NWT in the meantime, or plug in a custom extractor through §10.Forgetting ``[workflow].mode = “calibration”``.
hmp runthen treats the TOML as a simulation and runsKexactly once with its default value. The[calibration]section is silently ignored.
Python API reference#
Programmatic entry point (no CLI needed):
from hydromodpy.calibration.cli_runner import run_calibration_cli
summary = run_calibration_cli(
"run_calibration_k.toml",
objective="mypkg.metrics:custom_nse",
workspace="~/workspace",
project="nancon_K",
)
print(summary["best_sim_id"], summary["best_objective"])
The positional argument is the TOML path; resolved relative to the current working directory.
objective="module.path:fn"is the Python escape hatch. The callable must match theTrialMetricFnsignature(ctx, *, objective, variable) -> (primary_metric, {component: value}). Use it when the default NSE / KGE / RMSE extractor does not fit your variable (multi-station weighted, filtered seasons, etc.).workspaceoverrides the workspace root resolved from the TOML (same rule as the binary resolver inWorkspaceConfig).projectis a free string written tocalibration_sessions.project.
Return dict keys:
session_id, method, n_iterations, best_objective,
best_sim_id, duration_s, save_runs, promoted
Low-level primitives#
The CLI is a thin wrapper around three primitives you can call directly for custom orchestration:
from hydromodpy.simulation.execution.trial import (
prepare_trials, run_trial_light, promote_trial,
)
trial_ctx = prepare_trials(
"run_calibration_k.toml",
override_paths={"K": "flow.param.K.field.value"},
)
result = run_trial_light(
trial_ctx, {"K": 1e-4},
objective="nse", variable="discharge",
)
if result.status == "completed":
sim_id = promote_trial(
"run_calibration_k.toml",
{"K": 1e-4},
paths={"K": "flow.param.K.field.value"},
name="manual_k_1e-4",
)
Useful diagnostics on a persisted trace:
from hydromodpy.calibration.diagnostics import (
convergence_rate, parameter_correlation,
)
rate = convergence_rate(iters)
print(rate["slope"], rate["r_squared"])
corr = parameter_correlation(iters, parameters=["K", "Sy"])
print(corr)
Render a specific figure programmatically (equivalent to the
hmp display call above):
Analytical test cases#
Two pure-Python demos ship for local experimentation (no MODFLOW install needed):
hydromodpy.calibration.cases.recession_brutsaert: hydrograph recession fit (Q(t) = Q0 * exp(-t/tau)) used as the golden forgrid_search,random_search,nelder_mead,simplex,cma_es,gp_mapping,da_mh_gp.hydromodpy.calibration.cases.groundwater_1d: 1D synthetic aquifer with analytical head profile.
They are the quickest way to check that a new optimizer adapter
behaves against the known-good METHOD_ABS_TOL tolerances.
See also#
Calibration Architecture for the calibration architecture root.
Calibration Architecture for the package map and every UML diagram (config, runtime, execution flows, case structure).
Calibration Workflow for the user-facing hub.
Calibration for the inverse-problem formulation and methods.
Calibration Benchmarks for stable benchmark pages.