Schema Evolution#
HydroModPy V1 ships an Alembic-like migration runner that applies
versioned SQL migrations on every DuckDB the toolbox owns, and pins a
single integer schema version on Zarr and Parquet stores. The runner
records each application in a schema_migrations ledger and keeps one
row per component (catalog, cache, index).
For the storage layout that this policy applies to, see Storage Layout.
Scope#
Covered:
DuckDB databases: project
catalog.duckdb, workspacedata/cache.duckdb, machineindex.duckdb.Zarr v2 stores:
simulations/<basename>.zarr/withZARR_SCHEMA_VERSIONpinned in the root ACDD attributes.Parquet v2.6 outputs with
hmp.schema_versionin KV metadata (PARQUET_SCHEMA_VERSION).Portable
.hmppackages produced bySimulationCatalog.export_package.
Out of scope: user TOML files. Their versioning is handled by Pydantic
v2 with ConfigDict(extra="forbid").
Migration runner#
Source: hydromodpy/core/migrations/runner.py plus per-component
migration directories:
hydromodpy/results/catalog/migrations/for the project catalog;hydromodpy/data/registry/migrations/for the workspace cache;hydromodpy/core/state/migrations/for the global index.
Each migration is a numbered SQL file (0001_initial.sql,
0002_add_<slug>.sql, …) and applies cleanly in version order. The
runner:
ensures the
schema_migrationsledger exists with columnsversion INTEGER,component TEXT,slug TEXT,checksum TEXT,applied_at TIMESTAMP;reads the max applied version for the requested component;
applies every newer migration inside one transaction per file;
records the migration with a SHA-256 checksum of the SQL payload so a tampered migration is detected on re-runs.
Calling ensure_schema() from a backend (DuckDBBackend or any
other adapter implementing the protocol) deploys the latest schema for
that component.
The facade hmp.read and hmp.open call ensure_schema() on
first access so users never see a half-deployed catalog.
Principles#
One version per component. Each DuckDB has its own ledger row in
schema_migrations. Each Zarr store carrieszarr_schema_versionin its root attributes. Each Parquet file carrieshmp.schema_versionin KV metadata.Additive migrations first. Prefer
ALTER TABLE ... ADD COLUMNwith a default over deletions or renames. Spatial Zarr fields only grow (new datasets); existing ones keep their shape and dtype.Monotone numbering. Versions are integers incremented by one per migration. No gaps. Downgrades are not supported; a migration is a one-way door.
Round-trip tests required. For every migration
v(n) -> v(n+1)a test must cover:a minimal hand-built
v(n)fixture;applying the migration produces a
v(n+1)store readable by the current backend;the migration is idempotent: running it twice is a no-op.
Breaking reader change. Any change to shape, dtype, column order, or semantics of an existing field triggers a version bump and a migration. Pure refactors that do not touch disk do not bump the version.
Export/import boundary.
.hmppackages embed the version of every component in their manifest. The import path rejects packages whose component versions exceed the local library and silently migrates older ones through the registry.
Anti-patterns#
Do not silently accept unknown tables or columns. The reader rejects stores whose version exceeds the maximum it knows about.
Do not inject data from outside the migration. The function operates only on the SQL or store handle handed to it.
Do not couple SQL and field-store version numbers. Each evolves independently:
schema_migrationsfor DuckDB,ZARR_SCHEMA_VERSIONandPARQUET_SCHEMA_VERSIONfor the columnar stores.
Versions today#
Component |
Version |
Notes |
|---|---|---|
Project catalog ( |
|
Initial v2 DDL: simulations, parameters, metrics, provenance, calibration, workflow_steps, schema_migrations. |
Workspace cache ( |
|
Entries with workspace-relative paths, provenance, failures, validation_reports. |
Machine global index ( |
|
Workspaces table plus federated views. |
Zarr field store |
|
ACDD root attrs, CF |
Parquet tabular store |
|
pyarrow |
GeoParquet |
|
OGC 1.1, GeoArrow encoding. |
See also#
Storage Layout for the storage that this policy applies to.
Design Patterns for the Pydantic config layer that sits above the storage.
results for
hmp.readandCatalogBackend.