The catalog door#

HydroModPy stores its tabular state in three DuckDB files:

  • <workspace>/data/cache.duckdb – shared input cache.

  • <project>/catalog.duckdb – simulation results for one project.

  • <state_dir>/index.duckdb – machine-wide federation of every registered workspace.

End-user code never needs to know which file holds a given row. hmp.open is the single door onto the simulation catalog; the input cache and the machine-wide federation are reached through their own entry points (hydromodpy.catalog.InputsNamespace / the hmp data CLI, and hmp.index()).

Opening a catalog#

import hydromodpy as hmp

cat = hmp.open("~/proj/naizin")
sims = cat.find(solver="modflow6")          # SimulationGroup
workspaces = hmp.index()                     # machine-wide federation

hmp.open returns a SimulationCatalog (the engine itself, not a wrapper). With the default create=False it raises FileNotFoundError when no catalog.duckdb exists; pass create=True to initialise an empty catalog instead.

The three databases#

The simulation catalog – hmp.open#

Backed by <project>/catalog.duckdb.

cat = hmp.open("~/proj/naizin")

# All simulations for this project as a DataFrame.
df = cat.frame

# Equality filters against ``v_simulation_summary`` columns.
# An unknown filter raises ValueError listing the valid keys.
group = cat.find(solver="modflow6", status="completed")

# Schema discovery.
cat.describe()
cat.tables()
cat.columns()
cat.variables()
cat.metrics()
cat.stations()

# Ranking and resolution.
cat.latest()
cat.best()
cat.worst()
cat.rank()
cat.resolve(ref)

# One sim by reference.
row = cat["ab12cd34-...-...-...-..."]
data = cat.read(ref, "head")

# Raw SQL.
cat.sql("SELECT * FROM v_simulation_summary LIMIT 5")

The input cache#

Backed by <workspace>/data/cache.duckdb. Reached through hydromodpy.catalog.InputsNamespace or the hmp data CLI, not through hmp.open.

from hydromodpy.catalog import InputsNamespace

inputs = InputsNamespace("~/proj/naizin")
inputs.has_cache()
inputs.db_path  # -> ``<workspace>/data/cache.duckdb``

# List entries, optionally filtered.
inputs.list(variable="recharge")
inputs.list(variable="head", source="brgm")

# Locate a single cached entry covering a given extent.
entry = inputs.find(
    variable="recharge",
    source="meteofrance",
    station_id=None,
    bbox=(2.0, 48.0, 3.0, 49.0),
)

The machine global index – hmp.index#

Backed by <state_dir>/index.duckdb. Opened in read-only mode so concurrent hmp run writers keep their write-lock.

# Every registered workspace plus federated search.
hmp.index()

The federation (federated search across every workspace, full-text search across descriptions / scientific objectives) lives on the index returned by hmp.index().

Underlying objects#

Callers that need a finer surface (custom SQL, transaction control, register/unregister) reach the underlying objects directly:

  • hydromodpy.results.catalog.SimulationCatalog

  • hydromodpy.data.registry.DataCatalogDuckDB

  • hydromodpy.core.state.global_index.GlobalIndex

These are the V1 implementations and remain the canonical entry points for low-level work.

Migrations runner#

Each of the three DuckDB files owns a flat migrations/ directory holding one 0001_initial.sql. They share a single runner under hydromodpy.core.migrations.runner:

from hydromodpy.core.migrations import apply_migrations

apply_migrations(
    db_path="path/to/some.duckdb",
    migrations_dir="path/to/migrations/",
    component="catalog",  # or "data_cache", "index"
)

apply_migrations acquires a <db_path>.lock filelock so concurrent callers serialise. Already-applied migrations are skipped based on a checksum recorded in schema_migrations.

Authentication#

The catalog reads hydromodpy.core.auth to resolve the current operator. V1 ships a permissive default (LocalAuthBackend) that returns the OS user and allows every operation. Switching backends happens via the HMP_AUTH_BACKEND environment variable; no code change is needed in the catalog layer.

Path types#

Every workspace / cache / state path argument is typed pathlib.Path | upath.UPath. The runtime accepts local paths and file:// URIs; any other scheme raises NotImplementedError with a clear message. The type widening lets callers pass a raw URI today even though only local URIs are honoured.