ADR 0001 — DuckDB Adoption as Primary Analytics Engine

Status: Adopted (2025-10-08)

Context

  • The current pipeline relies on SQLite for imports, with extensive Python logic (pandas to_sql, index recalculation, manual validations) and significant limitations (limited DDL, degraded performance on large files, lack of analytical features).

  • The “Generic Import” refactoring aims for a more flexible engine to create dynamic tables, read massive files, manipulate geometries, and execute recursive CTEs.

  • DuckDB natively provides read_csv_auto, read_parquet, a spatial extension, and CREATE OR REPLACE TABLE statements that simplify the pipeline.

Decision

We are migrating Niamoto’s analytical infrastructure to DuckDB:

  • New import tables will be created in a DuckDB file (.duckdb) defined by the configuration (config.yml).

  • The niamoto.common.database module will be extended/adapted to encapsulate DuckDB (SQL execution, introspection, transactions).

  • Scripts and tests will use DuckDB by default; SQLite will remain only for temporary compatibility (targeted tests) but will no longer be the primary engine.

  • The CLI and GUI will automatically load the spatial extension when necessary (plots/shapes).

Consequences

Positive

  • “Direct” import with read_csv_auto / CREATE TABLE AS SELECT, reducing code and execution time.

  • Native support for recursive CTEs ⇒ simple adjacency list hierarchies.

  • Handling of modern formats (Parquet, GeoParquet) without additional conversions.

  • Simplified statistics/profiling generation via SQL.

Negative / Points of Attention

  • DuckDB learning curve for the team (DDL syntax, limitations). Snippets will be added to documentation.

  • Spatial extension must be explicitly loaded (initialization scripts ➜ check presence and document).

  • Migration of existing environments: provide an ATTACH script to copy old SQLite tables if necessary.

  • Adjust CI (DuckDB installation, extension) and packaging (pyproject.toml).

Actions

  1. Adapt Database + unit tests for DuckDB.

  2. Document configuration (docs/07-architecture/README.md, installation guides).

  3. Update scripts (bootstrap, tests) to create/initialize the DuckDB file.

  4. Prepare a migration guide (docs/08-roadmaps/generic-import-ultrathink.md ➜ migration appendix) with ATTACH examples.

Follow-up 2025-10-08

  • ✅ DuckDB helpers (SQL execution, introspection) integrated via registry and refactored loaders (core/common/database.py, core/imports/registry.py).

  • ✅ Spatial extension loaded via export/transform services after geospatial extractor migration.

  • 🔄 CLI stats and remaining loaders still aligned with sqlite_master: migration to DuckDB adapter in progress, tests/cli/test_stats.py tests to be finalized.