# ADR 0001 — DuckDB Adoption as Primary Analytics Engine *Status: Adopted (2025-10-08)* ## Context - The current pipeline relies on SQLite for imports, with extensive Python logic (pandas `to_sql`, index recalculation, manual validations) and significant limitations (limited DDL, degraded performance on large files, lack of analytical features). - The "Generic Import" refactoring aims for a more flexible engine to create dynamic tables, read massive files, manipulate geometries, and execute recursive CTEs. - DuckDB natively provides `read_csv_auto`, `read_parquet`, a spatial extension, and `CREATE OR REPLACE TABLE` statements that simplify the pipeline. ## Decision We are migrating Niamoto's analytical infrastructure to DuckDB: - New import tables will be created in a DuckDB file (`.duckdb`) defined by the configuration (`config.yml`). - The `niamoto.common.database` module will be extended/adapted to encapsulate DuckDB (SQL execution, introspection, transactions). - Scripts and tests will use DuckDB by default; SQLite will remain only for temporary compatibility (targeted tests) but will no longer be the primary engine. - The CLI and GUI will automatically load the spatial extension when necessary (plots/shapes). ## Consequences ### Positive - "Direct" import with `read_csv_auto` / `CREATE TABLE AS SELECT`, reducing code and execution time. - Native support for recursive CTEs ⇒ simple adjacency list hierarchies. - Handling of modern formats (Parquet, GeoParquet) without additional conversions. - Simplified statistics/profiling generation via SQL. ### Negative / Points of Attention - DuckDB learning curve for the team (DDL syntax, limitations). Snippets will be added to documentation. - Spatial extension must be explicitly loaded (initialization scripts ➜ check presence and document). - Migration of existing environments: provide an `ATTACH …` script to copy old SQLite tables if necessary. - Adjust CI (DuckDB installation, extension) and packaging (`pyproject.toml`). ### Actions 1. Adapt `Database` + unit tests for DuckDB. 2. Document configuration (`docs/07-architecture/README.md`, installation guides). 3. Update scripts (bootstrap, tests) to create/initialize the DuckDB file. 4. Prepare a migration guide (`docs/08-roadmaps/generic-import-ultrathink.md` ➜ migration appendix) with `ATTACH` examples. ## Follow-up 2025-10-08 - ✅ DuckDB helpers (SQL execution, introspection) integrated via registry and refactored loaders (`core/common/database.py`, `core/imports/registry.py`). - ✅ Spatial extension loaded via export/transform services after geospatial extractor migration. - 🔄 CLI stats and remaining loaders still aligned with `sqlite_master`: migration to DuckDB adapter in progress, `tests/cli/test_stats.py` tests to be finalized.