Acquisition Wave Retrain and Evaluation (2026-03-21)¶

Status: Experiment Audience: Team, AI agents Purpose: Measure the impact of the SINP 1A + ETS + sPlotOpen acquisition wave after a full rebuild and retrain

Context¶

This run measures the effect of the acquisition wave on the ML detection stack after:

rebuilding the gold set
retraining the header, values, and fusion branches
rerunning both the internal benchmark and the real-dataset suite

Data integrated before retrain¶

Added to the gold set¶

TAXREF v18 (ml/data/silver/taxref/TAXREFv18.txt)
ETS Occurrence_ext.csv
ETS Taxon_ext.csv
ETS Measurement_or_Fact_ext.csv
sPlotOpen_header(3).txt
sPlotOpen_DT(2).txt
sPlotOpen_CWM_CWV(2).txt
sPlotOpen_metadata(2).txt

Added at runtime only¶

sinp: alias block in column_aliases.yaml
ets: alias block in column_aliases.yaml
splot: alias block in column_aliases.yaml

Explicitly left out¶

OpenObs / SINP: source unavailable
species_trait_data.csv: too specialized / semantically fragile
PREDICTS: kept outside the critical path

Resulting gold set¶

Metric	Value
Labelled columns	2540
Coarse concepts	61
Added sources	`taxref_v18`, `ets_`, `splot_`

Visible contribution by source:

Source	Columns
`taxref_v18`	17
`ets_occurrence_ext`	9
`ets_taxon_ext`	17
`ets_measurement_ext`	4
`splot_header`	33
`splot_dt`	6
`splot_cwm`	40
`splot_metadata`	15

Commands run¶

uv run python -m ml.scripts.data.build_gold_set
uv run python -m ml.scripts.train.train_header_model
uv run python -m ml.scripts.train.train_value_model
uv run python -m ml.scripts.train.train_fusion
uv run python -m ml.scripts.eval.run_eval_suite

Training results¶

Model	Cross-val macro-F1	Note
Header	0.753	strongest branch
Values	0.378	useful signal, still weak on generalization
Fusion	0.639	stronger than `values`, still below `header` alone

Warnings¶

header and fusion triggered ConvergenceWarning messages during retraining. The models still trained and were saved successfully, but max_iter and/or the solver deserved a later adjustment.

Evaluation results¶

Output file:

ml/data/eval/results/20260321_194036.json

Internal benchmarks after retrain¶

Benchmark	Value
ProductScore	80.8392
GlobalScore / NiamotoOfflineScore	82.764

Detailed ProductScore:

Bucket	Score
`gbif_core_standard`	98.511
`gbif_extended`	91.018
`en_field`	82.672
`tropical_field`	75.093
`research_traits`	71.621
`anonymous`	63.634

Interpretation:

the historical holdout metrics remain broadly solid
they are consistent with the annotated dataset suite
anonymous is now the clearest penalizing bucket
coded headers and inventory-style exports remain the main generalization ceiling

Full eval suite (9 datasets, 478 columns)¶

Dataset	Cols	Role %	Concept %
`niamoto-nc`	57	96.5	91.2
`niamoto-gb`	27	100.0	100.0
`guyadiv`	61	83.6	83.6
`gbif-nc`	51	94.1	90.2
`gbif-gabon`	45	91.1	88.9
`gbif-inst-gabon`	41	90.2	87.8
`silver`	136	89.0	77.2
`acceptance-niamoto-gb`	27	100.0	100.0
`acceptance-fia-or`	33	75.8	63.6
TOTAL	478	90.4	84.7

Product-oriented aggregate view (7 datasets, 418 columns)¶

Aggregate	Role %	Concept %
Tier 1 + Tier 1b + Silver	90.9	85.4

Interpretation¶

What worked well¶

niamoto-gb stayed at 100%
niamoto-nc rose to 91.2%
guyadiv rose to 83.6%
the three GBIF datasets closest to the product all landed between 87.8% and 90.2%

The acquisition wave clearly improved the core product datasets and nearby standardized cases.

What remains weak¶

The frozen out-of-train benchmark remains dominated by acceptance-fia-or:

Dataset	Concept %
`acceptance-niamoto-gb`	100.0
`acceptance-fia-or`	63.6

Main remaining errors:

measurement.biomass -> measurement.volume
identifier.taxon -> taxonomy.species
category.habitat -> (not found)
FIA-coded headers such as SPCD, CR, VOLCFNET, VOLBFNET
GBIF taxonomic keys such as acceptedTaxonKey, speciesKey, genericName, infraspecificEpithet, scientificNameAuthorship

Conclusion¶

This acquisition wave was worth it.

Yes, it improved the Niamoto core datasets
Yes, it stabilized real GBIF-like cases
No, it did not solve coded inventory generalization yet

The next logical step was not another acquisition wave, but a targeted correction pass on:

the remaining GBIF taxonomic key columns
the FIA-coded columns
measurement.biomass, identifier.taxon, category.habitat, text.metadata