Ab Initio Data Quality 〈DELUXE – 2026〉
Replace NULL with explicit semantics. Use -999 for "offline," -9999 for "out of range," or better—split the column into value and value_metadata_flag . 3. The Referential Integrity Illusion Modern data lakes love "schema on read." This is the enemy of ab initio . You are essentially saying, “Let’s store the garbage, and we’ll figure out what kind of garbage it is later.”
We have it backwards.
Use tools like pydantic (Python), Great Expectations (with expect_column_values_to_not_be_null set to fatal ), or dbt 's constraints (enforced, not just documented). If the contract fails, the pipe breaks. Loudly. ab initio data quality
Stop cleaning the swamp. Stop building the bridge. Stop the garbage at the gate. Replace NULL with explicit semantics
Change is allowed. Silent change is not. Your first principle is: Schema version is part of the data identifier. events_v2.parquet is a different entity than events_v1.parquet . Never mutate; deprecate. The Referential Integrity Illusion Modern data lakes love