Data Pipeline To-Do List

Python + PostgreSQL · Raw → Bronze → Gold

0 / 0
1
Environment & Project Setup
0/0
Test: Import all libraries, load .env, print config values — confirm no errors.
2
Source Extraction
0/0
Test: Run extract(), print first 5 rows and row count, confirm _extracted_at column exists.
3
Raw / History Landing Layer
0/0
Test: Run load_raw(), query raw table, confirm row count grows on each run (history retained).
4
Bronze Layer (Cleansing & Transformation)
0/0
Test: Run transform_bronze(), query bronze table, check for nulls, duplicates, and correct types.
5
Data Quality Checks
0/0
Test: Introduce a bad row manually, run checks, confirm the check catches and logs it.
6
Gold Layer (Aggregation & Business Logic)
0/0
Test: Run transform_gold(), query gold table, manually verify one aggregation against bronze source.
7
Orchestration & Logging
0/0
Test: Run main() end-to-end, review pipeline.log, confirm all phases logged with row counts.
8
Scheduling & Final Verification
0/0
Test: Run pipeline twice back-to-back, query SELECT COUNT(*) FROM raw — count must increase each run.
Project Specifications