IBM CDC Databricks ETL pipeline
Bronze → Silver medallion architecture · IBM - CDC.ipynb · Delta Lake
IBM stock API
JSON · daily OHLCV data
Bronze layer
workspace.bronze.ibm · append-only · Delta
Watermark check
MAX(Date) in bronze
skip already-seen rows
Append new records
JSON → tabular
full history retained
new dates only
reads bronze
Silver layer
workspace.silver.ibm · latest-per-Date · Delta
Clean & validate
cast types · drop nulls
de-duplicate on Date
Delta MERGE
key: Date
insert new · update existing
curated rows
requests
pandas · numpy
pyspark · delta