Taxi Lakehouse (Azure)
ADF Copy to ADLS, Databricks Bronze/Silver, and dbt Gold fact fct_taxi_daily with email alerts and CI integration.
Role: Data Engineer•2024
06:00
ADF Schedule
Daily trigger
Bronze/Silver
Layers
Refined ingestion
fct_taxi_daily
Gold
Business fact table
Email
Alerts
On failure
Technology Stack
AzureADFADLS Gen2Databricks/DeltadbtGitHub Actions
Contents
Problem
Deliver a reliable, scheduled taxi pipeline with validated Gold facts and pragmatic ops for failures and notebook deployment.
Architecture
ADF Copy → ADLS Gen2 (raw) → Databricks (Bronze → Silver) → dbt (Gold: fct_taxi_daily)
- ADF: Daily 06:00 trigger; strict/warn validation toggle; robust filename handling
- Databricks: Bronze ingestion; Silver cleansing and feature columns
- dbt: Gold marts with tests and CI; contracts enforced
- Ops: GitHub Actions to import notebooks; failure alerts via email
- Repository README
Results & Impact
- Predictable daily runs with actionable failure notifications
- dbt Gold fact fct_taxi_daily enables downstream BI
- Resilient ingestion handles raw filename variability