Skip to main content

Taxi Lakehouse (Azure)

ADF Copy to ADLS, Databricks Bronze/Silver, and dbt Gold fact fct_taxi_daily with email alerts and CI integration.

Role: Data Engineer2024
06:00
ADF Schedule
Daily trigger
Bronze/Silver
Layers
Refined ingestion
fct_taxi_daily
Gold
Business fact table
Email
Alerts
On failure

Technology Stack

AzureADFADLS Gen2Databricks/DeltadbtGitHub Actions

Problem

Deliver a reliable, scheduled taxi pipeline with validated Gold facts and pragmatic ops for failures and notebook deployment.

Architecture

ADF Copy → ADLS Gen2 (raw) → Databricks (Bronze → Silver) → dbt (Gold: fct_taxi_daily)
  • ADF: Daily 06:00 trigger; strict/warn validation toggle; robust filename handling
  • Databricks: Bronze ingestion; Silver cleansing and feature columns
  • dbt: Gold marts with tests and CI; contracts enforced
  • Ops: GitHub Actions to import notebooks; failure alerts via email
  • Repository README

Results & Impact

  • Predictable daily runs with actionable failure notifications
  • dbt Gold fact fct_taxi_daily enables downstream BI
  • Resilient ingestion handles raw filename variability