Tampa Rent Signals Data Pipeline
Production-ready data engineering pipeline integrating Zillow, ApartmentList, and FRED data with dbt Core, Great Expectations, and Dagster orchestration on Snowflake.
Role: Data Engineer•2024
15 Assets
Orchestration
Dagster software-defined assets
100+ Rules
Data Quality
Great Expectations validations
12 Checks
Asset Checks
Comprehensive validation pipeline
9 Endpoints
API Endpoints
FastAPI production deployment
Technology Stack
Snowflakedbt CoreDagsterGreat ExpectationsFastAPIPythonAWS S3DockerRender
Contents
Problem
Rental market data is fragmented across multiple sources (Zillow, ApartmentList, FRED), making it difficult to analyze trends, compare markets, and correlate economic indicators. Investors and analysts need a unified, production-grade data platform with historical tracking, data quality guarantees, and API access for real-time insights.
Constraints
- Multiple data sources with varying schemas and formats (wide vs long format)
- Need for historical tracking with slowly changing dimensions (SCD Type 2)
- Comprehensive data quality validation across Bronze, Silver, and Gold layers
- Production-grade orchestration with monitoring and alerting
- RESTful API deployment for external consumption
- Cost-effective cloud data warehouse solution
Architecture
Built a modern data platform implementing Bronze → Silver → Gold medallion architecture with Snowflake:
AWS S3 (Raw CSV) → Snowflake Bronze → dbt Silver (Star Schema) → dbt Gold (Business Marts) → FastAPI (Render)
Data Flow
- Bronze Layer: Raw CSV ingestion from S3 (Zillow ZORI, ApartmentList, FRED CPI)
- Silver Layer: dbt-managed star schema with SCD Type 2 dimensions and fact tables
- Gold Layer: Business-ready mart models with pre-calculated analytics
- Orchestration: Dagster software-defined assets with Great Expectations validation
- API Layer: FastAPI deployment on Render with 9 production endpoints
Star Schema Design
- Dimensions: DIM_LOCATION (SCD Type 2), DIM_TIME, DIM_ECONOMIC_SERIES, DIM_DATA_SOURCE
- Facts: FACT_RENT_ZORI, FACT_RENT_APTLIST, FACT_ECONOMIC_INDICATOR
- Marts: mart_rent_trends, mart_market_rankings, mart_economic_correlation, mart_regional_summary
Implementation
Modern Data Stack
- dbt Core: 15+ models with staging, core, and mart layers; incremental processing for large datasets
- Dagster: 15 software-defined assets with automated scheduling, asset checks, and monitoring
- Great Expectations: 100+ validation rules with layer-specific quality gates
- Snowflake: Cloud data warehouse with clustering and partitioning optimization
- FastAPI: Production API with 9 endpoints for market data, trends, rankings, and analytics
Data Quality Framework
- 100+ Validation Rules: Comprehensive business rule validation across all pipeline layers
- Bronze Validation: Schema validation, null checks, data type verification
- Silver Validation: Business rule enforcement, referential integrity, SCD Type 2 consistency
- Gold Validation: Metric accuracy, trend validation, cross-source consistency
- Operational Monitoring: Data freshness checks, pipeline health, quality score tracking
- Statistical Validation: Outlier detection, range checking, and data profiling
SCD Type 2 Implementation
- dbt Snapshots: Automated historical tracking for changing dimensions
- Effective Dating: EFFECTIVE_DATE, END_DATE, IS_CURRENT flags for time-travel queries
- Surrogate Keys: Immutable keys for fact table relationships
Results & Impact
Orchestration Excellence
- 15 Software-Defined Assets: Complete pipeline managed via Dagster with dependency tracking
- Automated Scheduling: Daily and weekly execution with smart re-computation
- Asset Checks: 12 comprehensive validation checks integrating Great Expectations
- Monitoring Dashboard: Built-in observability via Dagster UI with alerting
Data Quality Achievement
- 100+ Validation Rules: Comprehensive Great Expectations validations across all pipeline layers
- 12 Asset Checks: Dagster asset checks integrating Great Expectations for automated quality gates
- Zero Data Loss: Complete audit trail with data lineage tracking via mart_data_lineage
- Business Rule Enforcement: Automated validation of rent growth limits and CPI ranges
- Cross-Source Consistency: Unified metrics across Zillow, ApartmentList, and FRED
- Statistical Validation: Outlier detection, range checking, and automated data profiling
Historical Tracking
- SCD Type 2 Implementation: Full historical tracking using dbt snapshots
- Time-Travel Queries: Point-in-time analysis with effective dating
- Dimension Evolution: Track changes in location attributes and economic series
API Deployment
- 9 Production Endpoints: Market data, trends, comparisons, price drops, rankings, analytics
- Live Deployment: Hosted on Render with auto-deploy from GitHub
- Interactive Documentation: Swagger UI and ReDoc for API exploration
- Production-Ready: Error handling, input validation, pagination, health monitoring
Technical Excellence
- Medallion Architecture: Industry-standard Bronze → Silver → Gold pattern
- Modern Stack: dbt Core + Dagster + Great Expectations + FastAPI
- Infrastructure as Code: AWS infrastructure with IAM policies and S3 automation
- Comprehensive Documentation: dbt docs, Dagster lineage, API documentation