Real-Time Offers Engine — Kafka → Spark → S3 → SQS/Lambda
Built a sub-minute streaming pipeline that enriches card transactions and triggers SMS-style offers; PII-safe data lake in S3.
Role: Data Engineer•2024
30-60s
End-to-End Latency
Transaction to notification
10K+/min
Processing Speed
Transactions per minute
99.9%
System Uptime
Fault-tolerant architecture
~$50
Monthly Cost
Demo workload operational cost
Technology Stack
Apache SparkConfluent KafkaAWS S3AWS LambdaAWS SQSPythonReal-time Streaming
Contents
Problem
Marketing teams need to send relevant offers within seconds of customer transactions (e.g., 10% off groceries). The system must be reliable, low-cost, and privacy-aware with PII minimization.
Architecture
Built a production-ready streaming architecture:
Kafka (Confluent) → Spark Structured Streaming → S3 (Parquet, partitioned by date) + SQS → Lambda for notifications
Key Features
- Real-time Processing: 30-60 second end-to-end latency
- PII Protection: Hashed phone numbers in data lake, raw only in notification path
- Fault Tolerance: Spark checkpointing for exactly-once processing
- Dual-Sink Pattern: Parallel writes to analytics (S3) and notifications (SQS)
Implementation
- Broadcast Joins: Efficient offer matching using broadcast variables
- Partitioned Storage: Date-based partitioning for optimal query performance
- Privacy by Design: SHA-256 phone hashing for lake storage
- Serverless Notifications: SQS + Lambda for scalable SMS delivery
Results & Impact
- High Throughput: 10K+ transactions per minute processing capability
- Low Latency: P95 < 60 seconds for real-time offer delivery
- Cost Efficient: ~$50/month operational costs with serverless architecture
- Production Ready: Comprehensive error handling and monitoring