Real-Time Offers Engine — Kafka → Spark → S3 → SQS/Lambda

Problem

Marketing teams need to send relevant offers within seconds of customer transactions (e.g., 10% off groceries). The system must be reliable, low-cost, and privacy-aware with PII minimization.

Architecture

Built a production-ready streaming architecture:

Kafka (Confluent) → Spark Structured Streaming → S3 (Parquet, partitioned by date) + SQS → Lambda for notifications

Key Features

Real-time Processing: 30-60 second end-to-end latency
PII Protection: Hashed phone numbers in data lake, raw only in notification path
Fault Tolerance: Spark checkpointing for exactly-once processing
Dual-Sink Pattern: Parallel writes to analytics (S3) and notifications (SQS)

Implementation

Broadcast Joins: Efficient offer matching using broadcast variables
Partitioned Storage: Date-based partitioning for optimal query performance
Privacy by Design: SHA-256 phone hashing for lake storage
Serverless Notifications: SQS + Lambda for scalable SMS delivery

Results & Impact

High Throughput: 10K+ transactions per minute processing capability
Low Latency: P95 < 60 seconds for real-time offer delivery
Cost Efficient: ~$50/month operational costs with serverless architecture
Production Ready: Comprehensive error handling and monitoring