Skip to main content

Real-Time Offers Engine — Kafka → Spark → S3 → SQS/Lambda

Built a sub-minute streaming pipeline that enriches card transactions and triggers SMS-style offers; PII-safe data lake in S3.

Role: Data Engineer2024
30-60s
End-to-End Latency
Transaction to notification
10K+/min
Processing Speed
Transactions per minute
99.9%
System Uptime
Fault-tolerant architecture
~$50
Monthly Cost
Demo workload operational cost

Technology Stack

Apache SparkConfluent KafkaAWS S3AWS LambdaAWS SQSPythonReal-time Streaming

Problem

Marketing teams need to send relevant offers within seconds of customer transactions (e.g., 10% off groceries). The system must be reliable, low-cost, and privacy-aware with PII minimization.

Architecture

Built a production-ready streaming architecture:

Kafka (Confluent) → Spark Structured Streaming → S3 (Parquet, partitioned by date) + SQS → Lambda for notifications

Key Features

  • Real-time Processing: 30-60 second end-to-end latency
  • PII Protection: Hashed phone numbers in data lake, raw only in notification path
  • Fault Tolerance: Spark checkpointing for exactly-once processing
  • Dual-Sink Pattern: Parallel writes to analytics (S3) and notifications (SQS)

Implementation

  • Broadcast Joins: Efficient offer matching using broadcast variables
  • Partitioned Storage: Date-based partitioning for optimal query performance
  • Privacy by Design: SHA-256 phone hashing for lake storage
  • Serverless Notifications: SQS + Lambda for scalable SMS delivery

Results & Impact

  • High Throughput: 10K+ transactions per minute processing capability
  • Low Latency: P95 < 60 seconds for real-time offer delivery
  • Cost Efficient: ~$50/month operational costs with serverless architecture
  • Production Ready: Comprehensive error handling and monitoring