System Architecture
Sim-Pesa is designed as a distributed, event-driven system to accurately simulate the asynchronous nature of mobile money payments.
1. System Components
1.1 Ingestion API (Express)
The gateway for all external requests. It mimics the Daraja API surface.
- Responsibility: Request validation, auth token issuance, initial transaction recording, and task enqueuing.
- Statelessness: The API does not process business logic; it merely offloads tasks to the queue.
1.2 Background Worker (BullMQ)
The engine of the simulator.
- Responsibility: Manages the transaction state machine, performs balance updates, and handles webhook dispatches.
- Concurrency: Uses row-level locking (
SELECT ... FOR UPDATE) in PostgreSQL to ensure data consistency during balance updates.
1.3 Signaling Layer (Redis)
Since the worker needs to wait for user interaction (PIN entry), it uses Redis Pub/Sub for low-latency signaling.
- Flow:
- Worker subscribes to
pin:<checkout_id>. - UI sends PIN via API.
- API publishes the result to the Redis channel.
- Worker receives the signal and resumes processing.
- Worker subscribes to
1.4 Persistent Storage (PostgreSQL)
- Merchants: Registered entities with
short_codeandcallback_url. - Users: Simulated customers with
phone_number,pin, andbalance. - Transactions: Audit log and state tracker for every request.
- Webhooks: Tracks every dispatch attempt for observability.
2. Event Flow: STK Push Lifecycle
The following sequence describes a successful STK Push:
- Initiation:
POST /stkpush/v1/processrequest- API generates a
checkout_id. - API saves transaction as
PENDING. - API enqueues
stk-push-requestjob. - API returns acknowledgement to client.
- API generates a
- Pickup: Worker picks up the job.
- Worker transitions status to
PROCESSING. - Worker validates that the merchant and user exist.
- Worker locks the user row to ensure balance integrity.
- Worker enters a wait state (
waitForPin) for 15 minutes (default).
- Worker transitions status to
- Simulation: User interaction via Dashboard.
- User sees the transaction on the "Virtual Phone".
- User enters the PIN and submits.
- UI calls
POST /stkpush/pin/:checkout_id. - API validates PIN against the DB and publishes
CORRECTto Redis.
- Completion: Worker receives
CORRECT.- Worker deducts the amount from the User's balance.
- Worker updates transaction status to
SUCCESS. - Worker enqueues a
send-webhookjob.
- Notification: Webhook Worker picks up the job.
- Worker POSTs the callback payload to the Merchant's
callback_url. - If it fails, BullMQ retries with exponential backoff.
- Worker POSTs the callback payload to the Merchant's
3. Data Consistency Approach
- Idempotency: The API uses a Redis-based lock on the combination of
BusinessShortCode,Amount, andPhoneNumberto prevent accidental duplicate submissions within a short window. - Transactional Integrity: All balance updates and status transitions are wrapped in PostgreSQL transactions.
- Queue Reliability: BullMQ ensures that jobs are not lost even if a worker crashes. Jobs are moved to "failed" and can be retried.
4. Webhook Retry Strategy
- Attempts: 5
- Backoff: Exponential (
delay * 2 ^ (attempt - 1)) - Initial Delay: 2000ms
- Failure Visibility: Failed webhooks are logged in the
webhook_attemptstable for debugging.