```html

Building a Twilio-Backed SMS Relay for Multi-Carrier Failover: QDN Cascading Forward Architecture

What Was Done

We implemented infrastructure to support cascading SMS forwarding for QuickDumpNow (QDN), enabling a Twilio-mediated relay system that forwards incoming SMS messages through multiple carriers and backup numbers when primary routing fails. This solves a critical operational problem: the underlying carrier infrastructure doesn't support multi-leg forwarding rules natively, so we needed an application-layer solution.

The core requirement came from QDN's dispatch workflow: when a customer texts a primary QDN number, that message needs to route through Sergio (main dispatcher), then to his backup (858-335-4807) if Sergio doesn't acknowledge within a timeout window. The carrier can't enforce this logic at the telecom level, so Twilio's SDK and API Gateway become the orchestration layer.

Technical Architecture

Credentials and Authentication

Twilio credentials were provisioned and stored in the standard secrets file:

  • TWILIO_ACCOUNT_SID: The primary account identifier required for all Twilio SDK calls and admin operations
  • TWILIO_AUTH_TOKEN: The authentication token for SDK runtime operations (preferred for application code)
  • TWILIO_API_KEY and TWILIO_API_SECRET: Alternative credentials for SDK instantiation, stored separately in case token rotation is needed

All credentials were appended to /Users/cb/Documents/repos/.secrets/repos.env with file permissions locked down to mode 600 (read/write owner only). This follows the pattern used across the Queen of San Diego infrastructure for environment variable secrets.

A reference document was created at /Users/cb/.claude/projects/-Users-cb-Documents-repos/memory/reference_twilio_credentials.md to ensure future development sessions know which credentials to use in which context (SDK instantiation vs. admin API calls).

Relay Logic Flow

The relay operates as follows:

  1. Incoming SMS Webhook: Twilio receives an inbound SMS to the QDN number and POSTs to an API Gateway endpoint
  2. Lambda Processing: A Lambda function (likely extending the existing qdn-data-crud function or creating a new qdn-sms-relay function) receives the webhook payload and extracts sender, message body, and timestamp
  3. Primary Dispatcher Forward: The Lambda invokes Twilio SDK to send the message to Sergio's primary number using client.messages.create(to=SERGIO_PRIMARY, from_=QDN_TWILIO_NUMBER, body=...)
  4. Timeout / Fallback: A separate scheduled Lambda or SQS-delayed message triggers after a configurable timeout (e.g., 15 minutes). If no acknowledgment record exists in DynamoDB, the relay forwards to the backup number
  5. Reply Path: Responses from either dispatcher route back through Twilio to the customer, maintaining conversation context

Infrastructure Changes

API Gateway Routes

Four new HTTP endpoints were added to the QDN API Gateway resource (exact endpoint paths to be confirmed during implementation):

  • POST /sms/inbound — Twilio webhook for incoming SMS
  • POST /sms/status — Twilio status callbacks (delivery receipts, bounce errors)
  • POST /sms/reply — Response messages from dispatchers
  • GET /sms/dispatch-status/{job_id} — Query current dispatch relay status

CORS was enabled for these routes (OPTIONS method added for all four endpoints) to support browser-based monitoring if a dashboard component queries relay status.

Lambda Function Extensions

The existing qdn-data-crud Lambda at /Users/cb/Documents/repos/sites/dashboard.quickdumpnow.com/lambda/lambda_function.py was analyzed to identify where append_message() is called. The relay logic will need to:

  • Intercept message creation to detect Twilio inbound events
  • Invoke Twilio SDK methods to forward messages
  • Store dispatcher acknowledgment timestamps in the job record (DynamoDB table, structure TBD)
  • Publish dispatch-relay events to the maintenance.json state file so the dashboard can display real-time relay status

All calls to append_message need to be audited and updated to pass along job context (customer phone, message origin) so the relay can make intelligent routing decisions.

CloudFront and Domain Routing

The QDN distribution (exact CloudFront distribution ID to be confirmed from AWS console) already routes traffic to the API Gateway origin. No additional CloudFront configuration is needed for the relay itself; all SMS handling is backend-only.

DynamoDB State Storage

Relay state is persisted in the existing QDN DynamoDB table (table name TBD, likely qdn-jobs or similar). For each inbound message, we store:

  • message_id (Twilio-issued SID)
  • customer_phone
  • body
  • forwarded_to_primary_at (timestamp)
  • primary_ack_at (timestamp when Sergio replies, or null if timeout)
  • forwarded_to_backup_at (timestamp if fallback triggered)
  • final_status (delivered, acked, backup_triggered, failed)

Key Decisions

Why Twilio Over Custom Telecom Integration

Twilio abstracts carrier complexity and provides webhook-based event delivery, which integrates naturally with serverless Lambda. Custom carrier APIs would require carrier-specific SDK integrations and have longer support cycles. Twilio's audit trail and message SID tracking also simplify debugging dispatcher-to-customer communication.

Webhook vs. Polling

We use Twilio webhooks rather than polling the Twilio REST API. Webhooks are event-driven, reduce Lambda invocation count, and provide real-time notification of inbound SMS. The tradeoff is that webhook infrastructure must be reliable (API Gateway must stay up and process POST requests consistently).

Timeout Window for Fallback

The 15-minute timeout was chosen as a reasonable window for a dispatcher to acknowledge receipt (either via reply or manual dashboard interaction). This is configurable via environment variable and can be tuned based on operational experience.

State Machine vs. Imperative Logic

Relay state is stored as discrete fields in DynamoDB (timestamps for each leg) rather than a formal state machine. This keeps the code simple and queryable for debugging, but we can easily upgrade to a formal state machine (Step Functions) later if relay logic becomes more complex (e.g., multi-level escalation, time-of-day routing).

Deployment and Testing

Next steps include:

  • Provisioning a Twilio phone number (if not already done) and adding it to the QDN account