Building a Local SMS Digest Pipeline: Bridging Android SMS Export to Email Without Twilio

What Was Done

Over this development session, I built a local SMS digestion pipeline that extracts SMS messages from Android devices (via Samsung SMS exports), parses conversation threads, and generates email digests without requiring Twilio credentials or cloud SMS infrastructure. The system reads exported SMS data, identifies recent conversations by date ranges, groups messages by sender, and pipes formatted digests through AWS SES for delivery.

This approach eliminates external SMS service dependencies while maintaining a clean separation between data extraction, processing, and delivery layers.

Technical Details

Core Infrastructure

The SMS pipeline consists of three main components:

  • SMS Export Source: Android SMS backup files (Samsung SMS format) stored locally in the conversation database
  • Processing Layer: Python script that parses message timestamps, groups by phone number, and filters by date range
  • Delivery Layer: AWS SES for email dispatch using existing IAM credentials

The entry point is /Users/cb/Documents/repos/tools/samsung_sms_sync.py, which was created to handle the specific format exported by Samsung devices.

Data Flow

The processing pipeline follows this sequence:

  1. Read SMS export file containing all message records with timestamps, sender phone numbers, and message bodies
  2. Parse timestamp format (typically milliseconds since epoch or ISO 8601 depending on export format)
  3. Filter messages within a configurable date range (e.g., April 25–29)
  4. Group messages by phone number to identify conversation threads
  5. Sort conversations by most recent activity to surface hot items
  6. Format grouped messages with conversation headers and metadata
  7. Compose digest email with SES-compatible headers
  8. Send via boto3 SES client using regional endpoint

This design allows for quick iteration on what messages surface in digests without modifying the extraction logic.

File Organization

Supporting the SMS pipeline:

  • /Users/cb/Documents/repos/tools/samsung_sms_sync.py — Main script for export parsing and digest generation
  • /Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist — macOS LaunchAgent for periodic scheduling
  • ~/.secrets/repos.env — Environment configuration (AWS region, SES sender address, recipient email)

The LaunchAgent plist allows the script to run on a schedule without manual invocation, following the standard macOS daemon pattern.

Key Technical Decisions

Why Not Twilio?

The original context suggested using Twilio for SMS infrastructure. However, this approach has several drawbacks for local-first workflows:

  • Credential Management: Twilio SID and auth tokens represent another secret to rotate and protect
  • Rate Limiting: Twilio API has request throttling that complicates batch operations
  • Cost: Per-message charges add up for large conversation histories
  • Latency: Network round-trip to Twilio adds 200–500ms per operation

By working directly with local SMS exports, we eliminate the external dependency while maintaining the same functionality for read-only digest operations.

Data Format Choice

Samsung SMS exports use a structured format that includes:

  • Unix timestamp in milliseconds
  • Phone number (normalized to E.164 format when possible)
  • Message body (text)
  • Direction (incoming vs. outgoing)
  • Read status and other metadata

The parser normalizes phone numbers to +1XXXXXXXXXX format for consistent grouping across conversations. This handles variations like (530) 262-3442, 530-262-3442, and +15302623442 as the same contact.

Email Delivery via SES

Rather than printing to stdout or saving to files, the pipeline uses AWS SES because:

  • Async delivery — script completes quickly regardless of mail server latency
  • Existing IAM setup — no additional authentication layer needed
  • Scalability — SES handles queue management automatically
  • Logging — SES bounce/complaint tracking helps identify bad addresses

The SES configuration references the AWS region and sender identity via environment variables, allowing the same code to work across different deployments.

Implementation Details

Timestamp Parsing

Samsung exports timestamps in milliseconds since Unix epoch. The parser converts these to Python datetime objects for range filtering:


from datetime import datetime, timezone

timestamp_ms = 1714123456789
timestamp_s = timestamp_ms / 1000.0
dt = datetime.fromtimestamp(timestamp_s, tz=timezone.utc)

This approach is timezone-aware and handles daylight savings transitions correctly.

Phone Number Normalization

The script normalizes incoming phone numbers using a regex that strips formatting characters:


import re

def normalize_phone(raw_number):
    digits = re.sub(r'\D', '', raw_number)
    if len(digits) == 10:
        digits = '1' + digits  # Add US country code
    return '+' + digits if digits else None

This ensures that messages from the same contact (even if stored with different formatting) are grouped together.

Environment Configuration

The script reads AWS and email configuration from ~/.secrets/repos.env:


AWS_REGION=us-west-2
SES_SENDER=operations@queenofsandiego.com
DIGEST_RECIPIENT=c.b.ladd@gmail.com
SMS_EXPORT_PATH=/path/to/sms/export.txt

Using environment variables rather than hardcoding allows the same script to work across development and production environments.

Scheduling with LaunchAgent

The plist configuration in /Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist schedules the digest to run daily:

  • <key>StartInterval</key> — Runs every 86400 seconds (24 hours)
  • <key>StandardErrorPath</key> — Logs errors to a local file for debugging
  • <key>EnvironmentVariables</key> — Passes AWS credentials and configuration to the script

The LaunchAgent approach is preferred over cron because it integrates with macOS system logging and handles process restarts on failure.

What's Next

Future enhancements to this pipeline could include:

  • Conversation Summary: Use Claude or GPT to generate AI summaries of long threads, highlighting action items and decisions