Building a Local SMS Digest Pipeline: Bridging Android SMS Export to Email Without Twilio
What Was Done
Over this development session, I built a local SMS digestion pipeline that extracts SMS messages from Android devices (via Samsung SMS exports), parses conversation threads, and generates email digests without requiring Twilio credentials or cloud SMS infrastructure. The system reads exported SMS data, identifies recent conversations by date ranges, groups messages by sender, and pipes formatted digests through AWS SES for delivery.
This approach eliminates external SMS service dependencies while maintaining a clean separation between data extraction, processing, and delivery layers.
Technical Details
Core Infrastructure
The SMS pipeline consists of three main components:
- SMS Export Source: Android SMS backup files (Samsung SMS format) stored locally in the conversation database
- Processing Layer: Python script that parses message timestamps, groups by phone number, and filters by date range
- Delivery Layer: AWS SES for email dispatch using existing IAM credentials
The entry point is /Users/cb/Documents/repos/tools/samsung_sms_sync.py, which was created to handle the specific format exported by Samsung devices.
Data Flow
The processing pipeline follows this sequence:
- Read SMS export file containing all message records with timestamps, sender phone numbers, and message bodies
- Parse timestamp format (typically milliseconds since epoch or ISO 8601 depending on export format)
- Filter messages within a configurable date range (e.g., April 25–29)
- Group messages by phone number to identify conversation threads
- Sort conversations by most recent activity to surface hot items
- Format grouped messages with conversation headers and metadata
- Compose digest email with SES-compatible headers
- Send via
boto3SES client using regional endpoint
This design allows for quick iteration on what messages surface in digests without modifying the extraction logic.
File Organization
Supporting the SMS pipeline:
/Users/cb/Documents/repos/tools/samsung_sms_sync.py— Main script for export parsing and digest generation/Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist— macOS LaunchAgent for periodic scheduling~/.secrets/repos.env— Environment configuration (AWS region, SES sender address, recipient email)
The LaunchAgent plist allows the script to run on a schedule without manual invocation, following the standard macOS daemon pattern.
Key Technical Decisions
Why Not Twilio?
The original context suggested using Twilio for SMS infrastructure. However, this approach has several drawbacks for local-first workflows:
- Credential Management: Twilio SID and auth tokens represent another secret to rotate and protect
- Rate Limiting: Twilio API has request throttling that complicates batch operations
- Cost: Per-message charges add up for large conversation histories
- Latency: Network round-trip to Twilio adds 200–500ms per operation
By working directly with local SMS exports, we eliminate the external dependency while maintaining the same functionality for read-only digest operations.
Data Format Choice
Samsung SMS exports use a structured format that includes:
- Unix timestamp in milliseconds
- Phone number (normalized to E.164 format when possible)
- Message body (text)
- Direction (incoming vs. outgoing)
- Read status and other metadata
The parser normalizes phone numbers to +1XXXXXXXXXX format for consistent grouping across conversations. This handles variations like (530) 262-3442, 530-262-3442, and +15302623442 as the same contact.
Email Delivery via SES
Rather than printing to stdout or saving to files, the pipeline uses AWS SES because:
- Async delivery — script completes quickly regardless of mail server latency
- Existing IAM setup — no additional authentication layer needed
- Scalability — SES handles queue management automatically
- Logging — SES bounce/complaint tracking helps identify bad addresses
The SES configuration references the AWS region and sender identity via environment variables, allowing the same code to work across different deployments.
Implementation Details
Timestamp Parsing
Samsung exports timestamps in milliseconds since Unix epoch. The parser converts these to Python datetime objects for range filtering:
from datetime import datetime, timezone
timestamp_ms = 1714123456789
timestamp_s = timestamp_ms / 1000.0
dt = datetime.fromtimestamp(timestamp_s, tz=timezone.utc)
This approach is timezone-aware and handles daylight savings transitions correctly.
Phone Number Normalization
The script normalizes incoming phone numbers using a regex that strips formatting characters:
import re
def normalize_phone(raw_number):
digits = re.sub(r'\D', '', raw_number)
if len(digits) == 10:
digits = '1' + digits # Add US country code
return '+' + digits if digits else None
This ensures that messages from the same contact (even if stored with different formatting) are grouped together.
Environment Configuration
The script reads AWS and email configuration from ~/.secrets/repos.env:
AWS_REGION=us-west-2
SES_SENDER=operations@queenofsandiego.com
DIGEST_RECIPIENT=c.b.ladd@gmail.com
SMS_EXPORT_PATH=/path/to/sms/export.txt
Using environment variables rather than hardcoding allows the same script to work across development and production environments.
Scheduling with LaunchAgent
The plist configuration in /Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist schedules the digest to run daily:
<key>StartInterval</key>— Runs every 86400 seconds (24 hours)<key>StandardErrorPath</key>— Logs errors to a local file for debugging<key>EnvironmentVariables</key>— Passes AWS credentials and configuration to the script
The LaunchAgent approach is preferred over cron because it integrates with macOS system logging and handles process restarts on failure.
What's Next
Future enhancements to this pipeline could include:
- Conversation Summary: Use Claude or GPT to generate AI summaries of long threads, highlighting action items and decisions