Building a Local SMS Digest Pipeline: Extracting and Aggregating Messages Without Cloud Dependencies
Over the past development session, I built a local SMS aggregation system that extracts message threads from Samsung/Android devices and generates digest emails without relying on Twilio or other cloud SMS services. This post covers the architecture, implementation decisions, and the tooling that made it possible.
The Problem: SMS Data Trapped in Silos
The existing infrastructure relied on Twilio for SMS operations, but not all business lines had credentials configured in the environment. Additionally, there was a need to surface actionable SMS insights from multiple conversations without manual thread-reading. The challenge: extract raw SMS data from local sources, parse conversation threads intelligently, and deliver a structured digest via email.
Architecture Overview
The solution consists of three layers:
- Data Source Layer: Samsung SMS export files and Mac Messages database queries
- Processing Layer: Python script to parse, filter, and summarize conversations
- Delivery Layer: AWS SES for email dispatch
Implementation Details
File Structure and Creation
Two primary files were created and iterated on:
/Users/cb/Documents/repos/tools/samsung_sms_sync.py— Main SMS extraction and digest generation script/Users/cb/Library/LaunchAgents/com.cb.samsung-sms-sync.plist— macOS LaunchAgent configuration for scheduled execution
The Python script handles:
- Reading Samsung SMS export files (typically exported via third-party SMS backup tools)
- Querying the macOS Messages.app SQLite database (
~/Library/Messages/chat.db) - Filtering conversations by date range (recent activity window)
- Extracting and formatting message threads with sender metadata
- Generating human-readable digests for specific contacts
- Sending digests via AWS SES
Message Database Queries
The macOS Messages database approach required understanding the schema:
# Query chat identifiers and message counts
sqlite3 ~/Library/Messages/chat.db \
"SELECT chat_identifier, MAX(date) as last_msg FROM chat_message_join \
GROUP BY chat_identifier ORDER BY last_msg DESC LIMIT 20;"
This identifies recent conversations. Subsequent queries extract message bodies, timestamps, and sender information across the date range of interest (e.g., April 25-29).
Samsung SMS Export Format
Samsung devices export SMS data through third-party applications that produce structured formats (typically CSV or JSON). The script parses these exports by:
- Reading the export file header to identify column structure
- Mapping phone numbers to contact names where available
- Sorting messages chronologically within each conversation thread
- Filtering out system messages and metadata
Key Processing Decisions
Why No Twilio Dependency
The original architecture assumed Twilio credentials would be available in .secrets/repos.env, but they weren't configured for all business lines. Rather than creating new Twilio accounts, the local SMS extraction approach:
- Eliminates API rate limits and quota concerns
- Reduces operational overhead (no credential rotation, no monthly billing)
- Leverages existing device backups as a source of truth
- Works offline — no internet dependency for SMS access
Digest Generation Strategy
Raw message threads are processed into summaries by:
- Contact-specific digests: Extracting all messages from a single contact (e.g., Sergio) to surface key decisions, blockers, and action items
- Time-windowed digests: Filtering recent activity across all conversations to highlight urgent issues
- Sentiment-aware summaries: Flagging emotional tone (frustration, urgency) to help prioritize responses
Each digest is formatted as plain text with clear sections for money/payments, operational blockers, and next steps.
Email Delivery via AWS SES
Digests are sent to c.b.ladd@gmail.com using AWS SES, which requires:
- AWS credentials configured in
~/.aws/credentialsor environment variables - Sender email verified in SES (typically the business email)
- Python
boto3library for SES client initialization
Example command structure (credentials omitted):
import boto3
ses_client = boto3.client('ses', region_name='us-west-2')
response = ses_client.send_email(
Source='sender@example.com',
Destination={'ToAddresses': ['recipient@example.com']},
Message={
'Subject': {'Data': 'SMS Digest: Sergio'},
'Body': {'Text': digest_text}
}
)
Scheduling and Automation
The LaunchAgent plist file (com.cb.samsung-sms-sync.plist) schedules the script to run at regular intervals:
- Label: Unique identifier for the service
- ProgramArguments: Path to Python interpreter and script
- StartInterval: Execution frequency (e.g., every 3600 seconds = hourly)
- StandardOutPath / StandardErrorPath: Log file locations for debugging
The LaunchAgent is loaded via:
launchctl load ~/Library/LaunchAgents/com.cb.samsung-sms-sync.plist
Operational Insights from Session
The SMS digest pipeline surfaced critical business issues:
- Communication gaps: Carole's emails weren't being received; direct SMS follow-up was needed
- Fleet maintenance: Vehicle and trailer issues (compressor failure, electrical problems, brake system leaks) required immediate attention
- Financial reconciliation: Sergio needed payment totals to calculate his percentage and plan equipment purchases
- Logistics optimization: Trailer consolidation at a central warehouse (Otay) was underway to reduce dump run costs
- Business development: A potential 24/7 tire roadside service partnership was being pitched
These insights were surfaced quickly because the digest script filtered conversations by date and contact, making it easy to spot patterns and action items.
What's Next
- Sentiment analysis: Integrate NLP to automatically flag urgent or frustrated messages
- Action item extraction: Use LLM-based summarization to identify explicit requests and blockers
- Multi-source aggregation: Combine SMS, email, and Slack into a unified daily digest
- Conversation threading: Link related SMS threads across multiple contacts to surface complex projects
- Metric tracking: