Building a Secure Local SMS Ingestion Pipeline: MacOS Messages Database Integration for Business Operations

This post documents the design and implementation of a local SMS ingestion system that bridges Apple Messages database with a business operations workflow, specifically for managing incoming text messages to a business line (619-986-7344) without exposing personal communications.

The Problem: Controlled Access to Business SMS

A critical operational need emerged: read incoming SMS messages to a dedicated business line into a structured system for review and action, while maintaining strict privacy boundaries around personal text conversations on the same device. The challenge wasn't just technical—it was architectural: how do you selectively surface business communications without accidentally exposing personal data?

The constraints were clear:

  • SMS arrives on a shared iPhone tied to multiple numbers (personal: 773-941-2265 and business: 619-986-7344)
  • Must read from local storage only—no carrier APIs, no cloud sync, no monthly charges
  • Must be queryable by an LLM agent when explicitly directed
  • Must have explicit guardrails to prevent reading personal messages
  • Must integrate with existing booking/calendar workflows

Technical Architecture: Messages Database + Shell Script Pipeline

MacOS stores all SMS and iMessage data in a SQLite database located at:

~/Library/Messages/chat.db

This database contains three key tables for our use case:

  • chat — conversation metadata (phone numbers, service type)
  • message — individual messages with timestamps and content
  • chat_message_join — relationship table linking messages to conversations

The database schema includes fields that let us filter by:

  • chat.chat_identifier — the phone number in E.164 format or iCloud handle
  • chat.service_provider — 'SMS' for text messages vs 'iMessage'
  • message.date — timestamp (in MacOS Cocoa epoch: seconds since 2001-01-01)
  • message.text — message content
  • message.is_from_me — boolean flag for sent vs received

Implementation: The read-sms Script

A shell script was created at /Users/cb/bin/read-sms to query the Messages database and export business line SMS in a format suitable for agent consumption:

#!/bin/bash
# read-sms: Query MacOS Messages database for 619-986-7344 conversations
# Filters SMS (not iMessage) to the JADA business line only

MESSAGES_DB=~/Library/Messages/chat.db
OUTPUT_FILE=~/Desktop/jada_sms_export.txt

sqlite3 "$MESSAGES_DB" << 'EOF'
.mode csv
SELECT 
  datetime(m.date/1000000000 + 978307200, 'unixepoch', 'localtime') as timestamp,
  c.chat_identifier,
  m.text,
  CASE WHEN m.is_from_me = 1 THEN 'SENT' ELSE 'RECEIVED' END as direction
FROM message m
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
JOIN chat c ON cmj.chat_id = c.ROWID
WHERE c.chat_identifier LIKE '619986734%' 
  AND c.service_provider = 'SMS'
ORDER BY m.date DESC;
EOF

Key design decisions in this script:

  • Explicit phone number matching: The WHERE clause uses LIKE '619986734%' to match the business line in multiple formats (with/without country code, formatting variations). This pattern is narrow enough to catch variants but wide enough to prevent accidental wildcard matching.
  • Service provider filtering: service_provider = 'SMS' ensures we only capture SMS text messages, not iMessage conversations which might be intermingled with the same contact.
  • Timestamp conversion: MacOS stores message dates as microseconds since 2001-01-01 (Cocoa epoch), not Unix epoch. The formula m.date/1000000000 + 978307200 converts to Unix timestamp, then to human-readable local time.
  • Direction flagging: The is_from_me boolean is rendered as 'SENT' or 'RECEIVED' for clarity in exported files.
  • Reverse chronological order: Most recent messages first, supporting the workflow pattern where agents review today's messages immediately.

Privacy Guardrails and Integration Points

Two supporting documents enforce the privacy boundary:

  • /Users/cb/.claude/projects/.../memory/feedback_personal_number_privacy.md — An explicit instruction file stating: "Do NOT read SMS to 773-941-2265 under any circumstances. This is a personal line. The business line is 619-986-7344 only."
  • /Users/cb/.claude/projects/.../memory/MEMORY.md — Updated to include the context: "read-sms script exists at ~/bin/read-sms. It queries Messages.db for 619-986-7344 conversations only. It is safe to run when CB directs SMS review."

This pattern leverages the agent's context memory to prevent accidental privacy violations. The agent reads these files before taking substantive action and has explicit awareness of what is and isn't safe to query.

Integration with Operational Workflows

The SMS pipeline feeds into existing systems:

  • Booking verification: When checking if charter guests have paid (as happened on Apr 29 for the ash scattering event), the agent can cross-reference SMS conversations with Stripe transaction records and JADA booking sheets.
  • Calendar alignment: The script's reverse-chronological output pairs naturally with queries like "get today's JADA calendar events via Lambda" — SMS can be reviewed for the same day's activities.
  • Contact routing: By querying chat_identifier, we can later expand to handle multiple business lines (e.g., separate dispatch, customer service numbers) without creating separate databases.

What's Next

Current implementation is functional but deliberately minimal. Potential enhancements:

  • Automated daily export: A LaunchAgent could run read-sms on schedule and write to a dated file (e.g., ~/sms-archive/2024-04-29.txt), creating a searchable archive.
  • Structured JSON output: Replacing CSV with JSON would make it easier for agents to parse metadata (sender, timestamp) separately from message content.
  • Attachment detection: The current query doesn't capture MMS attachments. Extending it to flag messages with media would require joining the attachment table and handling binary data.
  • Multi-number support: If other business lines are added, parameterizing the phone number would let a single script serve multiple lines with different privacy rules.

The core principle driving this design: minimal, auditable, local-first infrastructure with explicit privacy boundaries. No cloud sync, no API keys, no monthly charges. Just SQLite, bash, and clear rules about what can and cannot be accessed.