Automating GetMyBoat Lead Ingestion with Playwright: Browser Automation for SPA Navigation and Email Pipeline Integration

```html

When manual lead tracking across multiple platforms becomes a bottleneck, browser automation offers a compelling solution. This post documents the approach we took to automate GetMyBoat inbox scraping, parse conversation threads into a structured pipeline, and integrate the results into our warm-lead workflow via email delivery and persistent reporting.

The Problem: Manual GetMyBoat Inbox Review

GetMyBoat is a single-page application (SPA) that requires authentication and presents vessel inquiries in a conversational UI. Historically, this meant manual review of each thread, copy-paste into spreadsheets, and email handoffs. With Carole managing dozens of inbound leads, we needed a repeatable, automated approach that could:

Log in to the GetMyBoat account (carole@sailjada.com) programmatically
Navigate the SPA to the owner inbox without triggering bot detection
Parse conversation threads into structured JSON (thread ID, guest name, message count, pricing data)
Extract JADA pricing references from conversation content
Generate a markdown report and email it to stakeholders
Maintain a persistent browser profile to avoid repeated authentication

Technical Architecture: Playwright + Persistent Profiles

We chose Playwright over Selenium for its superior SPA handling and native support for persistent browser contexts. The stack:

Playwright (browser automation) — Python bindings
Chromium — headless (and headed for debugging)
Gmail API — for report delivery
JSON/Markdown — structured data output

All scripts were written into /tmp/gmb_*.py files during development, with final persistent state stored in the JADA ops directory.

Implementation: Four Key Scripts

`gmb_login.py` — Headed Authentication with Profile Persistence

The first challenge was authentication. GetMyBoat uses a modern login form and client-side session management. Rather than scrape session tokens, we took a "headed login" approach:

python /tmp/gmb_login.py \
  --creds-env GMB_EMAIL,GMB_PASSWORD \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --headed

This script:

Launches a visible Chromium window
Navigates to https://www.getmyboat.com/login
Waits for the email input field and types credentials from environment variables
Submits the form and waits for successful redirect (checks for the inbox URL pattern)
Saves the profile (cookies, local storage, indexedDB) to --profile-dir

The key insight: GetMyBoat's SPA stores session state in Chromium's storage layers, not just HTTP cookies. By persisting the entire profile, subsequent runs can reuse the logged-in state without re-authenticating.

`gmb_inbox.py` — SPA Navigation and Thread Enumeration

With a persistent profile, we can now navigate directly to the inbox:

python /tmp/gmb_inbox.py \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --output /Users/cb/Documents/repos/jada-ops/gmb-inbox-threads.json

This script:

Loads the saved profile (avoiding login)
Navigates to https://www.getmyboat.com/owner/inbox
Waits for the thread list to render (targets the conversation panel container)
Extracts thread metadata: thread ID, guest name, date, message count, vessel name, inquiry price
Returns a JSON array of all visible threads

SPA navigation required careful wait conditions. The inbox uses dynamic content loading, so we explicitly wait for:

page.wait_for_selector('.conversation-list-item') — ensures list renders
page.wait_for_load_state('networkidle') — waits for all XHR requests
Fallback timeout (30 seconds) to catch stalled loads

`gmb_scrape.py` — Message Parsing and JADA Pricing Extraction

Once we have a thread list, we iterate over each thread and extract the full conversation:

python /tmp/gmb_scrape.py \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --threads-json /Users/cb/Documents/repos/jada-ops/gmb-inbox-threads.json \
  --output /Users/cb/Documents/repos/jada-ops/gmb-full-conversations.json

For each thread, we:

Click the thread in the SPA (triggers message panel to load)
Wait for the message list to render
Parse each message: sender, timestamp, body text
Search message bodies for JADA pricing patterns (e.g., "$X,XXX/day", "JADA Charter")
Store structured output: { thread_id, guest_name, messages: [{ sender, timestamp, body, has_jada_pricing }], total_value_mentioned }

The pricing extraction uses regex patterns derived from historical GetMyBoat inquiries:

PRICING_PATTERNS = [
  r'\$[\d,]+(?:/day|/night|/week|/month)',
  r'JADA\s+(?:Charter|Pricing)',
  r'(?:interested|quote|booking)\s+.*\$'
]

`gmb_watch.py` — Real-Time Monitoring and Report Generation

The final script ties everything together:

python /tmp/gmb_watch.py \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --output-dir /Users/cb/Documents/repos/jada-ops/reports \
  --email-to c.b.ladd@gmail.com \
  --interval 3600

This script:

Runs gmb_inbox.py and gmb_scrape.py in sequence
Generates a markdown report with pipeline summary: thread count, total inquiries, hot leads (threads mentioning JADA pricing)
Uses the Gmail API to send the report to stakeholders
Logs all actions to /Users/cb/Documents/repos/jada-ops/gmb-watch.log
Loops on a configurable interval (default: hourly)