```html

Automating GetMyBoat Lead Ingestion with Playwright: Browser Automation for SPA Navigation and Email Pipeline Integration

When manual lead tracking across multiple platforms becomes a bottleneck, browser automation offers a compelling solution. This post documents the approach we took to automate GetMyBoat inbox scraping, parse conversation threads into a structured pipeline, and integrate the results into our warm-lead workflow via email delivery and persistent reporting.

The Problem: Manual GetMyBoat Inbox Review

GetMyBoat is a single-page application (SPA) that requires authentication and presents vessel inquiries in a conversational UI. Historically, this meant manual review of each thread, copy-paste into spreadsheets, and email handoffs. With Carole managing dozens of inbound leads, we needed a repeatable, automated approach that could:

  • Log in to the GetMyBoat account (carole@sailjada.com) programmatically
  • Navigate the SPA to the owner inbox without triggering bot detection
  • Parse conversation threads into structured JSON (thread ID, guest name, message count, pricing data)
  • Extract JADA pricing references from conversation content
  • Generate a markdown report and email it to stakeholders
  • Maintain a persistent browser profile to avoid repeated authentication

Technical Architecture: Playwright + Persistent Profiles

We chose Playwright over Selenium for its superior SPA handling and native support for persistent browser contexts. The stack:

  • Playwright (browser automation) — Python bindings
  • Chromium — headless (and headed for debugging)
  • Gmail API — for report delivery
  • JSON/Markdown — structured data output

All scripts were written into /tmp/gmb_*.py files during development, with final persistent state stored in the JADA ops directory.

Implementation: Four Key Scripts

gmb_login.py — Headed Authentication with Profile Persistence

The first challenge was authentication. GetMyBoat uses a modern login form and client-side session management. Rather than scrape session tokens, we took a "headed login" approach:

python /tmp/gmb_login.py \
  --creds-env GMB_EMAIL,GMB_PASSWORD \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --headed

This script:

  • Launches a visible Chromium window
  • Navigates to https://www.getmyboat.com/login
  • Waits for the email input field and types credentials from environment variables
  • Submits the form and waits for successful redirect (checks for the inbox URL pattern)
  • Saves the profile (cookies, local storage, indexedDB) to --profile-dir

The key insight: GetMyBoat's SPA stores session state in Chromium's storage layers, not just HTTP cookies. By persisting the entire profile, subsequent runs can reuse the logged-in state without re-authenticating.

gmb_inbox.py — SPA Navigation and Thread Enumeration

With a persistent profile, we can now navigate directly to the inbox:

python /tmp/gmb_inbox.py \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --output /Users/cb/Documents/repos/jada-ops/gmb-inbox-threads.json

This script:

  • Loads the saved profile (avoiding login)
  • Navigates to https://www.getmyboat.com/owner/inbox
  • Waits for the thread list to render (targets the conversation panel container)
  • Extracts thread metadata: thread ID, guest name, date, message count, vessel name, inquiry price
  • Returns a JSON array of all visible threads

SPA navigation required careful wait conditions. The inbox uses dynamic content loading, so we explicitly wait for:

  • page.wait_for_selector('.conversation-list-item') — ensures list renders
  • page.wait_for_load_state('networkidle') — waits for all XHR requests
  • Fallback timeout (30 seconds) to catch stalled loads

gmb_scrape.py — Message Parsing and JADA Pricing Extraction

Once we have a thread list, we iterate over each thread and extract the full conversation:

python /tmp/gmb_scrape.py \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --threads-json /Users/cb/Documents/repos/jada-ops/gmb-inbox-threads.json \
  --output /Users/cb/Documents/repos/jada-ops/gmb-full-conversations.json

For each thread, we:

  • Click the thread in the SPA (triggers message panel to load)
  • Wait for the message list to render
  • Parse each message: sender, timestamp, body text
  • Search message bodies for JADA pricing patterns (e.g., "$X,XXX/day", "JADA Charter")
  • Store structured output: { thread_id, guest_name, messages: [{ sender, timestamp, body, has_jada_pricing }], total_value_mentioned }

The pricing extraction uses regex patterns derived from historical GetMyBoat inquiries:

PRICING_PATTERNS = [
  r'\$[\d,]+(?:/day|/night|/week|/month)',
  r'JADA\s+(?:Charter|Pricing)',
  r'(?:interested|quote|booking)\s+.*\$'
]

gmb_watch.py — Real-Time Monitoring and Report Generation

The final script ties everything together:

python /tmp/gmb_watch.py \
  --profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
  --output-dir /Users/cb/Documents/repos/jada-ops/reports \
  --email-to c.b.ladd@gmail.com \
  --interval 3600

This script:

  • Runs gmb_inbox.py and gmb_scrape.py in sequence
  • Generates a markdown report with pipeline summary: thread count, total inquiries, hot leads (threads mentioning JADA pricing)
  • Uses the Gmail API to send the report to stakeholders
  • Logs all actions to /Users/cb/Documents/repos/jada-ops/gmb-watch.log
  • Loops on a configurable interval (default: hourly)