Automating GetMyBoat Lead Ingestion with Playwright: Browser Automation for SPA Navigation and Email Pipeline Integration
When manual lead tracking across multiple platforms becomes a bottleneck, browser automation offers a compelling solution. This post documents the approach we took to automate GetMyBoat inbox scraping, parse conversation threads into a structured pipeline, and integrate the results into our warm-lead workflow via email delivery and persistent reporting.
The Problem: Manual GetMyBoat Inbox Review
GetMyBoat is a single-page application (SPA) that requires authentication and presents vessel inquiries in a conversational UI. Historically, this meant manual review of each thread, copy-paste into spreadsheets, and email handoffs. With Carole managing dozens of inbound leads, we needed a repeatable, automated approach that could:
- Log in to the GetMyBoat account (
carole@sailjada.com) programmatically - Navigate the SPA to the owner inbox without triggering bot detection
- Parse conversation threads into structured JSON (thread ID, guest name, message count, pricing data)
- Extract JADA pricing references from conversation content
- Generate a markdown report and email it to stakeholders
- Maintain a persistent browser profile to avoid repeated authentication
Technical Architecture: Playwright + Persistent Profiles
We chose Playwright over Selenium for its superior SPA handling and native support for persistent browser contexts. The stack:
- Playwright (browser automation) — Python bindings
- Chromium — headless (and headed for debugging)
- Gmail API — for report delivery
- JSON/Markdown — structured data output
All scripts were written into /tmp/gmb_*.py files during development, with final persistent state stored in the JADA ops directory.
Implementation: Four Key Scripts
gmb_login.py — Headed Authentication with Profile Persistence
The first challenge was authentication. GetMyBoat uses a modern login form and client-side session management. Rather than scrape session tokens, we took a "headed login" approach:
python /tmp/gmb_login.py \
--creds-env GMB_EMAIL,GMB_PASSWORD \
--profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
--headed
This script:
- Launches a visible Chromium window
- Navigates to
https://www.getmyboat.com/login - Waits for the email input field and types credentials from environment variables
- Submits the form and waits for successful redirect (checks for the inbox URL pattern)
- Saves the profile (cookies, local storage, indexedDB) to
--profile-dir
The key insight: GetMyBoat's SPA stores session state in Chromium's storage layers, not just HTTP cookies. By persisting the entire profile, subsequent runs can reuse the logged-in state without re-authenticating.
gmb_inbox.py — SPA Navigation and Thread Enumeration
With a persistent profile, we can now navigate directly to the inbox:
python /tmp/gmb_inbox.py \
--profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
--output /Users/cb/Documents/repos/jada-ops/gmb-inbox-threads.json
This script:
- Loads the saved profile (avoiding login)
- Navigates to
https://www.getmyboat.com/owner/inbox - Waits for the thread list to render (targets the conversation panel container)
- Extracts thread metadata: thread ID, guest name, date, message count, vessel name, inquiry price
- Returns a JSON array of all visible threads
SPA navigation required careful wait conditions. The inbox uses dynamic content loading, so we explicitly wait for:
page.wait_for_selector('.conversation-list-item')— ensures list renderspage.wait_for_load_state('networkidle')— waits for all XHR requests- Fallback timeout (30 seconds) to catch stalled loads
gmb_scrape.py — Message Parsing and JADA Pricing Extraction
Once we have a thread list, we iterate over each thread and extract the full conversation:
python /tmp/gmb_scrape.py \
--profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
--threads-json /Users/cb/Documents/repos/jada-ops/gmb-inbox-threads.json \
--output /Users/cb/Documents/repos/jada-ops/gmb-full-conversations.json
For each thread, we:
- Click the thread in the SPA (triggers message panel to load)
- Wait for the message list to render
- Parse each message: sender, timestamp, body text
- Search message bodies for JADA pricing patterns (e.g., "$X,XXX/day", "JADA Charter")
- Store structured output:
{ thread_id, guest_name, messages: [{ sender, timestamp, body, has_jada_pricing }], total_value_mentioned }
The pricing extraction uses regex patterns derived from historical GetMyBoat inquiries:
PRICING_PATTERNS = [
r'\$[\d,]+(?:/day|/night|/week|/month)',
r'JADA\s+(?:Charter|Pricing)',
r'(?:interested|quote|booking)\s+.*\$'
]
gmb_watch.py — Real-Time Monitoring and Report Generation
The final script ties everything together:
python /tmp/gmb_watch.py \
--profile-dir /Users/cb/Documents/repos/jada-ops/gmb-profile \
--output-dir /Users/cb/Documents/repos/jada-ops/reports \
--email-to c.b.ladd@gmail.com \
--interval 3600
This script:
- Runs
gmb_inbox.pyandgmb_scrape.pyin sequence - Generates a markdown report with pipeline summary: thread count, total inquiries, hot leads (threads mentioning JADA pricing)
- Uses the Gmail API to send the report to stakeholders
- Logs all actions to
/Users/cb/Documents/repos/jada-ops/gmb-watch.log - Loops on a configurable interval (default: hourly)