Automating GetMyBoat Lead Ingestion: Browser Automation, Session Persistence, and Warm Lead Routing
When a battery shutdown interrupted six hours of distributed development work across five Claude Code sessions, the GetMyBoat warm lead automation pipeline was mid-implementation. This post reconstructs the technical approach, the blockers encountered, and the infrastructure pattern we're building to transform manual inbox triage into a persistent, headless lead-scoring system.
The Problem: Manual Lead Qualification on GetMyBoat
Carole manages charter inquiries through GetMyBoat's platform inbox. Each inquiry arrives as a notification email to carole@sailjada.com, but the qualification workflow—determining deposit status, availability, and match quality—requires logging into GetMyBoat, navigating the SPA, and manually reading lead details. The goal: automate this triage by extracting structured lead data, scoring warm prospects, and routing replies back to Carole with context.
Architecture: Playwright, Persistent Profiles, and Read-Only Extraction
Our implementation uses three layers:
- Authentication layer: Playwright headless browser with persistent user profile stored at
/tmp/gmb_profile, eliminating the need to re-authenticate on each run. - Extraction layer: Read-only SPA navigation to the inbox URL (discovered via
/tmp/gmb_explore.py) without mutating GetMyBoat state. - Routing layer: Structured lead JSON routed to a warm-lead responder (existing Carole prompt infrastructure) for draft reply generation.
This three-layer design isolates concerns: login complexity doesn't bleed into extraction logic, and extraction doesn't risk accidental mutation of lead status or messages.
Technical Implementation: Files and Flow
Four Python scripts were created in /tmp/ during development:
/tmp/gmb_session.py— Core session manager. Initializes Playwright context with persistent profile, handles browser launch/teardown, exposes page object for navigation./tmp/gmb_login.py— Headed login workflow. Launches a visible Chromium window, navigates tohttps://getmyboat.com/login, waits for user credential entry or credential injection via environment, validates post-login state (checks for dashboard elements)./tmp/gmb_explore.py— SPA navigation discovery. Once logged in, programmatically traverses GetMyBoat's navigation menu to isolate the inbox URL and inbox container selectors, then writes discovered URLs to stdout./tmp/gmb_lead_scan.py— Read-only lead extraction. Given an inbox URL and CSS selectors, queries the DOM for lead list items, extracts structured fields (inquiry date, lead name, vessel type, availability window), and outputs JSON.
Invocation pattern:
# Session init (one-time or when profile expires)
python /tmp/gmb_session.py --action init --profile-dir /tmp/gmb_profile
# Headed login (interactive, stores persistent session)
python /tmp/gmb_login.py \
--profile-dir /tmp/gmb_profile \
--email carole@sailjada.com \
--headless false
# Explore SPA to find inbox URL
python /tmp/gmb_explore.py \
--profile-dir /tmp/gmb_profile \
--output /tmp/gmb_inbox_url.txt
# Read-only lead scan (idempotent, no mutations)
python /tmp/gmb_lead_scan.py \
--profile-dir /tmp/gmb_profile \
--inbox-url $(cat /tmp/gmb_inbox_url.txt) \
--output /tmp/gmb_leads.json
Browser Automation: Playwright and Chromium Matching
Playwright introduced a critical dependency: a venv with both playwright and google-api-client (for Gmail token verification). The Python interpreter discovery process:
- Checked existing venv:
~/venv/lib/python3.11/site-packagesalready hadgoogle-api-client. - Installed Playwright asynchronously, then synchronously verified Chromium download:
playwright install chromium. - Tested import and launch:
python -c "from playwright.sync_api import sync_playwright; sync_playwright().start()". - Verified profile persistence by launching headed mode with
--user-data-dir=/tmp/gmb_profile, then killing the script mid-session and re-launching to confirm profile cache.
This approach avoids re-downloading Chromium on each CI run by baking the venv (and ~/.cache/ms-playwright) into the deployment artifact.
Session Persistence and Headed Mode
The key friction point: GetMyBoat's login requires either interactive credential entry (headed mode) or credential injection. We chose headed mode during development to validate the UX, with a plan to inject credentials from environment variables in production:
- Headed mode (development): Browser window visible, user enters credentials manually, session persists to
/tmp/gmb_profile. - Headless mode (production): Credentials injected via
page.fill()` and `page.press('Enter'), profile cached across runs.
The persistent profile approach amortizes the login cost: first run is slow (interactive login + inbox discovery), subsequent runs reuse the session cookie and skip authentication entirely.
Blocker Encountered: Playwright Login Timeout
When the battery shutdown occurred, the last attempt to run gmb_login.py in headed mode had timed out after ~30 seconds waiting for the GetMyBoat login page to render. Hypothesized causes:
- GetMyBoat's login endpoint may be rate-limiting or blocking Playwright's user-agent string.
- Page load time for the SPA (before login form becomes interactive) exceeded Playwright's default timeout (
page.goto(url, timeout=30000)). - CSS selector mismatch: login button or credential fields may have changed DOM structure.
Next debugging steps (post-session): increase Playwright timeout to 60s, capture network waterfall via page.on('response'), inspect HAR file for blocking resources, and test with a headless browser DevTools Protocol listener.
Infrastructure: Gmail Token Verification and Warm Lead Routing
The extracted lead JSON is routed through existing infrastructure:
- Gmail token account (verified from
~/.config/gcloud/application_default_credentials.json) provides OAuth context for Carole's inbox. - Warm lead responder prompt (at
~/Documents/repos/sailjada/prompts/warm-lead-responder.md) ingests structured lead JSON and generates draft replies. - Replies are staged in a drafts folder (not auto-sent) for Carole's final review and dispatch.
This keeps the human in the loop while eliminating the manual lead extraction step.
Key Decisions and Trade-offs
- Read-only extraction: We don't mutate lead status or mark inquiries as read. This safety-first approach means Carole always