Automating GetMyBoat Lead Intake: Browser Automation, Session State Management, and Warm Lead Handoff Infrastructure
This post documents the technical approach taken to automate lead capture and handoff workflows for the Sail JADA GetMyBoat integration, including the infrastructure decisions, browser automation patterns, and why the initial Playwright-based login strategy hit timeout constraints.
The Problem: Manual Lead Triage on GetMyBoat
GetMyBoat inquiries arrive as email notifications to carole@sailjada.com, but manual triage and response drafting creates bottlenecks. The goal was to:
- Automatically parse incoming GetMyBoat notification emails from the inbox
- Extract lead metadata (contact name, boat interest, inquiry date)
- Generate response drafts for warm leads
- Maintain session state across multiple automation runs without re-authenticating
- Route high-intent leads through a separate notification channel
Technical Architecture: Three-Layer Approach
Layer 1: Gmail Integration (/tmp/gmb_session.py, /tmp/gmb_login.py)
Rather than scraping the GetMyBoat web UI directly, we prioritized stable email-based lead detection. The approach:
- OAuth token persistence: Store Google OAuth2 tokens in a dedicated venv to avoid re-authentication on every run. Token shape: standard Google credential JSON with
access_token,refresh_token,expires_in, andtoken_uri. - Gmail API queries: Use the Google Sheets-like query syntax to isolate true GetMyBoat platform notifications:
from: noreply@getmyboat.com subject: (New inquiry | Inquiry response) is:unread - Venv isolation: Install Google client libraries in a dedicated Python environment (
google-auth,google-auth-oauthlib,google-auth-httplib2,google-api-python-client) to avoid version conflicts with Playwright and other automation tools.
Why this approach: Email is more reliable than HTML scraping. GetMyBoat's web UI is heavily JavaScript-rendered and prone to rate-limiting. By anchoring to SMTP notifications, we have a source-of-truth that doesn't require maintaining browser session state.
Layer 2: Browser Automation Exploration (/tmp/gmb_lead_scan.py)
For deeper lead context (photos, calendar availability, multi-message threads), we explored Playwright-based login to the GetMyBoat account dashboard. The implementation:
- Headed browser mode: Launch a persistent Chromium instance with
headless=Falseto allow manual intervention if CAPTCHA or bot-detection triggers. - Credential injection: Use Playwright's
page.fill()andpage.click()to automate login form submission:await page.fill('input[name="email"]', getmyboat_email) await page.fill('input[name="password"]', getmyboat_password) await page.click('button[type="submit"]') - Session persistence: Save browser context state (cookies, storage) to a local directory so subsequent runs reuse the authenticated session.
- Playwright installation: Custom venv with
pip install playwrightfollowed byplaywright install chromiumto download the matching Chromium build (necessary because system Chromium versions often lag).
Status: The headed login session was launched but timed out before completing the inbox capture. The timeout likely stems from:
- GetMyBoat's bot-detection heuristics (unusual login pattern, headless browser indicators)
- Network latency during the Chromium download and launch sequence
- Missing user-agent or TLS fingerprinting adjustments required by modern bot detection
Infrastructure Decisions
Why Multi-Venv, Multi-Layer?
The primary venv houses Google Sheets/Gmail libraries. Playwright was installed in a secondary venv because:
- Playwright's Chromium launcher is resource-heavy; isolating it prevents dependency conflicts.
- Google's client libraries pin older versions of
httplib2andoauth2client; Playwright's async patterns work better with fresh installs. - If browser automation fails, the email-based lead detection pipeline (Layer 1) remains unaffected.
Session State Storage
Playwright context state is serialized to ~/.playwright-sessions/getmyboat/ as JSON, including:
cookies.json— Authentication tokenslocalStorage.json— Client-side app statesessionStorage.json— Temporary session variables
This allows the next automation run to skip login entirely, reducing latency and bot-detection risk.
Data Flow: Email → Lead Object → Handoff
The intended pipeline:
- Read inbox: Gmail API queries for unread GetMyBoat notifications.
- Parse metadata: Extract sender, subject line, and message body using regex.
- Classify intent: Heuristic scoring (keywords: "available," "price," "dates," etc.) to separate warm leads from bounces.
- Fetch context: (Optional) Use Playwright to pull additional detail from GetMyBoat dashboard.
- Generate draft response: Template-based reply with Carole's tone and availability info.
- Log + handoff: Write summary to `/Users/cb/Documents/repos/sailjada/leads/` as JSON and notify via email or Slack.
Key Decisions & Trade-offs
- Email-first: We deliberately prioritized stable email parsing over full web-scraping to reduce maintenance burden and bot-detection risk.
- Read-only Gmail scope: OAuth token was requested with
readonlyscope on Gmail to minimize security surface (no accidental deletions or reply-all disasters). - Headed browser for now: The initial Playwright experiment uses
headless=Falseto allow manual CAPTCHA-solving if needed. For production, we'd either invest in anti-bot headers (User-Agent spoofing, TLS fingerprinting) or accept that some leads require manual triage. - Handoff prompt in /tmp: The analysis and response-drafting logic was staged in `/tmp/sailjada-tier2-handoff-prompt.md` as a Tier 2/3 handoff doc, allowing you to review and adjust the automation criteria before writing the full production handler.
What's Next
The battery shutdown interrupted this mid-iteration. The next steps are:
- Debug the Playwright timeout: Add verbose logging and check if GetMyBoat is actively blocking Playwright's default User-Agent. Consider using