Automating GetMyBoat Lead Intake: Browser Automation, Session State Management, and Warm Lead Handoff Infrastructure

```html

This post documents the technical approach taken to automate lead capture and handoff workflows for the Sail JADA GetMyBoat integration, including the infrastructure decisions, browser automation patterns, and why the initial Playwright-based login strategy hit timeout constraints.

The Problem: Manual Lead Triage on GetMyBoat

GetMyBoat inquiries arrive as email notifications to carole@sailjada.com, but manual triage and response drafting creates bottlenecks. The goal was to:

Automatically parse incoming GetMyBoat notification emails from the inbox
Extract lead metadata (contact name, boat interest, inquiry date)
Generate response drafts for warm leads
Maintain session state across multiple automation runs without re-authenticating
Route high-intent leads through a separate notification channel

Technical Architecture: Three-Layer Approach

Layer 1: Gmail Integration (`/tmp/gmb_session.py`, `/tmp/gmb_login.py`)

Rather than scraping the GetMyBoat web UI directly, we prioritized stable email-based lead detection. The approach:

OAuth token persistence: Store Google OAuth2 tokens in a dedicated venv to avoid re-authentication on every run. Token shape: standard Google credential JSON with access_token, refresh_token, expires_in, and token_uri.
Gmail API queries: Use the Google Sheets-like query syntax to isolate true GetMyBoat platform notifications:
```
from: noreply@getmyboat.com
subject: (New inquiry | Inquiry response)
is:unread
```
Venv isolation: Install Google client libraries in a dedicated Python environment (google-auth, google-auth-oauthlib, google-auth-httplib2, google-api-python-client) to avoid version conflicts with Playwright and other automation tools.

Why this approach: Email is more reliable than HTML scraping. GetMyBoat's web UI is heavily JavaScript-rendered and prone to rate-limiting. By anchoring to SMTP notifications, we have a source-of-truth that doesn't require maintaining browser session state.

Layer 2: Browser Automation Exploration (`/tmp/gmb_lead_scan.py`)

For deeper lead context (photos, calendar availability, multi-message threads), we explored Playwright-based login to the GetMyBoat account dashboard. The implementation:

Headed browser mode: Launch a persistent Chromium instance with headless=False to allow manual intervention if CAPTCHA or bot-detection triggers.

Credential injection: Use Playwright's page.fill() and page.click() to automate login form submission:

await page.fill('input[name="email"]', getmyboat_email)
await page.fill('input[name="password"]', getmyboat_password)
await page.click('button[type="submit"]')

Session persistence: Save browser context state (cookies, storage) to a local directory so subsequent runs reuse the authenticated session.
Playwright installation: Custom venv with pip install playwright followed by playwright install chromium to download the matching Chromium build (necessary because system Chromium versions often lag).

Status: The headed login session was launched but timed out before completing the inbox capture. The timeout likely stems from:

GetMyBoat's bot-detection heuristics (unusual login pattern, headless browser indicators)
Network latency during the Chromium download and launch sequence
Missing user-agent or TLS fingerprinting adjustments required by modern bot detection

Infrastructure Decisions

Why Multi-Venv, Multi-Layer?

The primary venv houses Google Sheets/Gmail libraries. Playwright was installed in a secondary venv because:

Playwright's Chromium launcher is resource-heavy; isolating it prevents dependency conflicts.
Google's client libraries pin older versions of httplib2 and oauth2client; Playwright's async patterns work better with fresh installs.
If browser automation fails, the email-based lead detection pipeline (Layer 1) remains unaffected.

Session State Storage

Playwright context state is serialized to ~/.playwright-sessions/getmyboat/ as JSON, including:

cookies.json — Authentication tokens
localStorage.json — Client-side app state
sessionStorage.json — Temporary session variables

This allows the next automation run to skip login entirely, reducing latency and bot-detection risk.

Data Flow: Email → Lead Object → Handoff

The intended pipeline:

Read inbox: Gmail API queries for unread GetMyBoat notifications.
Parse metadata: Extract sender, subject line, and message body using regex.
Classify intent: Heuristic scoring (keywords: "available," "price," "dates," etc.) to separate warm leads from bounces.
Fetch context: (Optional) Use Playwright to pull additional detail from GetMyBoat dashboard.
Generate draft response: Template-based reply with Carole's tone and availability info.
Log + handoff: Write summary to `/Users/cb/Documents/repos/sailjada/leads/` as JSON and notify via email or Slack.

Key Decisions & Trade-offs

Email-first: We deliberately prioritized stable email parsing over full web-scraping to reduce maintenance burden and bot-detection risk.
Read-only Gmail scope: OAuth token was requested with readonly scope on Gmail to minimize security surface (no accidental deletions or reply-all disasters).
Headed browser for now: The initial Playwright experiment uses headless=False to allow manual CAPTCHA-solving if needed. For production, we'd either invest in anti-bot headers (User-Agent spoofing, TLS fingerprinting) or accept that some leads require manual triage.
Handoff prompt in /tmp: The analysis and response-drafting logic was staged in `/tmp/sailjada-tier2-handoff-prompt.md` as a Tier 2/3 handoff doc, allowing you to review and adjust the automation criteria before writing the full production handler.

What's Next

The battery shutdown interrupted this mid-iteration. The next steps are:

Debug the Playwright timeout: Add verbose logging and check if GetMyBoat is actively blocking Playwright's default User-Agent. Consider using