Automating GetMyBoat Lead Capture with Playwright: Building a Headless Browser Pipeline for SPA Navigation

```html

What Was Done

Over a series of interrupted development sessions, we built out a multi-stage Python-based lead capture pipeline for GetMyBoat inquiries tied to the Sail JADA property charter operation. The work spanned browser automation setup, persistent session management, and SPA (Single Page Application) navigation challenges specific to GetMyBoat's dynamic inbox interface.

The core objective: reliably extract warm leads from GetMyBoat's owner inbox without manual intervention, authenticating as carole@sailjada.com and navigating past GetMyBoat's JavaScript-heavy UI to capture real inbox state. However, the final session exposed a critical blocker—the Playwright login flow was timing out—which we've documented for the next iteration.

Technical Details: The Pipeline Architecture

File Structure

All scripts were written to /tmp/ for iteration speed:

/tmp/gmb_session.py — Core Playwright browser context and profile management
/tmp/gmb_login.py — Authentication flow with headless/headed mode toggle
/tmp/gmb_inbox.py — Inbox page detection and navigation waits
/tmp/gmb_lead_scan.py — Read-only lead extraction from DOM
/tmp/gmb_explore.py — SPA navigation discovery for inbox URL patterns
/tmp/gmb_watch.py — Real-time inbox URL capture as user navigates
/tmp/gmb_manual.py — Prefill credentials and hand off to manual user login

Browser Automation Stack

We selected Playwright (not Selenium) for its native event-driven architecture and superior SPA handling. The decision was driven by:

Event-driven waits: page.wait_for_selector() and page.wait_for_url() are more reliable than polling for dynamic content
Persistent context: Playwright's browser_context with explicit storage_state saves and restores auth tokens without re-logging
Chromium stability: Matches GetMyBoat's target browser; reduces CSS/JS rendering drift

Installation required a custom venv due to existing project dependencies:

python3 -m venv /path/to/gmb_venv
source /path/to/gmb_venv/bin/activate
pip install playwright google-api-client google-auth
playwright install chromium

Session & Profile Management

Rather than create a fresh browser context on every run, we persisted user profile state to disk:

context = await browser.new_context(
    storage_state="/tmp/gmb_carole_profile.json"
)
await context.storage_state(path="/tmp/gmb_carole_profile.json")

This allowed us to:

Reuse authentication across script restarts (avoiding login timeouts)
Capture cookies, localStorage, and sessionStorage for GetMyBoat's SPA state
Run headed mode (headless=False) for debugging without losing auth state

The GetMyBoat SPA Challenge

GetMyBoat's inbox is a client-rendered SPA. Navigation does not trigger full page reloads; instead, JavaScript updates the DOM and URL fragment. Standard Selenium waits often fail because the page reports "ready" before the inbox content loads.

Our solution: two-pronged detection in /tmp/gmb_inbox.py:

# Wait for DOM markers specific to the inbox page
await page.wait_for_selector('[data-test-id="inbox-message-list"]', timeout=10000)

# Also wait for URL pattern match (GetMyBoat's inbox URL contains /owner/inbox)
await page.wait_for_url(re.compile(r'/owner/inbox'), timeout=10000)

Once both conditions were met, we knew the inbox was safe to scrape. The gmb_explore.py` script then mapped available UI elements to discover the exact inbox URL structure, which varied based on authentication state and active filters.



Lead Extraction & Read-Only Scanning


/tmp/gmb_lead_scan.py performs a read-only DOM query:


leads = await page.query_selector_all('.message-row')
for lead in leads:
    sender = await lead.query_selector('.sender-name')
    message = await lead.query_selector('.preview-text')
    timestamp = await lead.query_selector('.timestamp')
    # Extract text and log without modification


This approach:



Requires zero write permissions (safe for auditing)
Can be run repeatedly without side effects
Works with GetMyBoat's dynamically rendered list (no full-page screenshot needed)


Authentication & Credential Handling


Credentials were stored in a separate secrets file (not committed to version control). The login flow in /tmp/gmb_login.py prefills the email/password fields and waits for either:



Successful redirect to the dashboard
Explicit user interaction (2FA, CAPTCHA)



In headed mode, we prefilled credentials and handed off to the user:


await page.fill('input[name="email"]', carole_email)
await page.fill('input[name="password"]', carole_password)
# User clicks submit manually to handle any 2FA

The Authentication Timeout Issue


The final session (`gmb_login.py` headless run) timed out waiting for the login redirect. Likely causes:



GetMyBoat may detect headless Chromium and block (User-Agent filtering)
Network latency or temporary service degradation
Unhandled 2FA or rate-limiting



Next steps: add User-Agent spoofing, retry logic, and screenshot capture on timeout for debugging.


Infrastructure & File Organization


All development scripts live in /tmp/ for rapid iteration. Production deployment (when ready) will move to:



~/Documents/repos/sail-jada/gmb_automation/ (main source)
~/Documents/repos/sail-jada/secrets/gmb_creds.env (encrypted, not committed)
Persistent profile: ~/.gmb_profiles/carole_profile.json (gitignored)



Google API integration (from earlier sessions) remains in ~/Documents/repos/sail-jada/lib/gmail_helper.py for warm lead follow-up automation.


Key Decisions