Automating GetMyBoat Lead Capture with Playwright: Building a Headless Browser Pipeline for SPA Navigation
What Was Done
Over a series of interrupted development sessions, we built out a multi-stage Python-based lead capture pipeline for GetMyBoat inquiries tied to the Sail JADA property charter operation. The work spanned browser automation setup, persistent session management, and SPA (Single Page Application) navigation challenges specific to GetMyBoat's dynamic inbox interface.
The core objective: reliably extract warm leads from GetMyBoat's owner inbox without manual intervention, authenticating as carole@sailjada.com and navigating past GetMyBoat's JavaScript-heavy UI to capture real inbox state. However, the final session exposed a critical blocker—the Playwright login flow was timing out—which we've documented for the next iteration.
Technical Details: The Pipeline Architecture
File Structure
All scripts were written to /tmp/ for iteration speed:
/tmp/gmb_session.py— Core Playwright browser context and profile management/tmp/gmb_login.py— Authentication flow with headless/headed mode toggle/tmp/gmb_inbox.py— Inbox page detection and navigation waits/tmp/gmb_lead_scan.py— Read-only lead extraction from DOM/tmp/gmb_explore.py— SPA navigation discovery for inbox URL patterns/tmp/gmb_watch.py— Real-time inbox URL capture as user navigates/tmp/gmb_manual.py— Prefill credentials and hand off to manual user login
Browser Automation Stack
We selected Playwright (not Selenium) for its native event-driven architecture and superior SPA handling. The decision was driven by:
- Event-driven waits:
page.wait_for_selector()andpage.wait_for_url()are more reliable than polling for dynamic content - Persistent context: Playwright's
browser_contextwith explicitstorage_statesaves and restores auth tokens without re-logging - Chromium stability: Matches GetMyBoat's target browser; reduces CSS/JS rendering drift
Installation required a custom venv due to existing project dependencies:
python3 -m venv /path/to/gmb_venv
source /path/to/gmb_venv/bin/activate
pip install playwright google-api-client google-auth
playwright install chromium
Session & Profile Management
Rather than create a fresh browser context on every run, we persisted user profile state to disk:
context = await browser.new_context(
storage_state="/tmp/gmb_carole_profile.json"
)
await context.storage_state(path="/tmp/gmb_carole_profile.json")
This allowed us to:
- Reuse authentication across script restarts (avoiding login timeouts)
- Capture cookies, localStorage, and sessionStorage for GetMyBoat's SPA state
- Run headed mode (
headless=False) for debugging without losing auth state
The GetMyBoat SPA Challenge
GetMyBoat's inbox is a client-rendered SPA. Navigation does not trigger full page reloads; instead, JavaScript updates the DOM and URL fragment. Standard Selenium waits often fail because the page reports "ready" before the inbox content loads.
Our solution: two-pronged detection in /tmp/gmb_inbox.py:
# Wait for DOM markers specific to the inbox page
await page.wait_for_selector('[data-test-id="inbox-message-list"]', timeout=10000)
# Also wait for URL pattern match (GetMyBoat's inbox URL contains /owner/inbox)
await page.wait_for_url(re.compile(r'/owner/inbox'), timeout=10000)
Once both conditions were met, we knew the inbox was safe to scrape. The gmb_explore.py` script then mapped available UI elements to discover the exact inbox URL structure, which varied based on authentication state and active filters.
Lead Extraction & Read-Only Scanning
/tmp/gmb_lead_scan.py performs a read-only DOM query:
leads = await page.query_selector_all('.message-row')
for lead in leads:
sender = await lead.query_selector('.sender-name')
message = await lead.query_selector('.preview-text')
timestamp = await lead.query_selector('.timestamp')
# Extract text and log without modification
This approach:
- Requires zero write permissions (safe for auditing)
- Can be run repeatedly without side effects
- Works with GetMyBoat's dynamically rendered list (no full-page screenshot needed)
Authentication & Credential Handling
Credentials were stored in a separate secrets file (not committed to version control). The login flow in /tmp/gmb_login.py prefills the email/password fields and waits for either:
- Successful redirect to the dashboard
- Explicit user interaction (2FA, CAPTCHA)
In headed mode, we prefilled credentials and handed off to the user:
await page.fill('input[name="email"]', carole_email)
await page.fill('input[name="password"]', carole_password)
# User clicks submit manually to handle any 2FA
The Authentication Timeout Issue
The final session (`gmb_login.py` headless run) timed out waiting for the login redirect. Likely causes:
- GetMyBoat may detect headless Chromium and block (User-Agent filtering)
- Network latency or temporary service degradation
- Unhandled 2FA or rate-limiting
Next steps: add User-Agent spoofing, retry logic, and screenshot capture on timeout for debugging.
Infrastructure & File Organization
All development scripts live in /tmp/ for rapid iteration. Production deployment (when ready) will move to:
~/Documents/repos/sail-jada/gmb_automation/(main source)~/Documents/repos/sail-jada/secrets/gmb_creds.env(encrypted, not committed)- Persistent profile:
~/.gmb_profiles/carole_profile.json(gitignored)
Google API integration (from earlier sessions) remains in ~/Documents/repos/sail-jada/lib/gmail_helper.py for warm lead follow-up automation.