```html

Automating GetMyBoat Lead Ingestion with Playwright: Architecture & Implementation Challenges

Over the past development session, we built out a Python-based lead scraper for GetMyBoat inquiries tied to the Sail JADA rental operation. This post documents the architecture decisions, implementation details, and the blocking issue we encountered that requires follow-up work.

What We Built

Two Python scripts were created in /tmp/ to automate GetMyBoat lead retrieval:

  • /tmp/gmb_login.py — handles browser automation and account authentication
  • /tmp/gmb_lead_scan.py — extracts lead data from the authenticated session

The goal: replace manual checking of carole@sailjada.com inbox for GetMyBoat platform notifications with a scheduled, read-only scraper that classifies warm leads and auto-responds to inquiries.

Technical Architecture

Browser Automation Stack

We selected Playwright over Selenium for this task because:

  • Multi-browser support — Chromium, Firefox, WebKit in a single API
  • Native async/await — cleaner coroutine handling than Selenium's blocking model
  • Built-in wait strategies — automatic network idle detection and selector polling
  • Headless + headed modes — easier debugging when selectors fail

Installation required creating a dedicated Python venv to avoid conflicts with existing Google API client libraries:

python3 -m venv /path/to/venv
source /path/to/venv/bin/activate
pip install playwright google-api-python-client
playwright install chromium

We verified Chromium availability and launch capability before writing authentication code:

python3 -c "from playwright.sync_api import sync_playwright; p = sync_playwright().start(); browser = p.chromium.launch(); print('Chromium OK'); browser.close(); p.stop()"

Gmail Token & Account Routing

The system integrates with Google OAuth2 tokens stored in the credentials ecosystem. Rather than embedding GetMyBoat credentials in plaintext, we adopted a pattern:

  • Store GetMyBoat credentials in environment variables or a secrets manager (not in version control)
  • Use the existing Gmail token infrastructure to verify that incoming platform notifications originate from GetMyBoat's SMTP servers
  • Cross-reference MX records for sailjada.com to ensure reply routing works correctly

We verified the MX record setup before building the reply logic:

nslookup -type=MX sailjada.com
# Expected: Google Workspace MX entries (aspmx.l.google.com, etc.)

Implementation: The Login Script

/tmp/gmb_login.py handles the critical authentication step. Key design decisions:

  • Sync API — used Playwright's synchronous interface (not async) for simpler error handling in a cron/scheduled context
  • Network idle waits — after login, wait for all network requests to settle before returning the authenticated context
  • Headless mode default — production runs headless; headless=False for debugging
  • Timeout handling — 30-second timeouts on page navigation, 10-second timeouts on selector polls

Pseudocode structure:

from playwright.sync_api import sync_playwright

def login_to_getmyboat(email, password, headless=True):
    """
    Authenticate to GetMyBoat and return an authenticated Page object.
    Caller is responsible for closing the browser.
    """
    playwright = sync_playwright().start()
    browser = playwright.chromium.launch(headless=headless)
    context = browser.new_context()
    page = context.new_page()
    
    # Navigate to login
    page.goto("https://www.getmyboat.com/login", wait_until="networkidle")
    
    # Fill and submit credentials
    page.fill('input[name="email"]', email)
    page.fill('input[name="password"]', password)
    page.click('button[type="submit"]')
    
    # Wait for redirect to dashboard
    page.wait_for_url("**/dashboard**", timeout=30000)
    
    return page, browser, playwright

The Lead Scan Script

/tmp/gmb_lead_scan.py uses the authenticated page to extract lead metadata. It:

  • Navigates to the inquiries/messages inbox
  • Queries the DOM for lead cards (selector: .inquiry-card or similar)
  • Extracts structured data: sender name, message preview, date, vessel, charter dates
  • Classifies leads as "warm" (multi-message thread, booked within 30 days) or "cold"
  • Returns JSON for downstream processing (auto-reply, CRM sync)

Where We Hit a Blocker

The authentication test timed out at the 30-second mark during the credentials submission phase. This prevented us from completing the lead extraction workflow.

Root causes under investigation:

  • CloudFlare/rate-limiting — GetMyBoat may detect browser automation and serve a challenge page
  • MFA requirement — the account may have two-factor authentication enabled, requiring a TOTP token or email verification
  • Session timeout — the account may have been inactive long enough to require re-verification
  • Network policy — the development machine's IP may be flagged or geofenced

Key Decisions & Rationale

  • Read-only scraper — we intentionally built a passive observer, not an automated reply bot. All outgoing messages are queued for human review before sending.
  • Separate venv — isolating Playwright from the main Google API environment prevents dependency conflicts and allows for easy rollback or version-pinning.
  • Sync API over async — while Playwright's async API is performant, the cron-scheduled nature of this task doesn't require concurrent page operations. Sync code is easier to reason about in a scraper context.
  • Playwright over Puppeteer — Python ecosystem; Puppeteer is Node.js only.

What's Next

  • Debug the login timeout — run gmb_login.py in headless=False mode to see what page is actually being served at the 30-second mark
  • Add TOTP support — if MFA is enabled, integrate a TOTP library to handle time-based one-time passwords
  • Implement retry logic — exponential backoff for CloudFlare challenges
  • Test with a secondary account — isolate whether the blocker is account-specific or environmental
  • Wire into the warm lead