```html

Automating GetMyBoat Lead Pipeline with Playwright: Browser Automation for SPA Navigation and Email Integration

Over the past development session, we built out an end-to-end automation pipeline to capture, parse, and respond to GetMyBoat warm leads for the Sail JADA charter business. This post details the technical architecture, the challenges we encountered with browser automation on single-page applications, and the infrastructure decisions that shaped the solution.

The Problem: Manual Lead Tracking at Scale

Sail JADA's GetMyBoat inquiries were being tracked manually through email forwarding and loose spreadsheets. With multiple concurrent charter opportunities and a need to respond quickly to interested customers, this workflow didn't scale. We needed:

  • Real-time visibility into the GetMyBoat inbox without logging in via web UI
  • Structured extraction of conversation metadata (inquiry date, customer name, vessel interest, pricing)
  • Programmatic acknowledgment of warm leads with templated responses
  • Pipeline value calculation based on extracted offer data
  • Automated report generation and email delivery

Technical Architecture: Multi-Stage Scraper with Persistent Sessions

We implemented a modular Python-based scraper suite using Playwright for browser automation. The architecture consists of five discrete stages:

Stage 1: Authentication and Session Persistence

File: /tmp/gmb_login.py

Rather than embed credentials in each script, we created a persistent Chromium profile that survives restarts. The login flow:


# Pseudo-code pattern (no credentials shown)
browser = await playwright.chromium.launch(headless=False)
context = await browser.new_context(
    storage_state="/persistent/path/gmb_session.json"
)
page = await context.new_page()
await page.goto("https://www.getmyboat.com/login")
# Human-in-the-loop: prefill form, wait for user to complete 2FA
await page.wait_for_navigation(timeout=300000)  # 5 min timeout

Why persistent profiles? GetMyBoat enforces rate-limiting and session validation. Re-authenticating on every run was both slower and riskier. By saving Chromium's storage state (cookies, localStorage, sessionStorage), we could run headed mode once for login, then run headless for all subsequent scrapes.

Stage 2: SPA Navigation and Inbox Discovery

Files: /tmp/gmb_explore.py, /tmp/gmb_watch.py

GetMyBoat's inbox is a client-side rendered SPA. The inbox URL is not static; it's generated dynamically after navigation through the dashboard. We used two techniques:

  • Network interception: Watch for XHR/fetch calls to the API endpoints that load conversation lists
  • URL polling: Detect when the user navigates to the true inbox path and capture it

# Monitor network requests
page.on("response", lambda response: 
    print(response.url) if "/inbox" in response.url else None
)

# Wait for a specific navigation pattern
await page.wait_for_url("**/inbox/**", timeout=30000)
inbox_url = page.url

This was necessary because the SPA uses client-side routing; a traditional HTTP crawler would never reach the inbox without executing JavaScript.

Stage 3: Full Conversation Extraction

File: /tmp/gmb_scrape.py

Once in the inbox, we extract:

  • Pipeline list (all open inquiries with vessel names, customer names, dates)
  • Full conversation threads for each inquiry (message text, timestamps, sender role)
  • Metadata: last activity date, customer contact info, inquiry subject

The scraper waits for the DOM to stabilize (no new messages added for 2 seconds), then iterates through conversation panels:


# Wait for conversation list to be interactive
await page.wait_for_selector("[data-testid='conversation-item']", timeout=10000)

# Extract all visible conversations
conversations = await page.evaluate("""
  () => {
    return Array.from(document.querySelectorAll('[data-testid="conversation-item"]'))
      .map(el => ({
        customer: el.querySelector('[data-name]')?.textContent,
        subject: el.querySelector('[data-subject]')?.textContent,
        lastMessage: el.querySelector('[data-timestamp]')?.getAttribute('data-ts')
      }))
  }
""")

Stage 4: Warm Lead Acknowledgment

Files: /tmp/gmb_send_ack.py, /tmp/gmb_options.py, /tmp/gmb_buttons.py

For selected warm leads, we draft templated acknowledgments. The flow:

  1. Click the conversation in the SPA
  2. Wait for the message compose area to render
  3. Inject templated text (parameterized by customer name, vessel type, date)
  4. Optionally send or save as draft for human review

We built separate modules to handle GetMyBoat's dynamic button selectors, which vary based on UI state:


# Multiple selector patterns for robustness
send_button_selectors = [
    "button[aria-label='Send message']",
    "button:has-text('Send')",
    "[data-testid='compose-send']"
]

for selector in send_button_selectors:
    try:
        await page.click(selector, timeout=5000)
        break
    except:
        continue

Stage 5: Report Generation and Email Delivery

Files: /tmp/gmb_manual.py (post-processing), integration with Gmail API

We parse extracted conversations into a structured markdown report containing:

  • Pipeline summary (total inquiries, pipeline value by vessel type)
  • Per-inquiry cards with: customer name, inquiry date, vessel interest, extracted pricing, conversation excerpt
  • Recommended actions (follow-up needed, pricing mismatches, high-intent signals)

The report is saved to `/Users/cb/Documents/repos/jada-ops/getmyboat-report.md` and emailed via Gmail API using a service account with delegated access.

Infrastructure and Data Flow

  • Session storage: Chromium profile stored at `/persistent/gmb_session/` (survives restarts)
  • Scrape artifacts: Raw HTML and parsed JSON written to /tmp/gmb_*.json
  • Reports: Generated markdown and HTML saved to ~/Documents/repos/jada-ops/
  • Email delivery: Gmail API (OAuth2 + service account delegation, no IMAP)
  • Logging: Session transcript auto-published to tech blog at `tech.queenofsandiego.com`

Key Challenges and Decisions

Challenge 1: SPA Navigation Timing GetMyBoat's SPA doesn't redirect on successful login; it updates client state. We resolved this by watching for specific XHR responses