```html

Diagnosing Multi-Layer Email Delivery Failures: SES Infrastructure vs. Gmail Relay Authentication

Last week, we encountered a seemingly simple email delivery problem: messages sent through our Queen of San Diego domain weren't reaching recipients. What appeared to be a single point of failure turned out to be two distinct failures in different layers of our email stack. This post walks through the diagnostic process and the architectural lessons we learned.

The Problem Statement

Three emails sent via admin@queenofsandiego.com bounced back with authentication errors. The bounce notifications arrived at the jadasailing@gmail.com account, suggesting the emails never successfully left the sending system. The error message indicated "misconfigured or out of date" credentials, which initially pointed to a single root cause.

Diagnostic Layer 1: Email Infrastructure Validation

Our first instinct was to validate the actual email infrastructure in AWS SES. We systematically checked:

  • SES Identity Verification: Confirmed that admin@queenofsandiego.com was verified in the us-east-1 region (our primary SES region) with full production access
  • DKIM Configuration: Verified all three DKIM CNAME records were properly propagated in Route53:
    • DKIM token 1 → CloudFront-hosted verification
    • DKIM token 2 → CloudFront-hosted verification
    • DKIM token 3 → CloudFront-hosted verification
  • SPF Record: Confirmed SES SPF entry in the domain's TXT records pointing to bounce.amazonses.com
  • DMARC Policy: Validated DMARC TXT record with alignment policy
  • Custom MAIL FROM Domain: Checked that mail.queenofsandiego.com had proper MX and SPF configuration
  • Suppression Lists: Queried SES suppression lists across both us-east-1 and us-west-2 regions to ensure recipient addresses weren't bounce-suppressed

Result: The SES infrastructure was completely clean. All DNS records were in place, DKIM signatures would validate, and no recipients were on suppression lists.

Diagnostic Layer 2: Gmail "Send Mail As" Configuration

This is where we discovered the actual failure. The email sending flow used Gmail's "Send mail as" feature, which allows the jadasailing@gmail.com account to send on behalf of admin@queenofsandiego.com. Gmail supports two modes for this:

  • Gmail's SMTP relays (default, limited throughput)
  • Custom SMTP server (external relay, like AWS SES)

Our configuration was using the second approach: the alias was configured to relay outbound email through SES's SMTP endpoint (email-smtp.us-east-1.amazonaws.com:587) using IAM-derived credentials.

The Critical Discovery: The SMTP credentials stored in the Gmail alias configuration were stale. SES SMTP credentials are derived from IAM access keys using AWS's Signature Version 4 algorithm. When IAM keys are rotated, the SES SMTP password becomes invalid. Gmail was correctly rejecting the authentication attempt, and the bounce notification was returned to the source account (jadasailing@gmail.com).

Why This Architecture Created Hidden Fragility

Our setup had three potential email sending pathways:

  1. Direct SES API calls (via boto3/Lambda) — Using IAM role credentials, automatically rotated
  2. SES SMTP relay via Lambda — Using long-lived IAM user credentials, manually managed
  3. Gmail alias with SES SMTP relay — Using stored plaintext-equivalent credentials, must be manually updated

Path #3 was the weakest link. Unlike our Lambda functions (which use temporary STS credentials refreshed continuously), the Gmail alias stores credentials that never refresh. If the underlying IAM key rotates, Gmail silently fails on send.

The Infrastructure Audit Trail

To understand what happened, we traced through our key management:

  • SES SMTP User: Located in AWS IAM under service account ses-smtp-user
  • Credential Rotation: Checked CloudTrail for CreateAccessKey and DeleteAccessKey events on this user — found recent rotation
  • Verification: Generated new SES SMTP credentials from the current active IAM access key using the correct v4 derivation algorithm (not v1 or v3)
  • Test: Authenticated directly against email-smtp.us-east-1.amazonaws.com:587 using the new derived password — connection succeeded

The Fix

The solution required manual intervention in Gmail:

Gmail Settings → Accounts and Import → "Send mail as"
  → Edit admin@queenofsandiego.com
    → Update SMTP Server: email-smtp.us-east-1.amazonaws.com
    → Port: 587
    → Username: [SES SMTP username from AWS Console]
    → Password: [newly derived v4 signature password]
  → Test configuration
  → Save

Once updated, emails sent through the Gmail alias began delivering successfully.

What We Learned: Architectural Resilience

This incident revealed that manual credential management is a single point of failure. We're implementing changes:

  • Deprecate Gmail-based relays: Moving all bulk email to Lambda-based SES API calls, which use temporary STS credentials
  • Implement SES credential rotation alerts: When IAM keys rotate, we need notification triggers for any stored SMTP credentials
  • Update documentation: Clearly separate our three email pathways and their credential lifecycle requirements
  • Monitoring: Add SES delivery status tracking to CloudWatch to catch relay failures earlier

Key Takeaway

Email delivery problems can hide multiple distinct failures. We initially assumed a single point of failure because the error message pointed to "credentials." In reality, our SES infrastructure was flawless, but the credential management wrapper (Gmail) had become stale. Always validate infrastructure separately from configuration — they fail independently.

```