Debugging a Subdomain Outage: DNS Shadowing, CloudFront Validation, and Lambda Checkout Recovery

When adamcherrycomics.dangerouscentaur.com went down for hours, the investigation revealed a classic DNS shadowing issue compounded by Lambda function misconfiguration. This post walks through the systematic debugging approach, infrastructure changes made, and the architectural patterns that prevented a complete service failure.

The Problem: Multiple Layers of Failure

The site was completely unreachable. Initial curl requests timed out. The first instinct was to assume CloudFront was down, but verbose HTTP checks revealed the real picture: DNS wasn't resolving at all.

curl -v https://adamcherrycomics.dangerouscentaur.com/ --max-time 10

This immediately hung. We pivoted to DNS diagnostics:

dig adamcherrycomics.dangerouscentaur.com @8.8.8.8
dig adamcherrycomics.dangerouscentaur.com @pdns2.namecheap.com

The results showed a critical discrepancy: CloudFront distribution E2Q4UU71SRNTMB was healthy and returning 200s when accessed directly with the Host header, but DNS resolution was failing. This immediately pointed to Namecheap DNS configuration rather than infrastructure.

DNS Shadowing: The RFC 1034 Problem

The Namecheap DNS records for dangerouscentaur.com contained:

  • A wildcard CNAME: * → cloudfront-dist.amazonaws.com
  • An explicit subdomain CNAME: www.adamcherrycomics → cloudfront-dist.amazonaws.com

The problem: RFC 1034 § 4.3.3 specifies that explicit DNS nodes shadow wildcard records. The www.adamcherrycomics CNAME created a DNS node at adamcherrycomics, which blocked the wildcard from matching. Queries for adamcherrycomics.dangerouscentaur.com would find the adamcherrycomics node but no actual record under it, resulting in NXDOMAIN or resolution failure.

The fix was explicit: add a direct CNAME for adamcherrycomics to the Namecheap DNS record set.

Infrastructure Changes

DNS Records (Namecheap):

We queried the current state, then performed a merge-and-update operation:

GET /dns/list.json?domain=dangerouscentaur.com
POST /dns/setHosts.json with existing records + new adamcherrycomics CNAME

New record added (HostId 511341200):

  • Host: adamcherrycomics
  • Type: CNAME
  • Value: d.cloudfront.amazonaws.com (masked in this post)
  • TTL: 1800 seconds

After adding the record, we validated propagation across multiple nameservers:

dig adamcherrycomics.dangerouscentaur.com @pdns1.namecheap.com
dig adamcherrycomics.dangerouscentaur.com @pdns2.namecheap.com
dig adamcherrycomics.dangerouscentaur.com @8.8.8.8

We also polled until the site responded over HTTPS to ensure end-to-end connectivity before declaring the DNS layer resolved.

The Secondary Issue: Lambda Checkout Failures

Once DNS was fixed, we discovered a second layer of failure: the checkout functionality was broken. The site at /index.html contained Stripe integration code that called a Lambda function endpoint. Investigating the checkout flow:

cat /Users/cb/Documents/repos/sites/adamcherrycomics.com/index.html | grep -i stripe

The page called a Lambda function URL. We checked the function adam-cherry-checkout:

  • Runtime: Python 3.x
  • Source: /lambda/checkout.py
  • Auth Config: CORS enabled, but initial CORS headers were missing on OPTIONS requests
  • Resource Policy: Verified to allow invocation from the CloudFront distribution

The Lambda function was failing on import: the typing_extensions module was missing from the deployment package.

Lambda Deployment and Fixes

We rebuilt the deployment artifact with all dependencies included:

cd /Users/cb/Documents/repos/sites/adamcherrycomics.com/lambda
pip install -r requirements.txt -t .
zip -r checkout.zip . -x "*.pyc"
aws lambda update-function-code --function-name adam-cherry-checkout --zip-file fileb://checkout.zip

The function was then tested with the Stripe checkout payload for the "hells-lounge" product:

aws lambda invoke --function-name adam-cherry-checkout \
  --payload '{"product":"hells-lounge"}' \
  response.json

Multiple iterations addressed configuration issues:

  • Added missing typing_extensions to the zip
  • Fixed ui_mode parameter in Stripe session configuration (changed from deprecated parameter to correct Stripe API format)
  • Updated the redirect URL to use hosted_page instead of deprecated Stripe redirect patterns

We verified CloudWatch Logs at /aws/lambda/adam-cherry-checkout to see actual runtime errors rather than relying on generic HTTP 500s.

Content Deployment and Cache Invalidation

Once checkout.py was fixed, we updated index.html in S3:

aws s3 cp /Users/cb/Documents/repos/sites/adamcherrycomics.com/index.html \
  s3://adamcherrycomics-content/index.html --cache-control "max-age=3600"

Then invalidated the CloudFront cache for distribution E2Q4UU71SRNTMB:

aws cloudfront create-invalidation --distribution-id E2Q4UU71SRNTMB --paths "/*"

This ensured browsers and CloudFront edge nodes immediately served the updated version.

Architecture Patterns and Lessons

Layered Debugging: We systematically isolated each layer—DNS, HTTP, CloudFront, Lambda, application code—rather than assuming the whole stack failed.

Direct Nameserver Queries: Querying Namecheap's authoritative nameservers directly bypassed local resolver caches and revealed the true state faster than waiting for global propagation.

CloudWatch Logs as Ground Truth: Instead of guessing from HTTP status codes, we checked actual Lambda