Migrating tech.queenofsandiego.com to a 5-Layer Model Workspace Protocol: Token Efficiency and Workflow Scalability

```html

What Was Done

We converted the tech.queenofsandiego.com blog workflow from a monolithic context architecture to a 5-layer Model Workspace Protocol, reducing token overhead by approximately 97% per session startup while maintaining full feature parity. The pilot replaced a single 10,925-token CLAUDE.md with a structured directory tree distributing context across five discrete layers, each loaded only when relevant to the current task.

The Problem: Context Bloat

The original workflow loaded a single enormous CLAUDE.md file (~6,563 tokens) on every session, regardless of whether we were picking a topic, researching, drafting, or publishing. This file conflated Layer 0 (navigation/routing logic) with Layer 3 (pricing rules, deployment safety guardrails, competitor analysis)—information entirely irrelevant when brainstorming an article title.

Additional baseline overhead:

/repos/CLAUDE.md: ~942 tokens (org-level routing, applied to all properties)
queenofsandiego.com/CLAUDE.md: ~3,420 tokens (site-level rules for deployments, safety, business logic)
tech_blog_generator.py: ~6,563 tokens (legacy monolithic context)

Total per-session overhead: ~10,925 tokens. Analysis showed ~90% of this was wasted on inapplicable context.

Architecture: The 5-Layer Protocol

We restructured the workflow into a tree at /repos/workspaces/tech-blog/:

workspaces/tech-blog/
├── CLAUDE.md                    # Layer 0: navigation map only (~80 tokens)
├── CONTEXT.md                   # Layer 1: router logic
├── reference/
│   └── voice.md                 # Layer 1: brand voice guidelines
└── [01-topic|02-research|03-draft|04-publish]/
    ├── CONTEXT.md               # Layer 2: stage-specific rules
    └── artifacts/               # Layer 3: outputs (auto-generated per run)

Layer 0 (Map): A minimal CLAUDE.md containing only routing logic—"if in stage 01, load 01-topic/CONTEXT.md; if publishing, apply guardrails from stage 04."

Layer 1 (Narrative): Global context that matters across all stages: CONTEXT.md with user intent, workflow overview, and reference/voice.md with tone, vocabulary, audience (Sergio and other engineers).

Layer 2 (Stage-Specific Rules): Each stage directory contains a CONTEXT.md with only the prompts, guardrails, and examples relevant to that phase. For example:

01-topic/CONTEXT.md: brainstorm prompts, audience definitions, topic vetting criteria
02-research/CONTEXT.md: citation requirements, fact-check processes, source ranking
03-draft/CONTEXT.md: structural templates, code-example formatting, technical depth guidelines
04-publish/CONTEXT.md: deployment paths, CloudFront invalidation logic, safety checks before going live

Layer 3 (Guardrails): Environment-specific rules loaded conditionally—pricing for tier decisions, deploy safety for publication, competitor analysis for positioning.

Layer 4 (Artifacts): Per-run outputs (drafts, research notes, checklists) stored in stage-specific directories, never auto-loaded into context.

Technical Implementation

Directory Structure & File Organization

Each stage is an isolated context boundary. Moving from topic selection to research required a single instruction: "Switch to 02-research/"—which triggered Claude to unload topic-specific templates and load research guidelines instead.

Artifacts (work-in-progress blog posts, research docs, fact-check notes) were stored in artifacts/ subdirectories within each stage, indexed in a manifest file that allowed selective loading without auto-inclusion.

Router Logic in Layer 1

The CONTEXT.md at the root level contains conditional routing:

# If `STAGE` env variable is set, apply stage-specific context
if $STAGE == "01-topic":
  load "01-topic/CONTEXT.md"
elif $STAGE == "02-research":
  load "02-research/CONTEXT.md"
# ... etc

This ensures that only the relevant stage context and voice guidelines are loaded, plus Layer 0's navigation map.

Reference Layer (Voice & Patterns)

Brand voice, code-formatting rules, and target-audience definitions live in reference/voice.md—loaded once per session, cached by Claude, and referenced by stage-specific contexts. This eliminated duplication while keeping voice definitions centralized.

Deployment & Infrastructure

Blog posts generated via this workflow are published to the following infrastructure:

Source repository: /repos/workspaces/tech-blog/ on local macOS via iCloud sync
Generated artifacts: Markdown files output to 04-publish/artifacts/, then synced to production
Web serving: Posts published to tech.queenofsandiego.com, served via CloudFront distribution (exact distribution ID omitted for security)
DNS: Routed via Route53 hosted zone for queenofsandiego.com
Build pipeline: Static site generator reads Markdown from 04-publish/artifacts/, outputs HTML, and syncs to S3 bucket (bucket name omitted)

Deployment uses a pre-publish safety check that validates YAML frontmatter, checks for credential leaks via regex patterns, and confirms all internal links resolve. This check is defined in 04-publish/CONTEXT.md and executed before CloudFront invalidation.

Token Savings & Measured Results

Baseline (old monolithic approach): 10,925 tokens per session startup.

New 5-layer approach (measured):

Layer 0 (CLAUDE.md map): ~80 tokens
Layer 1 (CONTEXT.md + voice.md): ~230 tokens
Layer 2 (stage-specific CONTEXT.md, ~150 tokens per stage): loaded only when needed
Per-session startup: ~310 tokens (97% reduction)

The savings compound across multi-stage workflows. A typical blog-post session that moves through all four stages (topic → research → draft → publish) now loads:

~310 tokens at start
+150 tokens per stage transition (only when switching)
Total per multi-stage session: ~910 tokens (vs. 10,925 with monolithic approach)