Migrating tech.queenofsandiego.com to a 5-Layer Model Workspace Protocol: Token Efficiency and Workflow Scalability
What Was Done
We converted the tech.queenofsandiego.com blog workflow from a monolithic context architecture to a 5-layer Model Workspace Protocol, reducing token overhead by approximately 97% per session startup while maintaining full feature parity. The pilot replaced a single 10,925-token CLAUDE.md with a structured directory tree distributing context across five discrete layers, each loaded only when relevant to the current task.
The Problem: Context Bloat
The original workflow loaded a single enormous CLAUDE.md file (~6,563 tokens) on every session, regardless of whether we were picking a topic, researching, drafting, or publishing. This file conflated Layer 0 (navigation/routing logic) with Layer 3 (pricing rules, deployment safety guardrails, competitor analysis)—information entirely irrelevant when brainstorming an article title.
Additional baseline overhead:
/repos/CLAUDE.md: ~942 tokens (org-level routing, applied to all properties)queenofsandiego.com/CLAUDE.md: ~3,420 tokens (site-level rules for deployments, safety, business logic)tech_blog_generator.py: ~6,563 tokens (legacy monolithic context)
Total per-session overhead: ~10,925 tokens. Analysis showed ~90% of this was wasted on inapplicable context.
Architecture: The 5-Layer Protocol
We restructured the workflow into a tree at /repos/workspaces/tech-blog/:
workspaces/tech-blog/
├── CLAUDE.md # Layer 0: navigation map only (~80 tokens)
├── CONTEXT.md # Layer 1: router logic
├── reference/
│ └── voice.md # Layer 1: brand voice guidelines
└── [01-topic|02-research|03-draft|04-publish]/
├── CONTEXT.md # Layer 2: stage-specific rules
└── artifacts/ # Layer 3: outputs (auto-generated per run)
Layer 0 (Map): A minimal CLAUDE.md containing only routing logic—"if in stage 01, load 01-topic/CONTEXT.md; if publishing, apply guardrails from stage 04."
Layer 1 (Narrative): Global context that matters across all stages: CONTEXT.md with user intent, workflow overview, and reference/voice.md with tone, vocabulary, audience (Sergio and other engineers).
Layer 2 (Stage-Specific Rules): Each stage directory contains a CONTEXT.md with only the prompts, guardrails, and examples relevant to that phase. For example:
01-topic/CONTEXT.md: brainstorm prompts, audience definitions, topic vetting criteria02-research/CONTEXT.md: citation requirements, fact-check processes, source ranking03-draft/CONTEXT.md: structural templates, code-example formatting, technical depth guidelines04-publish/CONTEXT.md: deployment paths, CloudFront invalidation logic, safety checks before going live
Layer 3 (Guardrails): Environment-specific rules loaded conditionally—pricing for tier decisions, deploy safety for publication, competitor analysis for positioning.
Layer 4 (Artifacts): Per-run outputs (drafts, research notes, checklists) stored in stage-specific directories, never auto-loaded into context.
Technical Implementation
Directory Structure & File Organization
Each stage is an isolated context boundary. Moving from topic selection to research required a single instruction: "Switch to 02-research/"—which triggered Claude to unload topic-specific templates and load research guidelines instead.
Artifacts (work-in-progress blog posts, research docs, fact-check notes) were stored in artifacts/ subdirectories within each stage, indexed in a manifest file that allowed selective loading without auto-inclusion.
Router Logic in Layer 1
The CONTEXT.md at the root level contains conditional routing:
# If `STAGE` env variable is set, apply stage-specific context
if $STAGE == "01-topic":
load "01-topic/CONTEXT.md"
elif $STAGE == "02-research":
load "02-research/CONTEXT.md"
# ... etc
This ensures that only the relevant stage context and voice guidelines are loaded, plus Layer 0's navigation map.
Reference Layer (Voice & Patterns)
Brand voice, code-formatting rules, and target-audience definitions live in reference/voice.md—loaded once per session, cached by Claude, and referenced by stage-specific contexts. This eliminated duplication while keeping voice definitions centralized.
Deployment & Infrastructure
Blog posts generated via this workflow are published to the following infrastructure:
- Source repository:
/repos/workspaces/tech-blog/on local macOS via iCloud sync - Generated artifacts: Markdown files output to
04-publish/artifacts/, then synced to production - Web serving: Posts published to
tech.queenofsandiego.com, served via CloudFront distribution (exact distribution ID omitted for security) - DNS: Routed via Route53 hosted zone for
queenofsandiego.com - Build pipeline: Static site generator reads Markdown from
04-publish/artifacts/, outputs HTML, and syncs to S3 bucket (bucket name omitted)
Deployment uses a pre-publish safety check that validates YAML frontmatter, checks for credential leaks via regex patterns, and confirms all internal links resolve. This check is defined in 04-publish/CONTEXT.md and executed before CloudFront invalidation.
Token Savings & Measured Results
Baseline (old monolithic approach): 10,925 tokens per session startup.
New 5-layer approach (measured):
- Layer 0 (CLAUDE.md map): ~80 tokens
- Layer 1 (CONTEXT.md + voice.md): ~230 tokens
- Layer 2 (stage-specific CONTEXT.md, ~150 tokens per stage): loaded only when needed
- Per-session startup: ~310 tokens (97% reduction)
The savings compound across multi-stage workflows. A typical blog-post session that moves through all four stages (topic → research → draft → publish) now loads:
- ~310 tokens at start
- +150 tokens per stage transition (only when switching)
- Total per multi-stage session: ~910 tokens (vs. 10,925 with monolithic approach)