Migrating tech.queenofsandiego.com to a 5-Layer Model Workspace Protocol: Token Efficiency Through Structural Isolation
What Was Done
We identified and began executing a pilot migration of the tech.queenofsandiego.com blog workflow from a monolithic, always-on context architecture to a 5-layer Model Workspace Protocol structure. The goal: reduce per-session token overhead from ~10,925 tokens (primarily wasted on redundant context) down to ~370 tokens—a 30× improvement—while maintaining full feature parity and establishing a replicable migration pattern for other workflows.
This pilot was deliberately scoped to the least mission-critical workflow to prove the token-efficiency wins before generalizing the pattern to higher-risk systems like the ticket-runner or deposit/booking infrastructure.
The Problem: Context Bloat and Layer Mixing
The original workflow loaded:
- ~942 tokens from scattered
CLAUDE.mdfiles across the repos structure - ~3,420 tokens from the monolithic Queen of San Diego site
CLAUDE.md(auto-loaded every session, regardless of context) - ~6,563 tokens from
tech_blog_generator.py, the 2,100+ line Python generator script - ~90% waste: Layer-0 structural maps mixed with Layer-3 pricing rules, deployment safety checklists, and competitor analysis—none relevant to drafting a post about database indexing
Every session—whether for topic selection, research gathering, draft review, or publication—paid the full 10,925-token cost upfront, then discarded 90% of it.
Technical Architecture: The 5-Layer Model Workspace Protocol
The migration restructures context into five discrete layers, each loaded only when needed:
- Layer 0 (Map): A tiny
CLAUDE.mdin/workspaces/tech-blog/(~80 tokens) describing the entire workflow structure and routing logic. No rules, no pricing, no deploy checklists—just "go to Layer 1 for topic work, Layer 2 for research," etc. - Layer 1 (Router & Defaults): A
CONTEXT.mdfile that bridges Layer 0 to stages. Loads the voice guide (reference/voice.md) and stage metadata. ~120 tokens. - Layer 2 (Stage-Specific Context): Four directories—
01-topic/,02-research/,03-draft/,04-publish/—each with its ownCONTEXT.md(~80–120 tokens per stage). Topic stage loads post-idea templates; research stage loads source-citation rules; draft stage loads the style guide; publish stage loads the deployment checklist. - Layer 3 (Domain Rules): Extracted into
reference/subdirectories.voice.mdfor tone and audience,deploy-safety.mdfor publication guardrails,pricing.mdfor any monetization context (if needed in future),business.mdfor competitive/strategic context. None auto-loaded; pulled in by Layer-2CONTEXT.mdonly when the stage requires it. - Layer 4 (Artifacts): Per-run state, drafts, research notes, and generated outputs. Kept in stage directories to maintain lineage and avoid accumulating context debt across sessions.
Directory Structure and File Organization
/Users/cb/Library/Mobile Documents/com~apple~CloudDocs/repos/workspaces/tech-blog/
├── CLAUDE.md # Layer 0: 80 tok, map only
├── CONTEXT.md # Layer 1: 120 tok, router + voice
├── reference/
│ ├── voice.md # Tone, audience, post structure
│ ├── deploy-safety.md # Publication checklist & safety rules
│ ├── pricing.md # Monetization context (stage-gated)
│ └── business.md # Competitive analysis (stage-gated)
├── 01-topic/
│ ├── CONTEXT.md # Idea refinement rules, templates
│ └── [artifacts: post-ideas, outlines, feedback]
├── 02-research/
│ ├── CONTEXT.md # Source citation rules, fact-check gates
│ └── [artifacts: sources, notes, evidence chains]
├── 03-draft/
│ ├── CONTEXT.md # Writing style, editing rules, review gates
│ └── [artifacts: drafts, revision history, editor feedback]
└── 04-publish/
├── CONTEXT.md # Final checks, deploy sequence, rollback rules
└── [artifacts: final post, deploy logs, live URLs]
Token Accounting: Before and After
Before (monolithic):
- Repos CLAUDE.md: ~942 tokens
- QOS site CLAUDE.md: ~3,420 tokens
- tech_blog_generator.py: ~6,563 tokens
- Total per session: ~10,925 tokens (all loaded, most unused)
After (5-layer pilot):
- Layer 0 (map): ~80 tokens
- Layer 1 (router): ~120 tokens
- Layer 2 (stage, e.g., 03-draft): ~100 tokens
- Layer 3 (referenced, e.g., voice.md + deploy-safety.md): ~70 tokens
- Total per session: ~370 tokens (only relevant context loaded)
The generator script is replaced by stage-specific CONTEXT.md files that describe rules inline; no 2,100-line Python blob overhead.
Key Decisions and Rationale
- Why a monolithic Layer 0 map? The map is tiny, always needed, and serves as a cognitive index. It's the only file auto-loaded; everything else is on-demand. This breaks the assumption that "CLAUDE.md = all context."
- Why stage directories (01-, 02-, etc.)? Sequential naming makes workflow progression explicit and prevents accidental jumps. Each stage has its own state, so artifacts don't leak into unrelated sessions. The stage number also makes file references unambiguous across teams.
- Why reference/ as a separate tree? Domain rules (voice, safety, pricing, business) are orthogonal to workflow stages. Isolating them allows rules to be updated without touching stage logic, and stages can optionally include them without bloating layer 2.
- Why generate context.md files instead of a Python generator? Markdown context files are searchable, versionable, and don't require runtime dependencies. They're also self-documenting—a developer can read
03-draft/CONTEXT.mdand immediately understand what rules apply, whereas Python code requires execution and inspection. - Why start with tech-blog, not higher-impact workflows? Tech-blog is read/write-only, isolated from critical systems,