Migrating the Tech Blog to a 5-Layer Model Workspace Protocol: Token Optimization and Workflow Efficiency
What Was Done
We converted tech.queenofsandiego.com's blog publishing workflow from a monolithic context architecture to a 5-layer Model Workspace Protocol. This migration reduced per-session token overhead from ~10,925 tokens to a target of ~370 tokens—a 29× reduction—while maintaining feature parity and improving workflow clarity for multi-stage content production.
The tech blog was selected as the pilot because it's the lowest mission-criticality workflow in our portfolio, making it the ideal proving ground for this context-layering pattern before generalizing to deposit handling, ticket routing, and other high-stakes systems.
Technical Details: The Token Problem
The baseline implementation loaded three context layers simultaneously on every session:
- Layer 0 (Map): ~80 tokens of navigation and purpose (~30 tokens actually needed)
- Layer 3 (Business Rules): Pricing policies, deployment safety guardrails, competitor research constraints, and brand voice guidelines—mixed into the root
CLAUDE.mdalongside the navigation map. This added ~3,420 tokens that were irrelevant to 80% of sessions. - Layer 4 (Tools): The
tech_blog_generator.py` script itself—~6,563 tokens of executable Python, loaded on every run even when only reading draft status or checking voice tone.
Total waste: ~90% of the loaded context was orthogonal to any given session's actual work.
Architecture: The 5-Layer Model
The new structure lives in /repos/workspaces/tech-blog/:
tech-blog/
├── CLAUDE.md # Layer 0: 80 tokens, pure navigation
├── CONTEXT.md # Layer 1: ~200 tokens, smart router
├── reference/
│ └── voice.md # Tone/brand (loaded only in draft/publish stages)
├── 01-topic/
│ └── CONTEXT.md # Layer 2: Topic ideation rules + templates
├── 02-research/
│ └── CONTEXT.md # Layer 2: Research methodology + sources
├── 03-draft/
│ └── CONTEXT.md # Layer 2: Writing process + QA + SEO rules
└── 04-publish/
└── CONTEXT.md # Layer 2: Publishing checklist + CDN invalidation
Why this structure:
- Root
CLAUDE.mdcontains only the purpose statement and a pointer toCONTEXT.md. The router examines the user's request and loads only the relevant stage's context. - Each stage has its own
CONTEXT.md(Layer 2) that declares dependencies, rules, and sub-tasks—but does not inline business logic from other stages. reference/voice.mdis conditionally loaded during draft/publish, not during topic/research phases where tone guidelines are noise.- The generator script itself remains in
/tools/tech_blog_generator.pybut is summoned only when a stage explicitly requires execution, not on every session boot.
Infrastructure and File Organization
Baseline measurements:
/repos/sites/queenofsandiego.com/CLAUDE.md: 942 tokens (mixed layers, not split)/repos/sites/queenofsandiego.com/reference/files: 3,420 tokens (business rules, pricing, deploy safety)/tools/tech_blog_generator.py: 6,563 tokens (always loaded, conditionally executed)- Total baseline: 10,925 tokens per session
New measurements (target):
/repos/workspaces/tech-blog/CLAUDE.md: ~30 tokens (purpose only)/repos/workspaces/tech-blog/CONTEXT.md: ~200 tokens (router logic and dispatcher)- One active stage's
CONTEXT.md: ~80–120 tokens (stage-specific rules) reference/voice.md: ~60 tokens (conditionally loaded in draft/publish only)- Total active context: ~370–410 tokens per session
The generator script is now referenced via URI in the stage-specific CONTEXT.md` files and loaded on-demand, not globally.
Workflow Integration
The router in /repos/workspaces/tech-blog/CONTEXT.md uses a simple dispatch pattern:
# Example dispatcher logic (pseudocode in actual markdown)
User says: "I want to brainstorm post ideas"
→ Load: 01-topic/CONTEXT.md
→ Skip: voice.md, generator.py (not needed yet)
User says: "I need to finalize and publish"
→ Load: 04-publish/CONTEXT.md
→ Include: reference/voice.md (final QA)
→ Include: generator.py (invokes S3/CloudFront sync)
Each stage's CONTEXT.md declares what it can do, what inputs it needs, and what artifacts it produces:
- 01-topic: Accepts brief idea, outputs title + outline + keyword targets. Constraints: brand voice must align with prior posts (references
reference/voice.md` summary, not full file). - 02-research: Accepts title + outline, outputs annotated sources + competitor summaries. Runs fact-checking logic from
tools/tech_blog_fact_check.py(on-demand). - 03-draft: Accepts outline + sources, outputs full markdown + inline TODOs. Loads
reference/voice.mdfor real-time tone feedback. - 04-publish: Accepts final markdown, outputs S3 key, triggers CloudFront cache invalidation, updates sitemap. Invokes
tech_blog_generator.py` for deployment.
Key Decisions
Why stage-local CONTEXT.md instead of a monolithic generator script?
A single script handles the full pipeline, but knowledge workers jump between stages. A developer researching topics doesn't need deploy safety rules. Splitting context by stage lets each worker load only their scope, reducing cognitive overhead and token cost.
Why conditionally load reference/voice.md?
Brand voice guidelines are critical during draft and publish QA but are irrelevant during topic ideation and research sourcing. Moving it out of the root context and making it stage-aware saves ~60 tokens in 80% of sessions.
Why keep the generator script separate instead of inlining logic into each stage?
The generator is executable (Python). Inlining it into markdown would force re-parsing and re-interpretation. By storing it separately and URI-referencing it from stage contexts, we allow on-demand loading and direct execution without rehydration