Implementing a 5-Layer Model Workspace Protocol for Tech Blog Generation: Token Efficiency through Hierarchical Context
What Was Done
We migrated the tech.queenofsandiego.com blog generation workflow from a monolithic context model to a structured 5-layer workspace protocol. The pilot converted ~10,925 context tokens of always-on overhead into a ~370-token baseline—a 29.5× reduction—by decomposing a single 6,563-token generator script and bloated Layer-0 CLAUDE.md into discrete, stage-specific contexts that load only when needed.
This wasn't a refactor for its own sake. The old architecture loaded the entire pricing table, deployment safety rules, competitor analysis, and business context every session, even when the task was just brainstorming a topic outline. The new design uses a router pattern with four execution stages, each with surgical context injection.
Technical Details: The Old vs. New Architecture
Baseline Audit
Before migration, we measured exact token consumption across three categories:
- Repos-layer CLAUDE.md: ~942 tokens (generic site config, deploy guards, API endpoints)
- QoS site CLAUDE.md: ~3,420 tokens (pricing, business rules, brand voice, competitor data)
- tech_blog_generator.py: ~6,563 tokens (orchestration logic, all four post-creation stages hardcoded in one script)
- Total per-session overhead: ~10,925 tokens
- Waste ratio: ~90% of that loaded regardless of which stage was running
The tech_blog_generator.py script contained everything: topic research, outline drafting, full-post generation, and publication—all parsed into memory and all context-eligible on every invocation.
New Directory Structure
We scaffolded a dedicated workspace tree at /Users/cb/icloud-repos/workspaces/tech-blog/:
workspaces/tech-blog/
├── CLAUDE.md # Layer 0: 80 tokens, pointer map only
├── CONTEXT.md # Layer 1: router logic, dispatch rules
├── reference/
│ └── voice.md # Brand voice guidelines (~200 tokens)
├── 01-topic/
│ └── CONTEXT.md # Layer 2: topic research stage context
├── 02-research/
│ └── CONTEXT.md # Layer 2: deep research + outlining context
├── 03-draft/
│ └── CONTEXT.md # Layer 2: full-post writing context
└── 04-publish/
└── CONTEXT.md # Layer 2: publication & promotion context
Layer 0: The Map (CLAUDE.md)
The new root CLAUDE.md is deliberately minimal—about 80 tokens. It serves only as a navigation layer:
- File structure (where each stage lives)
- Dispatch rule: "If working on topic selection, load
01-topic/CONTEXT.md" - Pointer to
CONTEXT.mdfor the actual router logic - One-line reference to
reference/voice.mdfor brand consistency
It does not include pricing tables, deployment checklists, competitor data, or orchestration logic.
Layer 1: The Router (CONTEXT.md at workspace root)
This ~300-token file contains the decision tree:
- Stage detection: reads the user's stated goal or current artifacts to determine which of the four stages is active
- Context injection rules: "If in topic phase, do not load draft or publish contexts"
- Tool binding: which shell commands, Python scripts, or API calls are legal in each stage
- Artifact expectations: "By end of stage 02, you must have a
/04-publish/outline.md"
Layer 2: Stage-Specific Contexts
Each of the four CONTEXT.md files in 01-topic/, 02-research/, 03-draft/, and 04-publish/ is loaded only when that stage is active. For example:
- 01-topic/CONTEXT.md (~250 tokens): instructions for brainstorming and vetting blog post ideas, links to past post categories, SEO keyword strategies
- 02-research/CONTEXT.md (~400 tokens): deep-dive research prompts, competitor blog analysis, technical depth requirements, outline structure template
- 03-draft/CONTEXT.md (~600 tokens): tone & voice guidance, code-example formatting rules, internal link strategy, length targets, technical accuracy standards
- 04-publish/CONTEXT.md (~350 tokens): frontmatter schema, Hugo shortcode usage, social-media teaser generation, update-dashboard.py integration
Layer 3 & 4: Reference + Runtime Artifacts
reference/voice.md(~200 tokens) is always available but not auto-loaded; it's cited by name when brand tone questions arise- Per-run artifacts (topic brainstorm notes, research findings, draft .md files) live in stage directories and are loaded only on subsequent invocations within that stage
Infrastructure & Tool Integration
The migration touched no infrastructure—no S3 buckets, CloudFront distributions, or Route53 records. The entire change is context and workflow organization. However, we did integrate three existing tools:
- tech_blog_generator.py (now staged): the orchestration script no longer lives as a single 6,563-token monolith; its core functions are refactored into stage-specific modules under
workspaces/tech-blog/ - update_dashboard.py (on jada-agent box): the publish stage calls this to log finished posts to the project dashboard
- Hugo static site generator: publication stage validates frontmatter and builds local preview before pushing to
tech.queenofsandiego.comrepo
All invocations use the ! remote-command gate when touching the box; no bare SSH or API calls.
Key Decisions
Why Four Stages?
Topic, research, draft, and publish are the natural inflection points where context needs differ most. A writer may iterate on a topic outline for days without ever touching technical depth (stage 2). Another may skip stages 1–2 entirely if the idea is pre-approved. Four stages let each workflow path load only what it needs.
Why Not a Single Generator with Branches?
The original tech_blog_generator.pyalways. By splitting into four discrete CONTEXT.md files,