Implementing a 5-Layer Model Workspace Protocol for Tech Blog Generation: Token Efficiency through Hierarchical Context

```html

What Was Done

We migrated the tech.queenofsandiego.com blog generation workflow from a monolithic context model to a structured 5-layer workspace protocol. The pilot converted ~10,925 context tokens of always-on overhead into a ~370-token baseline—a 29.5× reduction—by decomposing a single 6,563-token generator script and bloated Layer-0 CLAUDE.md into discrete, stage-specific contexts that load only when needed.

This wasn't a refactor for its own sake. The old architecture loaded the entire pricing table, deployment safety rules, competitor analysis, and business context every session, even when the task was just brainstorming a topic outline. The new design uses a router pattern with four execution stages, each with surgical context injection.

Technical Details: The Old vs. New Architecture

Baseline Audit

Before migration, we measured exact token consumption across three categories:

Repos-layer CLAUDE.md: ~942 tokens (generic site config, deploy guards, API endpoints)
QoS site CLAUDE.md: ~3,420 tokens (pricing, business rules, brand voice, competitor data)
tech_blog_generator.py: ~6,563 tokens (orchestration logic, all four post-creation stages hardcoded in one script)
Total per-session overhead: ~10,925 tokens
Waste ratio: ~90% of that loaded regardless of which stage was running

The tech_blog_generator.py script contained everything: topic research, outline drafting, full-post generation, and publication—all parsed into memory and all context-eligible on every invocation.

New Directory Structure

We scaffolded a dedicated workspace tree at /Users/cb/icloud-repos/workspaces/tech-blog/:

workspaces/tech-blog/
├── CLAUDE.md                    # Layer 0: 80 tokens, pointer map only
├── CONTEXT.md                   # Layer 1: router logic, dispatch rules
├── reference/
│   └── voice.md                 # Brand voice guidelines (~200 tokens)
├── 01-topic/
│   └── CONTEXT.md               # Layer 2: topic research stage context
├── 02-research/
│   └── CONTEXT.md               # Layer 2: deep research + outlining context
├── 03-draft/
│   └── CONTEXT.md               # Layer 2: full-post writing context
└── 04-publish/
    └── CONTEXT.md               # Layer 2: publication & promotion context

Layer 0: The Map (CLAUDE.md)

The new root CLAUDE.md is deliberately minimal—about 80 tokens. It serves only as a navigation layer:

File structure (where each stage lives)
Dispatch rule: "If working on topic selection, load 01-topic/CONTEXT.md"
Pointer to CONTEXT.md for the actual router logic
One-line reference to reference/voice.md for brand consistency

It does not include pricing tables, deployment checklists, competitor data, or orchestration logic.

Layer 1: The Router (CONTEXT.md at workspace root)

This ~300-token file contains the decision tree:

Stage detection: reads the user's stated goal or current artifacts to determine which of the four stages is active
Context injection rules: "If in topic phase, do not load draft or publish contexts"
Tool binding: which shell commands, Python scripts, or API calls are legal in each stage
Artifact expectations: "By end of stage 02, you must have a /04-publish/outline.md"

Layer 2: Stage-Specific Contexts

Each of the four CONTEXT.md files in 01-topic/, 02-research/, 03-draft/, and 04-publish/ is loaded only when that stage is active. For example:

01-topic/CONTEXT.md (~250 tokens): instructions for brainstorming and vetting blog post ideas, links to past post categories, SEO keyword strategies
02-research/CONTEXT.md (~400 tokens): deep-dive research prompts, competitor blog analysis, technical depth requirements, outline structure template
03-draft/CONTEXT.md (~600 tokens): tone & voice guidance, code-example formatting rules, internal link strategy, length targets, technical accuracy standards
04-publish/CONTEXT.md (~350 tokens): frontmatter schema, Hugo shortcode usage, social-media teaser generation, update-dashboard.py integration

Layer 3 & 4: Reference + Runtime Artifacts

reference/voice.md (~200 tokens) is always available but not auto-loaded; it's cited by name when brand tone questions arise
Per-run artifacts (topic brainstorm notes, research findings, draft .md files) live in stage directories and are loaded only on subsequent invocations within that stage

Infrastructure & Tool Integration

The migration touched no infrastructure—no S3 buckets, CloudFront distributions, or Route53 records. The entire change is context and workflow organization. However, we did integrate three existing tools:

tech_blog_generator.py (now staged): the orchestration script no longer lives as a single 6,563-token monolith; its core functions are refactored into stage-specific modules under workspaces/tech-blog/
update_dashboard.py (on jada-agent box): the publish stage calls this to log finished posts to the project dashboard
Hugo static site generator: publication stage validates frontmatter and builds local preview before pushing to tech.queenofsandiego.com repo

All invocations use the ! remote-command gate when touching the box; no bare SSH or API calls.

Key Decisions

Why Four Stages?

Topic, research, draft, and publish are the natural inflection points where context needs differ most. A writer may iterate on a topic outline for days without ever touching technical depth (stage 2). Another may skip stages 1–2 entirely if the idea is pre-approved. Four stages let each workflow path load only what it needs.

Why Not a Single Generator with Branches?

The original tech_blog_generator.pyalways. By splitting into four discrete CONTEXT.md files,