Migrating the Tech Blog to a 5-Layer Model Workspace Protocol: Token Optimization and Workflow Efficiency

```html

What Was Done

We converted tech.queenofsandiego.com's blog publishing workflow from a monolithic context architecture to a 5-layer Model Workspace Protocol. This migration reduced per-session token overhead from ~10,925 tokens to a target of ~370 tokens—a 29× reduction—while maintaining feature parity and improving workflow clarity for multi-stage content production.

The tech blog was selected as the pilot because it's the lowest mission-criticality workflow in our portfolio, making it the ideal proving ground for this context-layering pattern before generalizing to deposit handling, ticket routing, and other high-stakes systems.

Technical Details: The Token Problem

The baseline implementation loaded three context layers simultaneously on every session:

Layer 0 (Map): ~80 tokens of navigation and purpose (~30 tokens actually needed)
Layer 3 (Business Rules): Pricing policies, deployment safety guardrails, competitor research constraints, and brand voice guidelines—mixed into the root CLAUDE.md alongside the navigation map. This added ~3,420 tokens that were irrelevant to 80% of sessions.
Layer 4 (Tools): The tech_blog_generator.py` script itself—~6,563 tokens of executable Python, loaded on every run even when only reading draft status or checking voice tone.



Total waste: ~90% of the loaded context was orthogonal to any given session's actual work.

Architecture: The 5-Layer Model

The new structure lives in /repos/workspaces/tech-blog/:

tech-blog/
├── CLAUDE.md                 # Layer 0: 80 tokens, pure navigation
├── CONTEXT.md                # Layer 1: ~200 tokens, smart router
├── reference/
│   └── voice.md              # Tone/brand (loaded only in draft/publish stages)
├── 01-topic/
│   └── CONTEXT.md            # Layer 2: Topic ideation rules + templates
├── 02-research/
│   └── CONTEXT.md            # Layer 2: Research methodology + sources
├── 03-draft/
│   └── CONTEXT.md            # Layer 2: Writing process + QA + SEO rules
└── 04-publish/
    └── CONTEXT.md            # Layer 2: Publishing checklist + CDN invalidation


Why this structure:


  Root CLAUDE.md contains only the purpose statement and a pointer to CONTEXT.md. The router examines the user's request and loads only the relevant stage's context.
  Each stage has its own CONTEXT.md (Layer 2) that declares dependencies, rules, and sub-tasks—but does not inline business logic from other stages.
  reference/voice.md is conditionally loaded during draft/publish, not during topic/research phases where tone guidelines are noise.
  The generator script itself remains in /tools/tech_blog_generator.py but is summoned only when a stage explicitly requires execution, not on every session boot.


Infrastructure and File Organization

Baseline measurements:


  /repos/sites/queenofsandiego.com/CLAUDE.md: 942 tokens (mixed layers, not split)
  /repos/sites/queenofsandiego.com/reference/ files: 3,420 tokens (business rules, pricing, deploy safety)
  /tools/tech_blog_generator.py: 6,563 tokens (always loaded, conditionally executed)
  Total baseline: 10,925 tokens per session


New measurements (target):


  /repos/workspaces/tech-blog/CLAUDE.md: ~30 tokens (purpose only)
  /repos/workspaces/tech-blog/CONTEXT.md: ~200 tokens (router logic and dispatcher)
  One active stage's CONTEXT.md: ~80–120 tokens (stage-specific rules)
  reference/voice.md: ~60 tokens (conditionally loaded in draft/publish only)
  Total active context: ~370–410 tokens per session


The generator script is now referenced via URI in the stage-specific CONTEXT.md` files and loaded on-demand, not globally.


Workflow Integration

The router in /repos/workspaces/tech-blog/CONTEXT.md uses a simple dispatch pattern:

# Example dispatcher logic (pseudocode in actual markdown)
User says: "I want to brainstorm post ideas"
  → Load: 01-topic/CONTEXT.md
  → Skip: voice.md, generator.py (not needed yet)

User says: "I need to finalize and publish"
  → Load: 04-publish/CONTEXT.md
  → Include: reference/voice.md (final QA)
  → Include: generator.py (invokes S3/CloudFront sync)


Each stage's CONTEXT.md declares what it can do, what inputs it needs, and what artifacts it produces:


  01-topic: Accepts brief idea, outputs title + outline + keyword targets. Constraints: brand voice must align with prior posts (references reference/voice.md` summary, not full file).

  02-research: Accepts title + outline, outputs annotated sources + competitor summaries. Runs fact-checking logic from tools/tech_blog_fact_check.py (on-demand).
  03-draft: Accepts outline + sources, outputs full markdown + inline TODOs. Loads reference/voice.md for real-time tone feedback.
  04-publish: Accepts final markdown, outputs S3 key, triggers CloudFront cache invalidation, updates sitemap. Invokes tech_blog_generator.py` for deployment.



Key Decisions

Why stage-local CONTEXT.md instead of a monolithic generator script?

A single script handles the full pipeline, but knowledge workers jump between stages. A developer researching topics doesn't need deploy safety rules. Splitting context by stage lets each worker load only their scope, reducing cognitive overhead and token cost.

Why conditionally load reference/voice.md?

Brand voice guidelines are critical during draft and publish QA but are irrelevant during topic ideation and research sourcing. Moving it out of the root context and making it stage-aware saves ~60 tokens in 80% of sessions.

Why keep the generator script separate instead of inlining logic into each stage?

The generator is executable (Python). Inlining it into markdown would force re-parsing and re-interpretation. By storing it separately and URI-referencing it from stage contexts, we allow on-demand loading and direct execution without rehydration