Semantic Controls for AI Workflows Part 2

By Jay Griffin, Claude Sonnet 4.5*🔧 AI-Assisted - Conversation synthesis exploring schema-based AI control systems· January 29, 2026

🏷️ Tags:aisemantic-compressionschemasstructured-outputszodcontrol-systemsweb-development

AI as an intelligence building block

Conversation Summary

Started with the article concept - Semantic compression as a way to pack more meaning into symbolic syntax for AI workflows
Identified two core goals - Better AI instructions (input compression) AND better AI outputs (semantic shorthand that gets transpiled)
Recognized the dual nature of symbols - They carry both structural information AND semantic meaning (e.g., ! = standalone block + critical warning about consequences)
Questioned the verbosity problem - Why have AI generate massive amounts of text when we only want decision-making, not prose generation?
Explored thinking models - Would extended reasoning help? Are there limits? Does this approach optimize for what thinking models do well?
Hit the training question - Does this require training a custom model, or can it work with in-context learning and existing models?
Realized the positioning - As a full-stack dev, training models is out of scope. Better to build infrastructure that works across all models as they improve.
Simplified to prompting - Maybe this is "just" a really good prompt that constrains output format?
Added structure with schemas - Use Zod + structured outputs to enforce valid component selection and data shapes
Discovered the key distinction - Component choice constraints (structural validation) vs forcing function constraints (semantic requirements)
Defined the control system - Schema descriptions that encode "how to think" about each component, creating enforceable semantic requirements
Faced the validation problem - How do you validate if output is semantically good, not just structurally valid? (It's AI evaluation all the way down)
Landed on pragmatic goals - Not perfect automation, but significant quality improvement and faster composition for web development

Final Decisions & Core Insights

What I Want to Build

A control system for AI web generation that forces semantic reasoning at each decision point.

Not training a model. Not inventing new symbols. Building a schema + validation layer that:

Constrains component choices (structural validation via Zod)
Encodes reasoning requirements in schema descriptions (what to think about)
Validates outputs pass tests before execution (the actual forcing function)
Works across any model that supports structured outputs
Transpiles to my actual component library

The Testing Harness Paradigm

This is fundamentally what this entire approach is about.

The game has shifted from reinforcement learning and training with rewards to harnessing frontier models through enforcement mechanisms.

Think of it this way:

Frontier models are the river - powerful, constantly flowing, improving every few months
Harnesses are the pipes - structured outputs, schemas, validation layers that channel that power to where it's needed
My codebase - the pipes deliver clean water (valid, useful output) to every house (component, feature, page)

The core insight: Testing harnesses provide concrete success criteria and enforcement. I'm not trying to make AI smarter through training. I'm forcing it to do the right thing by constraining what it can output to only represent valid decisions.

What testing harnesses give me:

Concrete success criteria - "Did you satisfy the constraints?" (not vague quality judgments)
Enforcement mechanism - "You can't proceed unless you pass the test" (the actual forcing function)
Control without training - Force correct behavior through validation, not RL rewards

This is exactly what structured outputs + schemas + validation achieve:

Schemas encode structural and semantic requirements ("what fields are needed and what to think about")
Schema descriptions prompt semantic reasoning at each field ("why am I choosing this component?")
Validation tests are the forcing functions - AI can only proceed if output passes (structural + semantic checks)
I get reliable results from the best available model

The new paradigm:

Old way: Train/fine-tune models with rewards for specific tasks
New way: Take the best model + build enforcement infrastructure (testing harnesses)

I'm not in the model-building business. I'm in the model-harnessing business.

The Key Innovation

Schemas encode semantic requirements. Validation tests are the forcing functions that enforce them.

Schemas tell the AI what to think about. Tests force the AI to actually do it right. This is the concrete implementation of the testing harness paradigm.

Example of the difference:

Lazy schema:

.ts

z.object({
  type: z.literal('hero'),
  title: z.string(),
  subtitle: z.string()
})

Control system schema:

.ts

z.object({
  type: z.literal('hero'),
  problem: z.string().describe('What pain point brings users here?'),
  solution: z.string().describe('What transformation does this enable?'),
  proof: z.string().describe('Why should they believe you?'),
  cta: z.object({
    verb: z.enum(['start', 'try', 'explore', 'build']),
    outcome: z.string().describe('What happens after click?')
  })
})

The second one forces the AI to think like a designer/copywriter, not just fill in boxes.

Strategic Positioning

Staying in my lane as a full-stack developer:

Don't train models (not my domain, not my competitive advantage)
Build the infrastructure layer that works across all models
Let model providers compete on reasoning quality
I compete on schema design and developer experience

The leverage play:

Models improve every 6 months → my system improves automatically
I own the interface between human intent and AI output

What Success Looks Like

Not: Fully autonomous, perfect page generation But: 10x faster page building with 80% quality that needs 20% human refinement

The comparison:

Manual: Quality 9/10, Speed: 1 page/hour
Generic AI: Quality 4/10, Unusable
v0/Lovable: Quality 6/10, Not your design system
My system: Quality 7-8/10, Fast, MY components

The Honest Limitations

Quality validation is hard - Can't programmatically determine if a problem statement is compelling, only if it exists and has the right shape
Still needs human oversight - The control system forces right questions to be answered, but humans judge if answers are good
It's still prompting - Just highly structured prompting with enforcement mechanisms
Not revolutionary - Practical tool for web dev leverage, not a fundamental AI breakthrough

Next Steps for Demo

1. Define My Component Library (Start Small)

Choose 5-8 core components I actually use:

Hero
FeatureGrid
CTA
CodeBlock
SocialProof
FAQ
Pricing
Newsletter

Document what each component is for semantically, not just what it looks like.

2. Design Control Schemas

For each component, create a Zod schema with:

Structural constraints (required fields, types, lengths)
Semantic forcing functions (descriptions that require specific decisions)
Validation rules that enforce quality minimums

Example template:

.ts

const HeroSchema = z.object({
  type: z.literal('hero'),
  // What questions must the AI answer?
  problem: z.string().min(20).describe('What pain point?'),
  solution: z.string().min(20).describe('What transformation?'),
  proof: z.string().describe('Why believe you?'),
  cta: z.object({
    action: z.string().describe('Specific verb'),
    outcome: z.string().describe('What happens after?')
  })
})

3. Build the Generation System

.ts

// System prompt that explains the component library
// + Structured output using my schemas
// + Validation layer (Zod)
// + Error handling / retry logic

4. Create the Transpiler

Map validated JSON to my actual React components:

.ts

function transpile(pageData: PageSchema) {
  return pageData.components.map(component => {
    switch(component.type) {
      case 'hero':
        return <Hero 
          problem={component.problem}
          solution={component.solution}
          // ... map all fields
        />
      // ... other components
    }
  })
}

5. Run the Comparison Test

Pick a concrete page to build (e.g., "Landing page for an AI coding tool")

Generate it three ways:

Baseline AI - Generic prompt, no controls
My system - Schema-controlled generation
Manual - I write it myself

Evaluate on:

Information architecture quality
Semantic appropriateness of component choices
Content quality within each component
Time to completion
Amount of editing needed

Success metric: If #2 is noticeably better than #1 and gets me 80% of the way to #3 in 20% of the time, I've built something valuable.

6. Iterate on Schema Design

Based on test results:

Which semantic forcing functions actually improved output?
Which constraints were too rigid or too loose?
What components/fields need better descriptions?
Where did the AI consistently make poor decisions?

7. Build the Developer Interface

Once the core works:

CLI tool? Web interface? VSCode extension?
How do developers describe what page they want?
How do they review/edit generated components?
How do they customize the component library?

Open Questions to Explore

How many examples does the AI need? - Can it work zero-shot or does it need few-shot examples of good component usage?
Does model choice matter significantly? - Will Claude vs GPT-4 vs Gemini produce notably different quality with the same schemas?
What's the optimal granularity? - Should schemas be very detailed (many fields, specific descriptions) or minimal (fewer constraints, more AI freedom)?
How do we handle composition? - When should the AI use multiple small components vs one large component? How do we encode that decision?
What about variations? - How do schemas handle "Hero with image" vs "Hero with video" vs "Hero with code example"?

The Real Value Proposition

For me personally: Compose pages 10x faster using my own design system

For other developers: A system they can customize with their component libraries to get the same leverage

For the ecosystem: Proof that schema design + semantic forcing functions can significantly improve AI output quality without training custom models

Philosophical Takeaway

Sometimes the best AI innovation isn't building a smarter model - it's building better testing harnesses that force existing models to pass concrete tests before they can proceed.

I'm not making AI smarter. I'm making it pass my tests.

Schemas with semantic forcing functions are testing harnesses. Structured outputs are enforcement mechanisms. I'm not training behavior - I'm requiring proof of correct decisions at every step.