Semantic Controls for AI Workflows Part 2
AI as an intelligence building block
Conversation Summary
-
Started with the article concept - Semantic compression as a way to pack more meaning into symbolic syntax for AI workflows
-
Identified two core goals - Better AI instructions (input compression) AND better AI outputs (semantic shorthand that gets transpiled)
-
Recognized the dual nature of symbols - They carry both structural information AND semantic meaning (e.g.,
!= standalone block + critical warning about consequences) -
Questioned the verbosity problem - Why have AI generate massive amounts of text when we only want decision-making, not prose generation?
-
Explored thinking models - Would extended reasoning help? Are there limits? Does this approach optimize for what thinking models do well?
-
Hit the training question - Does this require training a custom model, or can it work with in-context learning and existing models?
-
Realized the positioning - As a full-stack dev, training models is out of scope. Better to build infrastructure that works across all models as they improve.
-
Simplified to prompting - Maybe this is "just" a really good prompt that constrains output format?
-
Added structure with schemas - Use Zod + structured outputs to enforce valid component selection and data shapes
-
Discovered the key distinction - Component choice constraints (structural validation) vs forcing function constraints (semantic requirements)
-
Defined the control system - Schema descriptions that encode "how to think" about each component, creating enforceable semantic requirements
-
Faced the validation problem - How do you validate if output is semantically good, not just structurally valid? (It's AI evaluation all the way down)
-
Landed on pragmatic goals - Not perfect automation, but significant quality improvement and faster composition for web development
Final Decisions & Core Insights
What I Want to Build
A control system for AI web generation that forces semantic reasoning at each decision point.
Not training a model. Not inventing new symbols. Building a schema + validation layer that:
- Constrains component choices (structural validation via Zod)
- Encodes reasoning requirements in schema descriptions (what to think about)
- Validates outputs pass tests before execution (the actual forcing function)
- Works across any model that supports structured outputs
- Transpiles to my actual component library
The Testing Harness Paradigm
This is fundamentally what this entire approach is about.
The game has shifted from reinforcement learning and training with rewards to harnessing frontier models through enforcement mechanisms.
Think of it this way:
- Frontier models are the river - powerful, constantly flowing, improving every few months
- Harnesses are the pipes - structured outputs, schemas, validation layers that channel that power to where it's needed
- My codebase - the pipes deliver clean water (valid, useful output) to every house (component, feature, page)
The core insight: Testing harnesses provide concrete success criteria and enforcement. I'm not trying to make AI smarter through training. I'm forcing it to do the right thing by constraining what it can output to only represent valid decisions.
What testing harnesses give me:
- Concrete success criteria - "Did you satisfy the constraints?" (not vague quality judgments)
- Enforcement mechanism - "You can't proceed unless you pass the test" (the actual forcing function)
- Control without training - Force correct behavior through validation, not RL rewards
This is exactly what structured outputs + schemas + validation achieve:
- Schemas encode structural and semantic requirements ("what fields are needed and what to think about")
- Schema descriptions prompt semantic reasoning at each field ("why am I choosing this component?")
- Validation tests are the forcing functions - AI can only proceed if output passes (structural + semantic checks)
- I get reliable results from the best available model
The new paradigm:
- Old way: Train/fine-tune models with rewards for specific tasks
- New way: Take the best model + build enforcement infrastructure (testing harnesses)
I'm not in the model-building business. I'm in the model-harnessing business.
The Key Innovation
Schemas encode semantic requirements. Validation tests are the forcing functions that enforce them.
Schemas tell the AI what to think about. Tests force the AI to actually do it right. This is the concrete implementation of the testing harness paradigm.
Example of the difference:
Lazy schema:
z.object({
type: z.literal('hero'),
title: z.string(),
subtitle: z.string()
})
Control system schema:
z.object({
type: z.literal('hero'),
problem: z.string().describe('What pain point brings users here?'),
solution: z.string().describe('What transformation does this enable?'),
proof: z.string().describe('Why should they believe you?'),
cta: z.object({
verb: z.enum(['start', 'try', 'explore', 'build']),
outcome: z.string().describe('What happens after click?')
})
})
The second one forces the AI to think like a designer/copywriter, not just fill in boxes.
Strategic Positioning
Staying in my lane as a full-stack developer:
- Don't train models (not my domain, not my competitive advantage)
- Build the infrastructure layer that works across all models
- Let model providers compete on reasoning quality
- I compete on schema design and developer experience
The leverage play:
- Models improve every 6 months → my system improves automatically
- I own the interface between human intent and AI output
What Success Looks Like
Not: Fully autonomous, perfect page generation But: 10x faster page building with 80% quality that needs 20% human refinement
The comparison:
- Manual: Quality 9/10, Speed: 1 page/hour
- Generic AI: Quality 4/10, Unusable
- v0/Lovable: Quality 6/10, Not your design system
- My system: Quality 7-8/10, Fast, MY components
The Honest Limitations
-
Quality validation is hard - Can't programmatically determine if a problem statement is compelling, only if it exists and has the right shape
-
Still needs human oversight - The control system forces right questions to be answered, but humans judge if answers are good
-
It's still prompting - Just highly structured prompting with enforcement mechanisms
-
Not revolutionary - Practical tool for web dev leverage, not a fundamental AI breakthrough
Next Steps for Demo
1. Define My Component Library (Start Small)
Choose 5-8 core components I actually use:
- Hero
- FeatureGrid
- CTA
- CodeBlock
- SocialProof
- FAQ
- Pricing
- Newsletter
Document what each component is for semantically, not just what it looks like.
2. Design Control Schemas
For each component, create a Zod schema with:
- Structural constraints (required fields, types, lengths)
- Semantic forcing functions (descriptions that require specific decisions)
- Validation rules that enforce quality minimums
Example template:
const HeroSchema = z.object({
type: z.literal('hero'),
// What questions must the AI answer?
problem: z.string().min(20).describe('What pain point?'),
solution: z.string().min(20).describe('What transformation?'),
proof: z.string().describe('Why believe you?'),
cta: z.object({
action: z.string().describe('Specific verb'),
outcome: z.string().describe('What happens after?')
})
})
3. Build the Generation System
// System prompt that explains the component library
// + Structured output using my schemas
// + Validation layer (Zod)
// + Error handling / retry logic
4. Create the Transpiler
Map validated JSON to my actual React components:
function transpile(pageData: PageSchema) {
return pageData.components.map(component => {
switch(component.type) {
case 'hero':
return <Hero
problem={component.problem}
solution={component.solution}
// ... map all fields
/>
// ... other components
}
})
}
5. Run the Comparison Test
Pick a concrete page to build (e.g., "Landing page for an AI coding tool")
Generate it three ways:
- Baseline AI - Generic prompt, no controls
- My system - Schema-controlled generation
- Manual - I write it myself
Evaluate on:
- Information architecture quality
- Semantic appropriateness of component choices
- Content quality within each component
- Time to completion
- Amount of editing needed
Success metric: If #2 is noticeably better than #1 and gets me 80% of the way to #3 in 20% of the time, I've built something valuable.
6. Iterate on Schema Design
Based on test results:
- Which semantic forcing functions actually improved output?
- Which constraints were too rigid or too loose?
- What components/fields need better descriptions?
- Where did the AI consistently make poor decisions?
7. Build the Developer Interface
Once the core works:
- CLI tool? Web interface? VSCode extension?
- How do developers describe what page they want?
- How do they review/edit generated components?
- How do they customize the component library?
Open Questions to Explore
-
How many examples does the AI need? - Can it work zero-shot or does it need few-shot examples of good component usage?
-
Does model choice matter significantly? - Will Claude vs GPT-4 vs Gemini produce notably different quality with the same schemas?
-
What's the optimal granularity? - Should schemas be very detailed (many fields, specific descriptions) or minimal (fewer constraints, more AI freedom)?
-
How do we handle composition? - When should the AI use multiple small components vs one large component? How do we encode that decision?
-
What about variations? - How do schemas handle "Hero with image" vs "Hero with video" vs "Hero with code example"?
The Real Value Proposition
For me personally: Compose pages 10x faster using my own design system
For other developers: A system they can customize with their component libraries to get the same leverage
For the ecosystem: Proof that schema design + semantic forcing functions can significantly improve AI output quality without training custom models
Philosophical Takeaway
Sometimes the best AI innovation isn't building a smarter model - it's building better testing harnesses that force existing models to pass concrete tests before they can proceed.
I'm not making AI smarter. I'm making it pass my tests.
Schemas with semantic forcing functions are testing harnesses. Structured outputs are enforcement mechanisms. I'm not training behavior - I'm requiring proof of correct decisions at every step.