Audio as Optional Prop: Adding Human Commentary to AI Content
February 2, 2026
Feature spec for audio as optional prop on content primitives - attach human commentary to any component
The Problem
I'm generating a lot of AI-assisted content nowâarticles, documentation, feature specsâall written in my IDE with AI coding assistants. (See Why I Write Everything in My IDE Now for the full workflow.)
The velocity is amazing, but it creates a new problem: readers encounter AI-synthesized content and may want to know why I thought it was worth creating. Audio commentary can help with thisâadding human context, reasoning, and transparency without slowing down the creation loop.
Evolution: From VoiceNote Component to Audio as Optional Prop
Initially, I imagined a standalone VoiceNote componentâa special callout box with audio player and transcript. Like a blockquote but with voice.
But then I realized: audio isn't a content type, it's metadata. I don't want to insert special "voice note boxes" that interrupt the flow. I want to attach audio to the content that already exists.
<CodeBlock audio="...">- "Here's what this code does"<Heading audio="...">- "Here's what this section is about"
This is better because:
- Semantic integrity - Headings stay headings, code stays code
- Natural breakpoints - Audio at headings acts like chapter markers
- No flow interruption - Audio enhances existing content, doesn't fragment it
- Composable - Any component can have audio, not just special "voice note" components
So this spec evolved from "VoiceNote component" to "audio as optional prop."
What Audio Commentary Provides
Transparency and context:
- Process visibility - "Here's why I curated this"
- Human reasoning - "Why this matters to me"
- Decision context - "What made this worth shipping"
It's not about explaining the text. It's about explaining why the text exists and why it matters.
Why This Is Different
Current content formats force you to choose:
- Blogs - Static text only, one voice throughout
- Video essays - Full commitment to video format, can't skim
- Podcasts - Audio-only, completely separate from written content
Audio is hybrid: AI-generated comprehensive text + inline human reasoning attached to specific content. Best of both:
- Readers get comprehensive, well-synthesized content
- AI does the synthesis, human shares the reasoning and curation
- Perfect transparency workflow - clearly marked AI synthesis vs human reasoning
The Workflow Evolution
Phase 1: Basic Implementation
- Add audio props to CodeBlock and Heading components
- Create basic AudioPlayer component
- Record on phone, transcribe with Whisper, add audio prop manually
- Works but has friction
Phase 2: Friction Elimination
- Always-hot mic setup at desk
- System-wide hotkey (e.g.,
Cmd+Shift+V) triggers recording - Auto-transcribes, auto-titles, drops in
/content/audio/inbox/ - VSCode command to insert audio prop from inbox into current component
- Reduces "I should record this" â actually recording from 10% to 90%
Phase 3: Multi-Modal Composition
- Audio + code (explain why code is ugly)
- Audio + images (context for screenshots)
- Audio + debug visualizations (thought vs reality)
- Multiple audio clips per page for different sections
- Threading for longer explanations across components
Phase 4: Content Modes
- Read Mode - Text only, no audio
- Commentary Mode - All audio players visible
- Audio Tour Mode - Auto-plays audio as you scroll
- Lets different users consume content their way
The "Director's Commentary" Track
Every article could have a toggle at the top to switch between consumption modes:
- [đ Read Mode] - Just the text, no audio
- [đď¸ Commentary Mode] - Shows all audio players
- [âśď¸ Audio Tour Mode] - Auto-plays audio as you scroll past components
This lets different users consume the content differently:
- Speed readers: Text only
- Deep learners: Commentary mode
- Multitaskers: Audio tour (listen while doing dishes)
Component Specification
Audio on CodeBlock
<CodeBlock
language="typescript"
audio="/audio/why-this-code.mp3"
audioDuration="0:45"
>
{codeString}
</CodeBlock>Audio player appears inline with the code block. When played, provides context about implementation decisions, trade-offs, or "why this code is ugly but necessary."
Audio on Heading
<Heading
level={2}
audio="/audio/section-intro.mp3"
audioDuration="1:20"
>
The Core Architecture
</Heading>Audio player appears next to or below heading. Acts as a chapter markerâ"here's what this section is about and why it matters."
Shared Audio Props
interface AudioProps {
audio?: string; // Path to audio file (mp3/wav/etc)
audioDuration?: string; // Display duration (e.g., "0:45", "2:30")
audioWaveform?: string; // Optional: Path to waveform image/data
audioTranscript?: string; // Optional: Full transcript text
}These props get added to existing primitives (CodeBlock, Heading). When present, the component renders an inline audio player.
Note on audioDuration: The browser can read duration from the audio file itself once it loads.audioDuration is just an optimization for displaying the duration immediately on initial render (SSR) and preventing layout shift. It's entirely optionalâyou can omit it and let the player figure it out after the audio loads.
Visual Design
Audio player should be minimal and integrated with existing component styling:
- Icon indicator - đď¸ or speaker icon to signal audio is available
- Inline player - Appears within or adjacent to component, not as separate callout
- Minimal controls - Play/pause, progress bar, speed control, duration
- Theme integration - Uses existing theme colors and styling
- Transcript toggle - Optional: show/hide full transcript text
Audio Player Features
- Play/Pause button - Primary control
- Progress bar - Show position in audio, allow seeking
- Playback speed - 0.5x, 0.75x, 1x, 1.25x, 1.5x, 2x
- Duration display - Show total time and current position
- Optional waveform visualization - If available, show audio waveform
- Keyboard controls - Space to play/pause, arrow keys to seek
Usage Examples
Audio + Code
<CodeBlock
language="typescript"
audio="/audio/why-i-hate-this-code.mp3"
>
// The ugly code in question
function messyButNecessary() {
// Yeah so this function is ugly as heck,
// but here's why I had to do it this way...
}
</CodeBlock>Audio + Section Heading
<Heading
level={2}
audio="/audio/the-key-insight.mp3"
audioDuration="1:15"
>
The Key Innovation
</Heading>
<Paragraph>
[AI-generated explanation of the innovation...]
</Paragraph>Audio provides: "This is where it clicked for me. I was stuck thinking about "X" but it's actually "Y". That reframe changed everything."
Implementation Details
Three pieces to make this work:
1. Shared AudioPlayer Component
A reusable <AudioPlayer> component that handles all playback logic:
- Play/pause state management
- Progress bar with seeking
- Playback speed controls (0.5x, 1x, 1.5x, 2x)
- Duration display and current time
- Keyboard shortcuts (space for play/pause, arrows for seek)
- Optional transcript toggle
interface AudioPlayerProps {
src: string; // Path to audio file
duration?: string; // Display duration
waveform?: string; // Optional waveform visualization
transcript?: string; // Optional transcript text
}2. Optional Audio Props on Primitives
Each primitive (CodeBlock, Heading, etc.) gets extended with optional audio props:
interface AudioProps {
audio?: string; // Path to audio file
audioDuration?: string; // Display duration
audioWaveform?: string; // Optional waveform
audioTranscript?: string; // Optional transcript
}
// CodeBlock extends its existing props
interface CodeBlockProps extends AudioProps {
language: string;
children: string;
// ...existing props
}
// Heading extends its existing props
interface HeadingProps extends AudioProps {
level: 1 | 2 | 3 | 4 | 5 | 6;
children: React.ReactNode;
// ...existing props
}3. Integration Per Component
Each primitive decides where to render the AudioPlayer:
// CodeBlock renders audio player at bottom
export function CodeBlock({
audio,
audioDuration,
audioTranscript,
language,
children,
}: CodeBlockProps) {
return (
<div>
<pre><code>{children}</code></pre>
{audio && (
<AudioPlayer
src={audio}
duration={audioDuration}
transcript={audioTranscript}
/>
)}
</div>
);
}
// Heading renders audio player inline after text
export function Heading({
audio,
audioDuration,
level,
children,
}: HeadingProps) {
const Tag = `h${level}` as keyof JSX.IntrinsicElements;
return (
<div>
<Tag>{children}</Tag>
{audio && (
<AudioPlayer
src={audio}
duration={audioDuration}
/>
)}
</div>
);
}The positioning is the only custom partâCodeBlock might put it at the bottom, Heading might put it inline. The playback logic is entirely handled by the shared AudioPlayer.
Phase 1: Add Audio Props to Primitives
- Create shared AudioPlayer component in
src/components/AudioPlayer/ - Add optional audio props to CodeBlock component
- Add optional audio props to Heading primitive
- HTML5 audio element with custom controls
- Theme-integrated styling (use theme colors, spacing, radii)
- Responsive design for mobile
Phase 2: Always-On Recording Workflow
The killer feature that makes audio commentary actually usable at scale:
- Global hotkey -
Cmd+Shift+Vtriggers recording from anywhere - Auto-transcription - Whisper runs in background, generates transcript
- Smart titling - AI generates preliminary title from first 10 words
- Inbox staging - Saves to
/content/voice-notes/inbox/ - File management - Moves audio to
/public/audio/with proper naming - AI categorization - Optional suggestion of which article it relates to
Friction elimination: Current workflow is Think â Open app â Record â Save â Transcribe â File â Insert. Target workflow is Think â Hit hotkey â Talk â Done. The difference between "I should record this" and actually recording it is literally one keypress.
Phase 3: Enhanced Features
- Waveform generation and visualization
- Auto-sync text highlighting as audio plays
- Timestamps for jumping to specific sections
- Download transcript option
- Share audio clip functionality
Use Cases
1. AI-Generated Document Commentary
Scenario: You generate a comprehensive 5,000-word document with AI about your Timeline component architecture. You add audio at key sections to guide readers through your thinking.
<Heading level={2} audio="/audio/aha-moment.mp3">
The Key Innovation
</Heading>
<Paragraph>
[AI-generated explanation of the Timeline component...]
</Paragraph>
<Paragraph>
Audio provides: "This is where it clicked for me. I was stuck thinking
about this as a layout problem, but it's actually a data structure problem.
That reframe changed everything."
</Paragraph>2. Tutorial Walkthroughs
Scenario: Technical tutorial with code examples. Audio on CodeBlocks explains "why" decisions were made, not just "what" the code does.
3. Roadmap Context
Scenario: Feature roadmap document. Audio on headings adds personal context about priorities, trade-offs, and decision-making process.
4. Content Curation
Scenario: AI synthesizes research from multiple sources. Audio commentary adds "this source is particularly valuable because..." or "notice how these three ideas connect..."
Technical Considerations
Audio Format & Compression
- Format: MP3 (best browser compatibility) or WebM (smaller file sizes)
- Bitrate: 64kbps for voice is sufficient (significantly smaller than music)
- Mono vs Stereo: Mono for voice (half the file size)
- Target size: 30-60 seconds = ~250-500KB, 2-3 minutes = ~1-1.5MB
Accessibility
- Transcript text is always visible (audio enhancement, not replacement)
- Full keyboard navigation support
- ARIA labels for screen readers
- Visual indicators when audio is playing
- Prefer reduced motion: disable animations
Performance
- Lazy load audio files (don't preload until user interaction)
- Cache audio files in browser
- Show loading state when fetching audio
- Progressive loading for longer audio clips
Mobile Experience
- Larger touch targets for controls
- Simplified UI on small screens
- Handle background audio (continue playing when scrolling)
- Respect system audio settings and volume
Content Workflow
Step 1: Generate Base Content
Use AI to create comprehensive document on topic. Let it be thorough - that's what it's good at.
Step 2: Identify Commentary Points
Read through and mark spots where you want to add your reasoning:
- "Here's why this idea matters to me"
- "This is the moment it clicked - let me explain"
- "Why I decided to include this / why I curated this"
- "The real-world reason this exists"
- "What made this worth documenting and shipping"
Step 3: Record Audio Commentary
- Open voice memos on phone or use desktop recorder
- Record spontaneously (conversational, not scripted)
- Keep it short: 30-90 seconds per clip
- Name files descriptively:
schema-insight.mp3,aha-moment.mp3
Step 4: Transcribe & Add Audio Prop
- Run Whisper locally:
whisper audio.mp3 --model base - Copy transcript text (optional, for accessibility)
- Add audio prop to relevant component (CodeBlock or Heading)
- Light editing of transcript for readability if included
Step 5: Review & Publish
Listen to each audio clip in context. Does it flow? Does it add value? Adjust placement or re-record if needed.
Success Metrics
How do we know this feature is working?
- Engagement: Do people actually play the audio? Track play rates.
- Completion: Do they listen all the way through? Track completion rates.
- Time on page: Does voice commentary increase time spent on content?
- Feedback: Direct comments about audio commentary - helpful or distracting?
- Personal satisfaction: Does this make AI-generated content feel more authentic and valuable?
Open Questions
- How many audio clips per page before it becomes overwhelming?
- Should we show a "total audio commentary time" at the top of articles?
- How do we handle audio in article excerpts/previews?
- Should threading support automatic numbering ("Part 1 of 3")?
- Can audio commentary be searched/indexed for discovery?
- What's the right UX for audio tour mode auto-play behavior?
Ideal Future State
- Composable across all content types
- Zero-friction capture workflow
- AI handles synthesis, human adds the "why"
- Built into the development environment itself
Next Steps
- Build basic AudioPlayer component (MVP - just play/pause + progress bar)
- Add audio props to CodeBlock and Heading primitives
- Test in one document (maybe this spec or a technical article)
- Record 2-3 test audio clips and add them to components
- Get feedback (does this feel natural? does it add value?)
- Iterate on design and UX based on real usage
- Add enhanced features (waveform, transcript toggle, etc.)
- Document the workflow for future content creation
- Write blog post about the pattern and open source it