Audio as Optional Prop: Adding Human Commentary to AI Content

By Jay Griffin, Claude Sonnet 4.5*🔧 AI-Assisted - Jay's design evolution, Claude documented the spec·  January 29, 2026*Updated:
February 2, 2026
docs
🏷️ Tags:feature-specaudioai-contentprimitivesdesign-systemstransparency

Feature spec for audio as optional prop on content primitives - attach human commentary to any component

The Problem

I'm generating a lot of AI-assisted content now—articles, documentation, feature specs—all written in my IDE with AI coding assistants. (See Why I Write Everything in My IDE Now for the full workflow.)

The velocity is amazing, but it creates a new problem: readers encounter AI-synthesized content and may want to know why I thought it was worth creating. Audio commentary can help with this—adding human context, reasoning, and transparency without slowing down the creation loop.

Evolution: From VoiceNote Component to Audio as Optional Prop

Initially, I imagined a standalone VoiceNote component—a special callout box with audio player and transcript. Like a blockquote but with voice.

But then I realized: audio isn't a content type, it's metadata. I don't want to insert special "voice note boxes" that interrupt the flow. I want to attach audio to the content that already exists.

This is better because:

So this spec evolved from "VoiceNote component" to "audio as optional prop."

What Audio Commentary Provides

Transparency and context:

It's not about explaining the text. It's about explaining why the text exists and why it matters.

Why This Is Different

Current content formats force you to choose:

Audio is hybrid: AI-generated comprehensive text + inline human reasoning attached to specific content. Best of both:

The Workflow Evolution

Phase 1: Basic Implementation

Phase 2: Friction Elimination

Phase 3: Multi-Modal Composition

Phase 4: Content Modes

The "Director's Commentary" Track

Every article could have a toggle at the top to switch between consumption modes:

This lets different users consume the content differently:

Component Specification

Audio on CodeBlock

.tsx
<CodeBlock 
  language="typescript" 
  audio="/audio/why-this-code.mp3"
  audioDuration="0:45"
>
  {codeString}
</CodeBlock>

Audio player appears inline with the code block. When played, provides context about implementation decisions, trade-offs, or "why this code is ugly but necessary."

Audio on Heading

.tsx
<Heading 
  level={2} 
  audio="/audio/section-intro.mp3"
  audioDuration="1:20"
>
  The Core Architecture
</Heading>

Audio player appears next to or below heading. Acts as a chapter marker—"here's what this section is about and why it matters."

Shared Audio Props

.ts
interface AudioProps {
  audio?: string;           // Path to audio file (mp3/wav/etc)
  audioDuration?: string;   // Display duration (e.g., "0:45", "2:30")
  audioWaveform?: string;   // Optional: Path to waveform image/data
  audioTranscript?: string; // Optional: Full transcript text
}

These props get added to existing primitives (CodeBlock, Heading). When present, the component renders an inline audio player.

Note on audioDuration: The browser can read duration from the audio file itself once it loads.audioDuration is just an optimization for displaying the duration immediately on initial render (SSR) and preventing layout shift. It's entirely optional—you can omit it and let the player figure it out after the audio loads.

Visual Design

Audio player should be minimal and integrated with existing component styling:

Audio Player Features

Usage Examples

Audio + Code

.tsx
<CodeBlock 
  language="typescript" 
  audio="/audio/why-i-hate-this-code.mp3"
>
  // The ugly code in question
  function messyButNecessary() {
    // Yeah so this function is ugly as heck, 
    // but here's why I had to do it this way...
  }
</CodeBlock>

Audio + Section Heading

.tsx
<Heading 
  level={2} 
  audio="/audio/the-key-insight.mp3"
  audioDuration="1:15"
>
  The Key Innovation
</Heading>

<Paragraph>
  [AI-generated explanation of the innovation...]
</Paragraph>

Audio provides: "This is where it clicked for me. I was stuck thinking about "X" but it's actually "Y". That reframe changed everything."

Implementation Details

Three pieces to make this work:

1. Shared AudioPlayer Component

A reusable <AudioPlayer> component that handles all playback logic:

.ts
interface AudioPlayerProps {
  src: string;              // Path to audio file
  duration?: string;        // Display duration
  waveform?: string;        // Optional waveform visualization
  transcript?: string;      // Optional transcript text
}

2. Optional Audio Props on Primitives

Each primitive (CodeBlock, Heading, etc.) gets extended with optional audio props:

.ts
interface AudioProps {
  audio?: string;           // Path to audio file
  audioDuration?: string;   // Display duration
  audioWaveform?: string;   // Optional waveform
  audioTranscript?: string; // Optional transcript
}

// CodeBlock extends its existing props
interface CodeBlockProps extends AudioProps {
  language: string;
  children: string;
  // ...existing props
}

// Heading extends its existing props
interface HeadingProps extends AudioProps {
  level: 1 | 2 | 3 | 4 | 5 | 6;
  children: React.ReactNode;
  // ...existing props
}

3. Integration Per Component

Each primitive decides where to render the AudioPlayer:

.tsx
// CodeBlock renders audio player at bottom
export function CodeBlock({
  audio,
  audioDuration,
  audioTranscript,
  language,
  children,
}: CodeBlockProps) {
  return (
    <div>
      <pre><code>{children}</code></pre>
      {audio && (
        <AudioPlayer 
          src={audio} 
          duration={audioDuration}
          transcript={audioTranscript}
        />
      )}
    </div>
  );
}

// Heading renders audio player inline after text
export function Heading({
  audio,
  audioDuration,
  level,
  children,
}: HeadingProps) {
  const Tag = `h${level}` as keyof JSX.IntrinsicElements;
  return (
    <div>
      <Tag>{children}</Tag>
      {audio && (
        <AudioPlayer 
          src={audio} 
          duration={audioDuration}
        />
      )}
    </div>
  );
}

The positioning is the only custom part—CodeBlock might put it at the bottom, Heading might put it inline. The playback logic is entirely handled by the shared AudioPlayer.

Phase 1: Add Audio Props to Primitives

Phase 2: Always-On Recording Workflow

The killer feature that makes audio commentary actually usable at scale:

Friction elimination: Current workflow is Think → Open app → Record → Save → Transcribe → File → Insert. Target workflow is Think → Hit hotkey → Talk → Done. The difference between "I should record this" and actually recording it is literally one keypress.

Phase 3: Enhanced Features

Use Cases

1. AI-Generated Document Commentary

Scenario: You generate a comprehensive 5,000-word document with AI about your Timeline component architecture. You add audio at key sections to guide readers through your thinking.

.tsx
<Heading level={2} audio="/audio/aha-moment.mp3">
  The Key Innovation
</Heading>

<Paragraph>
  [AI-generated explanation of the Timeline component...]
</Paragraph>

<Paragraph>
  Audio provides: "This is where it clicked for me. I was stuck thinking 
  about this as a layout problem, but it's actually a data structure problem. 
  That reframe changed everything."
</Paragraph>

2. Tutorial Walkthroughs

Scenario: Technical tutorial with code examples. Audio on CodeBlocks explains "why" decisions were made, not just "what" the code does.

3. Roadmap Context

Scenario: Feature roadmap document. Audio on headings adds personal context about priorities, trade-offs, and decision-making process.

4. Content Curation

Scenario: AI synthesizes research from multiple sources. Audio commentary adds "this source is particularly valuable because..." or "notice how these three ideas connect..."

Technical Considerations

Audio Format & Compression

Accessibility

Performance

Mobile Experience

Content Workflow

Step 1: Generate Base Content

Use AI to create comprehensive document on topic. Let it be thorough - that's what it's good at.

Step 2: Identify Commentary Points

Read through and mark spots where you want to add your reasoning:

Step 3: Record Audio Commentary

Step 4: Transcribe & Add Audio Prop

Step 5: Review & Publish

Listen to each audio clip in context. Does it flow? Does it add value? Adjust placement or re-record if needed.

Success Metrics

How do we know this feature is working?

Open Questions

Ideal Future State

Next Steps

  1. Build basic AudioPlayer component (MVP - just play/pause + progress bar)
  2. Add audio props to CodeBlock and Heading primitives
  3. Test in one document (maybe this spec or a technical article)
  4. Record 2-3 test audio clips and add them to components
  5. Get feedback (does this feel natural? does it add value?)
  6. Iterate on design and UX based on real usage
  7. Add enhanced features (waveform, transcript toggle, etc.)
  8. Document the workflow for future content creation
  9. Write blog post about the pattern and open source it