HOME SKILLS BLOG GITHUB
// SKILL

AUDIO NARRATION
30 VOICES, 80+ LANGUAGES

Every blog post has a podcast audience that will never read the words. Audio narration makes your content accessible to commuters, screen-readers, and accessibility tools without re-recording. Claude Blog wraps Google Gemini TTS to generate broadcast-quality narration directly from your published markdown. 30 voices across 80+ languages, multi-speaker dialogue support, 24kHz mono WAV output. Free Google AI Studio API key, free output.

BY DANIEL AGRICI · UPDATED MAY 2026

$
/blog audio generate

REQUIRES CLAUDE BLOG INSTALLED IN CLAUDE CODE

// HOW IT WORKS

PODCAST-QUALITY AUDIO FROM MARKDOWN.

Audio narration is the simplest accessibility upgrade your blog can ship. Screen readers handle the basic case, but a real human-sounding narrator opens your content to a completely different audience: people who would rather listen during a commute, a workout, or while cooking dinner. Claude Blog wraps Google Gemini TTS so you can generate that narration directly from the same markdown that powers your post.

Pick a voice, run /blog audio generate, get a 24kHz mono WAV back. No re-recording, no DAW, no studio booking. The generator handles multi-speaker dialogue automatically when your post contains speaker tags, which makes interview-format and debate-format posts work without extra effort.

01
30 GEMINI VOICES
Charon, Kore, Puck, Aoede, Fenrir, and 25 more. Each voice has a distinct timbre. Preview voices with /blog audio voices before committing.
02
80+ LANGUAGES
Generate narration in any major language. Multilingual posts can output one audio file per language with native pronunciation.
03
MULTI-SPEAKER DIALOGUE
Interview-style content supports multiple voices per file. Annotate dialogue with speaker tags; each speaker gets a distinct voice automatically.
04
24KHZ MONO WAV
Broadcast-quality output. 24kHz mono WAV format, suitable for podcast distribution, in-page audio players, or accessibility playback.
// USAGE

HOW TO RUN AUDIO

Commands

CommandWhat it does
/blog audio generate <file>Generates a narration audio file from a blog markdown post
/blog audio voicesLists all 30 available voices with sample audio previews
/blog audio setupConfigures the Google AI Studio API key for Gemini TTS

Voice selection

Voices map to tones. Charon is authoritative and works well for analytical or technical posts. Kore is warm and suits storytelling, lifestyle, or interview content. Puck is energetic and works for upbeat marketing pieces, product launches, and high-energy explainers. Aoede is breezy and reads well for casual narratives. Fenrir is deeper and grounded, useful for serious analysis. Each voice carries the same multilingual range, so pick the timbre that matches your post, not the language.

Costs and limits

Gemini TTS is free at the Google AI Studio level for normal blog cadence. Rate limits apply (requests per minute, characters per day), which is fine for a typical publishing rhythm of 1 to 3 posts per week. For podcast-scale workflows or full-archive backfills, the paid Gemini API tier is billed per million characters of input text at premium pricing. Output audio itself is not metered, only the input text is counted.

// FAQ

QUESTIONS ABOUT AUDIO

Yes for normal blog usage. The Google AI Studio free tier covers tens of thousands of characters per day, sufficient for a blog cadence of 1-3 posts per week. Premium tiers (Gemini API paid) lift the limit for podcast-scale workflows. Pricing currently around $10 per million characters at the paid tier.
30 voices including Charon, Kore, Puck, Aoede, Fenrir, Sulafat, Algieba, Despina, Erinome, Schedar, Umbriel, Algenib, Rasalgethi, Laomedeia, Achernar, Alnilam, Achird, Zubenelgenubi, Vindemiatrix, Sadachbia, Sadaltager, Iapetus, Gacrux, Pulcherrima, Enceladus, and more. Run /blog audio voices to preview each one.
Yes. Annotate dialogue in your blog markdown with speaker tags like 'Alice:' and 'Bob:'. The audio generator detects speaker shifts and assigns distinct voices automatically. Useful for interview content, debate-format posts, and dialogue-driven explainers.
80+ languages including English, Spanish, French, German, Japanese, Korean, Mandarin, Portuguese, Italian, Dutch, Russian, Arabic, Hindi, Indonesian, Vietnamese, Thai, Turkish, Polish, Swedish, Danish, and many more. Native pronunciation is automatic from the source text language detection.
24kHz mono WAV by default. The output is broadcast-quality, suitable for podcast distribution, in-page HTML audio players, or accessibility playback tools. Convert to MP3 or AAC for distribution if needed.
// RELATED SKILLS

EXPLORE MORE

VIEW ALL 28 SKILLS →

ADD AUDIO TO YOUR BLOG
IN 30 SECONDS.

$
git clone --depth 1 https://github.com/AgriciDaniel/claude-blog.git && bash claude-blog/install.sh
VIEW ON GITHUB ALL SKILLS >