Talk API

Text-to-speech synthesis with multiple voices and formats

The Talk API converts text into natural-sounding speech audio using state-of-the-art TTS models. It supports 17 voices across American and British English accents, 5 audio formats, and custom voice cloning via ElevenLabs.

Base URL

https://api.do.dev/v1/talk

Endpoints

MethodPathDescriptionScope
POST/v1/talk/speechGenerate speech audio from texttalk:speech
GET/v1/talk/voicesList available voicestalk:voices
GET/v1/talk/formatsList supported audio formatstalk:formats

Authentication

All endpoints require a do_live_* API key from do.dev. Pass it as a Bearer token:

curl -H "Authorization: Bearer do_live_your_key_here" \
  https://api.do.dev/v1/talk/voices

See Authentication for details.

Quick Example

curl -X POST https://api.do.dev/v1/talk/speech \
  -H "Authorization: Bearer do_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world!", "voice": "aria", "format": "mp3"}' \
  --output hello.mp3

Webhooks

The Talk API generates events when speech is synthesized:

Event TypeTrigger
talk.speech.generatedText-to-speech audio generated

See Talk Webhooks for payload details, or Webhooks & Events for the full system overview.

What's Next?