Talk API

Text-to-speech synthesis with multiple voices and formats

The Talk API converts text into natural-sounding speech audio using state-of-the-art TTS models. It supports 17 voices across American and British English accents, 5 audio formats, and custom voice cloning via ElevenLabs.

Base URL

https://api.do.dev/v1/talk

Endpoints

Method	Path	Description	Scope
POST	`/v1/talk/speech`	Generate speech audio from text	`talk:speech`
GET	`/v1/talk/voices`	List available voices	`talk:voices`
GET	`/v1/talk/formats`	List supported audio formats	`talk:formats`

Authentication

All endpoints require a do_live_* API key from do.dev. Pass it as a Bearer token:

curl -H "Authorization: Bearer do_live_your_key_here" \
  https://api.do.dev/v1/talk/voices

See Authentication for details.

Quick Example

curl -X POST https://api.do.dev/v1/talk/speech \
  -H "Authorization: Bearer do_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world!", "voice": "aria", "format": "mp3"}' \
  --output hello.mp3

Webhooks

The Talk API generates events when speech is synthesized:

Event Type	Trigger
`talk.speech.generated`	Text-to-speech audio generated

See Talk Webhooks for payload details, or Webhooks & Events for the full system overview.

What's Next?

Quick Start — Generate your first audio in minutes
Speech API — Full endpoint reference
Voices — Explore all 17 voices
Formats — Choose the right audio format
Rate Limits — Understand usage limits
Webhooks — Receive events when speech is generated