Text-to-speech synthesis with multiple voices and formats
The Talk API converts text into natural-sounding speech audio using state-of-the-art TTS models. It supports 17 voices across American and British English accents, 5 audio formats, and custom voice cloning via ElevenLabs.
https://api.do.dev/v1/talk| Method | Path | Description | Scope |
|---|---|---|---|
| POST | /v1/talk/speech | Generate speech audio from text | talk:speech |
| GET | /v1/talk/voices | List available voices | talk:voices |
| GET | /v1/talk/formats | List supported audio formats | talk:formats |
All endpoints require a do_live_* API key from do.dev. Pass it as a Bearer token:
curl -H "Authorization: Bearer do_live_your_key_here" \
https://api.do.dev/v1/talk/voicesSee Authentication for details.
curl -X POST https://api.do.dev/v1/talk/speech \
-H "Authorization: Bearer do_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world!", "voice": "aria", "format": "mp3"}' \
--output hello.mp3The Talk API generates events when speech is synthesized:
| Event Type | Trigger |
|---|---|
talk.speech.generated | Text-to-speech audio generated |
See Talk Webhooks for payload details, or Webhooks & Events for the full system overview.