Generate Speech

Convert text to natural-sounding speech audio

The speech endpoint converts text into audio using AI-powered voices.

Endpoint

POST /v1/talk/speech

Authentication

Requires a do_live_* API key with the talk:speech scope.

  • Authorization: Bearer <key> header (recommended)
  • X-API-Key header

See Authentication for details.

Request Body

ParameterTypeRequiredDefaultDescription
textstringYes-The text to convert to speech (max 10,000 characters)
voicestringNoasteriaVoice ID to use (see Voices)
formatstringNomp3Output format: mp3, wav, flac, aac, opus
speednumberNo1.0Speaking speed (0.5 to 2.0)
sampleRatenumberNovariesSample rate in Hz (8000, 16000, 24000, 48000)
customVoiceIdstringNo-ElevenLabs voice ID for custom/cloned voices

Request Example

curl -X POST "https://api.do.dev/v1/talk/speech" \
  -H "Authorization: Bearer do_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to the Talk API! This is a demonstration of text-to-speech.",
    "voice": "aria",
    "format": "mp3",
    "speed": 1.0
  }' \
  --output output.mp3

Response

Success (200 OK)

Returns the audio file directly as binary data.

Response Headers:

HeaderDescription
Content-TypeAudio MIME type (e.g., audio/mpeg for MP3)
X-Characters-UsedNumber of characters processed
X-Audio-DurationAudio duration in seconds (Deepgram voices only)

Error Responses

Bad Request (400)

Missing or invalid text:

{
  "error": "'text' is required and must be a string"
}

Text too long:

{
  "error": "'text' exceeds maximum length of 10,000 characters"
}

Invalid voice:

{
  "error": "Invalid voice: unknown. Valid voices: thalia, helena, aria, ..."
}

Invalid speed:

{
  "error": "'speed' must be a number between 0.5 and 2.0"
}

Unauthorized (401)

{
  "error": "API key required. Use Authorization: Bearer <key> or X-API-Key header."
}

Too Many Requests (429)

{
  "error": "Rate limit exceeded"
}

Voice Options

Voice IDDisplay NameGenderAccentDescription
thaliaZaraFemaleAmericanClear, Confident, Energetic
helenaGraceFemaleAmericanCaring, Natural, Friendly
ariaClaireFemaleAmericanWarm, Professional, Expressive
coraSageFemaleAmericanSmooth, Calm, Soothing
emmaVictoriaFemaleBritishElegant, Refined, Clear
evelynMayaFemaleAmericanWarm, Empathetic, Approachable
apolloMaxMaleAmericanConfident, Casual, Comfortable
orionDrakeMaleAmericanDeep, Authoritative, Clear
theoFinnMaleAmericanFriendly, Natural, Warm
marcusBlakeMaleAmericanProfessional, Confident, Clear
jamesOliverMaleBritishRefined, Articulate, Warm

Legacy Aura Voices

Voice IDDisplay NameGenderAccentDescription
asteriaEchoFemaleAmericanClassic, Clear
lunaSerenaFemaleAmericanSoft, Gentle
stellaNovaFemaleAmericanBright, Energetic
athenaIrisFemaleBritishSophisticated, Wise
zeusAtlasMaleAmericanCommanding, Strong
orpheusPhoenixMaleAmericanSmooth, Melodic

Custom Voices (ElevenLabs)

For custom/cloned voices, pass the ElevenLabs voice ID:

curl -X POST "https://api.do.dev/v1/talk/speech" \
  -H "Authorization: Bearer do_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from my custom voice!",
    "customVoiceId": "your_elevenlabs_voice_id"
  }' \
  --output custom.mp3

Output Formats

FormatContent-TypeDescription
mp3audio/mpegMost compatible, good compression
wavaudio/wavUncompressed, highest quality
flacaudio/flacLossless compression
aacaudio/aacGood for Apple devices
opusaudio/opusWeb-optimized, efficient

Code Examples

JavaScript (Browser)

async function generateSpeech(text, voice = "aria") {
  const response = await fetch("https://api.do.dev/v1/talk/speech", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({ text, voice })
  });

  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.error);
  }

  // Play audio in browser
  const blob = await response.blob();
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();

  return audio;
}

await generateSpeech("Hello, this is a test!", "aria");

Python

import requests
import os

def generate_speech(text, voice="aria", format="mp3"):
    response = requests.post(
        "https://api.do.dev/v1/talk/speech",
        headers={
            "Authorization": f"Bearer {os.environ['DO_API_KEY']}",
            "Content-Type": "application/json"
        },
        json={
            "text": text,
            "voice": voice,
            "format": format
        }
    )
    response.raise_for_status()
    return response.content

audio = generate_speech("Hello, this is a test!")
with open("output.mp3", "wb") as f:
    f.write(audio)

Node.js

import fs from "fs";

async function generateSpeech(text, voice = "aria", format = "mp3") {
  const response = await fetch("https://api.do.dev/v1/talk/speech", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.DO_API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({ text, voice, format })
  });

  if (!response.ok) {
    const error = await response.json();
    throw new Error(error.error);
  }

  return Buffer.from(await response.arrayBuffer());
}

const audio = await generateSpeech("Hello, this is a test!");
fs.writeFileSync("output.mp3", audio);

Tips

  • Batch requests for long texts — split into chunks under 10,000 characters
  • Cache audio when possible to reduce API calls
  • Use MP3 for web delivery, WAV for editing or highest quality
  • Adjust speed for clarity (slower) or efficiency (faster)