Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vercel/ai/llms.txt

Use this file to discover all available pages before exploring further.

Speech

Speech generation is an experimental feature.
The AI SDK provides the generateSpeech function to generate speech from text using a speech model.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
});
To access the generated audio:
const audioData = result.audio.uint8Array; // audio data as Uint8Array
// or
const audioBase64 = result.audio.base64; // audio data as base64 string

Settings

Voice Selection

Different models support different voices. Refer to your provider’s documentation for available voices:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'nova', // Options: alloy, echo, fable, onyx, nova, shimmer
});

Output Format

You can specify the desired output format for the audio:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  outputFormat: 'mp3', // Options: mp3, wav, opus, aac, flac, etc.
});

Speech Speed

Some models support adjusting the speed of the generated speech:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  speed: 1.25, // Speed multiplier (0.25 to 4.0)
});

Language Setting

You can specify the language for speech generation (provider support varies):
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { lmnt } from '@ai-sdk/lmnt';

const result = await generateSpeech({
  model: lmnt.speech('aurora'),
  text: 'Hola, mundo!',
  language: 'es', // Spanish (ISO 639-1 language code)
});

Instructions

Some models accept additional instructions to guide the speech generation:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  instructions: 'Speak in a slow and steady tone',
});

Provider-Specific Settings

You can set model-specific settings with the providerOptions parameter:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  providerOptions: {
    openai: {
      // provider-specific options
    },
  },
});

Retries

The generateSpeech function accepts an optional maxRetries parameter that you can use to set the maximum number of retries. It defaults to 2 retries (3 attempts in total). You can set it to 0 to disable retries.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  maxRetries: 0, // Disable retries
});

Abort Signals and Timeouts

generateSpeech accepts an optional abortSignal parameter of type AbortSignal that you can use to abort the speech generation process or set a timeout.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  abortSignal: AbortSignal.timeout(10000), // Abort after 10 seconds
});

Custom Headers

generateSpeech accepts an optional headers parameter of type Record<string, string> that you can use to add custom headers to the speech generation request.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  headers: { 'X-Custom-Header': 'custom-value' },
});

Response Information

The generateSpeech function returns comprehensive response information:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
});

console.log(result.audio); // Generated audio file
console.log(result.warnings); // Any warnings from the provider
console.log(result.responses); // Raw provider responses
console.log(result.providerMetadata); // Provider-specific metadata

Speech Providers & Models

Several providers offer speech generation models:
ProviderModel
OpenAItts-1
OpenAItts-1-hd
ElevenLabsVarious
LMNTaurora