Speech

Speech generation is an experimental feature.

The AI SDK provides the generateSpeech function to generate speech from text using a speech model.

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
});

To access the generated audio:

const audioData = result.audio.uint8Array; // audio data as Uint8Array
// or
const audioBase64 = result.audio.base64; // audio data as base64 string

Settings

Voice Selection

Different models support different voices. Refer to your provider’s documentation for available voices:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'nova', // Options: alloy, echo, fable, onyx, nova, shimmer
});

Output Format

You can specify the desired output format for the audio:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  outputFormat: 'mp3', // Options: mp3, wav, opus, aac, flac, etc.
});

Speech Speed

Some models support adjusting the speed of the generated speech:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  speed: 1.25, // Speed multiplier (0.25 to 4.0)
});

Language Setting

You can specify the language for speech generation (provider support varies):

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { lmnt } from '@ai-sdk/lmnt';

const result = await generateSpeech({
  model: lmnt.speech('aurora'),
  text: 'Hola, mundo!',
  language: 'es', // Spanish (ISO 639-1 language code)
});

Instructions

Some models accept additional instructions to guide the speech generation:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  instructions: 'Speak in a slow and steady tone',
});

Provider-Specific Settings

You can set model-specific settings with the providerOptions parameter:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  providerOptions: {
    openai: {
      // provider-specific options
    },
  },
});

Retries

The generateSpeech function accepts an optional maxRetries parameter that you can use to set the maximum number of retries. It defaults to 2 retries (3 attempts in total). You can set it to 0 to disable retries.

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  maxRetries: 0, // Disable retries
});

Abort Signals and Timeouts

generateSpeech accepts an optional abortSignal parameter of type AbortSignal that you can use to abort the speech generation process or set a timeout.

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  abortSignal: AbortSignal.timeout(10000), // Abort after 10 seconds
});

Custom Headers

generateSpeech accepts an optional headers parameter of type Record<string, string> that you can use to add custom headers to the speech generation request.

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
  headers: { 'X-Custom-Header': 'custom-value' },
});

Response Information

The generateSpeech function returns comprehensive response information:

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello, world!',
  voice: 'alloy',
});

console.log(result.audio); // Generated audio file
console.log(result.warnings); // Any warnings from the provider
console.log(result.responses); // Raw provider responses
console.log(result.providerMetadata); // Provider-specific metadata

Speech Providers & Models

Several providers offer speech generation models:

Provider	Model
OpenAI	`tts-1`
OpenAI	`tts-1-hd`
ElevenLabs	Various
LMNT	`aurora`

Documentation Index

​Speech

​Settings

​Voice Selection

​Output Format

​Speech Speed

​Language Setting

​Instructions

​Provider-Specific Settings

​Retries

​Abort Signals and Timeouts

​Custom Headers

​Response Information

​Speech Providers & Models

Speech

Settings

Voice Selection

Output Format

Speech Speed

Language Setting

Instructions

Provider-Specific Settings

Retries

Abort Signals and Timeouts

Custom Headers

Response Information

Speech Providers & Models