Speech
Speech generation is an experimental feature.
The AI SDK provides the generateSpeech
function to generate speech from text using a speech model.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
});
To access the generated audio:
const audioData = result.audio.uint8Array; // audio data as Uint8Array
// or
const audioBase64 = result.audio.base64; // audio data as base64 string
Settings
Voice Selection
Different models support different voices. Refer to your provider’s documentation for available voices:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'nova', // Options: alloy, echo, fable, onyx, nova, shimmer
});
You can specify the desired output format for the audio:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
outputFormat: 'mp3', // Options: mp3, wav, opus, aac, flac, etc.
});
Speech Speed
Some models support adjusting the speed of the generated speech:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
speed: 1.25, // Speed multiplier (0.25 to 4.0)
});
Language Setting
You can specify the language for speech generation (provider support varies):
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { lmnt } from '@ai-sdk/lmnt';
const result = await generateSpeech({
model: lmnt.speech('aurora'),
text: 'Hola, mundo!',
language: 'es', // Spanish (ISO 639-1 language code)
});
Instructions
Some models accept additional instructions to guide the speech generation:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
instructions: 'Speak in a slow and steady tone',
});
Provider-Specific Settings
You can set model-specific settings with the providerOptions parameter:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
providerOptions: {
openai: {
// provider-specific options
},
},
});
Retries
The generateSpeech function accepts an optional maxRetries parameter
that you can use to set the maximum number of retries.
It defaults to 2 retries (3 attempts in total). You can set it to 0 to disable retries.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
maxRetries: 0, // Disable retries
});
Abort Signals and Timeouts
generateSpeech accepts an optional abortSignal parameter of
type AbortSignal
that you can use to abort the speech generation process or set a timeout.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
abortSignal: AbortSignal.timeout(10000), // Abort after 10 seconds
});
generateSpeech accepts an optional headers parameter of type Record<string, string>
that you can use to add custom headers to the speech generation request.
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
headers: { 'X-Custom-Header': 'custom-value' },
});
The generateSpeech function returns comprehensive response information:
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateSpeech({
model: openai.speech('tts-1'),
text: 'Hello, world!',
voice: 'alloy',
});
console.log(result.audio); // Generated audio file
console.log(result.warnings); // Any warnings from the provider
console.log(result.responses); // Raw provider responses
console.log(result.providerMetadata); // Provider-specific metadata
Speech Providers & Models
Several providers offer speech generation models:
| Provider | Model |
|---|
| OpenAI | tts-1 |
| OpenAI | tts-1-hd |
| ElevenLabs | Various |
| LMNT | aurora |