Streaming vs blocking
Blocking approach
With blocking generation, the entire response must be generated before anything is displayed:Streaming approach
With streaming, text appears incrementally as it’s generated:Streaming implementation
The AI SDK makes streaming straightforward:Text streaming
Stream text as it’s generated:Full stream
Access all stream parts including metadata:Stream to response
In a web server, stream directly to an HTTP response:Stream types
ThestreamText function returns multiple stream types:
textStream
Only the generated text:fullStream
All stream parts with metadata:toDataStream
A web-compatible stream:Streaming with tools
Stream tool calls and results:Stream callbacks
Handle stream events with callbacks:Consuming streams
Multiple ways to consume the stream:Async iteration
Promise-based
Wait for the complete result:Response streaming
Stream to an HTTP response:Performance considerations
Streaming provides better perceived performance:- Immediate feedback: Users see responses start appearing within 1-2 seconds instead of waiting 10+ seconds
- Progressive disclosure: Long responses become readable before they’re complete
- Better UX: Loading indicators can be replaced with actual content
When to use streaming
Use streaming when:- Generating long-form content (essays, articles, stories)
- Building chat interfaces
- Responses take more than a few seconds
- You want to show progress to users
- Responses are short and fast (< 2 seconds)
- You need the complete response before processing
- Building batch processing systems
- Simplicity is more important than perceived speed