Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vercel/ai/llms.txt
Use this file to discover all available pages before exploring further.
Multi-Modal Chatbot
Learn how to build a chatbot capable of understanding both images and PDFs using the AI SDK.
What is Multi-Modal?
Multi-modal refers to the ability of AI models to process and understand multiple types of input formats. In this guide, we’ll focus on:
- Images: Screenshots, photos, diagrams
- PDFs: Documents, reports, forms
- Text: Regular chat messages
Prerequisites
- Node.js 18+
- A Vercel AI Gateway API key
- Basic knowledge of Next.js and React
Setup
Create a new Next.js application:
pnpm create next-app@latest multi-modal-chatbot
cd multi-modal-chatbot
Install dependencies:
pnpm add ai @ai-sdk/react
Configure your API key:
AI_GATEWAY_API_KEY=your_api_key_here
Implementation
Create the API Route
Create a route handler that processes multi-modal messages:
import { streamText, convertToModelMessages, UIMessage } from 'ai';
export const maxDuration = 30;
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
const result = streamText({
model: 'openai/gpt-4o',
messages: await convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse();
}
The convertToModelMessages function automatically handles the conversion of images and PDFs from the UI format to the model’s expected format.
File Upload Helper
Create a helper function to convert files to data URLs:
export async function convertFilesToDataURLs(files: FileList) {
return Promise.all(
Array.from(files).map(
file =>
new Promise<{
type: 'file';
mediaType: string;
url: string;
}>((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
resolve({
type: 'file',
mediaType: file.type,
url: reader.result as string,
});
};
reader.onerror = reject;
reader.readAsDataURL(file);
}),
),
);
}
Chat Interface with File Upload
Build the frontend with support for uploading images and PDFs:
'use client';
import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import { useRef, useState } from 'react';
import Image from 'next/image';
import { convertFilesToDataURLs } from '@/lib/file-utils';
export default function Chat() {
const [input, setInput] = useState('');
const [files, setFiles] = useState<FileList | undefined>(undefined);
const fileInputRef = useRef<HTMLInputElement>(null);
const { messages, sendMessage } = useChat({
transport: new DefaultChatTransport({
api: '/api/chat',
}),
});
return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
{messages.map(m => (
<div key={m.id} className="whitespace-pre-wrap">
{m.role === 'user' ? 'User: ' : 'AI: '}
{m.parts.map((part, index) => {
if (part.type === 'text') {
return <span key={`${m.id}-text-${index}`}>{part.text}</span>;
}
if (part.type === 'file' && part.mediaType?.startsWith('image/')) {
return (
<Image
key={`${m.id}-image-${index}`}
src={part.url}
width={500}
height={500}
alt={`attachment-${index}`}
/>
);
}
if (part.type === 'file' && part.mediaType === 'application/pdf') {
return (
<iframe
key={`${m.id}-pdf-${index}`}
src={part.url}
width={500}
height={600}
title={`pdf-${index}`}
/>
);
}
return null;
})}
</div>
))}
<form
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl space-y-2"
onSubmit={async event => {
event.preventDefault();
const fileParts =
files && files.length > 0
? await convertFilesToDataURLs(files)
: [];
sendMessage({
role: 'user',
parts: [{ type: 'text', text: input }, ...fileParts],
});
setInput('');
setFiles(undefined);
if (fileInputRef.current) {
fileInputRef.current.value = '';
}
}}
>
<input
type="file"
accept="image/*,application/pdf"
onChange={event => {
if (event.target.files) {
setFiles(event.target.files);
}
}}
multiple
ref={fileInputRef}
/>
<input
className="w-full p-2"
value={input}
placeholder="Say something..."
onChange={e => setInput(e.target.value)}
/>
</form>
</div>
);
}
Key Features
Message Parts Structure
Messages use a parts array that can contain different types:
type MessagePart =
| { type: 'text'; text: string }
| { type: 'file'; mediaType: string; url: string };
File Processing
- User selects files via input field
- Files are converted to data URLs using FileReader API
- Data URLs are sent as part of the message
- Model processes the files alongside text
The interface renders different parts appropriately:
- Text: Displayed as plain text
- Images: Rendered using Next.js Image component
- PDFs: Displayed in an iframe
Running the Application
Visit http://localhost:3000 and try:
- Upload an image and ask “What’s in this image?”
- Upload a PDF and ask “Summarize this document”
- Send a regular text message
Using Other Providers
The AI SDK supports multiple providers with multi-modal capabilities:
// Anthropic
const result = streamText({
model: 'anthropic/claude-sonnet-4-20250514',
messages: await convertToModelMessages(messages),
});
// Google
const result = streamText({
model: 'google/gemini-2.5-flash',
messages: await convertToModelMessages(messages),
});
Best Practices
- File Size: Be mindful of file size limits for different providers
- Image Quality: Balance image quality with upload speed
- Error Handling: Handle file upload errors gracefully
- Loading States: Show progress indicators during file processing
Next Steps
- Add file size validation
- Implement drag-and-drop file upload
- Add support for more file types
- Implement file preview before sending
- Add tools for more advanced interactions
Resources