RAG Chatbot

Learn how to build a chatbot that uses retrieval-augmented generation (RAG) to answer questions based on a custom knowledge base.

What is RAG?

RAG (Retrieval Augmented Generation) enhances AI responses by fetching relevant information and providing it as context to the language model. This allows the model to answer questions about information it wasn’t trained on, such as proprietary data or recent events.

How It Works

Chunking: Break source material into smaller pieces
Embedding: Convert text chunks into vector representations
Storage: Store embeddings in a vector database
Retrieval: When a user asks a question, embed the query and find similar chunks
Generation: Pass relevant chunks to the LLM as context

Prerequisites

Node.js 18+
A Vercel AI Gateway API key
PostgreSQL with pgvector extension

Setup

Clone the starter repository:

git clone https://github.com/vercel/ai-sdk-rag-starter
cd ai-sdk-rag-starter
pnpm install

Database Setup

Create a .env file:

cp .env.example .env

Add your database URL and AI Gateway API key:

DATABASE_URL="your-postgres-url"
AI_GATEWAY_API_KEY="your-api-key"

Run migrations:

pnpm db:migrate

Implementation

Create Embeddings Schema

Define a table to store text chunks and their embeddings:

import { pgTable, text, varchar, vector, index } from 'drizzle-orm/pg-core';
import { resources } from './resources';

export const embeddings = pgTable(
  'embeddings',
  {
    id: varchar('id', { length: 191 }).primaryKey(),
    resourceId: varchar('resource_id', { length: 191 }).references(
      () => resources.id,
      { onDelete: 'cascade' },
    ),
    content: text('content').notNull(),
    embedding: vector('embedding', { dimensions: 1536 }).notNull(),
  },
  table => ({
    embeddingIndex: index('embeddingIndex').using(
      'hnsw',
      table.embedding.op('vector_cosine_ops'),
    ),
  }),
);

Generate Embeddings

Create a function to chunk and embed text:

import { embedMany } from 'ai';

const embeddingModel = 'openai/text-embedding-ada-002';

const generateChunks = (input: string): string[] => {
  return input
    .trim()
    .split('.')
    .filter(i => i !== '');
};

export const generateEmbeddings = async (
  value: string,
): Promise<Array<{ embedding: number[]; content: string }>> => {
  const chunks = generateChunks(value);
  const { embeddings } = await embedMany({
    model: embeddingModel,
    values: chunks,
  });
  return embeddings.map((e, i) => ({ content: chunks[i], embedding: e }));
};

Store Resources with Embeddings

Create a server action to save content and generate embeddings:

'use server';

import { db } from '../db';
import { resources } from '../db/schema/resources';
import { embeddings as embeddingsTable } from '../db/schema/embeddings';
import { generateEmbeddings } from '../ai/embedding';

export const createResource = async (input: { content: string }) => {
  try {
    const { content } = input;

    const [resource] = await db
      .insert(resources)
      .values({ content })
      .returning();

    const embeddings = await generateEmbeddings(content);
    await db.insert(embeddingsTable).values(
      embeddings.map(embedding => ({
        resourceId: resource.id,
        ...embedding,
      })),
    );

    return 'Resource successfully created and embedded.';
  } catch (error) {
    return error instanceof Error && error.message.length > 0
      ? error.message
      : 'Error, please try again.';
  }
};

Retrieve Similar Content

Implement semantic search using cosine similarity:

import { embed } from 'ai';
import { db } from '../db';
import { cosineDistance, desc, gt, sql } from 'drizzle-orm';
import { embeddings } from '../db/schema/embeddings';

export const findRelevantContent = async (userQuery: string) => {
  const userQueryEmbedded = await embed({
    model: embeddingModel,
    value: userQuery,
  });
  
  const similarity = sql<number>`1 - (${
    cosineDistance(embeddings.embedding, userQueryEmbedded.embedding)
  })`;
  
  const similarGuides = await db
    .select({ name: embeddings.content, similarity })
    .from(embeddings)
    .where(gt(similarity, 0.5))
    .orderBy(t => desc(t.similarity))
    .limit(4);
    
  return similarGuides;
};

Create the Chat Interface

Build a route handler that uses tools for adding and retrieving information:

import { createResource } from '@/lib/actions/resources';
import { findRelevantContent } from '@/lib/ai/embedding';
import {
  convertToModelMessages,
  streamText,
  tool,
  UIMessage,
  stepCountIs,
} from 'ai';
import { z } from 'zod';

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: 'openai/gpt-4o',
    messages: await convertToModelMessages(messages),
    stopWhen: stepCountIs(5),
    system: `You are a helpful assistant. Check your knowledge base before answering any questions.
    Only respond to questions using information from tool calls.
    if no relevant information is found in the tool calls, respond, "Sorry, I don't know."`,
    tools: {
      addResource: tool({
        description: `add a resource to your knowledge base.
          If the user provides a random piece of knowledge unprompted, use this tool without asking for confirmation.`,
        inputSchema: z.object({
          content: z
            .string()
            .describe('the content or resource to add to the knowledge base'),
        }),
        execute: async ({ content }) => createResource({ content }),
      }),
      getInformation: tool({
        description: `get information from your knowledge base to answer questions.`,
        inputSchema: z.object({
          question: z.string().describe('the users question'),
        }),
        execute: async ({ question }) => findRelevantContent(question),
      }),
    },
  });

  return result.toUIMessageStreamResponse();
}

Frontend with useChat

Create a chat interface using the useChat hook:

'use client';

import { useChat } from '@ai-sdk/react';
import { useState } from 'react';

export default function Chat() {
  const [input, setInput] = useState('');
  const { messages, sendMessage } = useChat();
  
  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      <div className="space-y-4">
        {messages.map(m => (
          <div key={m.id} className="whitespace-pre-wrap">
            <div>
              <div className="font-bold">{m.role}</div>
              {m.parts.map(part => {
                switch (part.type) {
                  case 'text':
                    return <p>{part.text}</p>;
                  case 'tool-addResource':
                  case 'tool-getInformation':
                    return (
                      <p>
                        call{part.state === 'output-available' ? 'ed' : 'ing'}{' '}
                        tool: {part.type}
                      </p>
                    );
                }
              })}
            </div>
          </div>
        ))}
      </div>

      <form
        onSubmit={e => {
          e.preventDefault();
          sendMessage({ text: input });
          setInput('');
        }}
      >
        <input
          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={e => setInput(e.currentTarget.value)}
        />
      </form>
    </div>
  );
}

Running the Application

pnpm run dev

Visit http://localhost:3000 and try:

Tell the chatbot information: “My favorite food is pizza”
Ask questions: “What is my favorite food?”

The chatbot will store information in its knowledge base and retrieve it when needed.

Key Concepts

Embeddings: Vector representations of text that capture semantic meaning
Vector Database: Stores embeddings and enables similarity search
Cosine Similarity: Measures how similar two embeddings are
Chunking: Breaking text into smaller pieces for better embedding quality
Tools: Enable the agent to add and retrieve information dynamically

Next Steps

Experiment with different chunking strategies
Try different embedding models
Implement more advanced retrieval techniques
Add user-specific knowledge bases

Documentation Index

​RAG Chatbot

​What is RAG?

​How It Works

​Prerequisites

​Setup

​Database Setup

​Implementation

​Create Embeddings Schema

​Generate Embeddings

​Store Resources with Embeddings

​Retrieve Similar Content

​Create the Chat Interface

​Frontend with useChat

​Running the Application

​Key Concepts

​Next Steps

​Resources