LlamaIndex Adapter

LlamaIndex is a framework for building LLM-powered applications. LlamaIndex helps you ingest, structure, and access private or domain-specific data. LlamaIndex.TS offers the core features of LlamaIndex for Python for popular runtimes like Node.js (official support), Vercel Edge Functions (experimental), and Deno (experimental).

Installation

Tab Title
Tab Title
Tab Title

llamaindex is a required peer dependency.

Features

Transform LlamaIndex ChatEngine and QueryEngine streams to AI SDK UIMessageStream
Seamless integration with AI SDK UI components like useCompletion
Support for RAG (Retrieval Augmented Generation) workflows
Compatible with LlamaIndex’s document processing and indexing capabilities

Example: Completion

Here is a basic example that uses both AI SDK and LlamaIndex together with the Next.js App Router. The AI SDK @ai-sdk/llamaindex package uses the stream result from calling the chat method on a LlamaIndex ChatEngine or the query method on a LlamaIndex QueryEngine to pipe text to the client.

import { OpenAI, SimpleChatEngine } from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const llm = new OpenAI({ model: 'gpt-4o' });
  const chatEngine = new SimpleChatEngine({ llm });

  const stream = await chatEngine.chat({
    message: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}

Then, we use the AI SDK’s useCompletion method in the page component to handle the completion:

'use client';

import { useCompletion } from '@ai-sdk/react';

export default function Chat() {
  const { completion, input, handleInputChange, handleSubmit } =
    useCompletion();

  return (
    <div>
      {completion}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

Example: RAG with QueryEngine

LlamaIndex excels at building RAG applications. Here’s an example using a QueryEngine with document indexing:

import {
  OpenAI,
  VectorStoreIndex,
  SimpleDirectoryReader,
} from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

// Initialize once (consider caching in production)
let queryEngine: any = null;

async function getQueryEngine() {
  if (!queryEngine) {
    // Load documents from a directory
    const reader = new SimpleDirectoryReader();
    const documents = await reader.loadData('./data');

    // Create index from documents
    const index = await VectorStoreIndex.fromDocuments(documents);

    // Create query engine
    queryEngine = index.asQueryEngine({
      llm: new OpenAI({ model: 'gpt-4o' }),
    });
  }
  return queryEngine;
}

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const engine = await getQueryEngine();

  const stream = await engine.query({
    query: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}

Example: Chat with Context

Build a conversational interface with document context:

import {
  OpenAI,
  VectorStoreIndex,
  ContextChatEngine,
  SimpleDirectoryReader,
} from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

let chatEngine: any = null;

async function getChatEngine() {
  if (!chatEngine) {
    const reader = new SimpleDirectoryReader();
    const documents = await reader.loadData('./data');
    const index = await VectorStoreIndex.fromDocuments(documents);

    // Create a chat engine with context
    chatEngine = new ContextChatEngine({
      retriever: index.asRetriever(),
      llm: new OpenAI({ model: 'gpt-4o' }),
    });
  }
  return chatEngine;
}

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const engine = await getChatEngine();

  const stream = await engine.chat({
    message: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}

Use with the useCompletion hook on the client:

'use client';

import { useCompletion } from '@ai-sdk/react';

export default function ChatWithContext() {
  const { completion, input, handleInputChange, handleSubmit, isLoading } =
    useCompletion({
      api: '/api/chat',
    });

  return (
    <div>
      <div className="response">
        {completion || 'Ask a question about your documents...'}
      </div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          disabled={isLoading}
          placeholder="What would you like to know?"
        />
        <button type="submit" disabled={isLoading}>
          {isLoading ? 'Thinking...' : 'Ask'}
        </button>
      </form>
    </div>
  );
}

API Reference

`toUIMessageStream(stream)`

Converts a LlamaIndex ChatEngine or QueryEngine stream to an AI SDK UIMessageStream.

import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

const stream = await chatEngine.chat({
  message: prompt,
  stream: true,
});

return createUIMessageStreamResponse({
  stream: toUIMessageStream(stream),
});

Parameters:

stream: AsyncIterable - Stream from LlamaIndex ChatEngine or QueryEngine

Returns: ReadableStream<UIMessageChunk>

Integration with LlamaIndex Features

The adapter works seamlessly with LlamaIndex’s powerful features:

Document Loaders

Load documents from various sources (files, URLs, databases)
Support for multiple file formats (PDF, Markdown, JSON, etc.)
Custom document readers

Vector Stores

In-memory vector storage
Integration with external vector databases
Efficient similarity search

Retrievers

Vector similarity retrieval
Keyword-based retrieval
Hybrid retrieval strategies

Query Engines

Simple query engine for basic RAG
Sub-question query engine for complex queries
Custom query engines

Chat Engines

Simple chat engine
Context chat engine with retrieval
Condense question chat engine

More Examples

create-llama is the easiest way to get started with LlamaIndex. It uses the AI SDK to connect to LlamaIndex in all its generated code.

Documentation Index

​LlamaIndex Adapter

​Installation

​Features

​Example: Completion

​Example: RAG with QueryEngine

​Example: Chat with Context

​API Reference

​toUIMessageStream(stream)

​Integration with LlamaIndex Features

​Document Loaders

​Vector Stores

​Retrievers

​Query Engines

​Chat Engines

​More Examples

​Learn More

LlamaIndex Adapter

Installation

Features

Example: Completion

Example: RAG with QueryEngine

Example: Chat with Context

API Reference

`toUIMessageStream(stream)`

Integration with LlamaIndex Features

Document Loaders

Vector Stores

Retrievers

Query Engines

Chat Engines

More Examples

Learn More