Skip to main content

LlamaIndex Adapter

LlamaIndex is a framework for building LLM-powered applications. LlamaIndex helps you ingest, structure, and access private or domain-specific data. LlamaIndex.TS offers the core features of LlamaIndex for Python for popular runtimes like Node.js (official support), Vercel Edge Functions (experimental), and Deno (experimental).

Installation

llamaindex is a required peer dependency.

Features

  • Transform LlamaIndex ChatEngine and QueryEngine streams to AI SDK UIMessageStream
  • Seamless integration with AI SDK UI components like useCompletion
  • Support for RAG (Retrieval Augmented Generation) workflows
  • Compatible with LlamaIndex’s document processing and indexing capabilities

Example: Completion

Here is a basic example that uses both AI SDK and LlamaIndex together with the Next.js App Router. The AI SDK @ai-sdk/llamaindex package uses the stream result from calling the chat method on a LlamaIndex ChatEngine or the query method on a LlamaIndex QueryEngine to pipe text to the client.
import { OpenAI, SimpleChatEngine } from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const llm = new OpenAI({ model: 'gpt-4o' });
  const chatEngine = new SimpleChatEngine({ llm });

  const stream = await chatEngine.chat({
    message: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}
Then, we use the AI SDK’s useCompletion method in the page component to handle the completion:
'use client';

import { useCompletion } from '@ai-sdk/react';

export default function Chat() {
  const { completion, input, handleInputChange, handleSubmit } =
    useCompletion();

  return (
    <div>
      {completion}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

Example: RAG with QueryEngine

LlamaIndex excels at building RAG applications. Here’s an example using a QueryEngine with document indexing:
import {
  OpenAI,
  VectorStoreIndex,
  SimpleDirectoryReader,
} from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

// Initialize once (consider caching in production)
let queryEngine: any = null;

async function getQueryEngine() {
  if (!queryEngine) {
    // Load documents from a directory
    const reader = new SimpleDirectoryReader();
    const documents = await reader.loadData('./data');

    // Create index from documents
    const index = await VectorStoreIndex.fromDocuments(documents);

    // Create query engine
    queryEngine = index.asQueryEngine({
      llm: new OpenAI({ model: 'gpt-4o' }),
    });
  }
  return queryEngine;
}

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const engine = await getQueryEngine();

  const stream = await engine.query({
    query: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}

Example: Chat with Context

Build a conversational interface with document context:
import {
  OpenAI,
  VectorStoreIndex,
  ContextChatEngine,
  SimpleDirectoryReader,
} from 'llamaindex';
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

export const maxDuration = 60;

let chatEngine: any = null;

async function getChatEngine() {
  if (!chatEngine) {
    const reader = new SimpleDirectoryReader();
    const documents = await reader.loadData('./data');
    const index = await VectorStoreIndex.fromDocuments(documents);

    // Create a chat engine with context
    chatEngine = new ContextChatEngine({
      retriever: index.asRetriever(),
      llm: new OpenAI({ model: 'gpt-4o' }),
    });
  }
  return chatEngine;
}

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const engine = await getChatEngine();

  const stream = await engine.chat({
    message: prompt,
    stream: true,
  });

  return createUIMessageStreamResponse({
    stream: toUIMessageStream(stream),
  });
}
Use with the useCompletion hook on the client:
'use client';

import { useCompletion } from '@ai-sdk/react';

export default function ChatWithContext() {
  const { completion, input, handleInputChange, handleSubmit, isLoading } =
    useCompletion({
      api: '/api/chat',
    });

  return (
    <div>
      <div className="response">
        {completion || 'Ask a question about your documents...'}
      </div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          disabled={isLoading}
          placeholder="What would you like to know?"
        />
        <button type="submit" disabled={isLoading}>
          {isLoading ? 'Thinking...' : 'Ask'}
        </button>
      </form>
    </div>
  );
}

API Reference

toUIMessageStream(stream)

Converts a LlamaIndex ChatEngine or QueryEngine stream to an AI SDK UIMessageStream.
import { toUIMessageStream } from '@ai-sdk/llamaindex';
import { createUIMessageStreamResponse } from 'ai';

const stream = await chatEngine.chat({
  message: prompt,
  stream: true,
});

return createUIMessageStreamResponse({
  stream: toUIMessageStream(stream),
});
Parameters:
  • stream: AsyncIterable - Stream from LlamaIndex ChatEngine or QueryEngine
Returns: ReadableStream<UIMessageChunk>

Integration with LlamaIndex Features

The adapter works seamlessly with LlamaIndex’s powerful features:

Document Loaders

  • Load documents from various sources (files, URLs, databases)
  • Support for multiple file formats (PDF, Markdown, JSON, etc.)
  • Custom document readers

Vector Stores

  • In-memory vector storage
  • Integration with external vector databases
  • Efficient similarity search

Retrievers

  • Vector similarity retrieval
  • Keyword-based retrieval
  • Hybrid retrieval strategies

Query Engines

  • Simple query engine for basic RAG
  • Sub-question query engine for complex queries
  • Custom query engines

Chat Engines

  • Simple chat engine
  • Context chat engine with retrieval
  • Condense question chat engine

More Examples

create-llama is the easiest way to get started with LlamaIndex. It uses the AI SDK to connect to LlamaIndex in all its generated code.

Learn More