In this tutorial, we will guide you through the steps to create a Chainlit application integrated with Llama Index.

Preview of the app you'll build


Before diving in, ensure that the following prerequisites are met:

  • A working installation of Chainlit
  • The Llama Index package installed
  • An OpenAI API key
  • A basic understanding of Python programming

Step 1: Set Up Your Data Directory

Create a folder named data in the root of your app folder. Download the state of the union file (or any files of your own choice) and place it in the data folder.

Step 2: Create the Python Script

Create a new Python file named in your project directory. This file will contain the main logic for your LLM application.

Step 3: Write the Application Logic

In, import the necessary packages and define one function to handle a new chat session and another function to handle messages incoming from the UI.

In this tutorial, we are going to use RetrieverQueryEngine. Here’s the basic structure of the script:
import os
import openai

from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine
from llama_index.callbacks.base import CallbackManager
from llama_index import (
from langchain.chat_models import ChatOpenAI
import chainlit as cl

openai.api_key = os.environ.get("OPENAI_API_KEY")

    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    # load index
    index = load_index_from_storage(storage_context)
    from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

    documents = SimpleDirectoryReader("./data").load_data()
    index = GPTVectorStoreIndex.from_documents(documents)

async def factory():
    llm_predictor = LLMPredictor(
    service_context = ServiceContext.from_defaults(

    query_engine = index.as_query_engine(

    cl.user_session.set("query_engine", query_engine)

async def main(message: cl.Message):
    query_engine = cl.user_session.get("query_engine")  # type: RetrieverQueryEngine
    response = await cl.make_async(query_engine.query)(message.content)

    response_message = cl.Message(content="")

    for token in response.response_gen:
        await response_message.stream_token(token=token)

    if response.response_txt:
        response_message.content = response.response_txt

    await response_message.send()

This code sets up an instance of RetrieverQueryEngine for each chat session. The RetrieverQueryEngine is invoked everytime a user sends a message to generate the response.

The callback handlers are responsible for listening to the intermediate steps and sending them to the UI.

Step 4: Launch the Application

To kick off your LLM app, open a terminal, navigate to the directory containing, and run the following command:

chainlit run -w

The -w flag enables auto-reloading so that you don’t have to restart the server each time you modify your application. Your chatbot UI should now be accessible at http://localhost:8000.