Llama Index
In this tutorial, we will guide you through the steps to create a Chainlit application integrated with Llama Index.

Preview of the app you'll build
Prerequisites
Before diving in, ensure that the following prerequisites are met:
- A working installation of Chainlit
- The Llama Index package installed
- An OpenAI API key
- A basic understanding of Python programming
Step 1: Set Up Your Data Directory
Create a folder named data
in the root of your app folder. Download the state of the union file (or any files of your own choice) and place it in the data
folder.
Step 2: Create the Python Script
Create a new Python file named app.py
in your project directory. This file will contain the main logic for your LLM application.
Step 3: Write the Application Logic
In app.py
, import the necessary packages and define one function to handle a new chat session and another function to handle messages incoming from the UI.
In this tutorial, we are going to use RetrieverQueryEngine
. Here’s the basic structure of the script:
import os
import openai
from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine
from llama_index.callbacks.base import CallbackManager
from llama_index import (
LLMPredictor,
ServiceContext,
StorageContext,
load_index_from_storage,
)
from langchain.chat_models import ChatOpenAI
import chainlit as cl
openai.api_key = os.environ.get("OPENAI_API_KEY")
try:
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# load index
index = load_index_from_storage(storage_context)
except:
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
index = GPTVectorStoreIndex.from_documents(documents)
index.storage_context.persist()
@cl.on_chat_start
async def factory():
llm_predictor = LLMPredictor(
llm=ChatOpenAI(
temperature=0,
model_name="gpt-3.5-turbo",
streaming=True,
),
)
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
chunk_size=512,
callback_manager=CallbackManager([cl.LlamaIndexCallbackHandler()]),
)
query_engine = index.as_query_engine(
service_context=service_context,
streaming=True,
)
cl.user_session.set("query_engine", query_engine)
@cl.on_message
async def main(message):
query_engine = cl.user_session.get("query_engine") # type: RetrieverQueryEngine
response = await cl.make_async(query_engine.query)(message)
response_message = cl.Message(content="")
for token in response.response_gen:
await response_message.stream_token(token=token)
if response.response_txt:
response_message.content = response.response_txt
await response_message.send()
This code sets up an instance of RetrieverQueryEngine
for each chat session. The RetrieverQueryEngine
is invoked everytime a user sends a message to generate the response.
The callback handlers are responsible for listening to the intermediate steps and sending them to the UI.
Step 4: Launch the Application
To kick off your LLM app, open a terminal, navigate to the directory containing app.py
, and run the following command:
chainlit run app.py -w
The -w
flag enables auto-reloading so that you don’t have to restart the server each time you modify your application. Your chatbot UI should now be accessible at http://localhost:8000.
What’s Next?
Congratulations! You’ve just created your first LLM app with Chainlit and Llama Index. From here, you can add elements and actions to create a more sophisticated app.
Happy coding! 🎉