If streaming is enabled at the LLM level, Langchain will only stream the intermediate steps. You can enable final answer streaming by passing stream_final_answer=True to the callback handler.
Copy
# Optionally, you can also pass the prefix tokens that will be used to identify the final answeranswer_prefix_tokens=["FINAL", "ANSWER"]cl.LangchainCallbackHandler( stream_final_answer=True, answer_prefix_tokens=answer_prefix_tokens, )
Final answer streaming will only work with prompts that have a consistent
final answer pattern. It will also not work with
AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION