Build a ChatGPT Clone with LangChain and OpenAI in 5 Steps
Build your own ChatGPT clone with LangChain, OpenAI, and Streamlit in five steps. Create a conversational AI with memory and real-time streaming.
Last updated: May 15, 2026
You will build a ChatGPT clone using LangChain for memory and streaming, OpenAI for the model, and Streamlit for the UI. The tutorial covers setup, conversation chain creation, and a real-time chat interface.
Build a ChatGPT Clone with LangChain and OpenAI in 5 Steps
Have you ever wanted to create your own conversational AI assistant, complete with memory, streaming responses, and a polished chat interface? In this tutorial, you will build a ChatGPT clone from scratch using LangChain for orchestration, OpenAI for the language model, and Streamlit for the user interface. By the end, you will have a fully functional chatbot that maintains conversation context and streams responses in real time. This project is perfect for understanding the core components behind modern conversational agents and serves as a foundation for more advanced systems.
Prerequisites
Before you start, make sure you have the following:
- Python 3.9 or newer installed on your machine
- An OpenAI API key (set as an environment variable
OPENAI_API_KEY) - Basic familiarity with Python and async programming
- A terminal and a code editor
You will also need to install these Python packages:
pip install langchain langchain-openai streamlit python-dotenvArchitecture Overview
The system consists of three main layers:
- UI Layer (Streamlit): Handles user input, displays messages, and manages session state.
- Orchestration Layer (LangChain): Manages conversation memory, chains prompts, and streams responses.
- Model Layer (OpenAI): Generates replies using the GPT-4 or GPT-3.5 model.
The following diagram shows how these components interact during a single user query:
Step-by-Step Implementation
Step 1: Set Up Environment Variables
Create a .env file in your project root and add your OpenAI API key:
OPENAI_API_KEY=sk-your-key-here
Then create a file named chatbot.py and load the environment variables at the top:
import os
from dotenv import load_dotenv
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
raise ValueError("OPENAI_API_KEY not found in .env file")Step 2: Create the LangChain Chain with Memory
LangChain provides a ConversationBufferMemory and a ConversationChain that handle prompt history automatically. We will configure the chain with streaming enabled:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-3.5-turbo",
temperature=0.7,
streaming=True,
openai_api_key=openai_api_key
)
memory = ConversationBufferMemory(return_messages=True)
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=False
)Notice that we set streaming=True on the LLM. This allows us to receive tokens one by one instead of waiting for the full response. The verbose=False keeps the console clean.
Step 3: Build the Streamlit User Interface
Streamlit makes it easy to create a chat interface. We will use session state to store the conversation history and a callback to handle streaming:
import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler
class StreamHandler(BaseCallbackHandler):
def __init__(self, container, initial_text=""):
self.container = container
self.text = initial_text
def on_llm_new_token(self, token: str, **kwargs) -> None:
self.text += token
self.container.markdown(self.text)
st.set_page_config(page_title="ChatGPT Clone", page_icon="🤖")
st.title("ChatGPT Clone with LangChain")
if "messages" not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if prompt := st.chat_input("Type your message..."):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
stream_handler = StreamHandler(st.empty())
response = conversation.predict(input=prompt, callbacks=[stream_handler])
st.session_state.messages.append({"role": "assistant", "content": response})Key points: The StreamHandler callback updates the UI container each time a new token arrives, giving the illusion of real-time streaming. The conversation history is stored in st.session_state.messages to persist across reruns.
Step 4: Run the Application
Save the file and run it from the terminal:
streamlit run chatbot.pyYour browser will open at http://localhost:8501. You can now chat with your clone. Type a message and watch the response stream in.
Step 5: Add Conversation Persistence (Optional)
By default, memory resets when you refresh the page. To persist conversations across sessions, you can save the memory to a file or database. Here is a simple JSON-based approach:
import json
def save_memory(memory, filepath="memory.json"):
data = {"history": memory.chat_memory.messages}
with open(filepath, "w") as f:
json.dump(data, f, default=str)
def load_memory(memory, filepath="memory.json"):
try:
with open(filepath, "r") as f:
data = json.load(f)
memory.chat_memory.messages = data["history"]
except FileNotFoundError:
passCall load_memory(memory) at startup and save_memory(memory) after each response. This gives your chatbot long-term memory.
Common Pitfalls
- API key errors: Make sure the
.envfile is in the same directory as your script and that the variable is named exactlyOPENAI_API_KEY. Restart Streamlit after changing the file. - Streaming not working: Verify that
streaming=Trueis set on theChatOpenAIinstance and that you pass thecallbackslist topredict(). If you useinvoke()instead, streaming will not work. - Memory not persisting: Streamlit reruns the script on every interaction. Use
st.session_stateto store objects like theconversationchain itself. Otherwise, a new chain is created each time, losing memory. - Rate limiting: OpenAI imposes rate limits on free and low-tier accounts. If you get 429 errors, add a small delay or use a lower-tier model like
gpt-3.5-turbo.
Next Steps
You now have a working ChatGPT clone with streaming and memory. To take it further, consider adding retrieval-augmented generation (RAG) so your chatbot can answer questions based on your own documents. You could also switch to an open-source model like LLaMA 3 running locally via Ollama to avoid API costs. Another improvement is to add a system prompt that gives your chatbot a specific personality or role. The LangChain documentation is an excellent resource for exploring these extensions.
Frequently Asked Questions
Do I need a paid OpenAI account?
Yes, you need an OpenAI API key which requires a paid account. However, you can use the free trial credits that come with new accounts. Alternatively, you can swap the model for a local one like LLaMA 3 via Ollama.
Can I use a different LLM provider?
Absolutely. LangChain supports many providers. Replace `ChatOpenAI` with `ChatAnthropic`, `ChatGooglePalm`, or a local model via `Ollama`. The rest of the code remains largely unchanged.
How do I clear the conversation memory?
You can clear the memory by calling `memory.clear()` in your code. In the Streamlit UI, add a button that triggers this method. Alternatively, restart the Streamlit app by pressing Ctrl+C and running it again.
Why is my response not streaming?
Make sure you set `streaming=True` on the LLM object and pass `callbacks=[stream_handler]` to the `predict()` method. Also verify that you are using a model that supports streaming, such as GPT-3.5 or GPT-4.