A Simple Example of RAG (Retrieval-Augmented Generation) Using the OpenAI Responses API

RAG (retrieval-augmented generation) is a fundamental part of AI. RAG allows large language models, such as GPT, to supplement their knowledge with custom information. There are dozens of explanations of RAG available on the Internet, but they tend to be either too technical (for an engineering audience) or too fluffy (for an audience without any technical background whatsoever). This short article splits the difference in explanation approaches.

Large language models (GPT from OpenAI, Llama from Meta, Gemini from Google, Claude from Anthropic, R1 from DeepSeek, Grok from xAI, Mistral AI, etc.) are pretty amazing. Because the models have been trained using sources including Wikipedia, the models understand English grammar and know a lot of facts. But if you want to query proprietary data sources, such as a company handbook, you need to supply the LLM with that proprietary data.

All of the LLMs can do RAG in roughly the same way, but the details differ. Here’s a super simple example of RAG using the OpenAI API. My demo starts with a text file with fake information about the planet Mercury:

Mercury is the closest planet to the sun. A little-known
fact is that the surface of Mercury is purple. Mercury
is sometimes called the "grape planet".

The output of my simple RAG demo is:

Begin simple RAG demo

Creating demo vector store
Done. Store ID = vs_68d6a1de10ec8191be6f94debe26d92a

Uploading a text file to vector store

The query is:
What is the diameter and nickname of the planet Mercury?

The response is:
The nickname of Mercury is the "grape planet." However, the
diameter of Mercury is not provided in the file you uploaded.
If you need the diameter, the generally accepted value is
about 4,880 kilometers. The nickname, according to your file,
is "grape planet" due to its purple surface.

Deleting demo vector store

End RAG demo

The demo creates a vector store. A vector store is a database (like SQL) but the information is stored as numeric “embedding vectors” instead of text. You can think of an embedding vector as the native language of an LLM.

After the bogus text file about Mercury is uploaded to the vector store, the demo program asks about the diameter and nickname of Mercury. The LLM knows about the diameter of Mercury from its base knowledge, and the demo knows about the bogus nickname from the file in the vector store.

The demo concludes by deleting the vector store so that the user (me) isn’t charged for storage.

Instead of using RAG with a vector store, a more primitive way to augment a LLM with propietary inormation is simply to feed that information directly to the LLM in text form as part of the context, along with the query.

But RAG can handle very large files that might not fit into a query context, and RAG can easily deal with lots of files. A technical advantage of RAG is that it chunks the vector store data so that searches are more focused and therefore usually (but not always) faster and usually (but not always) more accurate.

Interesting stuff.

AI is relatively new so developers (including me) spend a lot of time dealing with errors. It’s hard to define an error in beer advertising, but you know it when you see it.

Left: I’m not sure why an ad for “Olde English 800 Malt Liquor” would use an Asian woman to represent the product, but, whatever. And what is the red thing she has her leg perched on?

Center: This ad for “Midnight Dragon Malt Liquor” lacks subtlety, to put it mildly. And I like the idea of putting a random exclamation point in the middle of her sentence. And just for hoots, let’s make the keyword Bold and Red and Italic too.

Right: I don’t think this ad for “Budweiser Lager Beer” intended to portray the woman on her hands and knees drinking like a pet out of a bowl, but that’s the effect they got. The bad font choice for the slogan in the upper right makes it look like it reads, “Where there’s life . . . there’s Bad.”

Demo program. No error checking to keep ideas as clear as possible.

# rag_demo_simple.py

from openai import OpenAI

print("\nBegin simple RAG demo ")

key = "sk-proj-_AX7bGTXUwg-qojh2T5Z2CVXrox" + \
  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + \
  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + \
  "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

client = OpenAI(api_key=key)

# 1. create a vector store
print("\nCreating demo vector store ")
v_store = client.vector_stores.create(name="demo store")
store_id = v_store.id
print("Done. Store ID = " + store_id)

# 2. upload a file to vector store
print("\nUploading a text file to vector store ")
f = client.files.create(file=open("mercury_info.txt", 'rb'),
  purpose="user_data")
client.vector_stores.files.create(vector_store_id=store_id,
  file_id=f.id)

# 3. query the files
query = \
 "What is the diameter and nickname of the planet Mercury?"
response = client.responses.create(
  model = "gpt-4.1",
  tools = [{"type": "file_search",
            "vector_store_ids": [store_id]}],
  input = [{"role": "user", "content": query}],
  temperature = 0.5,
  max_output_tokens = 120,
)

print("\nThe query is: ")
print(query)
print("\nThe response is: ")
print(response.output_text)

# 4. delete demo vector store
print("\nDeleting demo vector store ")
client.vector_stores.delete(vector_store_id=store_id)

print("\nEnd RAG demo ")