Working With OpenAI Vector Stores

I was investigating retrieval-augmented generation (RAG) systems using the OpenAI Responses API. In a nutshell, a RAG system allows a query to use information from specific proprietary documents, such as a company handbook. Until recently, to implement RAG, you’d have to do a lot of work. The new Responses API dramatically simplifies creating a RAG system (but the process is still not trivial).

The key lines of code in a Responses API program look like:

query = "What is our Acme Company vacation policy?"
response = client.responses.create(
  input= query,
  model="gpt-4o-mini",
  tools=[{
    "type": "file_search",
    "vector_store_ids": [vector_store_details['id']],
  }]
)

You must have an existing vector store that is located on the OpenAI servers. A vector store holds embedded documents (the text has been tokenized and converted into numeric vectors) as opposed to a SQL database that has structured text data.

So, I knew that I had to more or less master how to work with OpenAI vector stores. Yesterday, I explored creating and deleting a vector store. I quickly learned that there are a lot of other topics I needed to learn. The OpenAI documentation at platform.openai.com/docs/api-reference/vector_stores/create is not great, but it’s pretty good, especially compared to other LLM systems such as Meta Llama, and Google Gemini, and Anthropic Claude.

Walking through the documentation was interesting but frustrating in the sense that everytime I learned something new, I realized there were five other new things that had to be learned. Anyway, I wrote some functions to:

* create a vector store
* delete a vector store
* upload a single document to a vector store
* upload multiple documents from a directory
* list the IDs of existing vector stores

Many of my functions are just relatively simple wrappers over an API call. Some of my demo code:

from openai import OpenAI
import os

def my_create_vector_store(store_name):
  # assumes a global-scope client object
  try:
    v_store = client.vector_stores.create(name=store_name)
    store_info = {
      "id": v_store.id,
      "name": v_store.name,
      "created_at": v_store.created_at,
      "file_count": v_store.file_counts.completed
    }
    return store_info
  except Exception as e:
    print("FATAL in create_vector_store() " + str(e))

# -----------------------------------------------------------

def my_delete_vector_store(store_id):
  # assumes a global-scope client object
  try:
    info = \
      client.vector_stores.delete(vector_store_id=store_id)
    return info
  except Exception as e:
    print("FATAL in delete_vector_store() " + str(e))  
   
# -----------------------------------------------------------

def my_get_vector_store_ids():
  # assumes global client object
  lst_store_ids = []
  existing_stores = client.vector_stores.list()
  for i in range(len(existing_stores.data)):
    curr_id = existing_stores.data[i].id
    lst_store_ids.append(curr_id)
  return lst_store_ids

# -----------------------------------------------------------

def my_upload_single_txt_file(file_path, store_id):
  # assumes a global-scope client object
  # file_path is full path + the file name with extension
  file_name = os.path.basename(file_path)
  try:
    file_res = client.files.create(file=open(file_path, 'rb'),
      purpose="user_data")
    attach_res = client.vector_stores.files.create(
      vector_store_id=store_id,
      file_id=file_res.id)
    # return {"file": file_name, "status": "success"}
  except Exception as e:
    print("FATAL in upload_single_txt_file() " + str(e))  
   
# -----------------------------------------------------------

def my_upload_multiple_txt_files(store_id, src_dir):
  # assumes a global-scope client object
  # uploads all the files in src_dir to vector store
  try:
    for fn in os.listdir(src_dir):
      src_path = os.path.join(src_dir, fn)
      print("uploading " + str(src_path))
      my_upload_single_txt_file(src_path, store_id)
  except Exception as e:
    print("FATAL in upload_multiple_txt_files " + str(e))

# -----------------------------------------------------------

This was just a beginning, but as the saying goes, every journey begins with a first step. Fascinating stuff.



This entry was posted in OpenAI. Bookmark the permalink.

Leave a Reply