AI – Romiko Derbynew

Introduction

In recent years, OpenAI has revolutionized the field of natural language processing with its advanced language models like ChatGPT. These models excel at generating human-like text and engaging in conversations. However, sometimes we may want to customize the output to align it with specific reference data or tailor it to specific domains. In this blog post, we will explore how to leverage OpenAI and a VectorDB to achieve this level of customization.

Understanding OpenAI and VectorDB: OpenAI is a renowned organization at the forefront of artificial intelligence research. They have developed language models capable of generating coherent and contextually relevant text based on given prompts. One such model is ChatGPT, which has been trained on vast amounts of diverse data to engage in interactive conversations.

VectorDB, on the other hand, is a powerful tool that enables the creation of indexes and retrieval mechanisms for documents based on semantic similarity. It leverages vector embeddings to calculate the similarity between documents and queries, facilitating efficient retrieval of relevant information.

Using OpenAI and VectorDB Together: To illustrate the use of OpenAI and VectorDB together, let’s dive into the provided sample code snippet:

import os
import sys

import openai
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma

import constants

os.environ["OPENAI_API_KEY"] = constants.APIKEY
# Enable to save to disk & reuse the model (for repeated queries on the same data)
PERSIST = False

query = None
if len(sys.argv) > 1:
  query = sys.argv[1]

if PERSIST and os.path.exists("persist"):
  print("Reusing index...\n")
  vectorstore = Chroma(persist_directory="persist", embedding_function=OpenAIEmbeddings())
  index = VectorStoreIndexWrapper(vectorstore=vectorstore)
else:
  #loader = TextLoader("data/data.txt") # Use this line if you only need data.txt
  loader = DirectoryLoader("data/")
  if PERSIST:
    index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"persist"}).from_loaders([loader])
  else:
    index = VectorstoreIndexCreator().from_loaders([loader])

chain = ConversationalRetrievalChain.from_llm(
  llm=ChatOpenAI(model="gpt-3.5-turbo"),
  retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}),
)

chat_history = []
while True:
  if not query:
    query = input("Prompt: ")
  if query in ['quit', 'q', 'exit']:
    sys.exit()
  result = chain({"question": query, "chat_history": chat_history})
  print(result['answer'])

  chat_history.append((query, result['answer']))
  query = None

Setting up the environment:
- The code imports the necessary libraries and sets the OpenAI API key.
- The PERSIST variable determines whether to save and reuse the model or not.
Loading and indexing the data:
- The code loads the reference data using a TextLoader or DirectoryLoader, depending on the requirements.
- If PERSIST is set to True, the code creates or reuses a VectorstoreIndexWrapper for efficient retrieval.
Creating a ConversationalRetrievalChain:
- The chain is initialized with a ChatOpenAI language model and the VectorDB index for retrieval.
- This chain combines the power of OpenAI’s language model with the semantic similarity-based retrieval capabilities of VectorDB.
Customizing the output:
- The code sets up a chat history to keep track of previous interactions.
- It enters a loop where the user can input prompts or queries.
- The input is processed using the ConversationalRetrievalChain, which generates an appropriate response based on the given question and chat history.
- The response is then displayed to the user.

Lets starts the program and see what the output is:

Dangers

The dangers of apps and social media is evident here. Utilising their own data sources (VectorDBs), the output of OpenAI can be massaged to align with a particular political party and contribute to the polarising nature social media and targeted advertising has had on our culture. A lot of challenges lie ahead to protect our language and cultural identity and influences.

Opportunity

This will super charge personalisation in the online e-commerce space. I am talking about the 2007 iPhone moment here. With very little changes to E-Commerce architecture, you can have super intelligent chatbots that understand the context of a customer based on the browsing history and order history alone. It will super charge tools that usually require expensive subscriptions to Zendesk. Google Dialogue Flow will move into a new real & meaningful conversation on websites. It could remind me if I forgot to order an item I usually order, make recommendations on cool events happening on the weekend based on the products and browsing patterns I have, with very little data ingestion!

Conclusion

In this blog post, we explored how to leverage OpenAI and VectorDB to customize the output from ChatGPT. By combining the strengths of OpenAI’s language model with the semantic similarity-based retrieval of VectorDB, we can create more tailored and domain-specific responses. This allows us to align the output with specific reference data and achieve greater control over the generated text. The provided code snippet serves as a starting point for implementing this customization in your own projects. So, go ahead and experiment with OpenAI and VectorDB to unlock new possibilities in natural language processing.

Full source code can be downloaded from here:

https://github.com/Romiko/ai-sandbox/tree/master/open-ai

Tip: OpenAI subscription is NOT the same as OpenAI API subscriptions. To run this, you will need an API Key and a subscription if you have used your 3 month subscription quota.

You can set this all up and ensure you setup usage rates and limits.

https://platform.openai.com/account/billing/limits

VSCode Tips – Anaconda Environments
launch.json

{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
“version”: “0.2.0”,
“configurations”: [
{
“name”: “Python: Current File”,
“type”: “python”,
“request”: “launch”,
“program”: “${file}”,
“console”: “integratedTerminal”,
“justMyCode”: true,
“cwd”: “${fileDirname}”,
}
]
}

Setup Anaconda Python Interpreter:

Ctrl + Shift + P
Typed Selected Interpreter and I chose my Anaconda Base environment for now. You can have many environments.

From here, you can debug in VSCode using Anaconda environments with ease.

Have Fun!