AI – Romiko Derbynew

So, you’re doing the “right thing.” You’re preparing your AKS cluster for the future by enabling the OIDC Issuer and Workload Identity. You haven’t even migrated your apps to use Federated Identity yet—you’re still rocking the classic Azure Pod Identity (or just standard Service Accounts). No harm, no foul, right?

Wrong.

As soon as you flip the switch on OIDC, Kubernetes changes the fundamental way it treats Service Account tokens. If you have long-running batch jobs (like Airflow workers, Spark jobs, or long-polling sensors), you might be walking into a 401 Unauthorized trap.

The “Gotcha”: Token Lifespan

Before OIDC enablement, your pods likely used legacy tokens. These were static, long-lived (often valid for ~1 year), and lived as simple secrets. They were the “set it and forget it” of the auth world.

How do you know if you are using the OIDC tokens? Inspect the token in your containers
/var/run/secrets/kubernetes.io/serviceaccount/token

If the Audience has xyz.oic.<env>-aks.azure.com, then its the OIDC token. Even though you have not implemented workload identity yet.

“aud”: [
“https://australiaeast.oic.prod-aks.azure.com/<tenantguid>/<guid>/“,

The Moment You Enable OIDC/Workload Identity: AKS shifts to Bound Projected Tokens. These are significantly more secure but come with a strict catch: The default expiration is 1 hour (3600 seconds).

If your app starts a session and doesn’t explicitly refresh that token, it will expire 60 minutes later. For a 4-hour batch job or a persistent sensor, this means your app will work perfectly… until it suddenly doesn’t.

Why It’s Sneaky

Azure Identity Still Works: Your connection to Key Vault or Storage via Pod Identity stays up.
The K8s API Fails: Only the calls within the cluster (like checking the status of another pod or a SparkApplication CRD) start throwing 401s.
It’s a Time Bomb: Everything looks fine in your 10-minute dev test. The failure only triggers in Production when the job hits the 61st minute or the token expired mid process.

The Quick Fix: The 24-Hour Band-Aid

If you aren’t ready to refactor your code to handle token rotation (which is the “real” fix), you can manually override the token lifespan using a Projected Volume in your Deployment or StatefulSet.

By mounting a custom token, you can extend that 1-hour window to something more batch-friendly, like 24 hours.

The Workaround YAML

You need to disable the automatic token mount and provide your own via volumes and volumeMounts.

			
# 1. Disable the default automount
--ServiceAccount--
apiVersion: v1
automountServiceAccountToken: false 
kind: ServiceAccount
--Deployment/Statefulset--
spec:
  automountServiceAccountToken: false 
  serviceAccountName: your-app-sa
  containers:
  - name: my-app
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: custom-token
      readOnly: true
  
  # 2. Project a token with a longer expiration
  volumes:
  - name: custom-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          # Match this to your cluster's OIDC issuer audience
          audience: https://australiaeast.oic.prod-aks.azure.com/YOUR-GUID/
          expirationSeconds: 86400 # 24 Hours
          path: token
      - configMap:
          name: kube-root-ca.crt
          items:
          - key: ca.crt
            path: ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

		

The Long-Term Play

While the 24-hour token buys you time, it’s a temporary safety net. Microsoft and the Kubernetes community are pushing for shorter token lifespans (AKS 1.33+ will likely enforce this more strictly).

Your to-do list:

Upgrade your SDKs: Modern Kubernetes clients (and Airflow providers) have built-in logic to reload tokens from the disk when they change.
Avoid Persistent Clients: Instead of one long-lived client object, initialize the client inside your retry loops.
Go All In: Finish the migration to Azure Workload Identity and move away from Pod Identity entirely.

Don’t let a security “improvement” become your next P1 incident. Check your batch job durations today!

TIP: Use TOKEN REVIEW to test your tokens, once you switch a cluster to OIDC.
https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-review-v1/

See:
https://learn.microsoft.com/en-us/azure/aks/workload-identity-migrate-from-pod-identity – This article does not warn you about the OIDC switch flick affecting Token behavour.

https://kubernetes.io/docs/concepts/storage/projected-volumes/

https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/

Introduction

In recent years, OpenAI has revolutionized the field of natural language processing with its advanced language models like ChatGPT. These models excel at generating human-like text and engaging in conversations. However, sometimes we may want to customize the output to align it with specific reference data or tailor it to specific domains. In this blog post, we will explore how to leverage OpenAI and a VectorDB to achieve this level of customization.

Understanding OpenAI and VectorDB: OpenAI is a renowned organization at the forefront of artificial intelligence research. They have developed language models capable of generating coherent and contextually relevant text based on given prompts. One such model is ChatGPT, which has been trained on vast amounts of diverse data to engage in interactive conversations.

VectorDB, on the other hand, is a powerful tool that enables the creation of indexes and retrieval mechanisms for documents based on semantic similarity. It leverages vector embeddings to calculate the similarity between documents and queries, facilitating efficient retrieval of relevant information.

Using OpenAI and VectorDB Together: To illustrate the use of OpenAI and VectorDB together, let’s dive into the provided sample code snippet:

import os
import sys

import openai
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma

import constants

os.environ["OPENAI_API_KEY"] = constants.APIKEY
# Enable to save to disk & reuse the model (for repeated queries on the same data)
PERSIST = False

query = None
if len(sys.argv) > 1:
  query = sys.argv[1]

if PERSIST and os.path.exists("persist"):
  print("Reusing index...\n")
  vectorstore = Chroma(persist_directory="persist", embedding_function=OpenAIEmbeddings())
  index = VectorStoreIndexWrapper(vectorstore=vectorstore)
else:
  #loader = TextLoader("data/data.txt") # Use this line if you only need data.txt
  loader = DirectoryLoader("data/")
  if PERSIST:
    index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"persist"}).from_loaders([loader])
  else:
    index = VectorstoreIndexCreator().from_loaders([loader])

chain = ConversationalRetrievalChain.from_llm(
  llm=ChatOpenAI(model="gpt-3.5-turbo"),
  retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}),
)

chat_history = []
while True:
  if not query:
    query = input("Prompt: ")
  if query in ['quit', 'q', 'exit']:
    sys.exit()
  result = chain({"question": query, "chat_history": chat_history})
  print(result['answer'])

  chat_history.append((query, result['answer']))
  query = None

Setting up the environment:
- The code imports the necessary libraries and sets the OpenAI API key.
- The PERSIST variable determines whether to save and reuse the model or not.
Loading and indexing the data:
- The code loads the reference data using a TextLoader or DirectoryLoader, depending on the requirements.
- If PERSIST is set to True, the code creates or reuses a VectorstoreIndexWrapper for efficient retrieval.
Creating a ConversationalRetrievalChain:
- The chain is initialized with a ChatOpenAI language model and the VectorDB index for retrieval.
- This chain combines the power of OpenAI’s language model with the semantic similarity-based retrieval capabilities of VectorDB.
Customizing the output:
- The code sets up a chat history to keep track of previous interactions.
- It enters a loop where the user can input prompts or queries.
- The input is processed using the ConversationalRetrievalChain, which generates an appropriate response based on the given question and chat history.
- The response is then displayed to the user.

Lets starts the program and see what the output is:

Dangers

The dangers of apps and social media is evident here. Utilising their own data sources (VectorDBs), the output of OpenAI can be massaged to align with a particular political party and contribute to the polarising nature social media and targeted advertising has had on our culture. A lot of challenges lie ahead to protect our language and cultural identity and influences.

Opportunity

This will super charge personalisation in the online e-commerce space. I am talking about the 2007 iPhone moment here. With very little changes to E-Commerce architecture, you can have super intelligent chatbots that understand the context of a customer based on the browsing history and order history alone. It will super charge tools that usually require expensive subscriptions to Zendesk. Google Dialogue Flow will move into a new real & meaningful conversation on websites. It could remind me if I forgot to order an item I usually order, make recommendations on cool events happening on the weekend based on the products and browsing patterns I have, with very little data ingestion!

Conclusion

In this blog post, we explored how to leverage OpenAI and VectorDB to customize the output from ChatGPT. By combining the strengths of OpenAI’s language model with the semantic similarity-based retrieval of VectorDB, we can create more tailored and domain-specific responses. This allows us to align the output with specific reference data and achieve greater control over the generated text. The provided code snippet serves as a starting point for implementing this customization in your own projects. So, go ahead and experiment with OpenAI and VectorDB to unlock new possibilities in natural language processing.

Full source code can be downloaded from here:

https://github.com/Romiko/ai-sandbox/tree/master/open-ai

Tip: OpenAI subscription is NOT the same as OpenAI API subscriptions. To run this, you will need an API Key and a subscription if you have used your 3 month subscription quota.

You can set this all up and ensure you setup usage rates and limits.

https://platform.openai.com/account/billing/limits

VSCode Tips – Anaconda Environments
launch.json

{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
“version”: “0.2.0”,
“configurations”: [
{
“name”: “Python: Current File”,
“type”: “python”,
“request”: “launch”,
“program”: “${file}”,
“console”: “integratedTerminal”,
“justMyCode”: true,
“cwd”: “${fileDirname}”,
}
]
}

Setup Anaconda Python Interpreter:

Ctrl + Shift + P
Typed Selected Interpreter and I chose my Anaconda Base environment for now. You can have many environments.

From here, you can debug in VSCode using Anaconda environments with ease.

Have Fun!