TL;DR: In late Aug–Sep 2025, Bitnami (Broadcom) shifted most free images off docker.io/bitnami, introduced a latest-only, dev-intended “bitnamisecure” subset, archived versioned tags to docker.io/bitnamilegacy (no updates), ran rolling brownouts of popular images, and said their OCI Helm charts on Docker Hub would stop receiving updates (except for the tiny free subset). Result: lots of teams saw pull failures and surprise drift, especially for core bits like kubectl, ExternalDNS, PostgreSQL; some Helm charts still referenced images that went missing mid-migration. GitHub+2hub.docker.com+2

What changed (and when)

Timeline. Bitnami announced the change for 28 Aug 2025, then postponed deletion of the public catalog to 29 Sep 2025, running three 24-hour brownouts to “raise awareness.” Brownout sets explicitly included external-dns (Aug 28) and kubectl, redis, postgresql, mongodb (Sep 17). Tags were later restored, except very old distro bases. GitHub
Free tier becomes “bitnamisecure/…” Available only as latest and “intended for development” (their wording). No version matrix. hub.docker.com+1
Legacy archive. Versioned tags moved to docker.io/bitnamilegacy—no updates, no support; meant only as a temporary bridge. GitHub+1
Charts. Source code stays on GitHub, but OCI charts on Docker Hub stop receiving updates (except the small free subset) and won’t work out-of-the-box unless you override image repos. Bitnami’s own FAQ shows helm upgrade … --set image.repository=bitnamilegacy/... as a short-term band-aid. GitHub

That mix of latest-only + brownouts + chart defaults still pointing at moved/blocked images is why so many clusters copped it, bru—especially anything depending on kubectl sidecars/hooks, ExternalDNS, or PostgreSQL images. GitHub

Why “latest-only, dev-intended” breaks production hygiene

Production needs immutability and pinning. “Latest” is mutable and can introduce breaking changes or CVE regressions without your staging gates ever seeing them. Bitnami explicitly positions these bitnamisecure/* freebies as development-only; if you need versions, you’re pointed to a paid catalog. That alone makes the free images not fit for prod, regardless of hardening claims. hub.docker.com

How clusters actually broke

Brownouts removed popular images for 24h windows. If your charts/Jobs still pulled from docker.io/bitnami, pods simply couldn’t pull. Next reconciliation loop? CrashLoop/back-off. GitHub
Chart/image mismatch. OCI charts remain published but aren’t updated to point at the new repos; unless you override every image.repository (and sometimes initContainer/metrics sidecars), you deploy a chart that references unavailable images. Bitnami’s own example shows how many fields you might need to override in something like PostgreSQL. GitHub
kubectl images. Lots of ops charts use a tiny kubectl image for hooks or jobs. When bitnami/kubectl went dark during brownouts, those jobs failed. Upstream alternatives exist (see below). hub.docker.com+1

Better defaults for core components (ditch the vendor lock)

Wherever possible, move back upstream for the chart and use official/community images:

ExternalDNS – Upstream chart & docs (Kubernetes SIGs): kubernetes-sigs/external-dns. Image: registry.k8s.io/external-dns/external-dns (pin a tag). GitHub+1
Velero – Upstream chart (VMware Tanzu Helm repo on Artifact Hub) and upstream images (pin). artifacthub.io
kubectl – Prefer upstream registry: registry.k8s.io hosts Kubernetes container images; several maintained images provide kubectl (or use distro images like alpine/kubectl/rancher/kubectl if they meet your standards—pin exact versions). GitHub+3Kubernetes+3GitHub+3

For stateful services:

PostgreSQL – Operators such as CloudNativePG (CNCF project). Alternatives include commercial operators; or, if you stick with straight images, use the official postgres image and manage via your own Helm/Kustomize. cloudnative-pg.io+1
MongoDB – Percona Operator for MongoDB (open-source) is a strong, widely used option. Percona Documentation+1
Redis – Consider the official redis image (or valkey where appropriate), plus a community operator if you need HA/cluster features; evaluate operator maturity and open issues for your SLA needs. (Context from Bitnami’s lists shows Redis/Valkey were part of the brownout sets.)

Questions Bitnami should answer publicly

Why ship a dev-only latest-only free tier for components that underpin production clusters, without a long freeze window and frictionless migration for chart defaults? (Their Docker Hub pages literally say latest-only and dev-intended.) hub.docker.com
Why brownouts of ubiquitous infra images (external-dns, kubectl, postgresql) during the migration window, increasing blast radius for unsuspecting teams? GitHub
Why leave OCI charts published but not updated to sane defaults (or at least yanking them) so new installs don’t reference unavailable registries by default?

Bitnami

Gain confidence, control and visibility of your software supply chain security with production-ready open source software delivered continuously in hardened images, with minimal CVEs and transparency you can trust.

We have lost confidence in your software supply chain.

TL;DR: Pin versions, set sane resources, respect system-node taints, make Gatekeeper happy, no-encoding secrets, and mirror images (Never pull from public registries and blindly trust them).

Works great on AKS, EKS, GKE — examples below use AKS.

The default dynakube template that Dynatrace provides you – will probably not work in the real world. You have zero trust, Calico firewalls, OPA Gatekeeper and perhaps some system pool taints?

Quick checks (healthy install):

dynatrace-operator Deployment is Ready
2x dynatrace-webhook pods
dynatrace-oneagent-csi-driver DaemonSet on every node (incl. system)
OneAgent pods per node (incl. system)
1x ActiveGate StatefulSet ready
Optional OTEL collector running if you enabled it

k get dynakube
NAME                  APIURL                                  STATUS    AGE
xxx-prd-xxxxxxxx      https://xxx.live.dynatrace.com/api   Running   13d

kubectl -n dynatrace get deploy,sts

# CSI & OneAgent on all nodes
kubectl -n dynatrace get ds

# Dynakube CR status
kubectl -n dynatrace get dynakube -o wide

# RBAC sanity for k8s monitoring
kubectl auth can-i list dynakubes.dynatrace.com \
  --as=system:serviceaccount:dynatrace:dynatrace-kubernetes-monitoring --all-namespaces

NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/dynatrace-operator   1/1     1            1           232d
deployment.apps/dynatrace-webhook    2/2     2            2           13d

NAME                                                  READY   AGE
statefulset.apps/xxx-prd-xxxxxxxxxxx-activegate       1/1     13d
statefulset.apps/xxx-prd-xxxxxxxxxxx-otel-collector   1/1     13d
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
xxx-prd-xxxxxxxxxxx-oneagent    9         9         9       9            9           <none>          13d
dynatrace-oneagent-csi-driver   9         9         9       9            9           <none>          13d
NAME                  APIURL                                  STATUS    AGE
xxx-prd-xxxxxxxxxxx   https://xxx.live.dynatrace.com/api   Running   13d
yes

Here are field-tested tips to keep Dynatrace humming on Kubernetes without fighting OPA Gatekeeper, seccomp, or AKS quirks.

1) Start with a clean Dynakube spec (and pin your versions)

Pin your operator chart/image and treat upgrades as real change (PRs, changelog, Argo sync-waves). A lean cloudNativeFullStack baseline that plays nicely with Gatekeeper:

apiVersion: dynatrace.com/v1beta5
kind: DynaKube
metadata:
  name: dynakube-main
  namespace: dynatrace
  labels:
    dynatrace.com/created-by: "dynatrace.kubernetes"
  annotations:
    # Helps Gatekeeper/PSA by ensuring init containers use a seccomp profile
    feature.dynatrace.com/init-container-seccomp-profile: "true"
    # GitOps safety
    argocd.argoproj.io/sync-wave: "5"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  apiUrl: https://<your-environment>.live.dynatrace.com/api
  metadataEnrichment:
    enabled: true

  oneAgent:
    hostGroup: PaaS_Development   # pick a sensible naming scheme: PaaS_<Env>
    cloudNativeFullStack:
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
          operator: Exists
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
          operator: Exists
        - key: "CriticalAddonsOnly"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      oneAgentResources:
        requests:
          cpu: 100m
          memory: 512Mi
        limits:
          cpu: 300m
          memory: 1.5Gi

  activeGate:
    capabilities: [routing, kubernetes-monitoring, debugging]
    resources:
      requests:
        cpu: 500m
        memory: 1.5Gi
      limits:
        cpu: 1000m
        memory: 1.5Gi

  logMonitoring: {}
  telemetryIngest:
    protocols: [jaeger, otlp, statsd, zipkin]
    serviceName: telemetry-ingest

  templates:
    otelCollector:
      imageRef:
        repository: <your-acr>.azurecr.io/dynatrace/dynatrace-otel-collector
        tag: latest

Why this works: it respects control-plane taints, adds the CriticalAddonsOnly toleration for system pools, sets reasonable resource bounds, and preps you for GitOps.

2) System node pools are sacred — add the toleration

If your CSI Driver or OneAgent skips system nodes, your visibility and injection can be patchy. Make sure you’ve got:

tolerations:
  - key: "CriticalAddonsOnly"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Your taints might be different, so check what taints you have on your systempools. This is the difference between “almost there” and “golden”.

3) Resource requests that won’t sandbag the cluster

OneAgent: requests: cpu 100m / mem 512Mi and limits: cpu 300m / mem 1.5Gi are a good starting point for mixed workloads.
ActiveGate: requests: 500m / 1.5Gi, limits: 1000m / 1.5Gi.
Tune off SLOs and node shapes; don’t be shy to profile and trim.

4) Make Gatekeeper your mate (OPA policies that help, not hinder)

Enforce the seccomp hint on DynaKube CRs (so the operator sets profiles on init containers and your PSA/Gatekeeper policies stay green).

ConstraintTemplate (checks DynaKube annotations):

5) Secrets: avoid the dreaded encode (akv2k8s tip)

Kubernetes Secret.data is base64 on the wire, but tools like akv2k8s can also feed you values that are already base64. If using tools like akv2k8s, use this to transform the output.

apiVersion: spv.no/v1
kind: AzureKeyVaultSecret
metadata:
  name: dynatrace-api-token-akvs
  namespace: dynatrace
spec:
  vault:
    name: kv-xxx-001
    object:
      name: DynatraceApiToken
      type: secret
  output:
    transform:
      - base64decode
    secret:
      name: aks-xxx-001
      type: Opaque
      dataKey: apiToken
---
apiVersion: spv.no/v1
kind: AzureKeyVaultSecret
metadata:
  name: dynatrace-dataingest-token-akvs
  namespace: dynatrace
spec:
  vault:
    name: kv-xxx-001
    object:
      name: DynatraceDataIngestToken
      type: secret
  output:
    transform:
        - base64decode
    secret:
      name: aks-xxx-001
      type: Opaque
      dataKey: dataIngestToken

This will ensure Dynatrace can read the Kubernentes Opaque secret as it, no base64 encoding on the secret.

6) Mirror images to your registry (and pin)

Air-gapping or just speeding up pulls? Mirror dynatrace-operator, activegate, dynatrace-otel-collector into your ACR/ECR/GCR and reference them via the Dynakube templates.*.imageRef blocks or Helm values. GitOps + private registry = fewer surprises.

We use ACR Cache.

7) RBAC: fix the “list dynakubes permission is missing” warning

If you see that warning in the UI, verify the service account:

# https://docs.dynatrace.com/docs/ingest-from/setup-on-k8s/reference/security
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: dynatrace-k8smon-extra-perms
rules:
  - apiGroups: ["dynatrace.com"]
    resources: ["dynakubes"]
    verbs: ["get","list","watch"]
  - apiGroups: [""]
    resources: ["configmaps","secrets"]
    verbs: ["get","list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dynatrace-k8smon-extra-perms
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: dynatrace-k8smon-extra-perms
subjects:
  - kind: ServiceAccount
    name: dynatrace-kubernetes-monitoring
    namespace: dynatrace

kubectl auth can-i list dynakubes.dynatrace.com \
–as=system:serviceaccount:dynatrace:dynatrace-kubernetes-monitoring –all-namespaces

If “no”, ensure the chart installed/updated the ClusterRole and ClusterRoleBinding that grant list/watch/get on dynakubes.dynatrace.com. Sometimes upgrading the operator or re-syncing RBAC via Helm/Argo cleans it up.

8) HostGroup naming that scales

Keep it boring and predictable:

PaaS_Development
PaaS_NonProduction
PaaS_Production

9) GitOps tricks (ArgoCD/Flux)

Use argocd.argoproj.io/sync-wave to ensure CRDs & operator land before Dynakube.
For major upgrades or URL/token churn:
1. kubectl -n dynatrace delete dynakube <name>
2. wait for operator cleanup
3. sync the new spec (Force + Prune if needed).

10) Networking & egress

If you restrict egress, either:

Allow ActiveGate to route traffic out and keep workload egress tight; or
Allowlist Dynatrace SaaS endpoints directly.
Don’t forget webhook call-backs and OTLP ports if you’re shipping traces/logs.

11) Troubleshooting you’ll actually use

OneAgent not injecting? Check the CSI Driver DaemonSet and the node you’re scheduling on. Make sure tolerations cover system pools.
Pods crash-loop with sidecar errors? Often token/secret issues — confirm you didn’t double-encode.
UI shows “permission missing”? Re-check RBAC and chart version; reconcile with Helm/Argo.
Gatekeeper blocking? Dry-run constraints first; add namespace/label-based exemptions for operator internals.

12) What “good” looks like

A healthy cluster shows:

dynatrace-operator 1/1
dynatrace-webhook 2/2
dynatrace-oneagent-csi-driver DESIRED == READY == node count
OneAgent pods present on all worker and system nodes
ActiveGate 1/1
Optional OTEL collector 1/1
…and dashboards populating within minutes.

That’s it — keep it simple, pin your bits, let Gatekeeper help (not hurt), and your Dynatrace setup will surf smooth swells instead of close-outs.

Other useful commands – hardcore diagnosis

kubectl exec -n dynatrace deployment/dynatrace-operator -- dynatrace-operator support-archive --stdout > operator-support-archive.zip

What the Dynatrace webhooks do on Kubernetes

When you install the Dynatrace Operator, you’ll see pods named something like dynatrace-webhook-xxxxx. They back one or more admission webhook configurations. In practice they do three big jobs:

Mutating Pods for OneAgent injection
- Adds init containers / volume mounts / env vars so your app Pods load the OneAgent bits that come from the CSI driver.
- Ensures the right binaries and libraries are available (e.g., via mounted volumes) and the process gets the proper preload/agent settings.
- Respects opt-in/opt-out annotations/labels on namespaces and Pods (e.g. dynatrace.com/inject: "false" to skip a Pod).
- Can also add Dynatrace metadata enrichment env/labels so the platform sees k8s context (workload, namespace, node, etc.).
Validating Dynatrace CRs (like DynaKube)
- Schema and consistency checks: catches bad combinations (e.g., missing fields, wrong mode), so you don’t admit a broken config.
- Helps avoid partial/failed rollouts by rejecting misconfigured specs early.
Hardening/compatibility tweaks
- With certain features enabled, the mutating webhook helps ensure injected init containers comply with cluster policies (e.g., seccomp, PSA/PSS).
- That’s why we recommend the annotation you’ve been using:
  feature.dynatrace.com/init-container-seccomp-profile: "true"
  It keeps Gatekeeper/PSA happy when it inspects the injected bits.

Why two `dynatrace-webhook` pods?

High availability for admission traffic. If one goes down, the other still serves the API server’s webhook calls.

How this ties into Gatekeeper/PSA

Gatekeeper (OPA) also uses validating admission.
The Dynatrace mutating webhook will first shape the Pod (add mounts/env/init).
Gatekeeper then validates the final Pod spec.
If you’re enforcing “must have seccomp/resources,” ensure Dynatrace’s injected init/sidecar also satisfies those rules (hence that seccomp annotation and resource limits you’ve set).

Dynatrace Active Gate

A Dynatrace ActiveGate acts as a secure proxy between Dynatrace OneAgents and Dynatrace Clusters or between Dynatrace OneAgents and other ActiveGates—those closer to the Dynatrace Cluster.
It establishes Dynatrace presence—in your local network. In this way it allows you to reduce your interaction with Dynatrace to one single point—available locally. Besides convenience, this solution optimizes traffic volume, reduces the complexity of the network and cost. It also ensures the security of sealed networks.

The docs on Active Gate and version compatibility with Dynakube are not yet mature. Ensure the following:

With Dynatrace Operator 1.7 the v1beta1 and v1beta2 API versions for the DynaKube custom resource were removed.

ActiveGates up to and including version 1.323 used to call the v1beta1 endpoint. Starting from ActiveGate 1.325, the DynaKube endpoint was changed to v1beta3
Ensure your ActiveGate is up to date with the latest version.

Dynatrace CPU and Memory Requests and Limits

Sources:
https://docs.dynatrace.com/docs/ingest-from/setup-on-k8s/guides/deployment-and-configuration/resource-management/dto-resource-limits

https://community.dynatrace.com/t5/Troubleshooting/Troubleshooting-Kubernetes-CPU-Throttling-Problems-in-Dynatrace/ta-p/250345

As part of our ongoing platform reliability work, we’ve introduced explicit CPU and memory requests/limits for all Dynatrace components running on AKS.

🧩 Why it matters

Previously, the OneAgent and ActiveGate pods relied on Kubernetes’ default scheduling behaviour. This meant:

No guaranteed CPU/memory allocation → possible throttling or eviction during cluster load spikes.
Risk of noisy-neighbour effects on shared nodes.
Unpredictable autoscaling signals and Dynatrace performance fluctuations.

Setting requests and limits gives the scheduler clear boundaries:

Requests = guaranteed resources for stable operation
Limits = hard ceiling to prevent runaway usage
Helps Dynatrace collect telemetry without starving app workloads

⚙️ Updated configuration

OneAgent

oneAgentResources:
  requests:
    cpu: 100m
    memory: 512Mi
  limits:
    cpu: 300m
    memory: 1.5Gi

ActiveGate

resources:
  requests:
    cpu: 200m
    memory: 512Mi
  limits:
    cpu: 500m
    memory: 1Gi

These values were tuned from observed averages across DEV, UAT and PROD clusters. They provide a safe baseline—enough headroom for spikes while keeping node utilisation predictable.

🧠 Key takeaway

Explicit resource boundaries = fewer throttled agents, steadier telemetry, and happier nodes.

Other resources:

installCRD: true

operator:
  resources:
    requests:
      cpu: "50m"
      memory: "64Mi"
    limits:
      cpu: "100m"
      memory: "128Mi"

webhook:
  resources:
    requests:
      cpu: "150m"
      memory: "128Mi"
    limits:
      cpu: "300m"
      memory: "128Mi"

csidriver:
  csiInit:
    resources:
      requests:
        cpu: "50m"
        memory: "100Mi"
      limits:
        cpu: "50m"
        memory: "100Mi"
  server:
    resources:
      requests:
        cpu: "50m"
        memory: "100Mi"
      limits:
        cpu: "100m"
        memory: "100Mi"
  provisioner:
    resources:
      requests:
        cpu: "200m"
        memory: "100Mi"
      limits:
        cpu: "300m"
        memory: "100Mi"
  registrar:
    resources:
      requests:
        cpu: "20m"
        memory: "30Mi"
      limits:
        cpu: "30m"
        memory: "30Mi"
  livenessprobe:
    resources:
      requests:
        cpu: "20m"
        memory: "30Mi"
      limits:
        cpu: "30m"
        memory: "30Mi"

Dynakube

apiVersion: dynatrace.com/v1beta5
kind: DynaKube
metadata:
  name: xxx
  namespace: dynatrace
  labels:
    dynatrace.com/created-by: "dynatrace.kubernetes"
  annotations:
    feature.dynatrace.com/k8s-app-enabled: "true"
    argocd.argoproj.io/sync-wave: "5"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
    feature.dynatrace.com/init-container-seccomp-profile: "true"
# Link to api reference for further information: https://docs.dynatrace.com/docs/ingest-from/setup-on-k8s/reference/dynakube-parameters
spec:
  apiUrl: https://xxx.live.dynatrace.com/api
  metadataEnrichment:
    enabled: true
  oneAgent:
    hostGroup: xxx
    cloudNativeFullStack:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
          operator: Exists
        - effect: "NoSchedule"
          key: "CriticalAddonsOnly"
          operator: "Equal"
          value: "true"
      oneAgentResources:
        requests:
          cpu: 100m
          memory: 512Mi
        limits:
          cpu: 300m
          memory: 1.5Gi
  activeGate:
    capabilities:
      - routing
      - kubernetes-monitoring
      #- debugging
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 1Gi
  logMonitoring: {}
  telemetryIngest:
    protocols:
      - jaeger
      - otlp
      - statsd
      - zipkin
    serviceName: telemetry-ingest

  templates:
    otelCollector:
      imageRef:
        repository: xxx.azurecr.io/dynatrace/dynatrace-otel-collector
        tag: latest
      resources:
        requests:
          cpu: 150m
          memory: 256Mi
        limits:
          cpu: 500m
          memory: 1Gi