TL;DR: Pin versions, set sane resources, respect system-node taints, make Gatekeeper happy, no-encoding secrets, and mirror images (Never pull from public registries and blindly trust them).
Works great on AKS, EKS, GKE — examples below use AKS.
The default dynakube template that Dynatrace provides you – will probably not work in the real world. You have zero trust, Calico firewalls, OPA Gatekeeper and perhaps some system pool taints?
Quick checks (healthy install):
dynatrace-operatorDeployment is Ready- 2x
dynatrace-webhookpods dynatrace-oneagent-csi-driverDaemonSet on every node (incl. system)- OneAgent pods per node (incl. system)
- 1x
ActiveGateStatefulSet ready - Optional OTEL collector running if you enabled it
k get dynakube
NAME APIURL STATUS AGE
xxx-prd-xxxxxxxx https://xxx.live.dynatrace.com/api Running 13d
kubectl -n dynatrace get deploy,sts
# CSI & OneAgent on all nodes
kubectl -n dynatrace get ds
# Dynakube CR status
kubectl -n dynatrace get dynakube -o wide
# RBAC sanity for k8s monitoring
kubectl auth can-i list dynakubes.dynatrace.com \
--as=system:serviceaccount:dynatrace:dynatrace-kubernetes-monitoring --all-namespaces
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dynatrace-operator 1/1 1 1 232d
deployment.apps/dynatrace-webhook 2/2 2 2 13d
NAME READY AGE
statefulset.apps/xxx-prd-xxxxxxxxxxx-activegate 1/1 13d
statefulset.apps/xxx-prd-xxxxxxxxxxx-otel-collector 1/1 13d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
xxx-prd-xxxxxxxxxxx-oneagent 9 9 9 9 9 <none> 13d
dynatrace-oneagent-csi-driver 9 9 9 9 9 <none> 13d
NAME APIURL STATUS AGE
xxx-prd-xxxxxxxxxxx https://xxx.live.dynatrace.com/api Running 13d
yes
Here are field-tested tips to keep Dynatrace humming on Kubernetes without fighting OPA Gatekeeper, seccomp, or AKS quirks.
1) Start with a clean Dynakube spec (and pin your versions)
Pin your operator chart/image and treat upgrades as real change (PRs, changelog, Argo sync-waves). A lean cloudNativeFullStack baseline that plays nicely with Gatekeeper:
apiVersion: dynatrace.com/v1beta5
kind: DynaKube
metadata:
name: dynakube-main
namespace: dynatrace
labels:
dynatrace.com/created-by: "dynatrace.kubernetes"
annotations:
# Helps Gatekeeper/PSA by ensuring init containers use a seccomp profile
feature.dynatrace.com/init-container-seccomp-profile: "true"
# GitOps safety
argocd.argoproj.io/sync-wave: "5"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
apiUrl: https://<your-environment>.live.dynatrace.com/api
metadataEnrichment:
enabled: true
oneAgent:
hostGroup: PaaS_Development # pick a sensible naming scheme: PaaS_<Env>
cloudNativeFullStack:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
operator: Exists
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
operator: Exists
- key: "CriticalAddonsOnly"
operator: "Equal"
value: "true"
effect: "NoSchedule"
oneAgentResources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 300m
memory: 1.5Gi
activeGate:
capabilities: [routing, kubernetes-monitoring, debugging]
resources:
requests:
cpu: 500m
memory: 1.5Gi
limits:
cpu: 1000m
memory: 1.5Gi
logMonitoring: {}
telemetryIngest:
protocols: [jaeger, otlp, statsd, zipkin]
serviceName: telemetry-ingest
templates:
otelCollector:
imageRef:
repository: <your-acr>.azurecr.io/dynatrace/dynatrace-otel-collector
tag: latest
Why this works: it respects control-plane taints, adds the CriticalAddonsOnly toleration for system pools, sets reasonable resource bounds, and preps you for GitOps.
2) System node pools are sacred — add the toleration
If your CSI Driver or OneAgent skips system nodes, your visibility and injection can be patchy. Make sure you’ve got:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Your taints might be different, so check what taints you have on your systempools. This is the difference between “almost there” and “golden”.
3) Resource requests that won’t sandbag the cluster
- OneAgent:
requests: cpu 100m / mem 512Miandlimits: cpu 300m / mem 1.5Giare a good starting point for mixed workloads. - ActiveGate:
requests: 500m / 1.5Gi,limits: 1000m / 1.5Gi.
Tune off SLOs and node shapes; don’t be shy to profile and trim.
4) Make Gatekeeper your mate (OPA policies that help, not hinder)
Enforce the seccomp hint on DynaKube CRs (so the operator sets profiles on init containers and your PSA/Gatekeeper policies stay green).
ConstraintTemplate (checks DynaKube annotations):
5) Secrets: avoid the dreaded encode (akv2k8s tip)
Kubernetes Secret.data is base64 on the wire, but tools like akv2k8s can also feed you values that are already base64. If using tools like akv2k8s, use this to transform the output.
apiVersion: spv.no/v1
kind: AzureKeyVaultSecret
metadata:
name: dynatrace-api-token-akvs
namespace: dynatrace
spec:
vault:
name: kv-xxx-001
object:
name: DynatraceApiToken
type: secret
output:
transform:
- base64decode
secret:
name: aks-xxx-001
type: Opaque
dataKey: apiToken
---
apiVersion: spv.no/v1
kind: AzureKeyVaultSecret
metadata:
name: dynatrace-dataingest-token-akvs
namespace: dynatrace
spec:
vault:
name: kv-xxx-001
object:
name: DynatraceDataIngestToken
type: secret
output:
transform:
- base64decode
secret:
name: aks-xxx-001
type: Opaque
dataKey: dataIngestToken
This will ensure Dynatrace can read the Kubernentes Opaque secret as it, no base64 encoding on the secret.
6) Mirror images to your registry (and pin)
Air-gapping or just speeding up pulls? Mirror dynatrace-operator, activegate, dynatrace-otel-collector into your ACR/ECR/GCR and reference them via the Dynakube templates.*.imageRef blocks or Helm values. GitOps + private registry = fewer surprises.
We use ACR Cache.

7) RBAC: fix the “list dynakubes permission is missing” warning
If you see that warning in the UI, verify the service account:
# https://docs.dynatrace.com/docs/ingest-from/setup-on-k8s/reference/security
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: dynatrace-k8smon-extra-perms
rules:
- apiGroups: ["dynatrace.com"]
resources: ["dynakubes"]
verbs: ["get","list","watch"]
- apiGroups: [""]
resources: ["configmaps","secrets"]
verbs: ["get","list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: dynatrace-k8smon-extra-perms
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: dynatrace-k8smon-extra-perms
subjects:
- kind: ServiceAccount
name: dynatrace-kubernetes-monitoring
namespace: dynatrace
kubectl auth can-i list dynakubes.dynatrace.com \
–as=system:serviceaccount:dynatrace:dynatrace-kubernetes-monitoring –all-namespaces
If “no”, ensure the chart installed/updated the ClusterRole and ClusterRoleBinding that grant list/watch/get on dynakubes.dynatrace.com. Sometimes upgrading the operator or re-syncing RBAC via Helm/Argo cleans it up.
8) HostGroup naming that scales
Keep it boring and predictable:
PaaS_Development
PaaS_NonProduction
PaaS_Production
9) GitOps tricks (ArgoCD/Flux)
- Use
argocd.argoproj.io/sync-waveto ensure CRDs & operator land before Dynakube. - For major upgrades or URL/token churn:
kubectl -n dynatrace delete dynakube <name>- wait for operator cleanup
- sync the new spec (Force + Prune if needed).
10) Networking & egress
If you restrict egress, either:
- Allow ActiveGate to route traffic out and keep workload egress tight; or
- Allowlist Dynatrace SaaS endpoints directly.
Don’t forget webhook call-backs and OTLP ports if you’re shipping traces/logs.
11) Troubleshooting you’ll actually use
- OneAgent not injecting? Check the CSI Driver DaemonSet and the node you’re scheduling on. Make sure tolerations cover system pools.
- Pods crash-loop with sidecar errors? Often token/secret issues — confirm you didn’t double-encode.
- UI shows “permission missing”? Re-check RBAC and chart version; reconcile with Helm/Argo.
- Gatekeeper blocking? Dry-run constraints first; add namespace/label-based exemptions for operator internals.
12) What “good” looks like
A healthy cluster shows:
dynatrace-operator1/1dynatrace-webhook2/2dynatrace-oneagent-csi-driverDESIRED == READY == node count- OneAgent pods present on all worker and system nodes
ActiveGate1/1- Optional OTEL collector 1/1
…and dashboards populating within minutes.
That’s it — keep it simple, pin your bits, let Gatekeeper help (not hurt), and your Dynatrace setup will surf smooth swells instead of close-outs.
Other useful commands – hardcore diagnosis
kubectl exec -n dynatrace deployment/dynatrace-operator -- dynatrace-operator support-archive --stdout > operator-support-archive.zip
What the Dynatrace webhooks do on Kubernetes
When you install the Dynatrace Operator, you’ll see pods named something like dynatrace-webhook-xxxxx. They back one or more admission webhook configurations. In practice they do three big jobs:
- Mutating Pods for OneAgent injection
- Adds init containers / volume mounts / env vars so your app Pods load the OneAgent bits that come from the CSI driver.
- Ensures the right binaries and libraries are available (e.g., via mounted volumes) and the process gets the proper preload/agent settings.
- Respects opt-in/opt-out annotations/labels on namespaces and Pods (e.g.
dynatrace.com/inject: "false"to skip a Pod). - Can also add Dynatrace metadata enrichment env/labels so the platform sees k8s context (workload, namespace, node, etc.).
- Validating Dynatrace CRs (like
DynaKube)- Schema and consistency checks: catches bad combinations (e.g., missing fields, wrong mode), so you don’t admit a broken config.
- Helps avoid partial/failed rollouts by rejecting misconfigured specs early.
- Hardening/compatibility tweaks
- With certain features enabled, the mutating webhook helps ensure injected init containers comply with cluster policies (e.g., seccomp, PSA/PSS).
- That’s why we recommend the annotation you’ve been using:
feature.dynatrace.com/init-container-seccomp-profile: "true"
It keeps Gatekeeper/PSA happy when it inspects the injected bits.
Why two dynatrace-webhook pods?
- High availability for admission traffic. If one goes down, the other still serves the API server’s webhook calls.
How this ties into Gatekeeper/PSA
- Gatekeeper (OPA) also uses validating admission.
- The Dynatrace mutating webhook will first shape the Pod (add mounts/env/init).
- Gatekeeper then validates the final Pod spec.
- If you’re enforcing “must have seccomp/resources,” ensure Dynatrace’s injected init/sidecar also satisfies those rules (hence that seccomp annotation and resource limits you’ve set).
Dynatrace Active Gate
A Dynatrace ActiveGate acts as a secure proxy between Dynatrace OneAgents and Dynatrace Clusters or between Dynatrace OneAgents and other ActiveGates—those closer to the Dynatrace Cluster.
It establishes Dynatrace presence—in your local network. In this way it allows you to reduce your interaction with Dynatrace to one single point—available locally. Besides convenience, this solution optimizes traffic volume, reduces the complexity of the network and cost. It also ensures the security of sealed networks.
The docs on Active Gate and version compatibility with Dynakube are not yet mature. Ensure the following:
With Dynatrace Operator 1.7 the v1beta1 and v1beta2 API versions for the DynaKube custom resource were removed.
ActiveGates up to and including version 1.323 used to call the v1beta1 endpoint. Starting from ActiveGate 1.325, the DynaKube endpoint was changed to v1beta3
Ensure your ActiveGate is up to date with the latest version.
Dynatrace CPU and Memory Requests and Limits
As part of our ongoing platform reliability work, we’ve introduced explicit CPU and memory requests/limits for all Dynatrace components running on AKS.
🧩 Why it matters
Previously, the OneAgent and ActiveGate pods relied on Kubernetes’ default scheduling behaviour. This meant:
- No guaranteed CPU/memory allocation → possible throttling or eviction during cluster load spikes.
- Risk of noisy-neighbour effects on shared nodes.
- Unpredictable autoscaling signals and Dynatrace performance fluctuations.
Setting requests and limits gives the scheduler clear boundaries:
- Requests = guaranteed resources for stable operation
- Limits = hard ceiling to prevent runaway usage
- Helps Dynatrace collect telemetry without starving app workloads
⚙️ Updated configuration
OneAgent
oneAgentResources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 300m
memory: 1.5Gi
ActiveGate
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
These values were tuned from observed averages across DEV, UAT and PROD clusters. They provide a safe baseline—enough headroom for spikes while keeping node utilisation predictable.
🧠 Key takeaway
Explicit resource boundaries = fewer throttled agents, steadier telemetry, and happier nodes.
Other resources:
installCRD: true
operator:
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "128Mi"
webhook:
resources:
requests:
cpu: "150m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "128Mi"
csidriver:
csiInit:
resources:
requests:
cpu: "50m"
memory: "100Mi"
limits:
cpu: "50m"
memory: "100Mi"
server:
resources:
requests:
cpu: "50m"
memory: "100Mi"
limits:
cpu: "100m"
memory: "100Mi"
provisioner:
resources:
requests:
cpu: "200m"
memory: "100Mi"
limits:
cpu: "300m"
memory: "100Mi"
registrar:
resources:
requests:
cpu: "20m"
memory: "30Mi"
limits:
cpu: "30m"
memory: "30Mi"
livenessprobe:
resources:
requests:
cpu: "20m"
memory: "30Mi"
limits:
cpu: "30m"
memory: "30Mi"
Dynakube
apiVersion: dynatrace.com/v1beta5
kind: DynaKube
metadata:
name: xxx
namespace: dynatrace
labels:
dynatrace.com/created-by: "dynatrace.kubernetes"
annotations:
feature.dynatrace.com/k8s-app-enabled: "true"
argocd.argoproj.io/sync-wave: "5"
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
feature.dynatrace.com/init-container-seccomp-profile: "true"
# Link to api reference for further information: https://docs.dynatrace.com/docs/ingest-from/setup-on-k8s/reference/dynakube-parameters
spec:
apiUrl: https://xxx.live.dynatrace.com/api
metadataEnrichment:
enabled: true
oneAgent:
hostGroup: xxx
cloudNativeFullStack:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: "NoSchedule"
key: "CriticalAddonsOnly"
operator: "Equal"
value: "true"
oneAgentResources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 300m
memory: 1.5Gi
activeGate:
capabilities:
- routing
- kubernetes-monitoring
#- debugging
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
logMonitoring: {}
telemetryIngest:
protocols:
- jaeger
- otlp
- statsd
- zipkin
serviceName: telemetry-ingest
templates:
otelCollector:
imageRef:
repository: xxx.azurecr.io/dynatrace/dynatrace-otel-collector
tag: latest
resources:
requests:
cpu: 150m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi




You must be logged in to post a comment.