
Estimated read: 8–9 minutes
--optimized, and JVM tuning are positioned as deployment-time concerns. The starting baseline ships a general-purpose UBI base, root group membership, no JVM security tuning, and dozens of OS-level CVEs at any given moment.dependencyManagement override patches transitive JARs (Vert.x, Netty, Jackson, JDBC) at build time, and Copa retrofits OS packages post-build when a fresh CVE lands.To be clear up front: the upstream Keycloak container is a competent, well-maintained starting point, and Keycloak's own production deployment documentation treats it as one — --optimized, hostname, TLS, reverse proxy, and JVM tuning are walked through as deployment-time concerns rather than baked into the image. We are not arguing that the upstream image is broken. We are arguing that the gap between that baseline and a regulated production posture is real, specific, and worth closing in one place.
The same reasoning applies to the FIPS-validated variants. FIPS 140 certifies cryptographic modules; it does not certify container attack surface, package count, or runtime user model. A FIPS-compliant Keycloak image still rides the same general-purpose base layer with the same baseline package surface as the standard one. Hardening sits at a different layer of the stack, and that is the layer this post is about.
Keycloak handles authentication tokens, session data, and user credentials — the most sensitive assets in any system. IAM sits at the trust root of every service that delegates authentication to it, which means a compromised Keycloak is never a single-system incident — it hands an attacker the keys to every downstream system that relies on its verdicts.
The default container image was not designed for that threat model. It ships with:
The upstream Keycloak image is itself multi-stage built on UBI. None of these issues come from the Keycloak distribution. They come from the general-purpose base layer underneath it. Hardening means swapping that layer out and applying a systematic set of controls on top.
Every package, every permission, and every open port that exists without a clear operational purpose is an attack surface waiting to be exploited.
A hardened image looks like a security artifact, but its largest impact is operational. The CVE count on a scanner dashboard is a downstream signal; what platform and IAM teams actually feel is the cadence of unplanned work that surrounds it.
On the default image, a critical CVE in the base layer or a transitive Java dependency creates a familiar pattern: a Friday-evening scanner alert, an ad-hoc rebuild, a coordination thread with security, an unplanned change window. Multiply that by the number of times it happens in a year and the cost is on-call attrition, not just engineering hours. A pre-hardened image with a daily-rebuilt base, a versioned Maven override flow, and Copa-style post-build patching folds most of that work into the normal pipeline. The incident exposure window shrinks from weeks to hours, and patching becomes a scheduled event rather than a fire drill.
The compliance side benefits from the same property. SBOM, signed base, OpenVEX-documented exceptions, and a reproducible build digest mean auditor questions — which CVEs are open?, how did you assess this finding?, show me the supply chain — have machine-readable answers that already live alongside the image. The artifact arrives at the audit with its own evidence package.
Air-gapped operations are where these properties stop being convenient and start being mandatory. With no outbound path to the internet, every runtime apk add, every transitive Maven fetch, every fallback to an upstream package mirror is an outage waiting to happen. A self-contained, fully patched image with a complete SBOM and a deterministic update cadence is the only model that survives — and the one that lets a platform team plan around a published image rather than around the day a CVE happens to drop.
The market for secure, minimal container images did not appear by accident. It is the operational answer to a regulatory wave that has redrawn what "shipping software" means in the last five years.
The Cyber Resilience Act (Regulation (EU) 2024/2847) entered into force in December 2024, with full applicability in December 2027. Its scope is broad — any "product with digital elements" placed on the EU market — and its essential requirements (Annex I) map directly onto how a container image is built and shipped:
dependencyManagement overrides for Java CVEs), and Tier 2b (Copa post-build patching).NIS2 (Directive (EU) 2022/2555) reinforces the same posture for "essential" and "important" entities by making the operator accountable for supply-chain risk, not just the vendor.
The U.S. trajectory is policy-led rather than regulation-led, but the destination is similar:
For some operators the requirement is not "secure by default" but "demonstrably isolated." A few of the frameworks our customers run under spell this out explicitly:
In each of these environments an IAM image must be self-contained by construction — no runtime package fetches, no external CDN dependencies, no upstream-managed update channel. The hardening properties described later in this post are not aesthetic improvements there; they are entry requirements.
The shape of the requirements is consistent across both jurisdictions and these sector mandates: prove what is in the image, prove it is patched, prove it is configured securely, and prove it stays that way over time. That is the gap a hardened image is built to close, and it is why this category exists at all.
There is a healthy market for hardened Keycloak images today. Chainguard ships one, Bitnami has secure variants, and Red Hat's build of Keycloak (RHBK) is maintained by Red Hat, a major contributor to upstream Keycloak. These are real products built by competent teams, and we recommend evaluating them on their merits.
We chose to build our own anyway, for one reason: the layer of the stack we are unwilling to outsource is the one we know best. We operate Keycloak under regulated workloads every day. We patch transitive Java dependencies before upstream Keycloak releases catch up — the Tier 2 Maven dependencyManagement flow described later in this post is something an off-the-shelf image cannot offer, because it requires building Keycloak from source on our schedule, not the upstream's. For a system that holds session tokens and credentials, that level of application-layer ownership is the difference between a product and a platform.
This post is not an argument against the existing market. It is an explanation of why, for an identity team that lives inside Keycloak, building the runtime ourselves is the right call.
The base image sets the ceiling on your container's security posture. Most Keycloak deployments use Ubuntu, Debian, or Alpine. We chose Wolfi OS by Chainguard.
| Criteria | Ubuntu/Debian | Alpine | Wolfi OS |
|---|---|---|---|
| Package count | 400+ | 50–100 | ~65 |
| Known CVEs (typical) | 20–50+ | 5–15 | 0 |
| Supply chain signing | Partial | Partial | Full (Sigstore) |
| glibc compatibility | Yes | No (musl) | Yes |
| Image size | 200–400 MB | 50–100 MB | ~245 MB |
| Update cadence | Monthly | Weekly | Daily |
Alpine uses musl libc instead of glibc. The size win is real, but musl introduces subtle compatibility issues with Java around DNS resolution, thread handling, and locale support. Keycloak runs on Quarkus and Vert.x, both of which depend on glibc-specific behavior that musl does not fully replicate. We have hit this in practice; we do not chase it any longer.
Wolfi is purpose-built for containers by Chainguard, the company behind Sigstore. It gives us:
apk info | wc -l.Wolfi is not a general-purpose Linux distribution. It does not ship a package manager in the final runtime if you strip APK. That is a feature, not a limitation — it prevents unauthorized package installation at runtime.
# Runtime stage — Wolfi OS by Chainguard
FROM cgr.dev/chainguard/wolfi-base:latest
# `:latest` is intentional here, not an anti-pattern.
# Wolfi rebuilds packages from source daily and applies CVE patches within hours
# of disclosure. Reproducibility is preserved at the CI/CD layer via immutable
# calendar-versioned tags (YYYY.M.D). For compliance-mandated environments
# (SOC 2, ISO 27001, PCI-DSS), pin the base image by SHA-256 digest instead.
RUN apk update && apk upgrade && \
apk add --no-cache \
openjdk-21-jre \
bash \
ca-certificates-bundle && \
rm -rf /var/cache/apk/*
Three packages. That is the entire runtime dependency surface:
openjdk-21-jre — Java runtime. Not the full JDK. No compiler, no debugging tools.bash — required by Keycloak startup scripts.ca-certificates-bundle — TLS certificate verification.curl is used during the build stage only; it is not retained in the runtime image.
latest and ReproducibilityPinning to a snapshot like wolfi-base:20260115 would freeze your base at that date and silently accumulate unpatched vulnerabilities until you bump the tag. For a security-critical workload like Keycloak — where the entire purpose of the image is to minimize attack surface — the cost of a stale base far exceeds the cost of non-deterministic builds.
Reproducibility lives one layer up:
2026.4.7 that points to a specific image digest.docker inspect or crane manifest retrieves it.latest tag only affects new builds, not running containers.latest with a SHA-256 digest: cgr.dev/chainguard/wolfi-base@sha256:abc123... This buys byte-level reproducibility while still allowing controlled updates.The stronger long-term default is digest pinning combined with an automated update bot — Renovate, Dependabot, or an internal equivalent. The Dockerfile references a specific digest, so every build is byte-for-byte reproducible. The bot opens a pull request whenever Chainguard publishes a new digest, which triggers the normal CI pipeline (Trivy scan, tests, manual review) before merge.
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"extends": ["config:base"],
"pinDigests": true,
"docker": { "enabled": true },
"schedule": ["before 6am on monday"],
"packageRules": [
{
"matchPackageNames": ["cgr.dev/chainguard/wolfi-base"],
"groupName": "wolfi base image",
"automerge": false
}
]
}
With this pattern in place, :latest no longer appears anywhere in the image graph. Reproducibility comes from the digest, freshness comes from the bot, governance comes from the pull request review.
Running containers as root is one of the most common — and most dangerous — misconfigurations. If an attacker escapes the container process, root execution hands them host-level privileges.
The hardened image creates a dedicated keycloak user with explicit UID/GID and tightens file permissions:
ARG USER=keycloak
ARG UID=2000
ARG GID=2000
RUN addgroup -g ${GID} ${USER} && \
adduser -u ${UID} -G ${USER} -s /bin/bash -D ${USER}
COPY --from=builder --chown=${USER}:${USER} /tmp/keycloak-dist/ ${KEYCLOAK_HOME}/
# Principle of least privilege
RUN find ${KEYCLOAK_HOME} -type d -exec chmod 755 {} + && \
find ${KEYCLOAK_HOME} -type f -exec chmod 644 {} + && \
chmod 755 ${KEYCLOAK_HOME}/bin/*.sh
USER ${USER}
UID 2000 (rather than 1000) avoids collisions with default user accounts on host systems. That matters in Kubernetes environments where runAsNonRoot and runAsUser security contexts enforce non-root execution from the orchestrator side as well.
This image is compatible with:
runAsNonRoot: true, allowPrivilegeEscalation: false, and drop ALL capabilities.The entrypoint script configures the JVM with security-first defaults. Each parameter has a specific defensive or operational purpose.
-XX:+ExitOnOutOfMemoryError
-XX:MaxRAMPercentage=70
-XX:InitialRAMPercentage=50
-XX:MetaspaceSize=96M
-XX:MaxMetaspaceSize=256m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/heap.hprof
| Parameter | Purpose |
|---|---|
ExitOnOutOfMemoryError |
Terminates the JVM on OOM instead of running in a degraded state. Kubernetes restarts the pod cleanly. A zombie Keycloak that appears alive but cannot allocate memory is worse than a clean restart. |
MaxRAMPercentage=70 |
Allocates 70% of the container memory limit to the JVM heap. The remaining 30% absorbs native memory, thread stacks, and metaspace, keeping the OOM killer at bay. |
MetaspaceSize / MaxMetaspaceSize |
Bounds metaspace between 96 MB and 256 MB to prevent classloader leaks from consuming unbounded memory. |
HeapDumpOnOutOfMemoryError |
Combined with a persistent volume mounted at /tmp, gives you a post-mortem .hprof artifact before Kubernetes restarts the pod. |
A flag-semantics note: -XX:MinRAMPercentage is used by the JVM to compute max heap in small containers, not to enforce a minimum heap. If you want a fixed heap, set only MaxRAMPercentage and InitialRAMPercentage to the same value, or use -Xms/-Xmx explicitly.
-Djava.security.egd=file:/dev/urandom
The default /dev/random blocks when the entropy pool is exhausted. In containerized environments — especially freshly spawned pods — the pool is often insufficient at boot. Keycloak hangs during startup while generating cryptographic keys for sessions and tokens.
/dev/urandom provides cryptographically secure random numbers without blocking, as confirmed by Linux kernel maintainers and NIST SP 800-90A. Since kernel 5.6+, /dev/urandom is equivalent to /dev/random for all practical cryptographic purposes. The blocking behavior of /dev/random provides no additional security benefit on modern kernels.
-XX:+UseZGC
-XX:+ZGenerational
-XX:StartFlightRecording=filename=/var/log/jfr/recording.jfr,settings=profile,maxsize=512m,maxage=1h,dumponexit=true,disk=true
Generational ZGC is the recommended garbage collector for Java 21, especially for latency-sensitive workloads like an IAM system. G1GC targets 200 ms pauses by default; Generational ZGC delivers sub-millisecond pauses, which keeps GC out of the authentication latency budget.
Java Flight Recorder is configured for production:
maxsize=512m, maxage=1h, disk=true) prevents the JFR file from growing indefinitely.JFR is disabled by default and gated behind ENABLE_JFR=true at deployment time. Keep it off in steady-state production and turn it on for targeted diagnostic windows. The default JFR stack depth of 64 frames is sufficient for most scenarios; raising it adds linear CPU and memory overhead per event.
exec "${JAVA_BIN}" ${JAVA_OPTS} \
-cp "${QUARKUS_CP}" \
"${MAIN_CLASS}" \
${CONFIG_FILE_ARG} \
${KC_START_CMD}
The exec command replaces the shell process with the Java process, making it PID 1. Three things follow:
terminationGracePeriodSeconds works as expected.Without exec, the shell remains PID 1 and may not forward signals, leading to forceful pod termination and potential session loss.
--optimized FlagKeycloak runs on Quarkus, which separates configuration into build-time (immutable) and runtime (environment-specific) phases. Knowing which knob lives where is critical for both security and startup performance.
# Runs during image build, results are cached
RUN bin/kc.sh build
This pre-compiles the Quarkus application with the selected database driver (PostgreSQL), feature flags, and cache configuration. At startup, start --optimized skips compilation entirely.
KC_DB_URL=jdbc:postgresql://db.example.com:5432/keycloak
KC_DB_USERNAME=keycloak_user
KC_DB_PASSWORD=<from-secret-manager>
KC_HOSTNAME=auth.example.com
KC_BOOTSTRAP_ADMIN_USERNAME=admin
KC_BOOTSTRAP_ADMIN_PASSWORD=<from-secret-manager>
Never hardcode credentials in the Dockerfile or entrypoint. Use Kubernetes Secrets, HashiCorp Vault, or your cloud provider's secret manager to inject sensitive values at runtime.
A note on HTTP vs HTTPS: the hardened image does not ship KC_HTTP_ENABLED=true. Keycloak's upstream default is HTTPS-only and we keep that default to avoid exposing a plaintext listener by accident. Deployments that terminate TLS at a sidecar or ingress and need plain HTTP on the pod-local interface must opt in explicitly, ideally with mTLS between the ingress and the pod. For direct HTTPS, configure KC_HTTPS_CERTIFICATE_FILE and KC_HTTPS_CERTIFICATE_KEY_FILE and leave HTTP off.
Achieving zero CVEs is not a one-time event; it is an ongoing process. The hardened image implements a three-tier vulnerability management strategy.
By starting from Wolfi, we begin with zero OS-level vulnerabilities. Chainguard rebuilds packages from source daily, applying patches before CVEs are even published. Result: 0 OS-level CVEs across roughly 65 packages.
When a CVE is discovered in a Keycloak application dependency — a transitive Java library like Vert.x, Netty, Jackson, or a JDBC driver — we patch it before the vulnerable JAR ever reaches the image.
The mechanism is a Maven dependencyManagement override applied to Keycloak's source pom.xml at build time. This replaces an earlier post-build pattern that downloaded a patched JAR with curl, verified its SHA-256, and overwrote the old filename in place. That earlier pattern worked, but SBOM tooling reported the stale filename even though the content was patched. With the override, Maven resolves the patched version natively, so Quarkus packaging produces a correctly named JAR (for example io.vertx.vertx-core-4.5.24.jar), and every downstream tool — Trivy, Syft, Grype, auditor filesystem inspection — agrees on the version.
The override lives as a small, reviewable patch file in the build context (pom-vertx-override.patch):
--- a/pom.xml
+++ b/pom.xml
@@ -312,6 +312,13 @@
<dependencyManagement>
<dependencies>
+ <!-- Tier 2 CVE patches (override Quarkus BOM transitive versions) -->
+ <!-- CVE-2026-1002: vertx-core 4.5.23 -> 4.5.24 -->
+ <dependency>
+ <groupId>io.vertx</groupId>
+ <artifactId>vertx-core</artifactId>
+ <version>4.5.24</version>
+ </dependency>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-bom</artifactId>
The builder stage applies it before the Maven build runs:
RUN curl -L ... && sha256sum -c - && tar xz ...
# Tier 2 application-layer CVE patches via Maven dependencyManagement override.
# Applied BEFORE the Maven build so packaged JARs carry the correct filename.
COPY pom-vertx-override.patch /tmp/pom-vertx-override.patch
RUN cd /tmp/keycloak && patch -p1 < /tmp/pom-vertx-override.patch
# Maven build resolves io.vertx:vertx-core:4.5.24 natively
RUN mvn clean install -DskipTests -Pdistribution --strict-checksums
Three guarantees fall out of this approach:
io.vertx.vertx-core-4.5.24.jar. Tools that derive version from filename (Syft) and tools that read pom.properties inside the JAR (Trivy, Grype) all report 4.5.24. No divergence between what the image contains and what scanners report.RUN layers.--strict-checksums, and the resolved version is pinned via dependencyManagement so every build pulls the same JAR.Downloading artifacts from external repositories during a Docker build introduces a supply chain dependency. Without checksum verification, a compromised CDN or repository mirror could inject malicious code into your image and the build would succeed silently. Always verify SHA-256 checksums for any externally fetched binary.
The Maven override closes the window for application-layer CVEs at build time. Copa (project-copacetic) handles the other direction: OS-level CVEs disclosed after an image has already been built and deployed. Copa does not rebuild the image. It invokes the distribution's native package manager (apk on Wolfi or Alpine, apt on Debian, dnf on UBI) inside a buildkit session, upgrades only the vulnerable packages named in a Trivy report, and appends the resulting filesystem diff as a thin layer on top of the existing image. A patch run takes seconds rather than the minutes a full rebuild would take.
The two mechanisms are complementary, not alternatives:
dependencyManagement override (Tier 2, build time): Java libraries inside the Keycloak distribution. Copa cannot touch these because they are not apk packages — they live inside the distribution tarball.openjdk-21-jre, glibc, openssl, ca-certificates-bundle) and retrofitting already-deployed images when a fresh CVE is disclosed between release and the next scheduled rebuild.A minimal CI integration that runs Copa as a safety net after the main image build:
# Scan the freshly built image
trivy image --format json --output scan.json keycloak-hardened:${TAG}
# If HIGH/CRITICAL CVEs remain, patch them with Copa
copa patch \
-i keycloak-hardened:${TAG} \
-r scan.json \
-t ${TAG}-patched \
-a ${BUILDKIT_ADDR} \
--push
# Rescan and gate the pipeline on the patched image
trivy image --exit-code 1 --severity HIGH,CRITICAL \
--openvex openvex.json \
keycloak-hardened:${TAG}-patched
Because Wolfi already delivers a near-zero OS CVE baseline, Copa typically has nothing to do on a steady-state run. It becomes load-bearing only when a same-day disclosure lands between the base-image rebuild and the next scheduled pipeline run.
Not every CVE flag from a scanner represents a real vulnerability. The OpenVEX standard provides a machine-readable format to document these exceptions transparently:
{
"@context": "https://openvex.dev/ns/v0.2.0",
"author": "SecurityTeam",
"statements": [
{
"vulnerability": { "name": "CVE-2025-59250" },
"products": [{ "@id": "pkg:maven/com.microsoft.sqlserver/mssql-jdbc@13.2.1" }],
"status": "not_affected",
"justification": "vulnerable_code_not_present",
"impact_statement": "Version 13.2.1.jre11 is in use but Trivy incorrectly parses the version string. This version is patched and safe."
}
]
}
This approach buys three things at once:
trivy image --openvex openvex.json keycloak-hardened:latest
| Tier | Strategy | Example | Outcome |
|---|---|---|---|
| Source | Wolfi OS | 0 OS-level CVEs | Attack surface removed |
| Active patching | Maven override + Copa post-build | CVE-2026-1002 (Vert.x) | Vulnerability fixed, supply chain verified |
| Transparent docs | OpenVEX | CVE-2025-59250 (FP) | Audit trail preserved |
The hardened image is designed to operate as a first-class citizen in Kubernetes.
DJGROUPS_DNS_QUERY=keycloak-headless.keycloak-ns.svc.cluster.local
This lets Keycloak instances discover each other via Kubernetes headless services, forming a distributed cache cluster for session replication.
With --health-enabled=true at build time, Keycloak exposes SmallRye Health endpoints:
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
A note on the Dockerfile HEALTHCHECK directive: we intentionally omit it. Kubernetes does not consult HEALTHCHECK; the kubelet runs its own liveness and readiness probes. Adding one is dead configuration outside Kubernetes and only expands the runtime toolchain by forcing curl or wget into the runtime image. For container runtimes that do consult HEALTHCHECK (Docker, Podman standalone), prefer a tiny statically linked health binary over curl.
securityContext:
runAsNonRoot: true
runAsUser: 2000
runAsGroup: 2000
fsGroup: 2000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
This enforces the restricted Pod Security Standard, the strictest level available in Kubernetes.
readOnlyRootFilesystem: true requires writable volumes for /tmp (Quarkus class generation, JFR recordings) and /opt/keycloak/data (cluster state, tenant config). Without these, the pod fails with a read-only filesystem error. A minimal working pair is an emptyDir with medium: Memory for /tmp and an emptyDir for /opt/keycloak/data.
The build pipeline produces native images for both AMD64 and ARM64 architectures, with security scanning gating every promotion.
Key pipeline features:
YYYY.M.D (e.g., 2026.4.7).keycloak-base/hardened/ change.--platform linux/amd64 and linux/arm64) rather than against the manifest list alone, because architecture-dependent binaries and native libraries can carry CVEs that are invisible from a single-platform scan.| Metric | Default Keycloak Image | Hardened Image |
|---|---|---|
| OS-level CVEs | 15–30+ | 0 |
| Total packages | 200–400+ | 65 |
| Runs as root | Yes | No (UID 2000) |
| Build tools in runtime | Yes (JDK, Maven traces) | No |
| JVM security tuning | None | Full |
| OOM behavior | Hang / degrade | Clean exit + restart |
| Startup time (optimized) | 30–60s | < 10s |
| Multi-architecture | Single | AMD64 + ARM64 |
| CVE documentation | None | OpenVEX |
A reasonable starting point in Kubernetes is 1 CPU / 2 GB RAM for development and 2 CPU / 4 GB RAM for production workloads. MaxRAMPercentage=70 automatically allocates 70% of the container memory limit to the JVM heap. Tune from there based on JFR and Prometheus metrics under real traffic.
Wolfi OS is an open-source project initiated and maintained by Chainguard. The Wolfi build recipes are released under the Apache License 2.0. Individual packages bundled in the image carry their own upstream licenses — Apache-2.0, MIT, GPL-2.0, GPL-3.0, LGPL-2.1, and GCC runtime exceptions among them — and the complete per-package license inventory ships in the image SBOM.
"Wolfi", "Chainguard", and "Sigstore" are trademarks of their respective owners. References here are for attribution only and do not imply endorsement, sponsorship, or affiliation. If you build downstream images on Wolfi, track package-level license obligations through your SBOM tooling rather than relying on a base-image disclaimer to cover them.
This post opens a series on running IAM in regulated production. Future installments will cover IAM observability with OpenTelemetry, gateway-level authorization enforcement, multi-tenant operations, and air-gapped identity deployments. The immediate follow-up is narrower: we will grade this image against the Cyber Resilience Act's Annex I essential requirements line by line — what we already meet, what needs additional controls in the deployment layer, and where the open questions are. If you want to read it when it lands, subscribe to the Keymate newsletter or follow us on LinkedIn.
Keymate builds identity infrastructure that goes beyond authentication. If you are running Keycloak in regulated production and want a hardened image that comes with the security posture described above already wired in, we would love to talk.
Keymate ships a production-grade Keycloak image with zero OS CVEs, non-root execution, and JVM hardening built in. Talk to us about deploying it in your stack.
Stay updated with our latest insights and product updates