Skip to main content

Production Hardening

Goal

Apply security hardening practices to your Keymate deployment to meet production security requirements. At the end of this guide, your deployment has hardened identity configuration, encrypted service communication, automated TLS, API gateway protection, network isolation, audit logging, and secure credential management.

Audience

Security engineers, platform engineers, and operators responsible for the security posture of production Keymate deployments.

Prerequisites

Before You Start

Production hardening builds on top of a properly deployed and operationally sound platform. Complete the Deployment Best Practices guide before applying security hardening. Hardening a misconfigured deployment adds complexity without improving security.

warning

Security hardening is not a one-time activity. Review these practices after every major upgrade and whenever compliance requirements change.

Steps

1. Harden the identity provider

The identity provider is the authentication entry point for all users. Hardening it reduces the attack surface for credential-based attacks.

Recommended actions:

AreaAction
Admin consoleRestrict admin console access to internal networks or VPN. Never expose the admin console publicly
Brute-force protectionEnable brute-force detection with account lockout after repeated failed attempts
Session policiesSet session timeouts: idle timeout (15-30 minutes), max session lifetime (8-12 hours)
Token lifetimeSet access token lifetime based on your security requirements (5-15 minutes for high-security environments)
Password policiesEnforce minimum length (12+ characters), complexity requirements, and password history
Unused flowsDisable authentication flows you do not use (e.g., direct grant if not needed)
Default accountsChange or disable default administrative accounts after initial setup

2. Enable service mesh encryption (mTLS)

The service mesh provides mutual TLS between all platform services, encrypting and authenticating all inter-service communication.

Recommended actions:

AreaAction
mTLS modeSet the service mesh to strict mTLS mode — reject any unencrypted inter-service traffic
Certificate rotationVerify the service mesh automatically rotates mTLS certificates on a regular schedule
Peer authenticationConfirm that all platform namespaces enforce peer authentication policies

Strict mTLS means that even if an attacker gains access to the cluster network, they cannot intercept or tamper with traffic between Keymate services without valid certificates.

3. Automate TLS for external endpoints

Secure all external-facing endpoints (login pages, API gateway, admin interfaces) with TLS and valid certificates.

Recommended actions:

AreaAction
Automated provisioningUse certificate automation to provision and renew certificates before they expire
Certificate monitoringSet up alerts for certificates that expire within 14 days
Protocol versionEnforce TLS 1.2 as minimum; prefer TLS 1.3 where supported
Cipher suitesDisable weak cipher suites (RC4, DES, 3DES, MD5-based)
HSTSEnable HTTP Strict Transport Security headers on all external endpoints

4. Harden the API gateway

The API gateway sits at the network edge of the platform, handling all inbound traffic and enforcing access policies.

Recommended actions:

AreaAction
Rate limitingConfigure rate limits per client, per Tenant, and per endpoint to prevent abuse
Request validationEnable request size limits and header validation to block malformed requests
IP restrictionsRestrict access to known IP ranges where applicable (admin endpoints, internal APIs)
CORS policiesConfigure strict CORS policies — allow only the specific origins that need access
Error handlingVerify error responses do not leak internal details (stack traces, internal hostnames, component versions)

5. Enforce network isolation

Restrict network traffic to only the paths that platform components require.

Recommended actions:

AreaAction
Network policiesApply Kubernetes NetworkPolicies to restrict ingress and egress per namespace
Data layer isolationBlock direct external access to databases, caches, and message brokers — only application services should reach them
Namespace separationEnsure each deployment layer runs in its own namespace with scoped access
Egress controlRestrict outbound cluster traffic to only required destinations (DNS, certificate authorities, external integrations)

6. Configure audit logging

Capture security-relevant events for compliance, forensics, and anomaly detection.

What to audit:

Event categoryExamples
AuthenticationLogin attempts (success and failure), password changes, account lockouts
AuthorizationAccess decisions (granted and denied), policy changes, role assignments
AdministrationTenant creation, user provisioning, configuration changes
SystemComponent restarts, certificate rotations, health check failures

Where audit data goes:

Audit events flow through the OpenTelemetry pipeline alongside other telemetry. They appear in the Observability dashboards, and you can export them to external tools for compliance archival.

tip

Configure audit log retention based on your compliance requirements. Many regulations require 1-7 years of audit data retention. Export audit logs to long-term storage rather than relying on the in-cluster observability stack for retention.

7. Secure credential management

Protect all credentials used by the platform.

Recommended actions:

AreaAction
Kubernetes SecretsStore all credentials in Kubernetes Secrets with encryption at rest enabled
External secretsUse an external secrets operator to sync credentials from a centralized vault
Rotation scheduleEstablish a rotation schedule for database passwords, API keys, and service accounts
Least privilegeGrant each component only the permissions it requires — no shared admin credentials
Git exclusionNever commit credentials to Git repositories, even in encrypted form unless using sealed secrets

8. Review and validate

After applying all hardening steps, validate the security posture.

Validation checklist:

CheckHow to verify
Admin console not publicly accessibleAttempt to access admin URL from outside the allowed network
mTLS enforcedDeploy a test pod without a sidecar and verify it cannot reach platform services
TLS valid on all endpointsRun a TLS scanner against all external endpoints
Rate limiting activeSend requests exceeding the rate limit and verify they are rejected
Network policies enforcedAttempt to connect directly to a database pod from an unauthorized namespace
Audit logging operationalTrigger a test event and verify it appears in the audit log

Validation Scenario

Scenario

A security engineer reviews a Keymate deployment before go-live to ensure it meets the organization's production security requirements.

Expected Result

  • No platform service allows unauthenticated access
  • mTLS encrypts all inter-service traffic
  • External endpoints use valid TLS certificates with strong cipher suites
  • Rate limiting prevents API abuse
  • Network policies block unauthorized traffic paths
  • Audit logs capture authentication, authorization, and administration events
  • All credentials reside in Kubernetes Secrets, not in configuration files

How to Verify

  • Run a port scan against the cluster to identify exposed services
  • Attempt unauthenticated access to all endpoints
  • Verify mTLS by inspecting service mesh configuration
  • Check certificate validity and cipher suites with an SSL testing tool
  • Review audit log output for completeness

Troubleshooting

  • Services fail after enabling strict mTLS. Some components may lack sidecar proxies. Verify all platform pods have the service mesh sidecar and that namespace labeling is correct.
  • TLS certificate renewal fails. Check certificate automation logs. Common causes: DNS challenge failures, rate limiting from the certificate authority, and expired credentials for DNS providers.
  • Network policies block legitimate traffic. Start with allow-list policies and add restrictions incrementally. Use network policy logging (if available) to identify blocked traffic before enforcement.
  • Audit log volume is too high. Tune the audit level to capture security events without logging every routine operation. Focus on authentication, authorization decisions, and configuration changes.

Next Steps