Production Hardening
Goal
Apply security hardening practices to your Keymate deployment to meet production security requirements. At the end of this guide, your deployment has hardened identity configuration, encrypted service communication, automated TLS, API gateway protection, network isolation, audit logging, and secure credential management.
Audience
Security engineers, platform engineers, and operators responsible for the security posture of production Keymate deployments.
Prerequisites
- A running Keymate platform (Helm-based or GitOps-based)
- Deployment Best Practices applied (HA, resource limits, monitoring)
- Administrative access to the Kubernetes cluster and platform components
Before You Start
Production hardening builds on top of a properly deployed and operationally sound platform. Complete the Deployment Best Practices guide before applying security hardening. Hardening a misconfigured deployment adds complexity without improving security.
Security hardening is not a one-time activity. Review these practices after every major upgrade and whenever compliance requirements change.
Steps
1. Harden the identity provider
The identity provider is the authentication entry point for all users. Hardening it reduces the attack surface for credential-based attacks.
Recommended actions:
| Area | Action |
|---|---|
| Admin console | Restrict admin console access to internal networks or VPN. Never expose the admin console publicly |
| Brute-force protection | Enable brute-force detection with account lockout after repeated failed attempts |
| Session policies | Set session timeouts: idle timeout (15-30 minutes), max session lifetime (8-12 hours) |
| Token lifetime | Set access token lifetime based on your security requirements (5-15 minutes for high-security environments) |
| Password policies | Enforce minimum length (12+ characters), complexity requirements, and password history |
| Unused flows | Disable authentication flows you do not use (e.g., direct grant if not needed) |
| Default accounts | Change or disable default administrative accounts after initial setup |
2. Enable service mesh encryption (mTLS)
The service mesh provides mutual TLS between all platform services, encrypting and authenticating all inter-service communication.
Recommended actions:
| Area | Action |
|---|---|
| mTLS mode | Set the service mesh to strict mTLS mode — reject any unencrypted inter-service traffic |
| Certificate rotation | Verify the service mesh automatically rotates mTLS certificates on a regular schedule |
| Peer authentication | Confirm that all platform namespaces enforce peer authentication policies |
Strict mTLS means that even if an attacker gains access to the cluster network, they cannot intercept or tamper with traffic between Keymate services without valid certificates.
3. Automate TLS for external endpoints
Secure all external-facing endpoints (login pages, API gateway, admin interfaces) with TLS and valid certificates.
Recommended actions:
| Area | Action |
|---|---|
| Automated provisioning | Use certificate automation to provision and renew certificates before they expire |
| Certificate monitoring | Set up alerts for certificates that expire within 14 days |
| Protocol version | Enforce TLS 1.2 as minimum; prefer TLS 1.3 where supported |
| Cipher suites | Disable weak cipher suites (RC4, DES, 3DES, MD5-based) |
| HSTS | Enable HTTP Strict Transport Security headers on all external endpoints |
4. Harden the API gateway
The API gateway sits at the network edge of the platform, handling all inbound traffic and enforcing access policies.
Recommended actions:
| Area | Action |
|---|---|
| Rate limiting | Configure rate limits per client, per Tenant, and per endpoint to prevent abuse |
| Request validation | Enable request size limits and header validation to block malformed requests |
| IP restrictions | Restrict access to known IP ranges where applicable (admin endpoints, internal APIs) |
| CORS policies | Configure strict CORS policies — allow only the specific origins that need access |
| Error handling | Verify error responses do not leak internal details (stack traces, internal hostnames, component versions) |
5. Enforce network isolation
Restrict network traffic to only the paths that platform components require.
Recommended actions:
| Area | Action |
|---|---|
| Network policies | Apply Kubernetes NetworkPolicies to restrict ingress and egress per namespace |
| Data layer isolation | Block direct external access to databases, caches, and message brokers — only application services should reach them |
| Namespace separation | Ensure each deployment layer runs in its own namespace with scoped access |
| Egress control | Restrict outbound cluster traffic to only required destinations (DNS, certificate authorities, external integrations) |
6. Configure audit logging
Capture security-relevant events for compliance, forensics, and anomaly detection.
What to audit:
| Event category | Examples |
|---|---|
| Authentication | Login attempts (success and failure), password changes, account lockouts |
| Authorization | Access decisions (granted and denied), policy changes, role assignments |
| Administration | Tenant creation, user provisioning, configuration changes |
| System | Component restarts, certificate rotations, health check failures |
Where audit data goes:
Audit events flow through the OpenTelemetry pipeline alongside other telemetry. They appear in the Observability dashboards, and you can export them to external tools for compliance archival.
Configure audit log retention based on your compliance requirements. Many regulations require 1-7 years of audit data retention. Export audit logs to long-term storage rather than relying on the in-cluster observability stack for retention.
7. Secure credential management
Protect all credentials used by the platform.
Recommended actions:
| Area | Action |
|---|---|
| Kubernetes Secrets | Store all credentials in Kubernetes Secrets with encryption at rest enabled |
| External secrets | Use an external secrets operator to sync credentials from a centralized vault |
| Rotation schedule | Establish a rotation schedule for database passwords, API keys, and service accounts |
| Least privilege | Grant each component only the permissions it requires — no shared admin credentials |
| Git exclusion | Never commit credentials to Git repositories, even in encrypted form unless using sealed secrets |
8. Review and validate
After applying all hardening steps, validate the security posture.
Validation checklist:
| Check | How to verify |
|---|---|
| Admin console not publicly accessible | Attempt to access admin URL from outside the allowed network |
| mTLS enforced | Deploy a test pod without a sidecar and verify it cannot reach platform services |
| TLS valid on all endpoints | Run a TLS scanner against all external endpoints |
| Rate limiting active | Send requests exceeding the rate limit and verify they are rejected |
| Network policies enforced | Attempt to connect directly to a database pod from an unauthorized namespace |
| Audit logging operational | Trigger a test event and verify it appears in the audit log |
Validation Scenario
Scenario
A security engineer reviews a Keymate deployment before go-live to ensure it meets the organization's production security requirements.
Expected Result
- No platform service allows unauthenticated access
- mTLS encrypts all inter-service traffic
- External endpoints use valid TLS certificates with strong cipher suites
- Rate limiting prevents API abuse
- Network policies block unauthorized traffic paths
- Audit logs capture authentication, authorization, and administration events
- All credentials reside in Kubernetes Secrets, not in configuration files
How to Verify
- Run a port scan against the cluster to identify exposed services
- Attempt unauthenticated access to all endpoints
- Verify mTLS by inspecting service mesh configuration
- Check certificate validity and cipher suites with an SSL testing tool
- Review audit log output for completeness
Troubleshooting
- Services fail after enabling strict mTLS. Some components may lack sidecar proxies. Verify all platform pods have the service mesh sidecar and that namespace labeling is correct.
- TLS certificate renewal fails. Check certificate automation logs. Common causes: DNS challenge failures, rate limiting from the certificate authority, and expired credentials for DNS providers.
- Network policies block legitimate traffic. Start with allow-list policies and add restrictions incrementally. Use network policy logging (if available) to identify blocked traffic before enforcement.
- Audit log volume is too high. Tune the audit level to capture security events without logging every routine operation. Focus on authentication, authorization decisions, and configuration changes.
Next Steps
- Observability Overview — Monitor security events through the telemetry pipeline
- Deployment Best Practices — Review operational best practices alongside security hardening
- Business Continuity & SLO Guardrails — Plan for failure scenarios and recovery