Deployment Best Practices

Goal

Apply production-grade best practices to your Keymate deployment. This guide covers high availability, resource management, monitoring activation, backup planning, upgrade strategies, and namespace organization — turning an initial installation into a production-ready platform.

Audience

Platform engineers and operators responsible for running Keymate in production environments.

Prerequisites

A running Keymate deployment (Helm-based or GitOps-based)
Administrative access to the Kubernetes cluster
Familiarity with Kubernetes resource management (requests, limits, PodDisruptionBudgets)

Before You Start

These best practices apply after the initial installation is complete and verified. If you have not installed the platform yet, start with the Pre-Deployment Checklist and the appropriate installation guide.

Steps

1. Configure high availability

Run critical components with multiple replicas to eliminate single points of failure.

Recommended replica counts for production:

Component	Minimum replicas	Notes
Identity Provider	2	Session affinity recommended
Authorization Engine	2	Stateless — scales horizontally
API Gateway	2	Handles all inbound traffic
Platform Services	2	Per service instance
Relational Database	2 (primary + replica)	Use operator-managed failover

Configure PodDisruptionBudgets to prevent voluntary disruptions from taking all replicas offline at once:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: identity-provider-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: identity-provider

2. Set resource requests and limits

Set explicit CPU and memory requests and limits for every Keymate component. This prevents resource contention and ensures the Kubernetes scheduler places pods on nodes with sufficient capacity.

Principles:

Set requests to the expected steady-state usage — this reserves the resources
Set limits to handle peak workloads — this prevents a single component from consuming all node resources
Never deploy production workloads without resource requests

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "2000m"
    memory: "2Gi"

tip

Start with conservative limits and adjust based on observed usage through the Metrics dashboard. Over-provisioning wastes resources; under-provisioning causes OOM kills and CPU throttling.

3. Enable monitoring from day one

Do not wait for an incident to set up monitoring. Enable observability immediately after installation.

Minimum monitoring setup:

Activate the Observability layer during installation
Configure dashboards for key indicators: request latency, error rate, CPU/memory usage, database connection pool utilization
Set up alerts for critical conditions: pod restarts, high error rates, certificate expiration, database replication lag
Verify that logs, metrics, and traces are flowing into the observability platform

If you use external monitoring tools, configure telemetry export through the OpenTelemetry-first model. You can run both the built-in observability stack and your own tools simultaneously.

4. Implement a backup strategy

Protect against data loss by establishing regular backups for all persistent data.

What to back up:

Data	Frequency	Retention
Identity provider database	Daily + before upgrades	30 days minimum
Authorization engine data	Daily + before upgrades	30 days minimum
Platform configuration	On every change (GitOps handles this automatically)	Full Git history
TLS certificates and secrets	On change	Align with certificate lifecycle

Backup principles:

Automate backups — teams forget manual backups under pressure
Store backups in a separate location from the cluster (object storage, off-site storage)
Test restores regularly — a backup you cannot restore is not a backup
Document the restore procedure and ensure at least two team members can execute it

5. Plan your upgrade strategy

Keymate is a multi-component platform. Upgrading requires coordination across layers.

Upgrade principles:

Upgrade one layer at a time following the dependency order: infrastructure → data → application → observability
Read release notes before every upgrade — pay attention to breaking changes and migration requirements
Test upgrades in a non-production environment before applying to production
Take database backups before every upgrade
Monitor the platform closely after each upgrade for unexpected behavior

For GitOps deployments:

Upgrades are Git commits. Promote version changes through your environment pipeline (dev → staging → production) and let ArgoCD apply them.

For Helm deployments:

Use helm upgrade per component in the correct dependency order. Validate each layer before proceeding to the next.

6. Organize namespaces

Use dedicated namespaces to isolate components by function. This improves access control, resource quotas, and operational visibility.

Recommended namespace layout:

Namespace	Purpose
Platform infrastructure	Service mesh, certificate management
Platform data	Databases, caches, message brokers
Platform application	Identity, authorization, gateway, services
Platform observability	Telemetry, dashboards, alerting

warning

Avoid deploying all Keymate components into a single namespace. Namespace separation enables fine-grained RBAC, independent resource quotas, and cleaner operational visibility.

7. Secure secrets management

Production deployments require disciplined secrets handling.

Store all credentials in Kubernetes Secrets, not in Helm values files or Git repositories
Use an external secrets operator (e.g., External Secrets Operator, Vault) if your organization requires centralized secrets management
Rotate database passwords and API keys on a regular schedule
Audit secret access through Kubernetes audit logs

8. Configure network policies

Restrict network traffic to only the communication paths that platform components require.

Allow inter-namespace traffic only between components that need to communicate
Block direct external access to data layer services (databases, caches)
Use the service mesh for mTLS enforcement between all platform services
Review network policies after adding new components or tenants

Validation Scenario

Scenario

After applying all best practices, verify that the production deployment meets operational readiness criteria.

Expected Result

All critical components run with 2+ replicas
PodDisruptionBudgets protect all stateful services
Every pod has resource requests and limits
Monitoring dashboards show metrics, logs, and traces flowing
Alerts fire correctly for test conditions (e.g., kill a pod, verify alert)
Database backup runs successfully and a test restore completes
Network policies block unauthorized traffic paths

How to Verify

kubectl get pdb -A — verify PodDisruptionBudgets exist
kubectl top pods -A — verify resource usage is within limits
Trigger a test alert and confirm it reaches the notification channel
Restore a database backup to a test instance and verify data integrity

Troubleshooting

Pods evicted or OOMKilled. Resource limits are too low. Increase memory limits and review actual usage in the metrics dashboard.
Rolling update takes too long. PodDisruptionBudget minAvailable may be too high relative to replica count. Ensure at least one pod can be disrupted at a time.
Backup job fails. Check storage permissions and available disk space in the backup destination. Verify network access from the cluster to the backup storage.
Upgrade breaks a component. Roll back to the previous version using helm rollback or a Git revert. Check release notes for missed migration steps.

Next Steps

Production Hardening — Apply security-specific hardening on top of these operational best practices
Scaling and Performance — Scale beyond the initial high availability setup
Observability Overview — Deep dive into monitoring, alerting, and telemetry export

Pre-Deployment Checklist

Verify readiness before installation

Production Hardening

Security hardening for production

Scaling & Performance

Scale components for higher workloads

Observability Overview

Monitoring, alerting, and telemetry

Goal​

Audience​

Prerequisites​

Before You Start​

Steps​

1. Configure high availability​

2. Set resource requests and limits​

3. Enable monitoring from day one​

4. Implement a backup strategy​

5. Plan your upgrade strategy​

6. Organize namespaces​

7. Secure secrets management​

8. Configure network policies​

Validation Scenario​

Scenario​

Expected Result​

How to Verify​

Troubleshooting​

Next Steps​

Related Docs​