Logs
Goal
Use the Keymate observability platform to collect, search, filter, and correlate logs across all platform components. At the end of this guide, you can find relevant log entries quickly, filter by service, severity, or time range, and correlate logs with traces and metrics for efficient troubleshooting.
Audience
Operators responsible for monitoring and troubleshooting the Keymate platform in production.
Prerequisites
- A running Keymate deployment with the observability layer deployed
- Access to the observability dashboard
Before You Start
Keymate collects logs from all platform components automatically through the OpenTelemetry pipeline. You do not need additional instrumentation. The platform structures logs in JSON format, which enables precise filtering and correlation.
Log Sources
The platform collects logs from three categories:
| Source | What it captures | Examples |
|---|---|---|
| Application logs | Business logic events from Keymate services | Authentication attempts, authorization decisions, policy evaluations, API requests |
| Infrastructure logs | Kubernetes and platform infrastructure events | Pod lifecycle, node events, service mesh proxy logs, certificate operations |
| Audit logs | Security-relevant operational events | Admin actions, configuration changes, Tenant operations, role assignments |
Steps
1. Access the log explorer
Open the observability dashboard and navigate to the log explorer. This interface provides full-text search across all collected logs with filtering capabilities.
2. Search by time range
Start every investigation by narrowing the time range. Most operational issues are bounded in time — start with a narrow window and expand if needed.
3. Filter by service
Filter logs to a specific service when you know which component is involved. Common service filters:
| Service | When to filter |
|---|---|
| Identity provider | Authentication failures, session issues, federation errors |
| Authorization engine | Access denied errors, policy evaluation issues |
| API gateway | Request routing errors, rate limiting events, TLS issues |
| Platform services | Business logic errors, integration failures |
4. Filter by severity
Use severity levels to focus on the most relevant entries:
| Level | When to use |
|---|---|
| ERROR | Active issues requiring investigation |
| WARN | Potential issues that may escalate |
| INFO | Normal operational events, useful for understanding flow |
| DEBUG | Detailed diagnostic information (typically high volume, use sparingly) |
5. Correlate with traces
When a log entry relates to a specific request, use the trace ID to find the full distributed trace. This shows the entire request flow across services, helping you pinpoint where an issue originated.
The observability platform links log entries to their corresponding traces when a trace ID is present.
6. Set up log-based alerts
Create alerts that trigger when specific log patterns appear:
- Error rate exceeds a threshold within a time window
- Specific error messages appear (e.g., database connection failures)
- Authentication failure rate spikes (potential brute-force attack)
- Certificate expiration warnings
Validation Scenario
Scenario
An operator receives a report that some API requests are returning 500 errors. They use logs to identify the root cause.
Expected Result
- The operator filters logs by the API gateway service and ERROR severity
- They find log entries showing upstream connection failures to the authorization engine
- They correlate with the trace ID to see the full request path
- They identify that the authorization engine is restarting due to an out-of-memory condition
How to Verify
- Search logs for
level:ERRORwithin the reported time range - Verify the log entries include structured fields (service name, trace ID, timestamp)
- Confirm the trace link navigates to the corresponding distributed trace
Troubleshooting
- No logs appearing. Verify the telemetry collector is running and can reach the log storage. Check collector logs for pipeline errors.
- Logs missing from a specific service. Verify the service pod has the telemetry sidecar or SDK. Check namespace labels for telemetry collection.
- Search is slow. Narrow the time range and add specific filters before running broad searches. Structured field queries perform better than full-text search.
Next Steps
- Metrics — Complement log investigation with metric dashboards
- Traces & Root Cause Analysis — Follow trace IDs from logs to full request flows
- Export & Tooling Portability — Export logs to external tools