Logs

Goal

Use the Keymate observability platform to collect, search, filter, and correlate logs across all platform components. At the end of this guide, you can find relevant log entries quickly, filter by service, severity, or time range, and correlate logs with traces and metrics for efficient troubleshooting.

Audience

Operators responsible for monitoring and troubleshooting the Keymate platform in production.

Prerequisites

A running Keymate deployment with the observability layer deployed
Access to the observability dashboard

Before You Start

Keymate collects logs from all platform components automatically through the OpenTelemetry pipeline. You do not need additional instrumentation. The platform structures logs in JSON format, which enables precise filtering and correlation.

Log Sources

The platform collects logs from three categories:

Source	What it captures	Examples
Application logs	Business logic events from Keymate services	Authentication attempts, authorization decisions, policy evaluations, API requests
Infrastructure logs	Kubernetes and platform infrastructure events	Pod lifecycle, node events, service mesh proxy logs, certificate operations
Audit logs	Security-relevant operational events	Admin actions, configuration changes, Tenant operations, role assignments

Steps

1. Access the log explorer

Open the observability dashboard and navigate to the log explorer. This interface provides full-text search across all collected logs with filtering capabilities.

2. Search by time range

Start every investigation by narrowing the time range. Most operational issues are bounded in time — start with a narrow window and expand if needed.

3. Filter by service

Filter logs to a specific service when you know which component is involved. Common service filters:

Service	When to filter
Identity provider	Authentication failures, session issues, federation errors
Authorization engine	Access denied errors, policy evaluation issues
API gateway	Request routing errors, rate limiting events, TLS issues
Platform services	Business logic errors, integration failures

4. Filter by severity

Use severity levels to focus on the most relevant entries:

Level	When to use
ERROR	Active issues requiring investigation
WARN	Potential issues that may escalate
INFO	Normal operational events, useful for understanding flow
DEBUG	Detailed diagnostic information (typically high volume, use sparingly)

5. Correlate with traces

When a log entry relates to a specific request, use the trace ID to find the full distributed trace. This shows the entire request flow across services, helping you pinpoint where an issue originated.

The observability platform links log entries to their corresponding traces when a trace ID is present.

6. Set up log-based alerts

Create alerts that trigger when specific log patterns appear:

Error rate exceeds a threshold within a time window
Specific error messages appear (e.g., database connection failures)
Authentication failure rate spikes (potential brute-force attack)
Certificate expiration warnings

Validation Scenario

Scenario

An operator receives a report that some API requests are returning 500 errors. They use logs to identify the root cause.

Expected Result

The operator filters logs by the API gateway service and ERROR severity
They find log entries showing upstream connection failures to the authorization engine
They correlate with the trace ID to see the full request path
They identify that the authorization engine is restarting due to an out-of-memory condition

How to Verify

Search logs for level:ERROR within the reported time range
Verify the log entries include structured fields (service name, trace ID, timestamp)
Confirm the trace link navigates to the corresponding distributed trace

Troubleshooting

No logs appearing. Verify the telemetry collector is running and can reach the log storage. Check collector logs for pipeline errors.
Logs missing from a specific service. Verify the service pod has the telemetry sidecar or SDK. Check namespace labels for telemetry collection.
Search is slow. Narrow the time range and add specific filters before running broad searches. Structured field queries perform better than full-text search.

Next Steps

Metrics — Complement log investigation with metric dashboards
Traces & Root Cause Analysis — Follow trace IDs from logs to full request flows
Export & Tooling Portability — Export logs to external tools

Observability Overview

How logs fit into the telemetry model

Metrics

Dashboards and alerting

Traces & Root Cause Analysis

Distributed tracing and span analysis

Production Hardening

Audit logging for security

Goal​

Audience​

Prerequisites​

Before You Start​

Log Sources​

Steps​

1. Access the log explorer​

2. Search by time range​

3. Filter by service​

4. Filter by severity​

5. Correlate with traces​

6. Set up log-based alerts​

Validation Scenario​

Scenario​

Expected Result​

How to Verify​

Troubleshooting​

Next Steps​

Related Docs​