Skip to main content

Logs

Goal

Use the Keymate observability platform to collect, search, filter, and correlate logs across all platform components. At the end of this guide, you can find relevant log entries quickly, filter by service, severity, or time range, and correlate logs with traces and metrics for efficient troubleshooting.

Audience

Operators responsible for monitoring and troubleshooting the Keymate platform in production.

Prerequisites

  • A running Keymate deployment with the observability layer deployed
  • Access to the observability dashboard

Before You Start

Keymate collects logs from all platform components automatically through the OpenTelemetry pipeline. You do not need additional instrumentation. The platform structures logs in JSON format, which enables precise filtering and correlation.

Log Sources

The platform collects logs from three categories:

SourceWhat it capturesExamples
Application logsBusiness logic events from Keymate servicesAuthentication attempts, authorization decisions, policy evaluations, API requests
Infrastructure logsKubernetes and platform infrastructure eventsPod lifecycle, node events, service mesh proxy logs, certificate operations
Audit logsSecurity-relevant operational eventsAdmin actions, configuration changes, Tenant operations, role assignments

Steps

1. Access the log explorer

Open the observability dashboard and navigate to the log explorer. This interface provides full-text search across all collected logs with filtering capabilities.

2. Search by time range

Start every investigation by narrowing the time range. Most operational issues are bounded in time — start with a narrow window and expand if needed.

3. Filter by service

Filter logs to a specific service when you know which component is involved. Common service filters:

ServiceWhen to filter
Identity providerAuthentication failures, session issues, federation errors
Authorization engineAccess denied errors, policy evaluation issues
API gatewayRequest routing errors, rate limiting events, TLS issues
Platform servicesBusiness logic errors, integration failures

4. Filter by severity

Use severity levels to focus on the most relevant entries:

LevelWhen to use
ERRORActive issues requiring investigation
WARNPotential issues that may escalate
INFONormal operational events, useful for understanding flow
DEBUGDetailed diagnostic information (typically high volume, use sparingly)

5. Correlate with traces

When a log entry relates to a specific request, use the trace ID to find the full distributed trace. This shows the entire request flow across services, helping you pinpoint where an issue originated.

The observability platform links log entries to their corresponding traces when a trace ID is present.

6. Set up log-based alerts

Create alerts that trigger when specific log patterns appear:

  • Error rate exceeds a threshold within a time window
  • Specific error messages appear (e.g., database connection failures)
  • Authentication failure rate spikes (potential brute-force attack)
  • Certificate expiration warnings

Validation Scenario

Scenario

An operator receives a report that some API requests are returning 500 errors. They use logs to identify the root cause.

Expected Result

  • The operator filters logs by the API gateway service and ERROR severity
  • They find log entries showing upstream connection failures to the authorization engine
  • They correlate with the trace ID to see the full request path
  • They identify that the authorization engine is restarting due to an out-of-memory condition

How to Verify

  • Search logs for level:ERROR within the reported time range
  • Verify the log entries include structured fields (service name, trace ID, timestamp)
  • Confirm the trace link navigates to the corresponding distributed trace

Troubleshooting

  • No logs appearing. Verify the telemetry collector is running and can reach the log storage. Check collector logs for pipeline errors.
  • Logs missing from a specific service. Verify the service pod has the telemetry sidecar or SDK. Check namespace labels for telemetry collection.
  • Search is slow. Narrow the time range and add specific filters before running broad searches. Structured field queries perform better than full-text search.

Next Steps