FGA Backend Sync Issues
This guide provides diagnostic steps and resolution paths for synchronization failures between the Keymate platform and FGA Engine backends. Use this guide when authorization decisions return unexpected results or when policy changes do not take effect.
Symptom
Authorization decisions return unexpected results, or changes to policies, permissions, or relationships do not take effect. You may observe:
- Permission checks return stale results after policy updates
- Policy expressions do not reflect webhook events from Integration Hub
- Audit logs are missing or delayed
- Cache hit/miss patterns indicate sync failures
- Response headers show increased latency or error status
Likely Causes
- Webhook processing failures — Event validation errors or malformed payloads
- Cache unavailability — Distributed cache connection timeouts or network issues
- Audit collection failures — Audit service unavailable or gRPC connection issues
- Token exchange failures — Keycloak token endpoint unreachable or misconfigured
- Network partitions — Connectivity issues between FGA services
How to Diagnose
Check 1: Inspect Response Headers
Examine authorization response headers for sync status:
| Header | Value | Meaning |
|---|---|---|
Keymate-Decision | result="allow|deny|error" | Authorization decision |
Keymate-Decision-Cache | hit or miss | Cache status |
Keymate-Decision-Latency | milliseconds | Total processing time |
Keymate-Decision-Authority-Latency | milliseconds | Downstream call time (absent on cache hit) |
Diagnosis:
- Consistent
misswith high latency → Cache sync issue errorresult → Check error code in response body- Missing
Authority-Latencyonmiss→ Early pipeline failure
Check 2: Review Error Codes
Common error codes indicating sync issues:
| Code | Status | Description |
|---|---|---|
TOKEN_INACTIVE | 401 | Token expired or revoked |
TOKEN_EXCHANGE_FAILED | 403 | Keycloak token exchange failed |
PERMISSION_CHECK_FAILED | 500 | Permission service unavailable |
RESOURCE_RESOLUTION_FAILED | 400 | Access rules not matching |
Check 3: Verify Service Health
Check health endpoints for FGA services:
# TODO: replace with actual diagnostic command
curl -f http://<fga-service-host>:<port>/health
Health endpoints return service status:
{
"status": "UP",
"checks": [...]
}
If status is not UP, the service may be experiencing issues.
Check 4: Review Audit Logs
Look for these patterns in audit logs:
| Log Pattern | Meaning |
|---|---|
| Webhook success | Event processed successfully |
| Webhook failed | Event validation or processing error |
| Audit request failed | Audit collection temporarily unavailable |
| Cache operation timeout | Distributed cache connectivity issue |
Check 5: Check Cache Connectivity
If cache operations are failing:
- Verify distributed cache service is running
- Check network connectivity between services
- Review cache operation retry logs
How to Resolve
Resolution 1: Webhook Event Failures
If policy expression changes are not syncing:
-
Validate event payload — Ensure all required fields are present:
- Event source: id, service, type, resource type, operation type, timestamp
- Payload: id, time, realm, resource, action, auth context
- Context map: all fields required for the event type
-
Check supported event types — Only these events are processed:
attribute-definition:createattribute-definition:updateattribute-definition:delete
-
Retry the event — If validation passed, retry from Integration Hub
Resolution 2: Cache Sync Issues
If permission decisions are stale:
- Wait for cache TTL — Permission cache has a short TTL (typically seconds)
- Verify cache service — Check distributed cache health
- Review retry configuration — Cache operations retry automatically with backoff
- Monitor cache headers —
Keymate-Decision-Cacheshould showhitafter initialmiss
Resolution 3: Audit Collection Failures
If audit logs are missing:
- Check audit service — Verify gRPC port is accessible
- Review network connectivity — Ensure services can reach audit collector
- Wait for retry — Audit failures retry with exponential backoff
- Note: Audit failures are non-blocking — authorization continues
Resolution 4: Token Exchange Failures
If token exchange consistently fails:
- Verify Keycloak availability — Check token endpoint is reachable
- Check client configuration — Ensure target client exists and accepts exchange
- Review access rules — Verify rule patterns match the request
- Check token validity — Source token must not be expired
Signals to Inspect
- Logs: Look for webhook success/failure, audit request status, cache timeout patterns
- Metrics: Monitor cache hit rate, authorization latency, error rate by code
- Traces: Check distributed traces for latency breakdown across services
- Audit events: Review authorization decisions and policy change events
Key Metrics to Monitor
| Metric | Description | Alert Threshold |
|---|---|---|
| Cache hit rate | Percentage of cache hits | < 80% may indicate issues |
| Authorization latency | P99 response time | Sudden increase indicates sync delay |
| Error rate by code | Errors grouped by code | Any sustained increase |
| Webhook processing time | Event processing duration | Timeout threshold |
Escalation Notes
Escalate to the platform team when:
- Cache service is completely unavailable for more than 5 minutes
- Webhook events are consistently rejected after payload validation
- Token exchange failures persist after Keycloak verification
- Authorization latency exceeds SLA thresholds
Data to attach:
- Response headers from failed requests
- Error codes and messages
- Service health check results
- Relevant time window for log analysis
- Cache hit/miss statistics
If authorization is consistently failing for all users, this may indicate a critical sync failure. Check all FGA service health endpoints and distributed cache connectivity immediately.
Next Step
After resolving sync issues, verify authorization behavior by testing permission checks with the Decision Simulation tool.
Related Docs
- FGA Engine — FGA architecture overview
- Policy Engine — Policy evaluation
- Access Gateway — Request authorization
- Audit & Observability — Monitoring and logs