Architecture Overview
Klarsicht is an AI agent that sits on top of your existing monitoring stack. It doesn’t collect data, doesn’t store metrics, and doesn’t build dashboards. It adds intelligence — layers 4 and 5 (correlation and reasoning).
Components
┌─────────────────────────────────────────────────┐
│ Your Kubernetes Cluster │
│ │
│ Grafana ──webhook──▸ Klarsicht Agent │
│ │ ├─▸ K8s API (read-only)│
│ │ ├─▸ Prometheus (PromQL) │
│ │ └─▸ PostgreSQL (storage)│
│ │ │
│ ▼ │
│ LLM API │
│ (external or local) │
└─────────────────────────────────────────────────┘
Investigation flow
When a Grafana alert fires:
- Receive — Webhook payload arrives at
POST /alertwith alertname, namespace, pod, severity - Parse — Extract context: which pod, which namespace, when did it start
- Inspect — Agent reads pod status, container states, restart count, exit codes
- Logs — Pull last 100 lines from current and previous container
- Events — Kubernetes warning events (BackOff, FailedMount, Unhealthy)
- Metrics — PromQL queries for CPU, memory, error rate in ±30min window
- Correlate — Check recent deployments, upstream pods, node health
- Synthesize — LLM produces root cause, confidence score, fix steps, postmortem
The entire process takes 15-60 seconds.
RBAC
The agent uses a ClusterRole with read-only access:
| Resource | Verbs |
|---|---|
| pods | get, list |
| pods/log | get |
| events | list |
| deployments | get, list |
| replicasets | get, list |
| nodes | get |
No write permissions. No exec. No delete. If the agent crashes, nothing else is affected.
Deployment models
Agent Mode
The LLM runs externally (Anthropic, OpenAI, etc.). Only investigation context — pod names, log snippets, metric values — is sent. No raw data export.
On-Prem Mode
The LLM runs inside your cluster via Ollama, vLLM, or any OpenAI-compatible endpoint. Zero external calls. Suitable for air-gapped environments and regulated industries.