Overview
Concepts: Core¶
This section explains the core ideas behind Redis SRE Agent and how pieces fit together.
- Query modes
- Without an instance_id: Get knowledge-based advice from your ingested docs, runbooks, and wikis. The agent searches the knowledge base and provides general guidance without touching live systems.
- With an instance_id: Get live triage of a specific Redis instance. The agent uses configured providers (Prometheus, Loki, Redis CLI, etc.) to fetch metrics, logs, and diagnostics, then analyzes the instance state.
Note: Internally, these use different agent implementations (Knowledge Agent vs. Triage Agent), but you'll sometimes see those names in output or logs.
- How the triage agent works
When you provide an instance_id, the agent uses a deep-research approach:
- Initial triage: Analyzes your query and the instance context to identify what needs investigation
- Research decomposition: Breaks the investigation into parallel research topics (e.g., "check memory pressure", "analyze slow operations", "review replication lag")
- Parallel research tracks: Each topic runs in its own tool-calling loop, using providers (Prometheus, Loki, Redis CLI, etc.) to gather data
- Synthesis: Combines findings from all research tracks into a coherent analysis with actionable recommendations
This parallel research design allows the agent to investigate multiple aspects of an issue simultaneously, making triage faster and more thorough.
-
Tasks vs. Threads
-
Task: How you interact with the agent. Create a task to run a query or triage. Each task has a
task_idand tracks execution status (queued, running, completed, failed). - Thread: What happened during execution. Contains the conversation history, messages, tool calls, and results. Each thread has a
thread_id.
When you create a task, the API creates or reuses a thread to store the execution history. You can:
- Poll the task for status: GET /api/v1/tasks/{task_id}
- Read the thread for results: GET /api/v1/threads/{thread_id}
- Stream updates via WebSocket: ws://localhost:8000/api/v1/ws/tasks/{thread_id}
- Jobs
- Ad-hoc jobs: On-demand via CLI or API. Each run creates a task and streams results to a thread.
-
Scheduled jobs: Recurring health checks defined by schedules. Each execution produces a task and posts into the same thread.
-
Instances and Context
- Create instance records with
instance create(CLI) orPOST /api/v1/instances(API) - Provide
instance_idin your query to trigger live triage with tools -
Instance metadata (environment, usage, description) helps the agent understand context
-
Providers (Integrations)
- Pluggable integrations for metrics (Prometheus), logs (Loki), tickets (GitHub/Jira), clouds, and more.
-
Configure via environment. See: how-to/tool-providers.md
-
Security and Secrets
- Use a 32-byte master key for envelope encryption of secrets at rest.
- See: how-to/configuration/encryption.md
Diagram: Agents & Providers (high-level)¶
flowchart LR
User[User/Caller]
KA[Knowledge Agent]
TA[Triage Agent]
KB[(Knowledge Base)]
Prov[Providers\n(Prometheus/Loki/etc.)]
Redis[(Target Redis)]
User -->|Ask| KA
KA --> KB
User -->|Triage| TA
TA --> Prov
TA --> Redis
Diagram: Threads & Tasks lifecycle (simplified)¶
sequenceDiagram
participant Client
participant API
participant Worker
Client->>API: POST /api/v1/tasks (message+context)
API-->>Client: task_id, thread_id
API->>Worker: enqueue task
Worker->>Providers: query metrics/logs
Worker->>Redis: check instance
Worker-->>API: stream updates to thread
Client->>API: GET /api/v1/tasks/{task_id}
API-->>Client: status/result