Docker Deployment Guide¶
This guide covers deploying Redis SRE Agent using Docker in environments with internet access. For air-gapped environments, see Air-Gapped Deployment.
Overview¶
The standard Docker deployment includes:
- Redis - Agent state, task queue, and vector storage
- LiteLLM Proxy - LLM gateway supporting OpenAI, Anthropic, Azure, etc.
- Prometheus/Grafana - Metrics and dashboards
- Loki - Log aggregation
- Tempo - Distributed tracing
Quick Start¶
1. Clone and Configure¶
git clone https://github.com/redis-applied-ai/redis-sre-agent.git
cd redis-sre-agent
# Copy example environment
cp .env.example .env
2. Set Required Environment Variables¶
Edit .env:
# Required: OpenAI API key (used by LiteLLM proxy)
OPENAI_API_KEY=sk-your-openai-key
# Optional: Anthropic for Claude models
ANTHROPIC_API_KEY=sk-ant-your-key
# LiteLLM proxy authentication (clients use this to auth to proxy)
LITELLM_MASTER_KEY=sk-1234
# API authentication
REDIS_SRE_MASTER_KEY=your-secret-key
3. Start Services¶
# Start core services
docker-compose up -d
# View logs
docker-compose logs -f sre-agent sre-worker
4. Access Services¶
| Service | URL | Credentials |
|---|---|---|
| SRE Agent API | http://localhost:8080 | REDIS_SRE_MASTER_KEY header |
| SRE Agent UI | http://localhost:3002 | - |
| Grafana | http://localhost:3001 | admin / admin |
| Prometheus | http://localhost:9090 | - |
| LiteLLM UI | http://localhost:4000/ui | admin / admin |
Architecture¶
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ SRE Agent │────▶│ LiteLLM │────▶│ OpenAI │
│ API │ │ Proxy │ │ Anthropic │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌─────────────┐
│ Redis │◀────│ SRE Worker │
│ (state + │ │ (background │
│ vectors) │ │ tasks) │
└─────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Prometheus │────▶│ Grafana │◀────│ Loki │
└─────────────┘ └─────────────┘ └─────────────┘
Services¶
Core Services¶
| Service | Purpose | Port |
|---|---|---|
sre-agent |
API server | 8080 |
sre-worker |
Background task processor | - |
redis |
Agent state and vectors | 7843 |
litellm |
LLM proxy | 4000 |
Monitoring Stack¶
| Service | Purpose | Port |
|---|---|---|
prometheus |
Metrics collection | 9090 |
grafana |
Dashboards | 3001 |
loki |
Log aggregation | 3100 |
tempo |
Distributed tracing | 3200 |
Demo/Testing¶
| Service | Purpose | Port |
|---|---|---|
redis-demo |
Target Redis for testing | 7844 |
redis-demo-replica |
Replica for replication tests | 7845 |
redis-exporter |
Redis metrics exporter | 9121 |
Configuration¶
Embedding Models¶
By default, the standard deployment uses OpenAI embeddings:
# Default (OpenAI API)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
VECTOR_DIM=1536
For local embeddings (no OpenAI API needed for embeddings):
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
VECTOR_DIM=384
LLM Configuration¶
LiteLLM supports multiple providers. Edit monitoring/litellm/config.yaml:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-3
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
MCP Servers (Optional)¶
Enable MCP integrations with the mcp profile:
# Start with GitHub MCP server
docker-compose --profile mcp up -d
For HTTPS MCP (required for Claude Desktop):
# Generate self-signed certs
./scripts/generate-mcp-certs.sh
# Start with SSL
docker-compose --profile ssl up -d
Development Mode¶
The default compose mounts source code for hot-reload:
volumes:
- ./redis_sre_agent:/app/redis_sre_agent # Source code
- ./tests:/app/tests # Tests
Changes to Python files trigger automatic reload.
Production Considerations¶
For production deployments:
- Use pre-built images instead of building from source
- Remove volume mounts for source code
- Configure proper secrets management
- Set up persistent storage for Redis, Prometheus, Grafana
- Configure TLS for all external endpoints
- Set resource limits on containers
Docker Hub Images¶
Pre-built images are available on Docker Hub:
| Tag | Description |
|---|---|
redislabs/redis-sre-agent:latest |
Latest standard image |
redislabs/redis-sre-agent:airgap |
Air-gap image with bundled models |
redislabs/redis-sre-agent:v1.0.0 |
Versioned release (example) |
redislabs/redis-sre-agent:v1.0.0-airgap |
Versioned air-gap release |
Example production overrides:
# docker-compose.prod.yml
services:
sre-agent:
image: redislabs/redis-sre-agent:latest
volumes: [] # Remove dev mounts
deploy:
resources:
limits:
cpus: '2'
memory: 4G
Comparison: Standard vs Air-Gapped¶
| Feature | Standard | Air-Gapped |
|---|---|---|
| Internet required | Yes | No |
| Embedding models | OpenAI API | Bundled HuggingFace |
| LLM access | Direct or LiteLLM | Customer's internal proxy |
| Redis | Included | Customer provides |
| Image size | ~1.5GB | ~4GB (includes models) |
| MCP servers | Full support | Limited (no npx) |
Troubleshooting¶
Services Won't Start¶
# Check logs
docker-compose logs sre-agent
# Verify Redis is healthy
docker-compose exec redis redis-cli ping
LLM Errors¶
# Test LiteLLM
curl http://localhost:4000/health
# Check LiteLLM logs
docker-compose logs litellm
Worker Not Processing Tasks¶
# Check worker logs
docker-compose logs sre-worker
# Verify worker is connected
curl http://localhost:8080/api/v1/health