Docker Deployment Guide¶

This guide covers deploying Redis SRE Agent using Docker in environments with internet access. For air-gapped environments, see Air-Gapped Deployment.

Overview¶

The standard Docker deployment includes:

Redis - Agent state, task queue, and vector storage
LiteLLM Proxy - LLM gateway supporting OpenAI, Anthropic, Azure, etc.
Prometheus/Grafana - Metrics and dashboards
Loki - Log aggregation
Tempo - Distributed tracing

Quick Start¶

1. Clone and Configure¶

git clone https://github.com/redis-applied-ai/redis-sre-agent.git
cd redis-sre-agent

# Copy example environment
cp .env.example .env

2. Set Required Environment Variables¶

Edit .env:

# Required: OpenAI API key (used by LiteLLM proxy)
OPENAI_API_KEY=sk-your-openai-key

# Optional: Anthropic for Claude models
ANTHROPIC_API_KEY=sk-ant-your-key

# LiteLLM proxy authentication (clients use this to auth to proxy)
LITELLM_MASTER_KEY=sk-1234

# API authentication
REDIS_SRE_MASTER_KEY=your-secret-key

3. Start Services¶

# Start core services
docker-compose up -d

# View logs
docker-compose logs -f sre-agent sre-worker

4. Access Services¶

Service	URL	Credentials
SRE Agent API	http://localhost:8080	`REDIS_SRE_MASTER_KEY` header
SRE Agent UI	http://localhost:3002	-
Grafana	http://localhost:3001	admin / admin
Prometheus	http://localhost:9090	-
LiteLLM UI	http://localhost:4000/ui	admin / admin

Architecture¶

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  SRE Agent  │────▶│   LiteLLM   │────▶│   OpenAI    │
│     API     │     │    Proxy    │     │  Anthropic  │
└─────────────┘     └─────────────┘     └─────────────┘
       │
       ▼
┌─────────────┐     ┌─────────────┐
│    Redis    │◀────│ SRE Worker  │
│  (state +   │     │ (background │
│   vectors)  │     │   tasks)    │
└─────────────┘     └─────────────┘
       │
       ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Prometheus  │────▶│   Grafana   │◀────│    Loki     │
└─────────────┘     └─────────────┘     └─────────────┘

Services¶

Core Services¶

Service	Purpose	Port
`sre-agent`	API server	8080
`sre-worker`	Background task processor	-
`redis`	Agent state and vectors	7843
`litellm`	LLM proxy	4000

Monitoring Stack¶

Service	Purpose	Port
`prometheus`	Metrics collection	9090
`grafana`	Dashboards	3001
`loki`	Log aggregation	3100
`tempo`	Distributed tracing	3200

Demo/Testing¶

Service	Purpose	Port
`redis-demo`	Target Redis for testing	7844
`redis-demo-replica`	Replica for replication tests	7845
`redis-exporter`	Redis metrics exporter	9121

Configuration¶

Embedding Models¶

By default, the standard deployment uses OpenAI embeddings:

# Default (OpenAI API)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
VECTOR_DIM=1536

For local embeddings (no OpenAI API needed for embeddings):

EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
VECTOR_DIM=384

LLM Configuration¶

LiteLLM supports multiple providers. Edit monitoring/litellm/config.yaml:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-sonnet-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

MCP Servers (Optional)¶

Enable MCP integrations with the mcp profile:

# Start with GitHub MCP server
docker-compose --profile mcp up -d

For HTTPS MCP (required for Claude Desktop):

# Generate self-signed certs
./scripts/generate-mcp-certs.sh

# Start with SSL
docker-compose --profile ssl up -d

Development Mode¶

The default compose mounts source code for hot-reload:

volumes:
  - ./redis_sre_agent:/app/redis_sre_agent  # Source code
  - ./tests:/app/tests                       # Tests

Changes to Python files trigger automatic reload.

Production Considerations¶

For production deployments:

Use pre-built images instead of building from source
Remove volume mounts for source code
Configure proper secrets management
Set up persistent storage for Redis, Prometheus, Grafana
Configure TLS for all external endpoints
Set resource limits on containers

Docker Hub Images¶

Pre-built images are available on Docker Hub:

Tag	Description
`redislabs/redis-sre-agent:latest`	Latest standard image
`redislabs/redis-sre-agent:airgap`	Air-gap image with bundled models
`redislabs/redis-sre-agent:v1.0.0`	Versioned release (example)
`redislabs/redis-sre-agent:v1.0.0-airgap`	Versioned air-gap release

Example production overrides:

# docker-compose.prod.yml
services:
  sre-agent:
    image: redislabs/redis-sre-agent:latest
    volumes: []  # Remove dev mounts
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G

Comparison: Standard vs Air-Gapped¶

Feature	Standard	Air-Gapped
Internet required	Yes	No
Embedding models	OpenAI API	Bundled HuggingFace
LLM access	Direct or LiteLLM	Customer's internal proxy
Redis	Included	Customer provides
Image size	~1.5GB	~4GB (includes models)
MCP servers	Full support	Limited (no npx)

Troubleshooting¶

Services Won't Start¶

# Check logs
docker-compose logs sre-agent

# Verify Redis is healthy
docker-compose exec redis redis-cli ping

LLM Errors¶

# Test LiteLLM
curl http://localhost:4000/health

# Check LiteLLM logs
docker-compose logs litellm

Worker Not Processing Tasks¶

# Check worker logs
docker-compose logs sre-worker

# Verify worker is connected
curl http://localhost:8080/api/v1/health