Migrating from Caddy to Traefik

This guide walks through migrating from the current Caddy setup to Traefik for dynamic multi-tenant proxy management.

Why Migrate?

Current Caddy Issues

  • Configuration Conflicts: Mixing Caddyfile and API causes srv1/srv2 port conflicts
  • Complex Sync Logic: Requires workarounds with PATCH/PUT operations
  • Limited Scalability: Not designed for thousands of dynamic routes
  • Poor Observability: Limited metrics and debugging capabilities

Traefik Benefits

  • True API-First: Built for dynamic configuration
  • No Conflicts: Single configuration source via HTTP provider
  • Highly Scalable: Efficiently handles thousands of routes
  • Observable: Built-in metrics, tracing, and dashboard
  • Zero-Downtime: Updates without restarts

Architecture Comparison

Current (Caddy)

Convex → Sync API → Caddy JSON API ←→ Caddyfile

                    CONFLICTS!

New (Traefik)

Convex → API Endpoint → Traefik HTTP Provider

         Clean JSON Config

Migration Steps

Phase 1: Setup (Day 1)

  1. Deploy Traefik alongside Caddy:

    cd apps/projects/local/traefik
    ./setup.sh
  2. Configure Environment:

    # Edit .env with your Cloudflare token
    vim .env
  3. Start Traefik:

    docker-compose up -d
  4. Verify Dashboard:

Phase 2: Testing (Days 2-3)

  1. Test Configuration Endpoint:

    # Check if API returns valid config
    curl http://localhost:3010/api/traefik/config | jq
  2. Add Test Domain:

    • Create a test route in Convex pointing to Traefik
    • Update DNS for test domain to Traefik IP
    • Verify SSL certificate generation
  3. Monitor Performance:

    # Watch Traefik logs
    docker logs -f traefik-proxy
    
    # Check metrics
    curl http://localhost:8080/metrics

Phase 3: Migration (Days 4-7)

  1. Batch Migration Strategy:

    // Suggested batch order
    const migrationBatches = [
      // Batch 1: Low traffic domains
      ['docs.dev', 'isup.dev'],
      
      // Batch 2: Medium traffic
      ['biturl.dev', 'contacts.dev', 'homepage.dev'],
      
      // Batch 3: High traffic
      ['do.dev', 'local.dev', 'customers.dev'],
      
      // Batch 4: Critical services
      ['dns.local.dev', 'talk.dev', 'sell.dev']
    ]
  2. For Each Batch:

    • Update DNS to point to Traefik IP
    • Monitor for 2-4 hours
    • Check error rates and performance
    • Proceed to next batch if stable
  3. Rollback Plan:

    • Keep Caddy running throughout migration
    • DNS changes can be reverted quickly
    • Document any issues for each domain

Phase 4: Cutover (Day 8)

  1. Final Validation:

    # Test all domains
    for domain in do.dev contacts.dev biturl.dev; do
      echo "Testing $domain..."
      curl -I https://$domain
    done
  2. Stop Caddy:

    docker stop caddy-reverse-proxy
    docker rm caddy-reverse-proxy
  3. Update Infrastructure:

    • Remove Caddy configuration files
    • Update documentation
    • Update monitoring alerts

Configuration Mapping

Caddy Route → Traefik Route

Caddy (Caddyfile):

do.dev {
    reverse_proxy 10.1.0.33:3005 10.3.0.33:3005 {
        health_uri /
        health_interval 30s
        lb_policy first
    }
}

Traefik (JSON):

{
  "http": {
    "routers": {
      "router-do-dev": {
        "rule": "Host(`do.dev`)",
        "service": "service-do-dev",
        "tls": { "certResolver": "cloudflare" }
      }
    },
    "services": {
      "service-do-dev": {
        "loadBalancer": {
          "servers": [
            { "url": "http://10.1.0.33:3005" },
            { "url": "http://10.3.0.33:3005" }
          ],
          "healthCheck": {
            "path": "/",
            "interval": "30s"
          }
        }
      }
    }
  }
}

UI Updates Required

1. Update Sync Function

Replace Caddy sync with Traefik config regeneration:

// Old (Caddy)
await caddyClient.loadConfiguration(routes)

// New (Traefik)
// Just trigger config regeneration
await fetch('/api/traefik/config', { method: 'POST' })

2. Update Server Status Page

Change health check endpoint:

// Old
const health = await fetch('http://10.3.3.3:2019/health')

// New  
const health = await fetch('http://traefik:8080/ping')

3. Update Route Management

No changes needed - routes still stored in Convex!

Monitoring & Debugging

Useful Commands

# View current routes
curl http://localhost:8080/api/http/routers | jq

# View service health
curl http://localhost:8080/api/http/services | jq

# Check specific route
curl http://localhost:8080/api/http/routers/router-do-dev | jq

# View real-time access logs
docker logs -f traefik-proxy 2>&1 | grep AccessLog | jq

Metrics to Monitor

  1. Response Times: traefik_service_request_duration_seconds
  2. Error Rates: traefik_service_requests_total{code="5XX"}
  3. Active Connections: traefik_service_open_connections
  4. Certificate Status: Check dashboard or /api/http/routers

Common Issues & Solutions

Issue: Routes Not Appearing

Solution: Check API endpoint is returning valid JSON:

curl http://localhost:3010/api/traefik/config | jq '.http.routers'

Issue: SSL Certificate Errors

Solution:

  1. Check Cloudflare token is valid
  2. Ensure domain points to Traefik IP
  3. Check ACME logs: docker logs traefik-proxy | grep acme

Issue: Health Checks Failing

Solution: Verify upstream services are accessible:

curl http://10.1.0.33:3005/  # From Traefik container network

Issue: Configuration Not Updating

Solution: Check polling is working:

docker logs traefik-proxy | grep "Configuration loaded from"

Performance Tuning

For High Traffic

Add to docker-compose.yml:

services:
  traefik:
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4G

Connection Pooling

In route configuration:

{
  loadBalancer: {
    servers: [...],
    // Add connection limits
    maxConn: 100,
    // Add timeout settings
    responseForwarding: {
      flushInterval: "100ms"
    }
  }
}

Success Criteria

Migration is complete when:

  • All domains resolved via Traefik
  • Zero Caddy containers running
  • SSL certificates valid for all domains
  • Health checks passing for all upstreams
  • Response times ≤ previous Caddy setup
  • Error rates < 0.1%
  • Monitoring dashboards updated

Rollback Procedure

If issues arise:

  1. Quick Rollback (< 5 minutes):

    # Start Caddy
    cd /root/local/caddy
    docker-compose up -d
    
    # Update DNS back to Caddy IP
  2. Investigate Issues:

    • Check Traefik logs
    • Review configuration
    • Test individual routes
  3. Fix and Retry:

    • Address specific issues
    • Test with single domain
    • Proceed with migration

Post-Migration Tasks

  1. Documentation:

    • Update README files
    • Remove Caddy documentation
    • Update runbooks
  2. Cleanup:

    • Remove Caddy containers
    • Delete Caddy configuration files
    • Remove unused API endpoints
  3. Optimization:

    • Enable caching where appropriate
    • Fine-tune rate limits
    • Configure advanced middleware

Support & Resources


Remember: Take it slow, test thoroughly, and keep Caddy as a fallback until you're 100% confident in Traefik!

On this page