Caddy as a Load Balancer and Fallback Strategies.

Share

Share on...

Share on Linkedin

Share on WhatsApp

Share on Facebook

Share on Threads

Copy link

April 14, 2026

Simone Manini
April 14, 2026

When building web applications, you inevitably reach a point where a single server is no longer sufficient. Redundancy is required, and this means having a load balancer.

Although many highly advanced enterprise solutions exist, Caddy has become an extremely interesting choice. It is written in Go, automatically handles HTTPS, and includes a surprisingly robust reverse proxy and load balancer.

But reading the documentation is not enough to truly gain confidence. To understand how your infrastructure handles failures, you need to deliberately break it. In this article, we will build a small lab using Podman, FastAPI, and Caddy. We will configure a load balancer and test what happens when a backend crashes, when a service becomes slow, and how to route traffic more intelligently.

Setup: building the lab

To see Caddy in action, we need a simple application to balance traffic across. We will use a lightweight Python app built with FastAPI that returns which backend handled the request.

Here is our application code (app.py)

				
					from fastapi import FastAPI
import os
import time

app = FastAPI()
backend_id = os.getenv("BACKEND_ID", "Unknown")

@app.get("/")
def health_check():
    return {"status": "ok", "backend": backend_id}

@app.get("/test-api")
def test_api():
    print(f"DEBUG: Request received by Backend {backend_id}")
    return {"message": f"Hello, I'm BACKEND_{backend_id}"}
    
# We will use this later for our "delayed" scenario
@app.get("/slow-api")
def slow_api():
    # Only make Backend 2 simulate a hang, leaving Backend 1 healthy
    if backend_id == "2":
        time.sleep(5) 
        return {"message": f"Slow response from BACKEND_{backend_id}"}
        
    return {"message": f"Normal response from BACKEND_{backend_id}"}

To run it, we simply need a Containerfile:

				
					FROM python:3.11-slim
RUN pip install fastapi uvicorn
COPY app.py .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

And a compose.yml file to orchestrate Caddy together with two backend instances:

				
					services:
  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
    depends_on:
      - backend1
      - backend2

  backend1:
    build: .
    environment:
      - BACKEND_ID=1

  backend2:
    build: .
    environment:
      - BACKEND_ID=2

The Ideal Case (Happy Path)

Let’s take a look at the initial configuration (Caddyfile). This configuration tells Caddy to listen on port 80 and distribute traffic evenly across the two backends using a round-robin policy.

				
					:80 {
    reverse_proxy backend1:8080 backend2:8080 {
        lb_policy round_robin

        # Retry once if a connection fails
        lb_retries 1
        
        # Active Health Checks
        health_uri /
        health_interval 30s
        health_status 200
        
        # Passive Health Check: How long to remember a node is dead
        fail_duration 10s
    }

    log {
        output stdout
    }
}

If we start everything and send repeated requests to the endpoint, we will see the responses alternate evenly between the first and the second backend.

				
					starting test
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200
{"message":"Hello, I'm BACKEND_2"} 200
{"message":"Hello, I'm BACKEND_1"} 200

Everything works perfectly.

What happens when a service crashes?

The real test of a load balancer is not how it handles success, but how it handles failures. What happens if the second backend completely stops working?

We use a Bash script (test.sh) to find out: we wait for health checks to kick in, intentionally stop one of the containers, and then send a few requests.

				
					#!/usr/bin/env bash

echo "Starting failover test..."
sleep 32 

echo "Stopping backend2..."
docker stop caddy-load-balancer-test-backend2-1

echo "Firing requests immediately after crash..."
for i in {1..4}; do
    curl -s -w " HTTP: %{http_code}\n" http://localhost/test-api
done

echo "Waiting 10 seconds for Caddy's fail_duration..."
sleep 10

echo "Firing requests after fail_duration expires..."
for i in {1..4}; do
    curl -s -w " HTTP: %{http_code}\n" http://localhost/test-api
done

echo "Restarting backend2..."
docker start caddy-load-balancer-test-backend2-1
sleep 2 

echo "Firing requests after recovery..."
for i in {1..4}; do
    curl -s -w " HTTP: %{http_code}\n" http://localhost/test-api
done

When we run this test, something very interesting happens: the end user never sees any errors.

				
					Starting failover test...
Stopping backend2...
caddy-load-balancer-test-backend2-1
Firing requests immediately after crash...
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
Waiting 10 seconds for Caddy's fail_duration...
Firing requests after fail_duration expires...
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
Restarting backend2...
caddy-load-balancer-test-backend2-1
Firing requests after recovery...
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_2"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_2"} HTTP: 200

When the second backend goes down, the first request that Caddy sends to it fails. Thanks to the configuration, Caddy intercepts the error, automatically retries the request on the first backend, and returns a valid response.

Subsequently, Caddy marks the backend as unavailable for a certain period and routes all traffic to the healthy one. After this interval, it retries the failed backend, and if it is available again, it automatically reintegrates it into the pool.

The slow service case

An immediate error is easy to handle. But what happens if a node is online but slow? For example, it could be stuck on a database query, causing delays of several seconds.

By default, Caddy patiently waits for the backend response. This means that if a backend is slow, some users will experience high latency.

				
					80 {
    reverse_proxy backend1:8080 backend2:8080 {
        lb_policy round_robin
        lb_try_duration 2s
        
        health_uri /
        health_interval 30s
        health_status 200
        fail_duration 10s
    }
}

We can improve this behavior by setting a timeout: if the backend does not respond within a certain threshold, the request is retried on another node.

In this way, even if a backend is slow, the user still receives a fast response, preventing a single node from degrading the overall experience.

				
					Starting failover test...
caddy-load-balancer-test-backend2-1
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200
{"message":"Hello, I'm BACKEND_1"} HTTP: 200

Beyond Round-Robin: Other Routing Techniques

Round-robin is the ideal starting point because it is fully predictable, but it is also “blind.” It simply distributes traffic as if dealing cards. Depending on your architecture, you may need Caddy to route traffic more intelligently. You can find all available options in the official documentation for its reverse proxy load balancing. Here are some alternatives and how to include them in your Caddyfile:

Least Connections (least_conn)

Instead of blindly sending every second request to a specific server, Caddy checks how many active and still-unfinished requests each backend is handling at that moment. It then routes new requests to the least busy server.
If one server is heavily loaded, for example due to generating a heavy report, Caddy will naturally direct new, lighter requests to the other idle servers, preventing congestion.

				
					:80 {
    reverse_proxy backend1:8080 backend2:8080 {
        lb_policy least_conn
        
        # Active Health Checks
        health_uri /
        health_interval 2s
        health_timeout 2s
        health_status 200
        
        fail_duration 10s
    }
}

What happens if a server is slow? Let’s test it with this script:

				
					#!/usr/bin/env bash

echo "Starting Least Connections test..."

# We fire 4 simultaneous requests to absolutely guarantee Backend 2 
# catches at least one and gets "jammed" for 5 seconds.
echo "Triggering a traffic jam in the background..."
(curl -s http://localhost/slow-api; echo " <-- Background request") &
(curl -s http://localhost/slow-api; echo " <-- Background request") &
(curl -s http://localhost/slow-api; echo " <-- Background request") &
(curl -s http://localhost/slow-api; echo " <-- Background request") &

# Wait a full second to ensure the connections are registered by Caddy
sleep 1

echo "Firing 6 rapid-fire requests to the fast endpoint..."
# Because Backend 2 is now definitely stuck, Caddy will route ALL of these to Backend 1.
for i in {1..6}; do
    curl -s http://localhost/test-api
    echo "" 
done

echo "Waiting for the jammed background requests to finally clear..."
wait
echo "Test complete."


Starting Least Connections test...
Triggering a traffic jam in the background...
{"message":"Normal response from BACKEND_1"}{"message":"Normal response from BACKEND_1"} <-- Background request
 <-- Background request
Firing 6 rapid-fire requests to the fast endpoint...
{"message":"Hello, I'm BACKEND_1"}
{"message":"Hello, I'm BACKEND_1"}
{"message":"Hello, I'm BACKEND_1"}
{"message":"Hello, I'm BACKEND_1"}
{"message":"Hello, I'm BACKEND_1"}
{"message":"Hello, I'm BACKEND_1"}
Waiting for the jammed background requests to finally clear...
{"message":"Slow response from BACKEND_2"}{"message":"Slow response from BACKEND_2"} <-- Background request
 <-- Background request
Test complete.

Meanwhile, Backend 2 is stuck executing command <1>. It currently has 2 active pending connections.time.sleep(5)

A fraction of a second later, we flood the server with 6 new requests. If we were using round-robin, Caddy would blindly send 3 of them to Backend 2, forcing those users to wait in a queue. But since we are using least_conn, Caddy evaluates the situation: it sees that Backend 2 is busy (2 active connections), while Backend 1 is completely free (0 connections). At that point, it aggressively routes 100% of the new traffic to Backend 1, fully protecting the user from the slow server.

IP Hash (ip_hash)

If your application stores user sessions in the server’s local memory (such as login state), round-robin can break the application’s behavior. If a user logs in on the first server and the next request is routed to the second, they may suddenly appear logged out.

IP Hash solves this problem by using the user’s IP address to associate them with a specific backend. It ensures that, as long as the user’s IP does not change, all requests are routed to the same server. An even better approach is to use cookie-based sessions.

				
					:80 {
    reverse_proxy backend1:8080 backend2:8080 {
        lb_policy ip_hash
        
        health_uri /
        health_interval 30s
        health_status 200
        fail_duration 10s
    }
}

Sticky Sessions via Cookie (cookie)

Although IP hashing is useful, it can be unreliable when hundreds of users share the same IP address behind a corporate firewall, or when a mobile network IP changes dynamically.

Cookie-based policies are the most robust way to handle stateful applications. Caddy injects a special session cookie into the user’s browser during the first visit. For all subsequent requests, Caddy reads that cookie and ensures that the user is routed directly to the same server that holds their session data.

				
					:80 {
    reverse_proxy backend1:8080 backend2:8080 {
        lb_policy cookie sticky_session_id
        
        health_uri /
        health_interval 30s
        health_status 200
        fail_duration 10s
    }
}

Random (random)

Interestingly, if no load balancing policy is specified in the Caddyfile, the default behavior of Caddy is random. In practice, it selects a backend completely at random.

This made me curious, so I started wondering why. I discovered that while round-robin is more suitable for small setups (like our two-node lab), random routing is actually extremely fast and introduces very low computational overhead when dealing with large clusters of identical microservices.

For this reason, it can make sense in large-scale stateless applications.

				
					:80 {
    reverse_proxy backend1:8080 backend2:8080 {
        lb_policy random
        
        health_uri /
        health_interval 30s
        health_status 200
        fail_duration 10s
    }
}

Conclusion

Load balancing serves to protect users from the inevitable critical moments of infrastructure failure. As we have seen in this lab, Caddy makes configuring this resilience extremely simple. With just a few lines of configuration, it is possible to build a system that gracefully handles sudden crashes, mitigates silent network issues, and routes traffic intelligently based on the specific needs of the application.

share

Share on...

Share on Linkedin

Share on WhatsApp

Share on Facebook

Share on Threads

Copy link

Caddy as a Load Balancer and Fallback Strategies.

Setup: building the lab

The Ideal Case (Happy Path)

What happens when a service crashes?

The slow service case

Beyond Round-Robin: Other Routing Techniques

Least Connections (least_conn)

IP Hash (ip_hash)

Sticky Sessions via Cookie (cookie)

Random (random)

Conclusion

Media

News

Blog

Case History

Industries

Medical & pharma

Retail

Industrial IOT

Educational

Expertise

Medical imaging

3D visualization

UI & Data Visualization

Synthetic data

Physician

Industrial and Retail

News

Blog

Case History

Medical & pharma

Retail

Industrial IOT

Educational

Medical imaging

3D visualization

UI & Data Visualization

Synthetic data

Caddy as a Load Balancer and Fallback Strategies.

Setup: building the lab

The Ideal Case (Happy Path)

What happens when a service crashes?

The slow service case

Beyond Round-Robin: Other Routing Techniques

Least Connections (least_conn)

IP Hash (ip_hash)

Sticky Sessions via Cookie (cookie)

Random (random)

Conclusion

Media

News

Blog

Case History

Industries

Medical & pharma

Retail

Industrial IOT

Educational

Expertise

Medical imaging

3D visualization

UI & Data Visualization

Synthetic data

What we do

Physician

Industrial and Retail

Media

News

Blog

Case History

Industries

Medical & pharma

Retail

Industrial IOT

Educational

Expertise

Medical imaging

3D visualization

UI & Data Visualization

Synthetic data