Gino Eising
Gino Eising
Nerd by Nature
May 14, 2026 34 min read

Surviving the Hug of Death on Home Fiber

thumbnail for this post

Cover: after Katsushika Hokusai, The Great Wave off Kanagawa — the wave of traffic cresting over the cluster — the boats hold the line because they were already in position.

May 2026 — on what it takes to make a Hugo blog on home fiber sit still while a thousand strangers ring the doorbell at the same time

This blog runs from a closet. Not metaphorically. There is a three-node Kubernetes cluster about four metres from where I am writing this, hanging off a 1 Gbps Odido fiber connection. No CDN. No Cloudflare. The reason is unromantic: I want to control everything end-to-end, and I want to be able to read the access logs without asking a third party for permission. If the site falls over, that’s my problem. If it survives, that’s also my problem, because then I’ll have to wonder how close I really was to the edge.

Tonight I tried to find out.

The setup

Three nodes, all on the same LAN segment behind a Mikrotik router:

NodeArchRAMRole
node02amd6464 GBMain compute, 10G NIC to router
storage1amd6432 GBNAS / control plane, 1G NIC
orange-pi-max-1arm6416 GBARM edge node, 1G NIC

Hugo builds in CI, ships as a container, and gets fronted by a Varnish DaemonSet — one cache pod on every node, each with its own warm RAM. nginx-ingress terminates TLS and proxies to the local Varnish via a Service with internalTrafficPolicy: Local, so traffic that lands on a node is served from that node’s cache. The whole thing sits behind a 1 Gbps fiber line in a normal Dutch apartment.

The previous post on this topic (Meten is Weten) explains how that DaemonSet got there. This one is about what happens when you point a thousand virtual users at it.

The provocation

Earlier this week I added a 37-tile painter-cover ribbon to the home page. Every post that has a cover_after frontmatter field gets a tiny tile, sorted by the painter’s birth year — 1450 Bosch on the far left, 1904 Dalí somewhere near the right. It is the visual equivalent of a card catalogue, and I am unreasonably fond of it.

It also means every home-page visit pulls 37 image requests instead of the previous 1.

If you are reading this with one eyebrow raised, that is correct. The honest framing is: I added a slow thing to the home page and then started a fight with it on purpose. I wanted to know whether the cluster could absorb a Hacker News front-page spike with the new ribbon in place, or whether the ribbon would be the thing that turned the spike into a 502 cascade.

Hence k6, hence tonight.

First measurement, and why I should have known better

The first run was a 1000-VU spike test from my laptop over Wi-Fi to the public hostname. The output looked grim.

http_req_duration..............: p(95)=7.04s
http_req_failed................: 3.21%
http_reqs......................: 38214  ~127 req/s

p95 of seven seconds. Three percent errors. My first thought was the ribbon broke everything. My second thought, arriving slightly late, was wait.

The laptop is on Wi-Fi. The laptop is one machine. One thousand concurrent VUs from one machine over Wi-Fi means: one shared NIC, one TLS stack opening a thousand connections in roughly parallel, one DNS resolver doing hairpin lookups for the public IP that then bounces off the router and comes back into the LAN. The bottleneck was sitting on my desk, not in the closet.

What stung a little is the laptop wasn’t supposed to be on Wi-Fi. My normal setup is a USB-C dongle with a 1 Gbit RJ45 going into a closet switch. The cable was missing — the Prusa 3D printer in the same closet had quietly inherited it during a vacuuming incident, and the closet’s too packed to dig into without a project. So the laptop fell back to Wi-Fi without telling me. The first ten minutes of “why is the cluster broken” was actually “why am I on Wi-Fi” with extra steps. Worth writing down: your troubleshooting assumptions about your own setup are a load-bearing part of the test, and they’re the part nobody validates.

So I built a k6 pod and ran the exact same test from inside the cluster, against the in-cluster Service. Same script, same VU count, same scenarios. The numbers changed entirely:

http_req_duration..............: p(95)=354ms
http_req_failed................: 0.00%
http_reqs......................: 137820  ~2298 req/s

p95 dropped from 7s to 354ms. Errors went to zero. Throughput went up roughly 18x. The server had been fine the whole time. The road to the server was the road in front of my apartment.

The lesson I keep relearning, written here so future-me can find it: always know whether you’re measuring the server or the road to the server. A load test from your laptop over Wi-Fi tells you something useful — usually about your laptop and your Wi-Fi.

For everything that follows, “the load test” means the one running from a pod inside the cluster, hitting the in-cluster Service directly. Real-world traffic still has to come down the fiber line, and we’ll get to that ceiling at the end.

The optimisation the data demanded — ribbon thumbs

Even with the test fixed, the ribbon scenario was the worst-behaved one. k6 was extracting all 37 image URLs from the home page and batch-fetching them. The average ribbon batch took 1.1s and pulled about 2.5 MB. For a single home visit that wants to feel instant on mobile, 2.5 MB of decorative tiles is criminal.

The original images were the same WebP files used for the post hero images — roughly 67 KB each. At 48 px on screen, that is wildly more than necessary. The fix is the thing any front-end engineer would have written first: generate small thumbnails at build time and serve those instead.

The Dockerfile already had a step that converted PNG hero images to WebP. I added a second step for the ribbon thumbs.

# Generate small ribbon-thumb variants (144x144 webp, ~3-8 KB each) so the
# home-page covers ribbon doesn't pay the cost of 37 × full-size cover images
# per visit. The ribbon partial reads these via <slug>-thumb.webp; falls back
# to the full image if the thumb is missing.
RUN for f in static/img/*.png; do \
      base="${f%.png}"; \
      thumb="${base}-thumb.webp"; \
      [ -f "$thumb" ] && continue; \
      convert "$f" -resize 144x144^ -gravity center -extent 144x144 \
        -quality 72 -define webp:method=6 -define webp:lossless=false "$thumb" && \
      echo "thumbed: $f$thumb"; \
    done

The ribbon partial picks <slug>-thumb.webp when it exists and falls back to the original otherwise, so old posts keep rendering even if the thumb pipeline misses them.

Per-image size dropped from about 67 KB to about 3.4 KB. The full ribbon payload dropped from roughly 2.5 MB to roughly 125 KB. Twenty times smaller, for an image set you only see at 48 px anyway. This is the part of the post where I have to acknowledge that the most useful optimisation tonight was a convert one-liner, not anything to do with Kubernetes. So be it.

The Varnish cache bump that actually mattered

The DaemonSet was originally configured with malloc,256m per pod — a leftover from when the site was twelve posts long. With 100+ posts, 37 ribbon thumbs, hero images, CSS bundles, fonts, and HTML pages, 256 MB filled within seconds of warmup and Varnish started evicting hot objects to make room for slightly less hot objects. Cache hit rate hovered around 92% under load; it should have been near 100%.

Bumping the malloc size to 2 GB and giving each pod a real CPU limit was the single biggest performance lever of the whole session.

    spec:
      containers:
      - name: varnish
        image: varnish:7.6
        # was: -s malloc,256m  (the original setting, two years stale)
        args:
        - |
          /usr/sbin/varnishd -F -f /etc/varnish/default.vcl -s malloc,2g \
            -a 0.0.0.0:80 -p thread_pool_max=1000 -p thread_pools=2 &          
        resources:
          requests:
            memory: 2200Mi
            cpu: 200m
          limits:
            memory: 4Gi
            cpu: "2"

After that change, cache hit rate sat at >99% for the home page and >97% for the long tail of blog posts. The Hugo backend basically stopped seeing traffic during the load tests — a handful of cold-miss requests per pod and that was it.

This might be a premature optimisation for a personal blog. It is also the kind of thing that takes ten minutes to ship and pays itself off the first time someone links you on a forum.

The HPA that didn’t fire

The original plan was a normal CPU-based HorizontalPodAutoscaler on the burst Deployment that sits behind the DaemonSet. Set target at 70% CPU, scale 2..10 replicas, let Kubernetes do its thing.

Under a 1000-VU spike, the HPA never triggered.

$ kubectl get hpa varnish-burst-djieno
NAME                   REFERENCE                         TARGETS    MINPODS   MAXPODS   REPLICAS
varnish-burst-djieno   Deployment/varnish-burst-djieno   0%/70%     2         10        2

Cute. Zero percent CPU while serving 2,300 req/s.

The reason is in retrospect obvious and at the time mildly humiliating: Varnish at this scale is RAM- and I/O-bound, not CPU-bound. It serves objects out of a malloc heap. The CPU spends most of its time idle while the kernel shovels bytes out of TCP buffers. Even when bursting hard, no single pod went above ~70% of one core. CPU was simply the wrong metric to autoscale on.

This is a useful failure to keep in mind: the default scaling signal will silently do nothing for any service whose work isn’t measured in CPU cycles. You don’t get an error. You get a flat line and a sense that something should be happening.

KEDA on the right metric

The fix is to scale on the metric that actually moves: HTTP request rate. KEDA makes this straightforward — it ships an external metrics provider that can query any Prometheus-compatible backend and feed the result to a regular HPA.

KEDA went in as a FluxCD HelmRelease, lean install, no HTTP add-on, no Prometheus operator sub-chart (we already run VictoriaMetrics). Each Varnish pod got a sidecar that runs prometheus_varnish_exporter, scraped by a PodMonitor that VictoriaMetrics picks up. Then a single ScaledObject replaced the CPU-HPA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: varnish-burst-djieno
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: varnish-burst-djieno
  pollingInterval: 15
  cooldownPeriod: 300
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://vmsingle-vm-stack-victoria-metrics-k8s-stack.monitoring.svc.cluster.local:8428
      metricName: varnish_req_rate
      threshold: '200'
      query: sum(rate(varnish_main_client_req[1m]))

The threshold is “200 req/s per replica.” During the spike, total request rate climbed to about 2,300 req/s, the HPA target became 2300/200 ≈ 12 desired replicas, capped at maxReplicaCount: 10, and KEDA scaled the burst Deployment from 2 → 7 in the first wave and 7 → 10 once the spike held. Now the metric actually fired. The HPA actually existed in a non-vestigial sense.

I should have started here. I didn’t, because CPU autoscaling is the default lore. The default lore is fine until it isn’t.

The arch dodge: building varnish-exporter for arm64

The exporter sidecar is jonnenauha/prometheus_varnish_exporter. Excellent project. Ships only linux-amd64 binaries on its GitHub Releases page. orange-pi-max-1 is aarch64, so the released binary won’t run on a third of the cluster.

I could have skipped the exporter on the ARM node, but that would have left the KEDA query missing a third of the request data, and the ARM node is in fact serving traffic. So I built it for both arches from source.

The Dockerfile is short — clone the upstream repo at a specific tag, cross-compile with GOARCH=$TARGETARCH, copy the resulting binary into the varnish:7.6 runtime image so the exporter can call varnishstat.

FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder
ARG TARGETARCH
ARG VERSION=1.6.1
WORKDIR /src
RUN apk add --no-cache git ca-certificates && \
    git clone --depth 1 --branch ${VERSION} \
      https://github.com/jonnenauha/prometheus_varnish_exporter.git . && \
    CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} \
      go build -trimpath -ldflags="-s -w" -o /out/prometheus_varnish_exporter .

FROM varnish:7.6
COPY --from=builder /out/prometheus_varnish_exporter /usr/local/bin/prometheus_varnish_exporter
USER varnish
ENTRYPOINT ["/usr/local/bin/prometheus_varnish_exporter"]

The build script uses two buildx builders in parallel — the local multiarch builder for amd64, and a remote opi-arm64 builder that runs natively on the ARM node so I don’t pay the QEMU emulation tax. Then docker buildx imagetools create stitches the two single-arch images into a multi-arch manifest under registry.djieno.com/djieno/varnish-exporter:v1.6.1.

If you are in the same arm64 boat with an exporter that only ships amd64 binaries, the recipe is open-source at https://gitlab.com/djieno/varnish-exporter. It is small. Fork it, change the upstream URL, change the runtime base image, ship a multi-arch image. Same shape for any Go exporter on the planet.

The cold-pod dip — warmup before joining the Service

The first time KEDA actually scaled the burst Deployment under load, aggregate throughput dipped. Not catastrophically — about 23% — but visibly. It took me a minute to figure out why.

When KEDA goes 2 → 7, the five new pods boot with empty caches. The moment their readiness probe passes, the Service routes traffic to them. They MISS on the first request for every URL, hit the Hugo backend, fill the cache slowly. During the ~30 seconds it takes each new pod to warm up, aggregate hit rate across the fleet drops, p95 latency climbs, and the spike you scaled out to handle is being served partly by pods that aren’t yet ready to handle it.

The fix is to delay readiness until the cache is actually warm. The container’s command now wraps varnishd: start it in background, fire a handful of HTTP requests at the home page and the critical CSS bundles, touch a marker file, then wait on the varnishd PID.

The slightly cursed detail is that the official varnish:7.6 image doesn’t ship curl or wget, and I didn’t want to install a package just for this. So the warmup uses bash’s built-in /dev/tcp socket support:

set -e
rm -f /var/lib/varnish/.warm
until getent hosts masterdjienocom.default.svc.cluster.local > /dev/null 2>&1; do
  echo "waiting for backend DNS..."; sleep 2
done
/usr/sbin/varnishd -F -f /etc/varnish/default.vcl -s malloc,2g \
  -a 0.0.0.0:80 -p thread_pool_max=1000 -p thread_pools=2 &
VARNISH_PID=$!
# Wait for varnishd to actually listen
for i in $(seq 1 30); do
  if (exec 3<>/dev/tcp/127.0.0.1/80) 2>/dev/null; then
    exec 3>&-; break
  fi
  sleep 1
done
# Warm the critical paths: home, blog index, the four CSS bundles every
# page pulls. Two passes — first one is a MISS, second one verifies HIT.
for path in / /blog/ /css/additional.css /css/medium.css /css/fonts.css /css/syntax.css; do
  for n in 1 2; do
    (exec 3<>/dev/tcp/127.0.0.1/80
     printf 'GET %s HTTP/1.0\r\nHost: djieno.com\r\nConnection: close\r\n\r\n' "$path" >&3
     cat <&3 > /dev/null) 2>/dev/null || true
  done
done
echo "cache warmed; marking ready"
touch /var/lib/varnish/.warm
wait $VARNISH_PID

The readinessProbe is then trivially simple:

readinessProbe:
  exec:
    command: [test, -f, /var/lib/varnish/.warm]
  initialDelaySeconds: 2
  periodSeconds: 2
  failureThreshold: 30

Pods join the Service only after the marker file exists. The cold-pod dip is gone — when KEDA scales the Deployment, new pods quietly warm themselves for a few seconds, then start absorbing traffic at full hit rate. It is the sort of fix that feels disproportionate to the problem until the first time you watch a graph stop dipping.

The numbers, honestly

With the ribbon thumbs in place, the 2 GB cache, KEDA scaling on request rate, and the warmup gate, the spike test from inside the cluster looked like this — over five separate runs across the evening:

MetricRange across 5 runs
Peak sustained throughput1,800 – 2,800 req/s
HTTP error rate0.00% – 0.04%
p50 latency (home page)15 – 18 ms
p95 latency (home page)350 – 600 ms
p95 latency (full ribbon batch)820 – 1,100 ms
Cache hit rate (home + CSS)> 99%
Burst replicas during peak7 – 10

There is real run-to-run variance here. Some of it is k6 startup jitter; some is the KEDA polling interval landing in a different place relative to the spike ramp; some is the cache state at the start of each run. I am not going to round any of this off and claim “the site does 10,000 req/s.” It doesn’t, on this hardware, on this fiber line, and I haven’t measured anything that supports a number that big.

What I can say is that during a 1000-concurrent-VU spike, the site serves ~2,000 requests per second with effectively zero errors and a median response time around 16 milliseconds. That is enough.

The Odido upload ceiling

The remaining honest question is whether the fiber line can deliver any of this to real visitors.

I measured sustained outbound throughput from node02 over a warm TCP connection to a server on the public internet: roughly 758 Mbps out of the nominal 1 Gbps. That is the ISP-side ceiling, not the cluster-side ceiling.

A full home-page visit with the new ribbon weighs about 175 KB on the wire (HTML + CSS + thumbs + WOFF2 fonts, all gzipped). Divide:

758 Mbps  /  (175 KB × 8 bits/byte)  ≈  535 home visits/sec

So the fiber line can sustain about 535 fresh visitors per second before saturating outbound. The cluster, server-side, runs out of CPU and thread pool capacity somewhere around 70 visitors per second of full asset load (the heaviest scenario, not just the HTML). The cluster is the bottleneck. The fiber line has roughly 7x headroom over the cluster.

A Hacker News front-page hit, depending on time of day and topic, sustains somewhere between 10 and 20 visits per second over a few hours, with brief peaks higher. A Reddit r/devops thread is similar order of magnitude. Both fit comfortably inside what this setup can serve, with the cluster as the limiting resource and the fiber line with room to spare.

That is the closest I am willing to come to a survival claim: this should hold for the kind of spike a personal blog actually gets, with margin. If a much larger property links here, things will get interesting in ways I have not measured.

What this didn’t fix, and what I’d do next

A few honest gaps.

Brief Service-routing window during scale events. When KEDA scales 7 → 10 and then 10 → 7 again, the iptables/IPVS sync inside the cluster lags pod readiness by a fraction of a second. Across the night I saw 0.00% – 0.04% errors during scale-down transitions, almost certainly traffic landing on a pod that just exited the endpoint set. A pre-stop hook with a short sleep would close this further; I haven’t bothered yet because 0.04% is acceptable for tonight.

The 60s rate window in KEDA’s query. rate(varnish_main_client_req[1m]) smooths the request rate over the last minute, which means a sharp spike takes ~30 seconds to be visible to the HPA, and the HPA reacts ~15 seconds after that on top of its polling interval. Total spike-to-scale latency is closer to a minute than I’d like. Shortening the window to [30s] would help, at the cost of more flap risk. Worth experimenting with under a real load pattern, not a synthetic one.

No CDN. A Cloudflare or Bunny in front of this would absorb basically everything I’ve described, for free or close to it, and would also handle the case where my fiber line drops. I keep not putting one there. The reason is the one in the opening paragraph — I want to control everything and serve from my house — and I am aware that this is more a value statement than an architecture decision. Reasonable people would disagree. Cheerfully.

ARM thread pool tuning. I haven’t tuned thread_pool_max per-architecture yet because the same 1000 ceiling across all three nodes was acceptable for the spike profile I tested tonight. A more sustained load would benefit from architecture-aware settings.

Closing

The blog now survives a synthetic 1000-VU spike with effectively zero errors, served from a closet in Amsterdam over consumer fiber, behind a cache layer that scales itself on the metric that actually moves, with warmup gating that hides the cold-pod gap, and with thumbnail images that are 20x smaller than they were yesterday. The biggest single win was a one-line convert command. The second biggest was bumping a number from 256 to 2048. The Kubernetes part was mostly there to make those two things matter at the right moment.

Now I just need an excuse to find out if anyone actually wants to read this.


Postscript — what actually happens when you generate the load from outside your own network

After publishing the above, I ran a follow-up test: real external traffic from three fresh Hetzner cpx22 instances in Nuremberg, Falkenstein, and Helsinki. The whole experiment cost €0.008 and ran in about thirty minutes.

The headline:

Source DCrequestsp50p95p99errors
Nuremberg8,46126 ms31 ms34 ms0
Falkenstein8,21934 ms41 ms48 ms0
Helsinki7,48167 ms75 ms88 ms0
Aggregate (~345 req/s sustained, real external)24,161< 100 ms anywhere in Europe0

That is the honest “what a visitor in Helsinki experiences” number. Sub-100ms p95 from a different country, zero errors, at five times Hacker-News-peak sustained traffic, while the cluster’s KEDA HPA was scaling and the warmup gating was kicking in. The cover is doing what the cover promised.

But then I pushed harder, because that’s what you do. 1500 VUs split across the same three DCs — 91% timeout rate. The cluster wasn’t to blame: the server-side request-rate metric still climbed to 351 req/s and the HPA scaled the burst Deployment 2 → 6 as designed. The connections never got that far. What broke first wasn’t Varnish, wasn’t Hugo, wasn’t even the Odido upload bandwidth (we measured 758 Mbps sustained, plenty). It was, in order of likelihood:

  1. Mikrotik NAT conntrack. Ruled out. I checked the next morning. The router is set to max-entries: 1048576 — one million conntrack slots, configured years ago when I had a different problem. So nope, the router has room. Initial hypothesis nuked by one ssh-into-the-Mikrotik. Worth keeping the strike-through for the next person who reaches for the same explanation.
  2. nginx-ingress TLS handshake CPU. Three ingress pods doing 1500 simultaneous TLS handshakes saturate before Varnish sees a thing. KEDA could do the same trick on the ingress that it does on Varnish; that would be the next layer of this same story.
  3. Odido edge shaping. Theoretically possible at the ISP, though our outbound bandwidth tests suggest they’re not the bottleneck on either direction.

The lesson, written down so I can find it next time: the cluster was never the bottleneck I should have been measuring against. Once a real external load is in play, the bottleneck moves to whatever bit of plumbing is between the visitor and the cluster — the home router, the TLS layer, the ISP. In retrospect this is obvious; I just hadn’t drawn the picture far enough to the left.

What got me to the right picture: noticing the cluster-side HPA was happily scaling while the external load was almost entirely failing. That asymmetry only shows up when you measure both sides at the same time. If I had only watched the k6 client, I would have concluded the cluster was broken. If I had only watched Varnish, I would have concluded the test was broken. Both were lying, neither alone.

So: the article above stands. The cluster does what it says on the tin. The router has its own date with KEDA, on a different night.


Postscript II — the day after, in which I was wrong about the bottleneck

Added 2026-05-16, two days after the original post. Some of what I wrote above is wrong. Better to say so out loud than to leave it on the page pretending.

A friend bet me a coffee that the bottleneck was the ISP. I bet him it was nginx-ingress TLS handshake CPU. We had the data sitting right there to settle it and we hadn’t run the experiment. So I ran it.

Two new tests. First, from inside my own LAN, 2000-VU spike from the laptop and from an Orange Pi 6 on the network, both against the public hostname (hairpin NAT through the Mikrotik) and against the cluster’s LAN IP directly (no Mikrotik routing). Second — the real one — three fresh Hetzner cpx22 in three different /16s, synchronised start, distributed ramp 0 → 4,500 aggregate VUs over two minutes. All metric streams pre-armed: Mikrotik CPU and conntrack, ether1-WAN1 byte and packet rates, nginx-ingress controller CPU per pod, Varnish request rate, every node’s CPU.

The results made me eat the article above with my morning bread.

SourceMedian latencySustained req/sErrors
Hetzner nbg1 (Nuremberg)16 ms7047.4%
Hetzner fsn1 (Falkenstein)4,580 ms688.7%
Hetzner hel1 (Helsinki)4,650 ms668.7%

Read that table twice. Same time window, same destination, three different Hetzner /16s. One of them was getting served in sixteen milliseconds. Two of them were stuck behind something that took four and a half seconds, on the median, for a homepage. The aggregate “cliff” in the original Postscript above was not the cluster crapping out at 1500 VUs. It was one of the three Hetzner sources being on a slow upstream path into Odido and its requests starting to time out as VU pressure built up.

Meanwhile, on the cluster side, during this same test:

ComponentPeak during 4,500 VU ramp
Mikrotik router CPU56%
Mikrotik conntrack4,781 (vs 1,048,576 cap)
ether1-WAN1 outbound645 Mbps out of ~1 Gbps
ether1-WAN1 drops0
nginx-ingress controller CPU5%
Varnish request rate served1,696 req/s

The cluster was bored. nginx-ingress, the thing I had bet on, sat at 5% CPU while two of the test sources were simultaneously starving for service. The home fiber link’s outbound side hit 645 Mbps — which is the bit you should actually carry away from this whole experiment. We had a working hypothesis that the home fiber connection would be irrelevant compared to cluster bottlenecks, and reality replied: the cluster has more capacity than the fiber to deliver it.

So three corrections, in increasing order of “things I should have known”:

  1. The original Postscript’s three suspects were wrong. It wasn’t conntrack (already struck-through), it wasn’t nginx-ingress TLS handshake CPU (5% CPU, comprehensively not it), and it wasn’t Odido edge shaping in any simple sense. It was peering quality between specific Hetzner /16s and Odido’s upstream. Two of three Hetzner DCs land on a slow path that gives them ~70 req/s. One lands on a fast path and gets ~700.
  2. The “1500-VU cliff” framing was the wrong way to describe it. A more honest framing is: when external sources push HTTP at the cluster, the limit is the quality of the public-internet path between source and cluster, not anything inside our perimeter. Different ASNs see very different ceilings. There is no single number.
  3. The actual cluster ceiling is unknown and larger than this article claimed. Varnish served 1,696 req/s during this run while ingress sat at 5% CPU. The cluster can almost certainly do multiples of that if the load arrived evenly. To prove it, you’d have to either bypass the public internet — terminate the WAN on a Hetzner edge, tunnel back, hammer the tunnel — or run the test from inside the LAN with multiple wired-cable sources to remove the path quality variable.

So the lesson rewrite. The original Postscript said “the cluster was never the bottleneck I should have been measuring against.” That was right for the wrong reason. The cluster is genuinely not the bottleneck — but it isn’t because the router or the ingress capped first. It’s because the public internet path to the front door is a more variable thing than I was treating it as. Some visitors get the cluster at its best. Others get whatever BGP and peering capacity their AS happened to pick that day.

I owe my friend a coffee.

Three experiments that would actually nail this down

Captured here so they’re on the record, in case I get to them.

  • Mirror the cluster at a Hetzner DC and hammer it from three others. Same Varnish + Hugo container, public IP at Hetzner. Three more Hetzner cpx22 instances pointed at it. Same k6 ramp. This removes the home fiber, the home router, and the public-ISP path entirely. The number that comes out is the true Varnish + Hugo ceiling on a cpx22-class node. It will be much larger than 1,700 req/s.
  • Tunnel the public IP off-cluster. Terminate the WAN-facing IP for test-blog.djieno.com on a Hetzner box, tunnel back to the cluster. External k6 hits the Hetzner termination. This isolates the question of “does Odido’s path quality account for everything I was seeing” by replacing Odido with a generic transit provider. If the asymmetry between Hetzner /16s vanishes when terminating at Hetzner, then it’s a route-into-Odido issue specifically.
  • Internal full-throat test. Three or four cable-connected laptops on the 24-port switch, k6-ing 10.1.1.231 directly. No Mikrotik, no Odido, no public internet. You’d find the cluster’s actual ingress + Varnish ceiling under realistic concurrent connections. Probably the most informative single test, and free.

All three are good blog material; none of them is being done tonight.

The first Postscript’s instincts were right about the cluster being healthy. They were wrong about why the test stopped scaling. Calling the home-fiber bandwidth “the most likely bottleneck after the cluster” turned out to actually be true in a way I didn’t intend — 645 Mbps of sustained outbound during a load test means real serving capacity is upload-constrained, which is something you can’t fix without changing ISPs. But the consistent limit, the thing that produces what looks like a cliff in a graph, is route diversity on the inbound side. That’s the part I missed.

Postscript III — three more runs, because I really wanted to find the cluster’s ceiling

A friend and I made an explicit bet before every subsequent run. He’s a Dutch SRE who actually runs production for a living; I’m an LLM with a strong prior toward “the bottleneck is in the legible compute somewhere.” It became a game. The friend ended up 4-0. The pattern of his wins is the most useful thing in this whole article, so let’s go through them.

Run #3 — mirror the cluster at Hetzner, hammer from elsewhere

If the path between visitor and front door is the cap, then bypassing the front door entirely should let us measure what the software can do on its own. Deployed the same Hugo + Varnish + TLS-terminator stack on a Hetzner cax11 (ARM Neoverse-N1, 2 vCPU, 4 GB), pointed test-blog.djieno.com at it, and ran the same 4,500-VU distributed ramp from three Hetzner cpx22 in fsn1 / nbg1 / hel1.

  • My bet: source-side cap, ~3,500-4,500 req/s, server CPU 30-50%.
  • Friend’s bet: server TLS CPU saturates first.

Result: 1,201 req/s aggregate, all three sources within 6% of each other (no more route asymmetry), server pegged at 200% CPU (both ARM cores), with nginx-tls doing the bulk and Varnish hitting cache at 60-80% CPU. The friend was right on the mechanism; my prediction was off in both ceiling and direction.

Useful side-finding: when the target sits inside Hetzner, the per-/16 asymmetry from Run #2 vanishes. The slow path was the route from those specific Hetzner /16s into Odido’s edge, not anything intrinsic to the sources. The asymmetric public-internet path is the boundary cap.

2-0 to the friend.

Run #4 — tunnel the WAN through a Hetzner edge, back to home via WireGuard

The natural next question: if Hetzner-source-to-home is the issue, what if we terminate the public WAN at a Hetzner box and tunnel the request stream into the home cluster? Stood up a cpx22 in fsn1 running nginx-TLS + kernel-mode WireGuard, with the home end on storage1 doing forwarding+masquerade. Bumped test-blog.djieno.com to the new edge IP, hammered with three Hetzner cpx22 again.

  • My bet: home fiber egress (~1 Gbps) caps us at 2,000-2,300 req/s.
  • Friend’s bet: WireGuard / tunnel CPU plays a big role.

Result: 1,114 req/s, all three sources equal (~370 r/s each), 0% errors. The asymmetry from Run #2 was gone (confirmed: it was the route into Odido). But — we didn’t get anywhere near the home fiber cap. The cpx22 edge was partly saturated, with CPU split between nginx TLS handshakes and WireGuard kernel softirq. The combined work of “terminate TLS and tunnel everything” was the cap, not bandwidth.

3-0.

Run #5 — wired LAN sources, three of them, direct to the cluster VIP

The cleanest possible test: laptop (cabled, 1 Gbps), a Lenovo on the LAN (1 Gbps), and a 16-core x86 box on a 10 Gbps NIC. Three sources, all on the 24-port switch. Targeted 10.1.1.231 (the cluster’s MetalLB VIP), no Mikrotik, no Odido, no public internet, no edge, no tunnel.

  • My bet: HTTP 12,000 r/s; HTTPS 6,000-7,000 r/s.
  • Friend’s bet: single-VIP funnelling will cap us at a moderate number even though the cluster has headroom.

Result, in two flavours:

HTTP-only (no TLS, just letting nginx-ingress return its 308 redirect)101,135 req/s aggregate, 0% errors, p95 = 42 ms. One hundred thousand req/s. Bored ingress controller. That’s the raw HTTP accept-and-redirect ceiling and nobody’s ever going to hit it from a real visitor pattern.

HTTPS, real TLS, real content1,658 req/s sustained, 0% errors, p50 under 5 ms, p95 under 1 second. The node holding the MetalLB VIP (node02, in our case) used 3.06 of its 12 cores for the ingress controller pod. The other two nodes’ ingress pods sat at 0.01 cores — idle, doing nothing for this VIP. Varnish handled the load with a 100% cache hit rate and 25-50% CPU.

The bottleneck — the friend was right — is single-VIP funnelling. MetalLB’s L2 advertisement makes one node own the IP via ARP; other ingress pods on the DaemonSet are spectators. The cluster’s actual TCP-accept ceiling is enormous; its HTTPS-under-realistic-concurrency ceiling on one pod is ~2,000 req/s; the rest is reserve capacity nobody’s using because the L2 advertisement won’t let them.

We tried pushing 500 VU/source instead of 100 to see if more pressure would force scaling. Throughput dropped to 578 r/s — more concurrent connections in one ingress pod hit HTTP/2 stream contention before they hit CPU. The sweet spot is 100-150 VU/source. The cluster gets worse under too much VU pressure, not better. Counter-intuitive but reproducible.

4-0. I am no longer permitted to make bets without a handicap.

Run #6 — the BGP-ECMP test that wouldn’t run

For closure: tried one more — force LAN traffic through Mikrotik via /32 host routes so it’d use the BGP ECMP routes (which were already installed, three /32s with the ECMP flag, just hidden by L2 ARP for LAN clients). Bet was on 2.5× scaling: I took over, friend took under.

The test never produced a number. TCP three-way handshake completed; every connection stalled after the TLS Client Hello. Root cause appears to be asymmetric routing — outbound went through Mikrotik (per the override), but cluster-node replies came back to the source directly over LAN L2 (bypass Mikrotik, because the source IP was on the connected subnet). Some combination of strict rp_filter, stateful conntrack on Mikrotik, or kernel session state silently drops the half-flow. Public traffic doesn’t hit this — Mikrotik hairpin-NATs the source so the return path is symmetric.

Inconclusive. The score stays 4-0. The article gets one more honest sentence in the table.

Final scorecard, and the postmortem of an LLM’s overconfidence

RunWhat we testedMy betFriend's betRealityWinner
#23× Hetzner distributed externalnginx-ingress TLS handshake CPUISP / WAN pathRoute-quality asymmetry between Hetzner /16s and Odido's edge: 16 ms vs 4,580 ms medianFriend
#3Mirror at Hetzner, distributed from elsewhereSource cap, 3,500-4,500 r/sServer TLS CPU1,201 r/s, server pegged at 200% CPU on 2 ARM coresFriend
#4WireGuard tunnel back to homeHome fiber egress at ~2,200 r/sWG / tunnel CPU1,114 r/s; edge cpx22 combined nginx+WG CPU was the capFriend
#53 wired LAN sources, direct to cluster VIPHTTPS at 6,000-7,000 r/sSingle-VIP funnelling caps moderatelyHTTPS 1,658 r/s; one ingress pod at 3 of 12 cores; other two idleFriend
#6BGP-ECMP from LAN via host routesover 2.5× scalingunder 2.5× scalingAsymmetric reply path killed every connection; no number produced— (inconclusive)

The pattern across all four bets: I kept anchoring on bottlenecks inside the system. The actual bottlenecks kept being at the boundary. Training data has lots of “we found nginx CPU was the cap” posts and very few “the slow path was BGP peering between AS24940 and a Dutch consumer ISP’s anti-DDoS edge”. The boundary bottleneck doesn’t show up in blog posts because nobody can see it without a distributed measurement rig. So I default to the legible bottleneck. The friend defaults to the boundary. The boundary won four times.

This generalises: a working SRE intuition is anti-correlated with the kind of bottleneck reasoning an LLM picks up from public technical writing. The intuition that wins is the one calibrated to where problems actually live — which is rarely the place that produces interesting blog posts, because if it produced interesting blog posts, somebody would have written them and the next-generation LLM would learn to pick that bottleneck. The bottlenecks that hide are the ones experienced operators eventually learn to look at first.

What the cluster is, after all this

After 5 runs, 4 distributed Hetzner setups, 1 WireGuard tunnel, 3 wired LAN sources, ~€0.20 of cloud credits, and one Saturday:

visitor on residential ISP
    ↓        ← path quality varies massively per /16; this is the real cliff
   Odido
    ↓        ← Hetzner /16 ↔ Odido peering is the variable for some prefixes
  Mikrotik (CCR2004, 12×SFP+)
    ↓        ← never gets above 56% CPU under 4,500 VU; not a bottleneck
  10.1.1.231 (MetalLB VIP — both L2 and BGP advertised)
    │
    ├──► WAN visitors (via Mikrotik DNAT): use BGP /32 ECMP, traffic
    │     spreads across all three node ingress pods ~3× scaling
    │
    └──► LAN clients on 10.1.1.0/24: ARP-resolve via L2, hit ONE node
          → that's where Run #5 saw the ~1,650 r/s single-pod ceiling
   ↓
  nginx-ingress on each receiving node
    ↓        ← per-pod ceiling ~2,000 r/s HTTPS; aggregate for WAN ~5,000 r/s
  Varnish DaemonSet
    ↓        ← 100% cache hit rate, never stressed, KEDA burst rarely needed
  Hugo static (origin)
    ↓        ← bored

The original article’s whole story was KEDA-scale Varnish to survive the hug of death. After this, the better one-line summary is “Varnish is fine; the cluster has BGP ECMP for external traffic and spreads across three nodes; the dominant variable on whether a particular visitor sees a fast cluster or a slow one is the public-internet path getting to the front door, not anything in our perimeter.” That’s a less-quotable sentence than the earlier “one pod handling a single advertised IP” version, but it’s the more accurate one — the LAN-direct test in Run #5 hit the L2-funnelled case, which is the worst the cluster behaves; external HN traffic gets the BGP-distributed case, which is meaningfully better.

Could we push the ceiling? Sure. Do we need to? No.

The cluster is comically over-built for the actual load that arrives at it. Even at the saturated state we found, the slowest part of every request was the visitor’s path to the door, not anything we own. Adding more cluster capacity makes the inside of the system bigger; it does nothing about the outside. If the original post over-engineered for the wrong threat, doubling down on the over-engineering wouldn’t fix the threat.

For completeness, there are ways to push the measurable ceiling higher if anyone cares for sport: switching MetalLB from L2-ARP to BGP advertisement so all three nodes share the inbound VIP via ECMP, an HTTP/3 transport, separating TLS termination from HTTP/2 handling. None of them change the fact that the public-internet path is the real cap on the experience. The cluster is fine. The router is fine. The fiber is fine. The internet, as ever, is the part you cannot fix from your closet.

A genuinely funny side note from poking at this: the cluster already has the BGP machinery wired up. BGPAdvertisement exists, BGPPeer is paired with the Mikrotik (sessions ESTABLISHED for 5+ days), Mikrotik’s route table holds three /32 ECMP routes for the VIP. The reason Run #5 funnelled to one node wasn’t a missing BGP setup — it was the L2Advertisement covering the same pool, so LAN clients ARP-resolved the VIP to one node and never traversed Mikrotik routing at all. The imagined “fix” was already half-deployed. We just couldn’t measure it cleanly without contriving the asymmetric reply path that broke Run #6.

One more line, because the punchline deserves one

If you came in fearing the Hacker News hug of death and reached for KEDA + Varnish + a multi-arch exporter + a custom Prometheus rule + a Grafana panel: you’re not wrong, you’re just overshooting. The cluster will be fine. The internet between your visitor and your front door is the thing you can neither see nor fix, and it is also, almost always, the part that bottlenecks. You can spend an afternoon proving this from inside your own house with ~€0.20 of cloud credits and three wired laptops. Or you can take this article’s word for it. Either is fine. Both are, in some sense, comically unnecessary.

A live readout, if you want to watch

For the curious, dashboard.djieno.com is a live Grafana board showing real-time numbers from this very cluster while you read: request rate, Varnish hit rate, Mikrotik CPU, per-node load, WAN throughput, conntrack count, ingress controller pod CPU. Refreshes every 30 seconds.

Two honest caveats so nobody is disappointed:

  • The dashboard is hosted at a Hetzner mirror, not at home — separate failure domain from the cluster it’s observing. If the cluster gets into trouble during a real hug, the dashboard will keep telling you so for a while, then go quiet when the cluster’s metrics endpoint stops responding. That’s by design.
  • The dashboard itself has its own per-source rate limit — 30 req/sec average, 100-request burst, then HTTP 429 from Traefik. Refreshing every 5 seconds in your browser is well below that. Pointing wrk at the dashboard will only teach you that Traefik’s rate-limit middleware works; it will not crash the dashboard. Don’t bother. There’s a richer thing to do with your afternoon — see the third paragraph of this article.

What this kind of work actually is

This is what platform engineering looks like when the brief is “figure out what’s actually happening” rather than “keep the Helm charts green.” It’s boundary debugging, multi-stack observability, and the willingness to bet against your own intuition on a Saturday afternoon — combined with the writing discipline to come back with the right number and an honest account of what you got wrong on the way.

It’s also, mostly, work that’s invisible to recruiters who pattern-match on tech stacks rather than read the writing. Kubernetes Engineer L3, four days a week, mostly Helm charts is not the shape of this. The shape of this is “our last three contractors couldn’t explain what was going on; here are the dashboards, the network, the postmortem we tried to write — figure it out, then tell us what’s actually broken, with the receipts.”

I’ve spent the last decade doing this work, mostly under contract. If you got to the bottom of this article and recognised the shape of the problem, the rest of this site will tell you whether I’m the right person to ask about your version of it. The cluster lives. The article is correct now. We had a good Saturday.


References