Kaizen! Let it crash (Friends) - The Changelog Recap
Podcast: The Changelog
Published: 2026-01-17
Duration: 1 hr 41 min
Guests: Gerhard Lazu
Summary
The episode discusses the challenges of managing server stability amidst out-of-memory errors and unusual download patterns. Gerhard Lazu and the hosts explore solutions involving Varnish caching and the philosophy of 'Let It Crash' from Erlang.
What Happened
Gerhard Lazu returns to discuss the persistent issue of out-of-memory errors affecting their systems. Between October and December, their Pipedream system experienced 43 out-of-memory crashes, largely due to Varnish's handling of large file caches. To combat this, they explored using file caching instead of memory to manage MP3 file requests more effectively.
The team noted a concerning pattern of over 10,000 distinct IPs repeatedly downloading the same file, indicating a potential DDoS attack. This unusual traffic pattern was partly attributed to the rise of AI and machine learning, which has led to unexpected spikes in data requests. They also considered utilizing VMOD throttle in Varnish to limit downloads per IP and manage bandwidth costs.
A misconfiguration on Fly.io, where concurrency was set to connections instead of requests, led to 2,700 long-running connections, exacerbating the server load issues. Switching to HTTP1 resolved timeout issues in two regions, and they now run hourly checks to monitor potential hanging connections.
In the context of network optimization, Gerhard shared his efforts in improving his home network setup, achieving sub-three-millisecond latency. His long-term goal includes upgrading to a 5 gigabit connection and possibly moving to a location with better internet infrastructure.
The episode also highlighted the most downloaded episode of The Changelog, titled 'OAuth, it's complicated,' which surpassed 1 million downloads. This prompted the team to consider using Cloudflare R2 to serve content more efficiently and mitigate further issues with popular episodes.
The 'Let It Crash' philosophy from Erlang was emphasized as a method for building resilient systems. By allowing components to fail in a controlled manner, systems can become more stable and learn from these failures. This approach was contrasted with the challenges they faced with Varnish's memory management.
Gerhard and the hosts also discussed tools like Namespace and Depot.dev, which help speed up CI/CD processes and build times. They praised Namespace for its ability to cache dependencies and Docker layers, making it a faster alternative to traditional GitHub Actions.
Key Insights
- Between October and December, the Pipedream system experienced 43 out-of-memory crashes primarily due to Varnish's inefficient handling of large file caches, prompting a shift to file caching for better management of MP3 file requests.
- The rise of AI and machine learning has led to unexpected spikes in data requests, with over 10,000 distinct IPs repeatedly downloading the same file, indicating a potential DDoS attack on their systems.
- A misconfiguration on Fly.io, where concurrency was set to connections instead of requests, resulted in 2,700 long-running connections, worsening server load issues that were resolved by switching to HTTP1.
- The 'Let It Crash' philosophy from Erlang is used to build resilient systems by allowing components to fail in a controlled manner, contrasting with the challenges faced in Varnish's memory management.