contents
Blogs / Which Kubernetes Distribution Should You Choose? Lessons From My Failure.
August 17, 2023 • Matthew Duong • Kubernetes;Self Hosting • 4 min read
Last weekend, my home Kubernetes cluster blew up. I was running a 6-node cluster for some extensive workloads for coderone (ai game tournament) and headbot (generative ai avatars), and suddenly, things went south. Here's my journey from the collapse to the rebuild, and what I learned in the process.
I've been using MicroK8s for the past three years, initially drawn to it as an optional add-on for Ubuntu installations. However, issues started to emerge when all of a sudden I noticed some of these symptoms:
Upon further investigation, I found that MicroK8s uses dqlite for storing cluster configuration. Though lightweight, dqlite isn't as mature as etcd, the data store that most production-grade / high availability distributions use.
Dqlite stands for "Distributed SQLite," and it aims to extend the well-known SQLite database engine to a clustered environment. SQLite itself is an embedded SQL database, highly praised for its reliability and efficiency in a wide range of applications, from mobile devices to web servers.
Simplicity: One of the most appealing aspects of dqlite is its simplicity. It's easier to install and get up and running than many other distributed databases.
Resource Efficiency: Dqlite is lightweight and has lower CPU and memory requirements compared to etcd, making it a popular choice for smaller clusters or edge computing.
Maturity: Dqlite is not as mature as etcd. This can translate to less community support and fewer features geared toward high availability and data consistency. Limited Use-Cases: Given its lighter weight, dqlite may not be the best option for more extensive, production-grade clusters requiring high availability and fault tolerance.
Etcd is a distributed key-value store used as Kubernetes’ backing store for all cluster data. Developed by CoreOS, it's now maintained by the Cloud Native Computing Foundation (CNCF).
High Availability: Etcd is built with high availability in mind. It can tolerate machine failures, network partitions, and will elect a new leader automatically if the current one fails.
ACID Compliant: Etcd offers ACID properties, ensuring data consistency across the cluster. This is crucial for applications that require transactions to be processed reliably.
Resource Intensity: Etcd clusters require fast hardware to run (typically an ssd is recommended). There are explicit warnings about running etcd on the sd card of a raspberry pi.
Complexity: The setup can be complex and might require a better understanding of its operational aspects, including cluster configuration, data backup, and regular maintenance.
After pulling the plug on MicroK8s, I explored other distributions—k3s and RKE2, to be precise.
K3s seemed like a good fit at first, but my efforts to set it up in high-availability mode were not successful. By default, K3s uses dqlite for single-node setups and switches to etcd for high-availability setups. I was easily able to get a single node running, however I was unable to setup the high availability configuration (for three nodes). So I moved on to RKE2.
Out of the box RKE2 promised:
FIPS 140-2 Compliance: Designed with a focus on security, RKE2 meets the requirements for US government projects.
Air-Gap Support: It can run in environments without direct internet access, offering flexibility for secure or isolated deployments.
Windows Node Support: Unlike many distributions that focus solely on Linux nodes, RKE2 extends its support to Windows nodes. Best of all the setup worked the first time around (it took me about 5 minutes to setup and join 6 nodes in the cluster). 3 master nodes and 3 worker nodes.
The main takeaways from this experience is that each flavour of kubernetes serves a different purpose. If you only intend to run a single node and want to experiment then I would highly recommend microk8s.
Out of the box it is highly configurable with many free addons that are usually quite finicky to install:
However the shortcomings become apparent almost immediately when you want to scale up and run in high availability. Your network quickly becomes choked and your cluster may just suddenly break.
My take on K3s I’ve had quite some success with K3s where hardware is limited. I made a tutorial on running your own blog from a raspberry pi. It promises to be the solution for iot and is supposed to be able to run on beefier hardware in high availability mode. However I could not configure it out of the box.
Up to this point, RKE2 has been running smoothly for me, easily handling the workloads that brought down my MicroK8s setup. The installation process was even simpler than with MicroK8s. However, it does require more initial configuration for features that come pre-configured in MicroK8s.