Baremetal K8S cluster is FUN!

Yuwei Sung
3 min readJan 30, 2021
Photo by Daniel Cheung on Unsplash

Long story short, I left my 8-year job and moved to a startup company. As such, I lost my company-sponsored GCP account as my lab. So I picked up my home lab equipment and made my first baremetal K8S cluster at home. This is what I have learned so far.

Before the home lab project, I used git, Github, ansible, terraform, visual code, and kubeadm to quickly bring up a cluster and automate an environment to experiment with microservice. The learning path is bumpy, but I think I picked the right tools to make my learning less frustrating. I want to stick with the same toolset during the home lab building process, so I decided to create a branch from my GCP friendly kubeadm script for baremetal scripts. After a new git branch, I stop and rethink the process. First, I need to rewrite the terraform scripts to kickstart OS build with DHCP, DNS, PXE boot, TFTP, and Ubuntu kickstart. Then, I need to revise the ansible script to get storage and network bonding. These should be what I need to move my project from cloud to baremetal world. As you may guess, if this is that easy, you won’t see this article.

For terraform PXE boot, I haven’t seen a good fit for my home lab. I may need to write a module for the PXE boot. I ended up doing the OS/networking stuff manually. I only have one RAID1 NAS server, one raspberry pi as DNS/NTP/DHCP server, one 32G Ram master, and three fanless micro form factor 16G Ram workers. Those took me two hours. I need to come back to revisiting this when I get more “quiet” machines. I really miss Vmware Vsphere/vSAN/NSX-T.

For the ansible script, this is really easy and fast. Mostly, I spent time setting up storage device auto-detection. Great! I could get my home lab set up in an hour and start to deploy some pods, add services, and expose ingress.

Yes, bringing up a “basic” K8S cluster is pretty easy. My script ran without “error,” and I can run “k get nodes” and see nodes are “ready.” I am ready to deploy an Nginx pod and see if I can curl it. Wait! I don’t have shared storage across my worker nodes. How can the Kube-scheduler move the pod and still keep volume mounted? I need CSI. There are many options (NFS, iSCSI) but you need shared storage (all worker nodes should access it). I have some m2 SSD drives on worker nodes, and it will be great if I can make them shareable as a “fast” storage class. I found rook-ceph is a good option to share the local drives. This is like a poor man's vSAN. I love it. But I spent 6 hours to get the rook/ceph operator mounting my local drives correctly. Most of the time, I am troubleshooting the calico networking. It turned out the culprit is multi-nic on my worker nodes, and calico creates routes on wrong IPs. In the troubleshooting process, I have to run the rook teardown process multiple times.

Next is Ingress. From the k8s official doc, you can find many options to expose your k8s services. Node port is popular as you really don’t need to do anything, and it fits my home lab env (it is a private network and the number of node ports is big enough). But node port is not “elegant” to me. Physical Load Balancer is not an option. So I pick metallb. With the metallb, I can assign a subset of my home lab IPs as IP Pool for the K8S load balancer. Moreover, the Nginx-ingress automatically picks up the IP from metallb IP-Pool through layer 2 protocol.

At this stage, I am confident that all my micro-services deployments I developed for the GCP lab will work without any problem in my baremetal home lab.

Next article, I will share more detail of this lesson learned. Stay tuned.

--

--

Yuwei Sung

A data nerd started from data center field engineer to cloud database reliability engineer.