Homelab

Posted by Danny on 2024-04-21

My Homelab for 2024

This is more of a living document than a blog post and I'll keep editing and adding to it as things change.

I thought about doing some version control, but hey, that's what git is for so here's the link.

The Physical

Nodes

Optiplexes

Components Spec
1x i7-9700 & 3x i7-8700
93 GiB
4x 256 SSD
4x Gigabit Intel NIC
4x 10Gigabit ConnectX3
pve/8.1.3/b46aac3b42da5d15

Zeus

Zeus (Why a plant in the case? It wanted root access)

Components Spec
1x i7-6700K
32 GiB
4x Seagate BarraCuda 4Tb Drive (RAIDZ1)
2x 2Tb random Drives (Pool 2 - RAIDZ1)
NVIDIA RTX 3050 (Patched drivers for transcoding)
1x Gigabit Intel NIC
1x 10Gigabit ConnectX3
pve/8.1.3/b46aac3b42da5d15

I run the 4 Dell Optiplexes as Proxmox nodes only, I bought them used for a really good deal.

The, original lab was mainly just Zeus but now he's the granddaddy of all my servers. He's been through more disasters than a Greek tragedy. Still, he acts an extra node in the Prox cluster, my NAS and and runs one of the K8s masters (mainly for graphics related processes). I used to run TrueNas Scale for the NAS but moved to just a instance Debian running ZFS, as I didn't need most of the things TrueNas provided.

Proxmox Web Ui

Thus it's a 5 node cluster, with local lvm thins on a each host. I'm experimenting with Ceph and HA but don't have enough OSDs for now to be production ready. Availability is mainly L7 based, but there are a few VMs in the Prox HA.

I run Debian wherever possible and also use it as my daily driver for my PC and laptop.

I use two separate bridges for networking, one with the gigabit cards for all VMs, LXCs (these are being moved to K8s), WebUI, and corosync. Then the 10G cards are on a separate bridge, no LACP. This is reserved for the Kubernetes cluster and Ceph/storage bandwidth. I setup spanning tree on the Mikrotik's so I can loose one cable and still be able to reach every node.

Networking

Homelab Network

Router/Firewall

I have a fiber gigabit uplink to my ISP going into my main firewall Protectli Vault FW4B running PfSense in HA. I run the secondary instance on Zeus and even though pfsync is enabled, the WAN switch currently only terminates to the Protectli. Thus, I don't route any traffic through Zeus at the moment but if needed I could manually move over the WAN link.

By default everything is rejected both ways, I only open ports to the metallb IPs for my Istio Gateways and I control ingress/egress though them for all hosted apps. I also run Snort and have done a lot of finagling to get it working nicely, though it's never perfect. There are also three VPNs running, a Wireguard for personal uses and an OpenVPN server setup because a specific type of network traffic only allows TCP streams which Wiregaurd doesn't support. And lastly a Wiregaurd for my Work and only work traffic. My goal in the future is to get another fully dedicated firewall box for proper HA.

SW1 Core switch : MikroTik CRS309-1G-8S+IN

Nothing fancy, just running SwitchOS.

SW2: W.I.P. about to be CRS326-24G-2S+RM

coming very soon...

AP

Only one Access Point with 3 SSID, LAN, GUESTS and IOT.

VLANS

Base services

DNS...

I use two PiHoles for DNS filtering / blacklisting, two PowerDNS Recursors and two PowerDNS Authoritative Servers connected to my PostgreSQL DB in the K8S cluster.

Kea DHCP in my PfSense router pushes the two PiHoles to my clients, servers are using the re-cursors directly.

The PowerDNS solution is pretty overkill for my needs but it was a good learning experiance and if one of my VMs is ever down, at least my DNS stays running :). It does let me mess around a lot with routing and gather a good bit of metrics! I've also been using PowerDNS Admin as a nice way to manage the records.

Git + Ops

Forgejo, Jenkins, and Harbor! All running in the K8S cluster. I'm slowly moving my projects to being hosted on Forgejo and just mirrored to GitHub.

Kubernetes

Set up through some simple Ansible playbooks. I didn't do this the first two times I nuked the cluster but reinstalling all the packages, configuring the mounts, users, etc. really became a pain.

So it's a 3 master (ETCD on masters), 2 worker cluster. Everything in the cluster is configured with FluxCD. I previously used Rancher for easier introspection but wasn't a fan especially while learning. I've gone back the CLI and occasionally Lens. Initially I used Rancher + Helm charts for deployments but as complexity grew, I quickly learned why GitOps is nice.

This is still a W.I.P... I'll have a detailed write up about everything running in the future but as a simple overview for what hosted right now.