The background to this is that over the last few years I've had the pleasure of using VMware vCenter at work on a Cisco UCS. This has been used for our development and staging environments. On my home server, meanwhile, I have been successfully using LXC and now LXD.
I originally tried LXC a number of years back out of frustration with some VMware workstation pains I was having on a fairly resource limited server, so thinking I’d give containerisation a try, I discovered that for the workload I was dealing with, LXC gave me an almost 4x speedup where suddenly my IO and Memory bottlenecks vanished.
Needless to say I'm fairly sold on containerisation for development and testing.
Now that I've been using it for some time, I've been looking for a way to take this environment away with me on my laptop when I travel, I've also wanted to be able to create a standalone 'distribution' I can give away to share the goodness.
Getting this all set up and running is fairly complex, and as LXD is still fairly new, there is not a lot of documentation around and none that I could find for making it work on the Mac.
For this article I'm going to focus on getting it running on the Mac using Funtoo as the linux distribution for the LXD host, however, it is certainly possible to replicate this for other distros and hypervisors. For example another good choice for the host might be Ubuntu or Alpine.
So, enough talk, now on with the details. Firstly what am I trying the achieve?
- A mini Datacenter I can use to quickly spin up 10 Linux images to do some platform development and testing on a typical macbook
- Support for fast snapshotting and an efficient filesystem architecture
- A datacenter environment that has some of the features of a vCenter or a Public cloud
- A design that is API enabled and could easily be linked into a continuous integration environment.
- Something I can zip up to under 500Meg and throw onto my Google Drive to share.
- LXD as our datacenter control system
- ZFS so we can do the fast snapshotting, and because it is just generally awesome
- Xhyve so we can use the native (free) hypervisor inside OSX
- Plan 9 filesystem for sharing with the host (there are plenty of other ways however I haven't played with this one before and it seems like an efficient way to do things)
At this point you can continue reading for the details or you can grab the latest zip file from my google drive and then jump to Booting and Configuring the Datacenter below.
Now, lets talk about the choices in a little more detail:
Now, lets talk about the choices in a little more detail:
LXD
In order to comfortably run around ten servers in my laptop datacenter I’m going to need to use containerisation (C-Level Hypervisor) rather than full virtualisation.
Containerisation means that rather than running a separate kernel for each vm, I can run a single kernel and share it and its memory across each of my containerised virtual machines.
LXD leverages the power of LXC containers by wrapping them under the control of a REST API enabled daemon, it also provides a great command line client for managing machines.
For more details, go checkout the main website for LXD
ZFS
A truely amazing filesystem, ZFS has massive scalability, built in integrity, built in compression, drive pooling and raid, just to mention a few things.
Go read about it and come back convinced if you aren't already. For our purposes ZFS is already leveraged by LXD to provide near instantaneous snapshotting.
You could also use Btrfs if you prefer. This will get you to the same goal, however I'm biased towards the Sun / Solaris camp on this one.
Go read about it and come back convinced if you aren't already. For our purposes ZFS is already leveraged by LXD to provide near instantaneous snapshotting.
You could also use Btrfs if you prefer. This will get you to the same goal, however I'm biased towards the Sun / Solaris camp on this one.
Xhyve with 9p / VirtFS
Xhyve is the OSX equivalent of Bhyve which is the qemu equivalent for FreeBSD. The nice thing about this is that it is free and works natively on just about every Mac with a recent OS and CPU. It is also very small and can be easily bundled with what we are building (it is, for example, bundled as part of Docker for Mac)
Go check out Dunkelstern's excellent Braindump on the topic.
Having a look around I found a number of forks of the original xhyve and one in particular caught my attention having already merged in support for the plan9 filesystem.
Here's the Plan
Now that we have talked a little about the technology, lets draw up a plan to pull all the pieces together.
- Update the build environment and kernel
- (since I will copy a working kernel and modules from here)
- Build ZFS
- Build a host OS environment for LXD
- kernel with virtio, plan9 and required CGroup namespace options set
- LXD
- network bridges
- squid + other useful packages
- Create the raw disk image for Xhyve
- Build Xhyve with Plan9
- Wire it all together and create a Zip file to distribute
1. Update the build environment and kernel
Since I already have Funtoo installed on my build machine, my first step is to update everything (especially in light of the recent Dirty Cow kernel issues)
So, following my usual upgrade steps:
So, following my usual upgrade steps:
emerge --update --newuse --deep --with-bdeps=y @system -a emerge --update --newuse --deep --with-bdeps=y @world -a emerge @preserved-rebuild -a emerge --depclean -a
According to the release notes for the latest ZFS package the newest kernel currently supported is the 4.8 series. I've been using 4.4.19 successfully for a while, so it looks like it is time to take the plunge and update to 4.8.5
The following command lets me update the kernel using the config for the old working kernel as a starting point
The following command lets me update the kernel using the config for the old working kernel as a starting point
genkernel --zfs --menuconfig --kernel-config=/etc/kernels/kernel-config-x86_64-4.4.19 all
Make sure you have virtio, 9p filesystem, all the control groups and namespace options, ip masquerading and bridging turned on. (I've put my final kernel-config in the shared directory)
2. Build ZFS
On my build environment I'm using 'zfs on root' hence the --zfs in the above genkernel command, so my first priority is to get my build machine working and rebooted safely....
Now that the kernel has built correctly and is linked in via /usr/src/linux, I can emerge in the Solaris Porting Layer and the ZFS kernel mods as follows:
Finally I need to rebuild my zfs package but before I start I need one more trick. In the /etc/make.conf file I set:
This means that when zfs finished building I end up with a nice binary gentoo package sitting under /usr/portal/packages. This binary package comes in handy later when we build the host environment.
After doing the usual messing about with grub.cfg I've now rebooted successfully and have a nice running 4.8.5 kernel.
The best test to know if it is also going to work smoothly is to run lxc-checkconfig and make sure that each item it outputs is a green 'enabled'. You can run this when you have rebooted into your brand new kernel, or you can point it at your kernel source and get it to check the config of a non running kernel.
At the time of writing there is no Funtoo in the provided lxd image repo, however I notice that there is now a Gentoo image, so it should be fairly easy to start from there as an alternative. I've also been keeping my eye on Alpine because of its very low footprint, however my efforts getting lxd to install from the Alpine test repo have so far been unsuccessful.
To get my Funtoo bootstrapped, I started with an older stage3 so the resulting mini datacenter will hopefully run on a wider variety of Mac's including those dating back a few years.
When starting from a stage3 you need to untar it into a rootfs directory and tarball it up again with the appropriate metadata.yaml so that it can then be imported into lxd (see the section on Manually building an Image in Stéphane Graber's article)
Once you have imported your image you can use that to 'init' a new LXC image as follows:
Now we can do a few more handy tricks to link it through to your build hosts existing portage and distfiles repository as follows:
I've set security.priviledge to true since this means we get 'normal' uid:gid values in the exposed filesystem mounted on the build environment. Later we need to use this exposed rootfs to build an image usable by Xhyve.
So at this point if you list your images, you should see the new funtoo image appear
It is time to start the image, jump inside it and complete the usual steps to getting a running updated OS built from source. You can check out the instructions on Funtoo.org or refer to the notes for your distro of choice.
Once we are updated, then we need to configure the networking:
This gives us two bridge interfaces we can slave to eth0
We need to set up the configs for these interfaces such that br0 gets an address via dhcp from Xhyve (it helps if your build host is also running a dhcp server) and lxbr0 is assigned to some other class C. First the eth0 interface needs to not get an ip address:
The br0 interface will get an address via dhcp
The lxdbr0, as you might suspect, is for the LXD guest network. This one is given a static address. I have arbitrarily chosen the 192.168.64.0 network here:
Make sure you add the essential ingredients such as dhcp, dhcpcd and squid (if you want to expose a proxy to your guests)
If you are interested in trying pylxd for accessing the api from python, you might also want to add pip and any other useful utility packages that you know and love.
Now since we want to run a dhcp daemon for the LXD guests we need to add this to /etc/dhcpcd.conf so it only serves addresses to lxdbr0
I added the following lines to /etc/dhcp/dhcpd.conf to give us a 192.168.14.0 network for the guests
We also need to add dhcpcd to the default runlevel
At this point you should be able to poweroff and restart the container to observe that the new interfaces correctly come up. (this assumes your buildhost or local network exposes a dhcp server)
Now that we have gotten this far, we have a working LXD guest sharing the buildhost's kernel (which we updated earlier to support LXD and ZFS)
Rather than trying to rebuild the kernel and modules within this guest, we can leverage the hard work we did earlier and re-use the ZFS package and kernel-modules from the build host.
Within the guest we can install ZFS from the binary package as follows:
And now add the various zfs services to the default runlevel
Finally edit /etc/inittab and uncomment all the terminals leaving only a serial console for Xhyve to communicate over
We also want to force the virtio-net and zfs drivers to load at boot so we need to edit /etc/conf.d/modules as follows:
For non privileged containers we need both /etc/subuid and /etc/subgid to look like this:
Optionally we can trim down the image and remove anything we don't think we need. For example:
For file sharing between LXD host and the real host, make a directory called /mnt/shared and add the following to /etc/hosts
Finally, set a root password and powerdown the container.
Now that the container is off (and we can still access it from the host) we can add in the hosts modules to the filesystem as follows:
It is a good idea to do a quick sanity check and make sure that both zfs.ko and virtio_net.ko were in there somewhere!
Or to keep it smaller, you might want to just copy the modules you need.
This takes about 5 minutes to run on my Intel NUC and at the end of the operation I have a new raw image. (The +300M gives us just a little headroom on the root partition, however the aim is that we will use /mnt/shared or a zfs pool for any new space required within the image)
Now that the kernel has built correctly and is linked in via /usr/src/linux, I can emerge in the Solaris Porting Layer and the ZFS kernel mods as follows:
emerge -av spl zfs-kmod
FEATURES="buildpkg"
This means that when zfs finished building I end up with a nice binary gentoo package sitting under /usr/portal/packages. This binary package comes in handy later when we build the host environment.
emerge -av zfs
After doing the usual messing about with grub.cfg I've now rebooted successfully and have a nice running 4.8.5 kernel.
3. Build a host environment for LXD
So far we have only been working on the build environment and my assumption is that if you get this far you have a working build machine running Funtoo, Gentoo or other linux with a working ZFS on a recent kernel. To continue we also need a working LXD running in the build environment as we are going to use an LXD container to build the host environment before converting it into a disk image that can be used by Xhyve.
Under Funtoo or Gentoo this is a simple as:
emerge -av lxd
At the time of writing there is no Funtoo in the provided lxd image repo, however I notice that there is now a Gentoo image, so it should be fairly easy to start from there as an alternative. I've also been keeping my eye on Alpine because of its very low footprint, however my efforts getting lxd to install from the Alpine test repo have so far been unsuccessful.
To get my Funtoo bootstrapped, I started with an older stage3 so the resulting mini datacenter will hopefully run on a wider variety of Mac's including those dating back a few years.
When starting from a stage3 you need to untar it into a rootfs directory and tarball it up again with the appropriate metadata.yaml so that it can then be imported into lxd (see the section on Manually building an Image in Stéphane Graber's article)
Once you have imported your image you can use that to 'init' a new LXC image as follows:
lxc image import myfuntoostage3.tar.gz --alias funtoo lxc init funtoo funtoo
lxc config device add funtoo sharedmem disk source=/dev/shm path=/dev/shm lxc config device add funtoo portagedir disk source=/usr/portage path=/usr/portage lxc config device add funtoo distfiles disk source=/data/distfiles path=/var/distfiles lxc config set funtoo security.privileged true
I've set security.priviledge to true since this means we get 'normal' uid:gid values in the exposed filesystem mounted on the build environment. Later we need to use this exposed rootfs to build an image usable by Xhyve.
So at this point if you list your images, you should see the new funtoo image appear
zen ~ # lxc list +------------+---------+----------------------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +------------+---------+----------------------+------+------------+-----------+ | funtoo | STOPPED | | | PERSISTENT | 0 | +------------+---------+----------------------+------+------------+-----------+
It is time to start the image, jump inside it and complete the usual steps to getting a running updated OS built from source. You can check out the instructions on Funtoo.org or refer to the notes for your distro of choice.
Once we are updated, then we need to configure the networking:
zen ~ # lxc start funtoo zen ~ # lxc exec funtoo bash funtoo ~ # cd /etc/init.d/ funtoo init.d # ln -s netif.tmpl net.eth0 funtoo init.d # ln -s netif.tmpl net.br0 funtoo init.d # ln -s netif.tmpl net.lxdbr0 funtoo init.d # rc-update add net.eth0 default * service net.eth0 added to runlevel default funtoo init.d # rc-update add net.br0 default * service net.br0 added to runlevel default funtoo init.d # rc-update add net.lxdbr0 default * service net.lxdbr0 added to runlevel default
This gives us two bridge interfaces we can slave to eth0
We need to set up the configs for these interfaces such that br0 gets an address via dhcp from Xhyve (it helps if your build host is also running a dhcp server) and lxbr0 is assigned to some other class C. First the eth0 interface needs to not get an ip address:
cat > /etc/conf.d/net.eth0 template="interface-noip" ^D
The br0 interface will get an address via dhcp
cat > /etc/conf.d/net.br0 template="bridge" stp="on" forwarding=1 slaves="net.eth0" stp="on" ^D
The lxdbr0, as you might suspect, is for the LXD guest network. This one is given a static address. I have arbitrarily chosen the 192.168.64.0 network here:
cat > /etc/conf.d/net.lxdbr0 template="bridge" ipaddr="192.168.64.3/24" gateway="192.168.64.1" nameservers="8.8.8.8" domain="lxd.local" slaves="net.eth0" stp="on" forwarding=1 ^D
Make sure you add the essential ingredients such as dhcp, dhcpcd and squid (if you want to expose a proxy to your guests)
emerge -av dhcp dhcpcd bind-tools inetd dnsmasq squid
If you are interested in trying pylxd for accessing the api from python, you might also want to add pip and any other useful utility packages that you know and love.
emerge -av dev-python/pip netkit-telnetd netcat
Now since we want to run a dhcp daemon for the LXD guests we need to add this to /etc/dhcpcd.conf so it only serves addresses to lxdbr0
denyinterfaces eth0 br0
I added the following lines to /etc/dhcp/dhcpd.conf to give us a 192.168.14.0 network for the guests
subnet 192.168.14.0 netmask 255.255.255.0 { range 192.168.14.10 192.168.14.200; option routers 192.168.14.1; }
We also need to add dhcpcd to the default runlevel
rc-update add dhcpcd
At this point you should be able to poweroff and restart the container to observe that the new interfaces correctly come up. (this assumes your buildhost or local network exposes a dhcp server)
Now that we have gotten this far, we have a working LXD guest sharing the buildhost's kernel (which we updated earlier to support LXD and ZFS)
Rather than trying to rebuild the kernel and modules within this guest, we can leverage the hard work we did earlier and re-use the ZFS package and kernel-modules from the build host.
Within the guest we can install ZFS from the binary package as follows:
cd /usr/portage/packages/sys-fs/ funtoo sys-fs # emerge --nodeps zfs-0.6.5.8.tbz2
And now add the various zfs services to the default runlevel
rc-update add zfs-import rc-update add zfs-mount
Finally edit /etc/inittab and uncomment all the terminals leaving only a serial console for Xhyve to communicate over
# SERIAL CONSOLES s0:12345:respawn:/sbin/agetty -L 115200 ttyS0 vt100 #s1:12345:respawn:/sbin/agetty -L 115200 ttyS1 vt100
We also want to force the virtio-net and zfs drivers to load at boot so we need to edit /etc/conf.d/modules as follows:
modules="virtio-net zfs"
For non privileged containers we need both /etc/subuid and /etc/subgid to look like this:
dnsmasq:100000:65536 lxd:1000000:65536 root:1000000:65536 squid:165536:65536
Optionally we can trim down the image and remove anything we don't think we need. For example:
emerge -C man-pages man-pages-posix debian-sources genkernel postfix rm -rf /var/spool/postfix/*
For file sharing between LXD host and the real host, make a directory called /mnt/shared and add the following to /etc/hosts
host /mnt/shared 9p defaults,rw,relatime,sync,dirsync,trans=virtio,version=9p2000.L 0 0 /dev/vda1 / ext4 noatime,rw 0 1
Finally, set a root password and powerdown the container.
Now that the container is off (and we can still access it from the host) we can add in the hosts modules to the filesystem as follows:
cp -a /lib/modules/4.8.5/ /var/lib/lxd/containers/funtoo/rootfs/lib/modules/
It is a good idea to do a quick sanity check and make sure that both zfs.ko and virtio_net.ko were in there somewhere!
Or to keep it smaller, you might want to just copy the modules you need.
4. Create the raw disk image for Xhyve
All the hard work we did in the previous section has now left us with the files needed for our new raw host OS mounted at /var/lib/lxd/containers/funtoo/rootfs
To get these into an image we could do this manually or we can take the shortcut of using virt-make-fs (emerge this into your build environment if you dont already have it)
virt-make-fs --type=ext4 --format=raw --size=+300M --partition -- \ /data/lxd/containers/funtoo/rootfs funraw.img
This takes about 5 minutes to run on my Intel NUC and at the end of the operation I have a new raw image. (The +300M gives us just a little headroom on the root partition, however the aim is that we will use /mnt/shared or a zfs pool for any new space required within the image)
5. Build Xhyve with Plan9 support
Now on your Mac, clone the xhyve repo and build it as follows:
There is an empty directory called 'shared' which we will use to share files with the host MacOSgit clone --recursive https://github.com/jceel/xhyve cd hyve ; make
(note: I did this on 10.11 and it worked fine, however I notice there are now errors building on MacOS 10.12.1, however YMMV so I've made my binaries available in my shared directory)
6. Wire it all together and create a Zip file to distribute
Firstly we copy our new funraw.img, kernel and initramfs to our Mac.
I created a separate directory called lxd with the following files in it:
simonmac:lxd simon$ ls -F funraw.img readme.txt xhyve.9p* initramfs-genkernel-x86_64-4.8.5 runxhyve.sh* kernel-genkernel-x86_64-4.8.5 shared/
We can start the image for the first time using the following: (sample provided in my shared directory)
./runxhyve.sh
This will ask for your MacOS password since it calls sudo, then you should see the OpenRC boot messages being pumped out of the virtual serial device and printed to stdout by Xhyve.
On first boot you will notice a few errors to do with the filesystem not mounting read write. I suspect this is something to do with the way virt-make-fs constructs the image, however this is easily fixed by logging in and doing the following:
localhost ~ # mount / -o rw,remount localhost ~ # touch /etc/conf.d localhost ~ # poweroff
Now when we run the xhyve shell script again it should boot normally and you should be able to ping external hosts. You should also have access to the hosts shared directory on /mnt/shared
At this point I have run 'poweroff' and zipped the entire directory into a zip file and uploaded it to my Google Drive.
7. Booting and Configuring the Datacenter
Ok, lets run it again. (or for the first time if you have downloaded my prebuilt zip file)./runxhyve.sh
Rather than have to maintain extra space in the datacenter OS partition for this, we now have the advantage of being able to use the host's file system.
Lets create a 10Gig partition to start with. We can always grow this later or do other zfs magic on it if required.
It would be even nicer if MacOS supported sparse files, however for our purposes this file uses up all the space we ask for:
dd if=/dev/zero of=/mnt/shared/pool.zfs bs=$((1024 * 1024)) count=0 seek=10240 zpool create -f -o cachefile= -O compression=on -m none lxdpool /mnt/shared/pool.zfs
Now lets check that with 'zpool list':
localhost ~ # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT lxdpool 9.94G 283K 9.94G - 0% 0% 1.00x ONLINE -
Also it might be nice to allocate some swap space. We can set this up as follows:
zfs create -V 2G -b $(getconf PAGESIZE) \ -o primarycache=metadata \ -o com.sun:auto-snapshot=false lxdpool/swap
Make and mount the swap:
mkswap -f /dev/zvol/lxdpool/swap swapon /dev/zvol/lxdpool/swap
Rather than add a swap line to /etc/fstab, I've instead opted for creating a /etc/local.d/lxd.start script instead. This script checks for the swap device before enabling swap, it also ensures your shared directory is correctly in place. Here is a sample one:
#!/bin/bash if [ ! -d /mnt/shared/images ]; then mkdir /mnt/shared/images fi # If zfs swap is defined lets activate it if [ -e /dev/zvol/lxdpool/swap ]; then swapon /dev/zvol/lxdpool/swap fi
Ok, before we initialise LXD, we need to softlink some directories into our /mnt/shared so that we don't eat into the extra headroom on the root filesystem that we created with virt-make-fs
rm -rf /var/lib/lxd/images mkdir -p /mnt/shared/images ln -s /mnt/shared/images/ /var/lib/lxd/images
Now lets call and initialise LXD
localhost lxd # lxd init Name of the storage backend to use (dir or zfs) [default=zfs]: Create a new ZFS pool (yes/no) [default=yes]? no Name of the existing ZFS pool or dataset: lxdpool Would you like LXD to be available over the network (yes/no) [default=no]? Would you like stale cached images to be updated automatically (yes/no) [default=yes]? no Would you like to create a new network bridge (yes/no) [default=yes]? no LXD has been successfully configured.
For simplicity I chose not to setup the network control port for this walkthrough.
This is a good point to poweroff the datacenter host and start it again from the ./runxhyve.sh shell script. Everything should come back up again, 'zpool list' should show the lxdpool as ONLINE and 'free' should show you your swap space.
8. Testing the Datacenter
Lets create a couple of virtual machines and then test the networking between them, and from them to the LXD host.
For the virtual machine choice, you can pick any distro you like out of "lxc image list images:" I've chosen a nice small alpine image for this test.
You can log into each with "lxc exec alpineN sh"For the virtual machine choice, you can pick any distro you like out of "lxc image list images:" I've chosen a nice small alpine image for this test.
localhost ~ # lxc launch images:alpine/3.4/amd64 alpine1 Creating alpine1 Retrieving image: 100% Starting alpine1 localhost ~ # lxc launch images:alpine/3.4/amd64 alpine2 Creating alpine2 Starting alpine2 localhost ~ # lxc list +---------+---------+-----------------------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +---------+---------+-----------------------+------+------------+-----------+ | alpine1 | RUNNING | 192.168.14.189 (eth0) | | PERSISTENT | 0 | +---------+---------+-----------------------+------+------------+-----------+ | alpine2 | RUNNING | 192.168.14.72 (eth0) | | PERSISTENT | 0 | +---------+---------+-----------------------+------+------------+-----------+
localhost ~ # lxc exec alpine1 sh ~ # ping -c 1 192.168.14.72 PING 192.168.14.72 (192.168.14.72): 56 data bytes 64 bytes from 192.168.14.72: seq=0 ttl=64 time=0.212 ms --- 192.168.14.72 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.212/0.212/0.212 ms ~ # ping -c 1 192.168.14.1 PING 192.168.14.1 (192.168.14.1): 56 data bytes 64 bytes from 192.168.14.1: seq=0 ttl=64 time=0.131 ms --- 192.168.14.1 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.131/0.131/0.131 ms
This shows we can ping the LXD host 192.168.14.1 and the other alpine container on its dynamically assigned 192.168.14.72
If you want to enable the VMs to reach the outside you can either enable natting on the datacenter host (already enabled if you downloaded the zip) or point applications on them to the squid proxy we installed.
How far can this scale you might ask? After editing the runxhyve.sh script to give the lxd host 6Gigs of ram, I have successfully run up 40 centos6 machines (36 salt minions and 4 salt masters) with them all happily talking to each other over zeromq. (FYI: in order to make this work I needed to add the following line to /etc/sysctl.conf so I could have that many containers without getting a 'Too many open files' error:
fs.inotify.max_user_instances = 1024
Conclusion
This has been a long post, however we have walked a long journey and ended up with a ready to run Datacenter in a zip file. I've shared the final zip and the various key components on my Google Drive. I hope you found this useful, it was certainly fun to put together.What's next? I'm feeling tempted to see if I can mod this to also run under KVM and Hyper-V