Saturday, 5 November 2016

A Mini Datacenter for OSX / MacOS using LXD and Xhyve

This post documents my efforts to create a 'portable' mini data center for OSX/MacOS that I can easily pass to friends and colleagues as a single zip file.

The background to this is that over the last few years I've had the pleasure of using VMware vCenter at work on a Cisco UCS. This has been used for our development and staging environments. On my home server, meanwhile, I have been successfully using LXC and now LXD.

I originally tried LXC a number of years back out of frustration with some VMware workstation pains I was having on a fairly resource limited server, so thinking I’d give containerisation a try, I discovered that for the workload I was dealing with, LXC gave me an almost 4x speedup where suddenly my IO and Memory bottlenecks vanished.

Needless to say I'm fairly sold on containerisation for development and testing.

Now that I've been using it for some time, I've been looking for a way to take this environment away with me on my laptop when I travel, I've also wanted to be able to create a standalone 'distribution' I can give away to share the goodness.

Getting this all set up and running is fairly complex, and as LXD is still fairly new, there is not a lot of documentation around and none that I could find for making it work on the Mac.

For this article I'm going to focus on getting it running on the Mac using Funtoo as the linux distribution for the LXD host, however, it is certainly possible to replicate this for other distros and hypervisors. For example another good choice for the host might be Ubuntu or Alpine.

So, enough talk, now on with the details. Firstly what am I trying the achieve?
  • A mini Datacenter I can use to quickly spin up 10 Linux images to do some platform development and testing on a typical macbook
  • Support for fast snapshotting and an efficient filesystem architecture
  • A datacenter environment that has some of the features of a vCenter or a Public cloud
  • A design that is API enabled and could easily be linked into a continuous integration environment.
  • Something I can zip up to under 500Meg and throw onto my Google Drive to share.
In order to do this on a laptop (lets say my MacBook Air) I’m going to need to use some pretty cool stuff, so lets list out the goodies we are about to play with:
  1. LXD as our datacenter control system
  2. ZFS so we can do the fast snapshotting, and because it is just generally awesome
  3. Xhyve so we can use the native (free) hypervisor inside OSX 
  4. Plan 9 filesystem for sharing with the host (there are plenty of other ways however I haven't played with this one before and it seems like an efficient way to do things)
At this point you can continue reading for the details or you can grab the latest zip file from my google drive and then jump to Booting and Configuring the Datacenter below.

Now, lets talk about the choices in a little more detail:

LXD

In order to comfortably run around ten servers in my laptop datacenter I’m going to need to use containerisation (C-Level Hypervisor) rather than full virtualisation.

Containerisation means that rather than running a separate kernel for each vm, I can run a single kernel and share it and its memory across each of my containerised virtual machines.

LXD leverages the power of LXC containers by wrapping them under the control of a REST API enabled daemon, it also provides a great command line client for managing machines.

For more details, go checkout the main website for LXD 
There is also a great set of articles by Stéphane Graber well worth taking a look at.

ZFS

A truely amazing filesystem, ZFS has massive scalability, built in integrity, built in compression, drive pooling and raid, just to mention a few things.

Go read about it and come back convinced if you aren't already. For our purposes ZFS is already leveraged by LXD to provide near instantaneous snapshotting.

You could also use Btrfs if you prefer. This will get you to the same goal, however I'm biased towards the Sun / Solaris camp on this one.

Xhyve with 9p / VirtFS

Xhyve is the OSX equivalent of Bhyve which is the qemu equivalent for FreeBSD. The nice thing about this is that it is free and works natively on just about every Mac with a recent OS and CPU. It is also very small and can be easily bundled with what we are building (it is, for example, bundled as part of Docker for Mac)

Go check out Dunkelstern's excellent Braindump on the topic.

Having a look around I found a number of forks of the original xhyve and one in particular caught my attention having already merged in support for the plan9 filesystem.

Here's the Plan

Now that we have talked a little about the technology, lets draw up a plan to pull all the pieces together.
  • Update the build environment and kernel 
    • (since I will copy a working kernel and modules from here)
  • Build ZFS
  • Build a host OS environment for LXD
    • kernel with virtio, plan9 and required CGroup namespace options set
    • LXD
    • network bridges
    • squid + other useful packages
  • Create the raw disk image for Xhyve
  • Build Xhyve with Plan9
  • Wire it all together and create a Zip file to distribute

1. Update the build environment and kernel

Since I already have Funtoo installed on my build machine, my first step is to update everything (especially in light of the recent Dirty Cow kernel issues)
So, following my usual upgrade steps:

emerge --update --newuse --deep --with-bdeps=y @system -a
emerge --update --newuse --deep --with-bdeps=y @world -a 
emerge @preserved-rebuild -a emerge --depclean -a

According to the release notes for the latest ZFS package the newest kernel currently supported is the 4.8 series. I've been using 4.4.19 successfully for a while, so it looks like it is time to take the plunge and update to 4.8.5

The following command lets me update the kernel using the config for the old working kernel as a starting point

genkernel --zfs --menuconfig --kernel-config=/etc/kernels/kernel-config-x86_64-4.4.19 all

Make sure you have virtio, 9p filesystem, all the control groups and namespace options, ip masquerading and bridging turned on. (I've put my final kernel-config in the shared directory)

2. Build ZFS

On my build environment I'm using 'zfs on root' hence the --zfs in the above genkernel command, so my first priority is to get my build machine working and rebooted safely....

Now that the kernel has built correctly and is linked in via /usr/src/linux, I can emerge in the Solaris Porting Layer and the ZFS kernel mods as follows:

emerge -av spl zfs-kmod

Finally I need to rebuild my zfs package but before I start I need one more trick. In the /etc/make.conf file I set:

FEATURES="buildpkg"

This means that when zfs finished building I end up with a nice binary gentoo package sitting under /usr/portal/packages. This binary package comes in handy later when we build the host environment.

emerge -av zfs

After doing the usual messing about with grub.cfg I've now rebooted successfully and have a nice running 4.8.5 kernel.

3. Build a host environment for LXD

So far we have only been working on the build environment and my assumption is that if you get this far you have a working build machine running Funtoo, Gentoo or other linux with a working ZFS on a recent kernel. To continue we also need a working LXD running in the build environment as we are going to use an LXD container to build the host environment before converting it into a disk image that can be used by Xhyve.

Under Funtoo or Gentoo this is a simple as:

emerge -av lxd

The best test to know if it is also going to work smoothly is to run lxc-checkconfig and make sure that each item it outputs is a green 'enabled'. You can run this when you have rebooted into your brand new kernel, or you can point it at your kernel source and get it to check the config of a non running kernel.

At the time of writing there is no Funtoo in the provided lxd image repo, however I notice that there is now a Gentoo image, so it should be fairly easy to start from there as an alternative. I've also been keeping my eye on Alpine because of its very low footprint, however my efforts getting lxd to install from the Alpine test repo have so far been unsuccessful.

To get my Funtoo bootstrapped, I started with an older stage3 so the resulting mini datacenter will hopefully run on a wider variety of Mac's including those dating back a few years.

When starting from a stage3 you need to untar it into a rootfs directory and tarball it up again with the appropriate metadata.yaml so that it can then be imported into lxd (see the section on Manually building an Image in Stéphane Graber's article)

Once you have imported your image you can use that to 'init' a new LXC image as follows:

lxc image import myfuntoostage3.tar.gz --alias funtoo
lxc init funtoo funtoo

Now we can do a few more handy tricks to link it through to your build hosts existing portage and distfiles repository as follows:

lxc config device add funtoo sharedmem disk source=/dev/shm path=/dev/shm
lxc config device add funtoo portagedir disk source=/usr/portage path=/usr/portage
lxc config device add funtoo distfiles disk source=/data/distfiles path=/var/distfiles
lxc config set funtoo security.privileged true

I've set security.priviledge to true since this means we get 'normal' uid:gid values in the exposed filesystem mounted on the build environment. Later we need to use this exposed rootfs to build an image usable by Xhyve.

So at this point if you list your images, you should see the new funtoo image appear

zen ~ # lxc list
+------------+---------+----------------------+------+------------+-----------+
|    NAME    |  STATE  |         IPV4         | IPV6 |    TYPE    | SNAPSHOTS |
+------------+---------+----------------------+------+------------+-----------+
| funtoo     | STOPPED |                      |      | PERSISTENT | 0         |
+------------+---------+----------------------+------+------------+-----------+

It is time to start the image, jump inside it and complete the usual steps to getting a running updated OS built from source. You can check out the instructions on Funtoo.org or refer to the notes for your distro of choice.

Once we are updated, then we need to configure the networking:

zen ~ # lxc start funtoo
zen ~ # lxc exec funtoo bash
funtoo ~ # cd /etc/init.d/
funtoo init.d # ln -s netif.tmpl net.eth0
funtoo init.d # ln -s netif.tmpl net.br0
funtoo init.d # ln -s netif.tmpl net.lxdbr0
funtoo init.d # rc-update add net.eth0 default
 * service net.eth0 added to runlevel default
funtoo init.d # rc-update add net.br0 default
 * service net.br0 added to runlevel default
funtoo init.d # rc-update add net.lxdbr0 default
 * service net.lxdbr0 added to runlevel default

This gives us two bridge interfaces we can slave to eth0

We need to set up the configs for these interfaces such that br0 gets an address via dhcp from Xhyve (it helps if your build host is also running a dhcp server) and lxbr0 is assigned to some other class C. First the eth0 interface needs to not get an ip address:

cat > /etc/conf.d/net.eth0
template="interface-noip"
^D

The br0 interface will get an address via dhcp

cat > /etc/conf.d/net.br0
template="bridge"
stp="on"
forwarding=1
slaves="net.eth0"
stp="on"
^D

The lxdbr0, as you might suspect, is for the LXD guest network. This one is given a static address. I have arbitrarily chosen the 192.168.64.0 network here:

cat > /etc/conf.d/net.lxdbr0
template="bridge"
ipaddr="192.168.64.3/24"
gateway="192.168.64.1"
nameservers="8.8.8.8"
domain="lxd.local"
slaves="net.eth0"
stp="on"
forwarding=1
^D

Make sure you add the essential ingredients such as dhcp, dhcpcd and squid (if you want to expose a proxy to your guests)

emerge -av dhcp dhcpcd bind-tools inetd dnsmasq squid

If you are interested in trying pylxd for accessing the api from python, you might also want to add pip and any other useful utility packages that you know and love.

emerge -av dev-python/pip netkit-telnetd netcat

Now since we want to run a dhcp daemon for the LXD guests we need to add this to /etc/dhcpcd.conf so it only serves addresses to lxdbr0

denyinterfaces eth0 br0

I added the following lines to /etc/dhcp/dhcpd.conf to give us a 192.168.14.0 network for the guests

subnet 192.168.14.0 netmask 255.255.255.0 {
 range 192.168.14.10 192.168.14.200;
 option routers 192.168.14.1;
}

We also need to add dhcpcd to the default runlevel

rc-update add dhcpcd

At this point you should be able to poweroff and restart the container to observe that the new interfaces correctly come up. (this assumes your buildhost or local network exposes a dhcp server)

Now that we have gotten this far, we have a working LXD guest sharing the buildhost's kernel (which we updated earlier to support LXD and ZFS)

Rather than trying to rebuild the kernel and modules within this guest, we can leverage the hard work we did earlier and re-use the ZFS package and kernel-modules from the build host.

Within the guest we can install ZFS from the binary package as follows:

cd /usr/portage/packages/sys-fs/
funtoo sys-fs # emerge --nodeps zfs-0.6.5.8.tbz2

And now add the various zfs services to the default runlevel

rc-update add zfs-import
rc-update add zfs-mount

Finally edit /etc/inittab and uncomment all the terminals leaving only a serial console for Xhyve to communicate over

# SERIAL CONSOLES
s0:12345:respawn:/sbin/agetty -L 115200 ttyS0 vt100
#s1:12345:respawn:/sbin/agetty -L 115200 ttyS1 vt100

We also want to force the virtio-net and zfs drivers to load at boot so we need to edit /etc/conf.d/modules as follows:

modules="virtio-net zfs"

For non privileged containers we need both /etc/subuid and /etc/subgid to look like this:

dnsmasq:100000:65536
lxd:1000000:65536
root:1000000:65536
squid:165536:65536

Optionally we can trim down the image and remove anything we don't think we need. For example:

emerge -C man-pages man-pages-posix debian-sources genkernel postfix
rm -rf /var/spool/postfix/*

For file sharing between LXD host and the real host, make a directory called /mnt/shared and add the following to /etc/hosts

host       /mnt/shared 9p defaults,rw,relatime,sync,dirsync,trans=virtio,version=9p2000.L 0 0
/dev/vda1  /           ext4            noatime,rw              0 1

Finally, set a root password and powerdown the container.

Now that the container is off (and we can still access it from the host) we can add in the hosts modules to the filesystem as follows:

cp -a /lib/modules/4.8.5/ /var/lib/lxd/containers/funtoo/rootfs/lib/modules/

It is a good idea to do a quick sanity check and make sure that both zfs.ko and virtio_net.ko were in there somewhere!

Or to keep it smaller, you might want to just copy the modules you need.

4. Create the raw disk image for Xhyve

All the hard work we did in the previous section has now left us with the files needed for our new raw host OS mounted at /var/lib/lxd/containers/funtoo/rootfs

To get these into an image we could do this manually or we can take the shortcut of using virt-make-fs (emerge this into your build environment if you dont already have it)

virt-make-fs --type=ext4 --format=raw --size=+300M --partition -- \
    /data/lxd/containers/funtoo/rootfs funraw.img

This takes about 5 minutes to run on my Intel NUC and at the end of the operation I have a new raw image. (The +300M gives us just a little headroom on the root partition, however the aim is that we will use /mnt/shared or a zfs pool for any new space required within the image)

5. Build Xhyve with Plan9 support

Now on your Mac, clone the xhyve repo and build it as follows:

git clone --recursive https://github.com/jceel/xhyve
cd hyve ; make

(note: I did this on 10.11 and it worked fine, however I notice there are now errors building on MacOS 10.12.1, however YMMV so I've made my binaries available in my shared directory)

6. Wire it all together and create a Zip file to distribute

Firstly we copy our new funraw.img, kernel and initramfs to our Mac.
I created a separate directory called lxd with the following files in it:

simonmac:lxd simon$ ls -F
funraw.img                              readme.txt                              xhyve.9p*
initramfs-genkernel-x86_64-4.8.5        runxhyve.sh*
kernel-genkernel-x86_64-4.8.5           shared/

There is an empty directory called 'shared' which we will use to share files with the host MacOS

We can start the image for the first time using the following: (sample provided in my shared directory)

./runxhyve.sh

This will ask for your MacOS password since it calls sudo, then you should see the OpenRC boot messages being pumped out of the virtual serial device and printed to stdout by Xhyve.

On first boot you will notice a few errors to do with the filesystem not mounting read write. I suspect this is something to do with the way virt-make-fs constructs the image, however this is easily fixed by logging in and doing the following:

localhost ~ # mount / -o rw,remount
localhost ~ # touch /etc/conf.d
localhost ~ # poweroff

Now when we run the xhyve shell script again it should boot normally and you should be able to ping external hosts. You should also have access to the hosts shared directory on /mnt/shared

At this point I have run 'poweroff' and zipped the entire directory into a zip file and uploaded it to my Google Drive.

7. Booting and Configuring the Datacenter

Ok, lets run it again. (or for the first time if you have downloaded my prebuilt zip file)

./runxhyve.sh

First we need to setup a nice ZFS pool to place all our guests into.
Rather than have to maintain extra space in the datacenter OS partition for this, we now have the advantage of being able to use the host's file system.

Lets create a 10Gig partition to start with. We can always grow this later or do other zfs magic on it if required.

It would be even nicer if MacOS supported sparse files, however for our purposes this file uses up all the space we ask for:

dd if=/dev/zero of=/mnt/shared/pool.zfs bs=$((1024 * 1024)) count=0 seek=10240
zpool create -f -o cachefile= -O compression=on -m none lxdpool /mnt/shared/pool.zfs

Now lets check that with 'zpool list':

localhost ~ # zpool list
NAME      SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
lxdpool  9.94G   283K  9.94G         -     0%     0%  1.00x  ONLINE  -

Also it might be nice to allocate some swap space. We can set this up as follows:

zfs create -V 2G -b $(getconf PAGESIZE) \
              -o primarycache=metadata \
              -o com.sun:auto-snapshot=false lxdpool/swap

Make and mount the swap:

mkswap -f /dev/zvol/lxdpool/swap
swapon /dev/zvol/lxdpool/swap

Rather than add a swap line to /etc/fstab, I've instead opted for creating a /etc/local.d/lxd.start script instead. This script checks for the swap device before enabling swap, it also ensures your shared directory is correctly in place. Here is a sample one:

#!/bin/bash
if [ ! -d /mnt/shared/images ]; then
    mkdir /mnt/shared/images
fi

# If zfs swap is defined lets activate it
if [ -e /dev/zvol/lxdpool/swap ]; then
    swapon /dev/zvol/lxdpool/swap
fi

Ok, before we initialise LXD, we need to softlink some directories into our /mnt/shared so that we don't eat into the extra headroom on the root filesystem that we created with virt-make-fs

rm -rf /var/lib/lxd/images
mkdir -p /mnt/shared/images
ln -s /mnt/shared/images/ /var/lib/lxd/images

Now lets call and initialise LXD

localhost lxd # lxd init
Name of the storage backend to use (dir or zfs) [default=zfs]:
Create a new ZFS pool (yes/no) [default=yes]? no
Name of the existing ZFS pool or dataset: lxdpool
Would you like LXD to be available over the network (yes/no) [default=no]?
Would you like stale cached images to be updated automatically (yes/no) [default=yes]? no
Would you like to create a new network bridge (yes/no) [default=yes]? no
LXD has been successfully configured.

For simplicity I chose not to setup the network control port for this walkthrough.

This is a good point to poweroff the datacenter host and start it again from the ./runxhyve.sh shell script. Everything should come back up again, 'zpool list' should show the lxdpool as ONLINE and 'free' should show you your swap space.

8. Testing the Datacenter 

Lets create a couple of virtual machines and then test the networking between them, and from them to the LXD host.

For the virtual machine choice, you can pick any distro you like out of "lxc image list images:" I've chosen a nice small alpine image for this test.

localhost ~ # lxc launch images:alpine/3.4/amd64 alpine1
Creating alpine1
Retrieving image: 100%
Starting alpine1
localhost ~ # lxc launch images:alpine/3.4/amd64 alpine2
Creating alpine2
Starting alpine2
localhost ~ # lxc list
+---------+---------+-----------------------+------+------------+-----------+
|  NAME   |  STATE  |         IPV4          | IPV6 |    TYPE    | SNAPSHOTS |
+---------+---------+-----------------------+------+------------+-----------+
| alpine1 | RUNNING | 192.168.14.189 (eth0) |      | PERSISTENT | 0         |
+---------+---------+-----------------------+------+------------+-----------+
| alpine2 | RUNNING | 192.168.14.72 (eth0)  |      | PERSISTENT | 0         |
+---------+---------+-----------------------+------+------------+-----------+

You can log into each with "lxc exec alpineN sh"

localhost ~ # lxc exec alpine1 sh
~ # ping -c 1 192.168.14.72
PING 192.168.14.72 (192.168.14.72): 56 data bytes
64 bytes from 192.168.14.72: seq=0 ttl=64 time=0.212 ms

--- 192.168.14.72 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.212/0.212/0.212 ms
~ # ping -c 1 192.168.14.1
PING 192.168.14.1 (192.168.14.1): 56 data bytes
64 bytes from 192.168.14.1: seq=0 ttl=64 time=0.131 ms

--- 192.168.14.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.131/0.131/0.131 ms

This shows we can ping the LXD host 192.168.14.1 and the other alpine container on its dynamically assigned 192.168.14.72

If you want to enable the VMs to reach the outside you can either enable natting on the datacenter host (already enabled if you downloaded the zip) or point applications on them to the squid proxy we installed.

How far can this scale you might ask? After editing the runxhyve.sh script to give the lxd host 6Gigs of ram, I have successfully run up 40 centos6 machines (36 salt minions and 4 salt masters) with them all happily talking to each other over zeromq. (FYI: in order to make this work I needed to add the following line to /etc/sysctl.conf so I could have that many containers without getting a 'Too many open files' error:


fs.inotify.max_user_instances = 1024

Conclusion

This has been a long post, however we have walked a long journey and ended up with a ready to run  Datacenter in a zip file. I've shared the final zip and the various key components on my Google Drive. I hope you found this useful, it was certainly fun to put together.

What's next? I'm feeling tempted to see if I can mod this to also run under KVM and Hyper-V

Tuesday, 2 September 2014

LTP 750 Pump bearing replacement

Leisure Time Pump repair - LTP 750

A while back I scored a nice second hand pool pump courtesy of a friend that sensibly recovers gems like this from the local council roadside collection (thanks Mark). Anyway, it found a new life as part of my cobbled together pool filtering system along with a number of other second hand pvc pipes, connectors and valves.

The arrangement has been running well for a couple of years now keeping our $200 kmart kids pool nice and clean, however, recently it started making an ever increasing whine that eventually got to the point where I could hear the resonant frequency from the front street.

Not normally being one to delve outside the realms of computer software and electronics, I decided I'd take a shot at fixing it. Maybe it was something simple I could squirt some oil into. (Ha Ha)

pump assembly removal

Ok a few bolts and the pump assembly separated from the motor body, so far so good.

Now to remove the impeller... hang on no nuts, bolts, screw heads... Hmm if I force this thing I'm likely to break something and then I need a new pump.

impeller removal


A quick search on YouTube and I discover these The amazing Pump Guys and six or seven instructional videos later I found myself educated enough to get down to the rotor. At this point the source of the noise became apparent. A 6202 bearing on the front side. Ok time to invest in a bearing puller and a new bearing from my local BSC bearings

Getting the bearing back on the shaft was nice and easy using a bit of old pipe that Dad turned up on the lathe for me. A few careful taps of the hammer moved the bearing nicely down to its seat.

bearing pipe press


Now for the re-assembly. There are four long "through bolts" that hold the ends on the pump and keep the armature centred, the bearings press fit into recesses in these pieces of cast metal. I first tried pulling it together by assembling and then tightening the bolts, however, the armature was binding so I loosened and started again, this time tapping with a rubber mallet.

Weirdly, each time I got the case back together it would spin freely until I tightened, and then it would bind. I tried rotating the back and it improved slightly. (The front has a drain hole that needs to point downwards so it has only one way it can go on)

Time to spin it up (fingers crossed).

Bench test


You can see the through bolts in this shot. Initially I left them loose, ran it for a while and slowly tightened it. Each time I ran it and then tightened, the armature seems to spin more freely as it bedded into position... And the great part about this bench test was the uncanny silence.. Ok, not quite silent, but sounding more like it should and less like a banshee.

Time for the final reassembly:

alignment slot

This bit is easy thanks to alignment slots and helpful arrows in the plastic to remind me which way is up.

Ok, now where did all those rubber seals go?


Nearly there, just have to add the o-ring in front of the seal

Fitting the o-ring

By pressing the spring loaded seal back the small o-ring can then be seated in the shaft groove. After this, there is one more large o-ring that goes around the impeller assembly and makes a seal when the plastic pump body slides on.

Reassembled and Fitted


Finally, the new improved almost silent version... 

Cost was $20 for the bearing puller and $10 for the bearing. (3 times the price from an online pool parts place)

Now that the "Banshee" is gone from the pool filter, maybe I'm feeling game enough to have a go at removing the "Freight Train" from the washing machine spin dry cycle.

Wednesday, 5 February 2014

Simple fix for Sticky keys on wired Apple Keyboard

Having recently been given an old iMac to refurbish, I have ended up with a nice wide wired Apple keyboard. (excuse the stock photo)



I've always wanted those extra function keys and page up and page down, since I do a lot of work with other OS's via remote control or VMware. (and the two handy USB ports don't go astray)

Anyway, the problem was sticky keys. Some over them felt quite squishy, some of them crackled and some of them just didn't want to move..

A bit of research on this model revealed a number really well documented but unsuccessful disassemblies and having recently disassembled an older MacBook Air attempting to fix dead keys, I wasn't jumping at the opportunity to start potentially a destructive operation with screwdrivers and heat guns:

Anyway after a few experiments it turns out these older keyboards are quite open in terms of liquid flow through.  (Probably a bad thing when coffee hits it, but a good thing if you are going to try flushing it with solvent)

The solution was as simple as spraying around each bad key with an excess of Isopropyl Alcohol, letting it trickle inside the keyboard and then repeatedly pushing the offending key until it came good..

Now the keyboard works perfectly.. No doubt the coffee or whatever has just moved further into the works, but hey, who am I to complain.

Hero for the day


Sunday, 26 January 2014

Curiosity killed the Thread

I've recently been intrigued by David Beazley's introduction to coroutines entitled "A Curious Course on Coroutines and Concurrency" and have had a go at taking his 'Operating System' a little further.

Having programmed some generators previously, I find the whole way of thinking quite interesting.

Basically the concept is that instead of relying on the OS to pre-emptively interrupt running tasks, we program them in such a way that they periodically hand back control to a scheduler. Ok this is actually an old idea, however achieving it using co-routines in python makes it all new again :-) Any tasks that require IO use select and are taken out of the sequential execution loop until ready to continue. I've added some intertask messaging and sleep system calls.

Using coroutines we end up with an operating system that runs tasks consecutively in a loop without the context switching penalties imposed by threading or multi processing.

Admittedly there are some limitations on what you can do, but rather than running thousands of tasks we can now run millions. We can still increase processing efficiency using threads or processes but we can do it via task pools and exercise some control over how this happens.

Spawning a million tasks with pyos coroutines takes around 32 seconds on my i5-3427U 16 Gig Mem Intel NUC. However attempting to spawn a million threads bombs out at < 32,000. The difference is that we are just instantiating a class with the former. Taking the two systems into even ground and attempting to spawn only 31,000 tasks each gives me the following:


  • Coroutines:  1.1 seconds and 40 MB of memory
  • Threads: 28 seconds and 369 MB of memory

The test code looks something like this: (pyos9 is my modified version of David's pyos8)

#!/usr/bin/python2
# Perf and Unit tests
#
# Test what happens with large numbers of tasks
# Simon Ryan 2014

from pyos9 import *
import sys,random,time,threading

mode = sys.argv[1]

try:
    num = int(sys.argv[2])
except:
    num = 1000
now = time.time()

def readhellos():
    yield SubscribeMessage('hello')
    while 1:
        # We need to go idle in order to listen for the message
        yield Idle()
        message = (yield)
        if message and message[2] == 'quit':
            break

def spawninator():
    print('spawninator starting')
    for i in xrange(num):
        yield NewTask(readhellos())
    print('spawninator finishing spawning %d tasks in %2.2f seconds' % (num,time.time() - now))
    print('sending quit message')
    yield Message('hello','quit')

class globs:
    # Poor mans messaging for threads
    timetodie = False

def simpletask():
    # Something we can launch as a thread
    while 1:
        time.sleep(1)
        if globs.timetodie:
            break

if mode == 'pyos':
    sched = Scheduler()
    sched.new(spawninator())
    sched.mainloop()

else:
    # Do approximately the same but with threads
    threadlist = []
    for i in range(num):
         try:
             testthread = threading.Thread(target=simpletask)
             testthread.start()
         except:
             print('Thread create error at thread number %d' % (i+1))
             break
         threadlist.append(testthread)
    elapsed = time.time() - now

    print('spawninator finishing spawning %d tasks in %2.2f seconds' % (num,elapsed))

    globs.timetodie = True
    for task in threadlist:
        task.join()

    elapsed = time.time() - now

print('Finished in %2.2f seconds' % (time.time() - now))

Here is what the output looks like when I ask for 10 Million tasks (this consumed 9.7Gigs of resident memory)

# ./unittest1.py pyos 10000000
spawninator starting

spawninator finishing spawning 10000000 tasks in 367.74 seconds
sending quit message
no more tasks
Finished in 586.68 seconds

Why is this interesting? Being able to spawn this many tasks changes the way we can think about certain types of problems. The first one that comes to mind is large scale simulations. Being able to write the simulated components as programs to be run rather than trying to model them as arrays of parameters opens up all sorts of interesting possibilities particularly where component behaviour is complex, such as automobile traffic around Hong Kong or asteroid/ship collisions in a space game :-)


Check out the code on GitHub