Sunday 26 January 2014

Curiosity killed the Thread

I've recently been intrigued by David Beazley's introduction to coroutines entitled "A Curious Course on Coroutines and Concurrency" and have had a go at taking his 'Operating System' a little further.

Having programmed some generators previously, I find the whole way of thinking quite interesting.

Basically the concept is that instead of relying on the OS to pre-emptively interrupt running tasks, we program them in such a way that they periodically hand back control to a scheduler. Ok this is actually an old idea, however achieving it using co-routines in python makes it all new again :-) Any tasks that require IO use select and are taken out of the sequential execution loop until ready to continue. I've added some intertask messaging and sleep system calls.

Using coroutines we end up with an operating system that runs tasks consecutively in a loop without the context switching penalties imposed by threading or multi processing.

Admittedly there are some limitations on what you can do, but rather than running thousands of tasks we can now run millions. We can still increase processing efficiency using threads or processes but we can do it via task pools and exercise some control over how this happens.

Spawning a million tasks with pyos coroutines takes around 32 seconds on my i5-3427U 16 Gig Mem Intel NUC. However attempting to spawn a million threads bombs out at < 32,000. The difference is that we are just instantiating a class with the former. Taking the two systems into even ground and attempting to spawn only 31,000 tasks each gives me the following:


  • Coroutines:  1.1 seconds and 40 MB of memory
  • Threads: 28 seconds and 369 MB of memory

The test code looks something like this: (pyos9 is my modified version of David's pyos8)

#!/usr/bin/python2
# Perf and Unit tests
#
# Test what happens with large numbers of tasks
# Simon Ryan 2014

from pyos9 import *
import sys,random,time,threading

mode = sys.argv[1]

try:
    num = int(sys.argv[2])
except:
    num = 1000
now = time.time()

def readhellos():
    yield SubscribeMessage('hello')
    while 1:
        # We need to go idle in order to listen for the message
        yield Idle()
        message = (yield)
        if message and message[2] == 'quit':
            break

def spawninator():
    print('spawninator starting')
    for i in xrange(num):
        yield NewTask(readhellos())
    print('spawninator finishing spawning %d tasks in %2.2f seconds' % (num,time.time() - now))
    print('sending quit message')
    yield Message('hello','quit')

class globs:
    # Poor mans messaging for threads
    timetodie = False

def simpletask():
    # Something we can launch as a thread
    while 1:
        time.sleep(1)
        if globs.timetodie:
            break

if mode == 'pyos':
    sched = Scheduler()
    sched.new(spawninator())
    sched.mainloop()

else:
    # Do approximately the same but with threads
    threadlist = []
    for i in range(num):
         try:
             testthread = threading.Thread(target=simpletask)
             testthread.start()
         except:
             print('Thread create error at thread number %d' % (i+1))
             break
         threadlist.append(testthread)
    elapsed = time.time() - now

    print('spawninator finishing spawning %d tasks in %2.2f seconds' % (num,elapsed))

    globs.timetodie = True
    for task in threadlist:
        task.join()

    elapsed = time.time() - now

print('Finished in %2.2f seconds' % (time.time() - now))

Here is what the output looks like when I ask for 10 Million tasks (this consumed 9.7Gigs of resident memory)

# ./unittest1.py pyos 10000000
spawninator starting

spawninator finishing spawning 10000000 tasks in 367.74 seconds
sending quit message
no more tasks
Finished in 586.68 seconds

Why is this interesting? Being able to spawn this many tasks changes the way we can think about certain types of problems. The first one that comes to mind is large scale simulations. Being able to write the simulated components as programs to be run rather than trying to model them as arrays of parameters opens up all sorts of interesting possibilities particularly where component behaviour is complex, such as automobile traffic around Hong Kong or asteroid/ship collisions in a space game :-)


Check out the code on GitHub