The other day, I decided to test a theory of mine. I said, in “Computing Is Broken and How to Fix It,” that even with a microkernel, we could reduce the interaction with the kernel (and increase performance) if the hardware supported pipes (circular buffers) for passing data between processes.
I decided to test this theory with a crude test on Linux, making use of shared memory between two processes.
I admit that I was scared. This is an idea that I have staked a lot on, so I wanted it to do well. And I wasn’t sure it would.
I needn’t have worried.
In order to reproduce my results, you will need these three files:
- The
Makefile
. This will build and run everything for you. - The actual test code. This file requires certain
#define
’s to work properly. TheMakefile
takes care of these. - A small timer library I use in my tests. The test file already includes this file, but it needs to be there or you will get compile errors.
With these files, you should be able to just do make && make run
, and it will
do it all for you. However, I ran my tests like this:
export CC=clang
export CFLAGS="-flto -O3 -DNDEBUG"
make
sudo make run
The sudo
is there because the programs attempt to reduce the nice value
to give them higher priority. If that fails, they will continue regardless, so
it is not needed.
What make && make run
will do is use the test file to compile and run two
different programs, one of which will use a Unix pipe to pass data from one
program to the other. The other one will use a circular buffer in shared memory,
kind of like I proposed in “Computing Is Broken.”
When you run it, you should see output like this:
./shmem
SHMEM Time: 0.000000007089606793
./kernel
KERNEL Time: 0.000000441069553283
The SHMEM Time
is the average time per iteration when using shared memory
(circular buffer). The KERNEL Time
is the average time per iteration when
using Unix pipes.
When I ran these tests on an x86_64
Gentoo Linux install, SHMEM Time
was
always more than 50 times faster. Sometimes, it was as great as 64 times
faster.
That was an incredible result, far above what I was expecting.
The performance difference is so much that I am actually confident that if hardware were to implement such a circular buffer, which would provide more protections, than the shared memory here, like atomicity, a microkernel implemented on top of that hardware would be faster, maybe much faster, than the equivalent monolithic kernel.
But as always, I could be wrong. If I am, please tell me.