Please see the disclaimer.
The other day, I decided to test a theory of mine. I said, in “Computing Is Broken and How to Fix It,” that even with a microkernel, we could reduce the interaction with the kernel (and increase performance) if the hardware supported pipes (circular buffers) for passing data between processes.
I decided to test this theory with a crude test on Linux, making use of shared memory between two processes.
I admit that I was scared. This is an idea that I have staked a lot on, so I wanted it to do well. And I wasn’t sure it would.
I needn’t have worried.
In order to reproduce my results, you will need these three files:
Makefile. This will build and run everything for you.
- The actual test code. This file requires certain
#define’s to work properly. The
Makefiletakes care of these.
- A small timer library I use in my tests. The test file already includes this file, but it needs to be there or you will get compile errors.
With these files, you should be able to just do
make && make run, and it will
do it all for you. However, I ran my tests like this:
export CC=clang export CFLAGS="-flto -O3 -DNDEBUG" make sudo make run
sudo is there because the programs attempt to reduce the nice value
to give them higher priority. If that fails, they will continue regardless, so
it is not needed.
make && make run will do is use the test file to compile and run two
different programs, one of which will use a Unix pipe to pass data from one
program to the other. The other one will use a circular buffer in shared memory,
kind of like I proposed in “Computing Is Broken.”
When you run it, you should see output like this:
./shmem SHMEM Time: 0.000000007089606793 ./kernel KERNEL Time: 0.000000441069553283
SHMEM Time is the average time per iteration when using shared memory
(circular buffer). The
KERNEL Time is the average time per iteration when
using Unix pipes.
When I ran these tests on an
x86_64 Gentoo Linux install,
SHMEM Time was
always more than 50 times faster. Sometimes, it was as great as 64 times
That was an incredible result, far above what I was expecting.
The performance difference is so much that I am actually confident that if hardware were to implement such a circular buffer, which would provide more protections, than the shared memory here, like atomicity, a microkernel implemented on top of that hardware would be faster, maybe much faster, than the equivalent monolithic kernel.
But as always, I could be wrong. If I am, please tell me.