The other day, I decided to test a theory of mine. I said, in “Computing Is Broken and How to Fix It,” that even with a microkernel, we could reduce the interaction with the kernel (and increase performance) if the hardware supported pipes (circular buffers) for passing data between processes.

I decided to test this theory with a crude test on Linux, making use of shared memory between two processes.

I admit that I was scared. This is an idea that I have staked a lot on, so I wanted it to do well. And I wasn’t sure it would.

I needn’t have worried.

In order to reproduce my results, you will need these three files:

  1. The Makefile. This will build and run everything for you.
  2. The actual test code. This file requires certain #define’s to work properly. The Makefile takes care of these.
  3. A small timer library I use in my tests. The test file already includes this file, but it needs to be there or you will get compile errors.

With these files, you should be able to just do make && make run, and it will do it all for you. However, I ran my tests like this:

export CC=clang
export CFLAGS="-flto -O3 -DNDEBUG"
make
sudo make run

The sudo is there because the programs attempt to reduce the nice value to give them higher priority. If that fails, they will continue regardless, so it is not needed.

What make && make run will do is use the test file to compile and run two different programs, one of which will use a Unix pipe to pass data from one program to the other. The other one will use a circular buffer in shared memory, kind of like I proposed in “Computing Is Broken.”

When you run it, you should see output like this:

./shmem
SHMEM Time:  0.000000007089606793
./kernel
KERNEL Time: 0.000000441069553283

The SHMEM Time is the average time per iteration when using shared memory (circular buffer). The KERNEL Time is the average time per iteration when using Unix pipes.

When I ran these tests on an x86_64 Gentoo Linux install, SHMEM Time was always more than 50 times faster. Sometimes, it was as great as 64 times faster.

That was an incredible result, far above what I was expecting.

The performance difference is so much that I am actually confident that if hardware were to implement such a circular buffer, which would provide more protections, than the shared memory here, like atomicity, a microkernel implemented on top of that hardware would be faster, maybe much faster, than the equivalent monolithic kernel.

But as always, I could be wrong. If I am, please tell me.