Assumed Audience: Anyone interested in OS design.

Epistemic Status: Confident, but without time.

Introduction

I feel myself being pulled away from the programming world.

This is why I wrote my last two posts: even though I worded them as though I intend to work on Yao, my language, I’m actually not sure I will ever do so.

And that’s even with the fact that Yao already exists in a primitive state and works!

But I want to commit my ideas to “paper” so that others can follow after me and do better.

So this post is yet another laying out design ideas, this time for an operating system.

The Problem

The problem with operating systems today is that they have poor security.

What makes good security? Isolation, capabilities, and as little code as possible. Current operating systems don’t have any of those.

The Solution

And the best kind of OS would create pits of success for the developers targeting it. Current operating systems make it so easy to fail.

Process Creation

Let’s take process creation.

I previously said this about process creation:

To pick the same example from both, I hate both fork() and CreateProcess().

Microsoft wrote “A fork() in the Road” describing the problems with fork(), and they are right: fork() is too simple, it doesn’t scale, it is inefficient, and error handling becomes next to impossible.

(In my code, I have the child process return exit codes from 255 on down for error handling. It assumes, probably wrongly but right enough, that most programs won’t use those exit codes.)

But CreateProcess() has the exact opposite problem: it’s too complex and limited because it takes a large, fixed set of arguments. It takes many lines of code to set up for it, and once you actually start the process, you have no control over it, which sucks if you need to do something with the new process that CreateProcess() cannot do on creation.

Windows does have ways of modifying running processes, which is great and sort of makes up for the limits, but heaven help you if you need to use one of those functions on the new process because you have no control. The only thing you can do is to suspend the start thread of the new process right away, do your thing to it, and resume the thread, praying that the new process hadn’t created a new runaway thread before it was suspended.

(And all of that is not even mentioning that the Windows way of passing a command-line is to pass a string to be parsed, not a list of strings. My code has a function literally to take a list of strings and turn it into a Windows-compatible command-line string with all of the juicy backslashing that implies.)

The right API is in the middle: a zero-argument function to create a new, blank process (not a copy of the current process), but to create it in a suspended state, so you know it’s not going to run away from you.

Then, you use Windows-style functions to change the process, before it even starts, to set it up. Once you’re done, unsuspend it.

That end result gives you as much power as fork(), with the better scalability of CreateProcess(), and better ease-of-use than both. You could even have functions to map in a copy of the current process if you so wish, to implement fork() for process checkpointing!

In other words, talented programmers can implement things and make them work, yes, but it doesn’t imply that they are good at design.

My design is a pit of success; the error handling is easier because the programmer doesn’t have to do anything for a process that doesn’t start because an unstarted process is just reaped.

Process Trees

While we are talking about processes, they should be treated like threads in structured concurrency: parent processes do not get reaped until their child processes are done. And when a process is reaped, it’s whole process tree is reaped.

This would get rid of a lot of race conditions with process creation and destruction.

Process IDs

Process IDs should be capabilities, just like everything else.

Waitable Capabilities

Every capability should be waitable in poll() or whatever the new OS uses.

This means that the OS should make it possible to wait on processes (Windows got this right), asynchronous I/O, signals, and basically all other resources.

Also, waiting should be O(1) and level-triggered because then data is not lost.

Signals

Signals should be waitable, and there should be at least one kind that can actually interrupt execution (for CPU-heavy workloads).

However, the interrupt should only be delivered at a point where the interrupt is safe, i.e., no mutexes are held or anything like that.

Trees

A theme for this OS design is that capability trees are great, and we can restrict capabilities by only giving access to subtrees.

And the first kind of tree: directories.

Processes should not have access to all directories and files. Instead, a process should be given access to a set of directories under certain labels.

Windows actually got this sort of right by putting directories under drive names, not some nebulous root directory that has no meaning anymore because of multiple mounts.

As an example, the ssh program should only be given access to the ssh config directory and the ssh config directories for every user.

A word processor should have no access to files at all, but when the user tries to open one, the OS should provide a file picker, then give access to just the file that the user opens.

Another example of a capability tree is users: there could be multiple “root” users, and they could have sub users.

Say gavin is a root user; I could also have gavin.firefox just to run Firefox and gavin.dev for my development environment.

Isolation

Having multiple root users means that there is not one root user that can do everything, so it’s safer, but those root users are still important, and the only thing that should run under those users are the init for each of them.

Because yeah, why not have multiple inits too? And multiple virtual machines?

If this is sounding like Qubes OS, that’s intentional.

Microkernel

Oh, and the OS should be a microkernel; an OS should have as little code as possible because it’s extremely bad to have bugs in the OS.

Conclusion

Okay, I get it; this post meandered and didn’t really have a form, but it’s late at night, and I’m running out of motivation. Like I said, I may leave programming behind.

So this is the best I can do.

But I did cover the important things.

Finally, if you have questions about pits of success, just remember: question the interfaces. Before you commit to an API, think about use cases and make sure all of those use cases are easy.

That’s it!

I’m sorry I couldn’t write more. I have some more notes, but I haven’t spent tons of time on this.

The best way to get back at me would be to do better and actually implement your better design. So please do that.