Please see the disclaimer.

Assumed Audience: Programmers. All programmers.

Discuss on Hacker News and Reddit.

Epistemic Status: Confident with few doubts. Minor mistakes may exist, though. Corrections welcome.

If you want to skip to the actual material, use this link.

This post is meant to be informative, but it does have an ad at the end, which will be clearly marked. You have been warned!


I used to think that my Bachelor’s Degree in Computer Science was useless, that I didn’t learn much that would help me in programming.

I was wrong.

There was at least one class that was well worth it: Theory of Computation.

I also consider a few others worth it:

  • The class about linear algebra.
  • The class about discrete math.
  • The class about different kinds of algorithms.
  • The classes on computer graphics.
  • The class on cryptography. I wish they had taught us more about elliptic curves.
  • A class I audited about signals processing. I wish I had had the time to take this class for real.

I have used material from all of these classes. I encourage programmers to study these if they haven’t.

When I showed up to the Theory of Computation class, I was solidly unhappy. I was certain that this would be a worthless class; I held the opinion that theory was practically meaningless.

And then I show up to class and found out that it was really an applied math class; there was little theory and a lot of proofs.

I struggled. I hated it.

But I came out of that class with two things: a problem that annoyed me and an understanding of what computers cannot do.

The problem that annoyed me is about the soft limits of what computers can do; it turns out that there are some algorithms that can be quick most of the time, but can also be impossibly slow in certain cases.

And as far as we know, those algorithms are optimal.

It makes sense that there are some things that are so expensive to compute that they are not worth it.

I had always assumed there were, but it annoyed me that we didn’t even know if that limit existed.

So computers may have soft limits.

Surely, they won’t have hard limits?

Surprisingly, we know the answer to this question: they do!

In other words, there are some problems computers cannot solve, even if given infinite time and space.

When I attended that class, this blew me away. I had never considered that there were things computers could never do.

I still had the mainstream belief that AI would be just as capable as humans.

Why I now believe otherwise, and how people can measure AI “intelligence,” is a subject for another post.

Well, time passed, and I graduated. While I knew those hard limits were important, I didn’t think I would run into them.

I was wrong again; I run into them all the time!

Beyond that, I have run into a slew of programmers, on Hacker News and other forums, that seem to not understand these hard limits and what they mean.

An example is this exchange, started by a user asking me to explain why I don’t like Zig.

So because I believe that these hard limits are essential to our programming, here is a free class on the most important programming knowledge that programmers don’t understand.

I may not use completely mainstream terminology in order to make this more accessible. I may also skip a lot of formality and some nuance.

The Entscheidungsproblem

We start by going back to the 1930’s. At King’s College, Cambridge, there is a exceptionally bright student named Alan Turing working on his Master’s Course and writing a paper.

Unknown to everyone at the time, this brilliant man would someday save the British Empire, and maybe Western Civilization.

Also unknown to everyone, this paper would be the most important scientific result of the 20th century!

And that is despite the fact that Turing was beat to the punch!

The man who beat him was Alonzo Church, and their results were proven equivalent.

Soon after finishing his paper, but before it was published, Turing went to Princeton to study under his rival.

What a marvelous partnership!

They used very different models. Church used something he called the “lamda calculus,” and Turing used hypothetical machines that came to be called “Turing Machines.”

But what problem were they tackling?

It’s called the Entscheidungsproblem, a name that, to this day, I can’t remember how to spell or say.

It means “decision problem” according to Wikipedia, and that agrees with what I know.

As far as I know, this is the same “decision” in the “decision” problems that make up the P and NP complexity classes.

This is also why “computability” is called “decidability”. I just use “computable” and “computability” for accessibility; all of the literature uses “decidable” and “decidability.”

This “decision problem” was defined as giving a (mathematical) statement to an algorithm and to have that algorithm return a decision about whether that statement was true or not.

This was a big deal at the time; mathematicians would have loved to prove things automatically.

And then along came Alonzo Church and Alan Turing and proved that solving the Entscheidungsproblem is impossible.

Well, there goes that dream.

But if Alonzo Church proved it first, why is Alan Turing the central character of this story?

Because there is one aspect of Turing’s model that makes it more useful: Universal Turing Machines (UTM’s).

Universal Turing Machines

The idea of UTM’s is that you can have Turing Machines running other Turing Machines.

This idea is so basic to computer science and the world in general that this is the biggest reason I say that Turing’s result, while later than Church’s, is still more important!

How important is it?

Without this concept, would we understand the idea of interpreters?

You run interpreters every day, even if you don’t know it. Python is an interpreter. Your shell is an interpreter. Your text editor is running interpreters. Your web browser runs an interpreter.

It goes even further!

It means that Turing Machines can take other Turing Machines as input and do stuff to them. So even compilers depend on this concept!

And it goes even deeper!

Say you are running a Python program that is implementing a DSL. How many layers of compilers and interpreters are you running?

  1. Your Python program is an interpreter of the DSL.
  2. Python itself is an interpreter of your Python program.
  3. Python is implemented in C, which is implemented by a C compiler.
  4. The Python C program has been translated to machine code, which is implemented by microcode in your hardware which acts as a compiler to translate the machine code into something even closer to the hardware.
  5. The microcode interpreter is implemented by actual hardware, which acts as an interpreter of the translated machine code.

Yep. It’s that central; it goes all the way down to hardware.

Imagine how many levels deep you can get running your Python program in the web browser on something like!

The Halting Problem

So what did Alan Turing prove, exactly?

I’m going to let Mark Jago and Computerphile explain that with a wonderful animation.


Seriously, you should watch it. But if you don’t…

Turing constructed an equivalent problem to the Entscheidungsproblem, the Halting Problem. This problem is about whether or not any machine could figure out if any other machine will halt or run forever.

Then he constructed a hypothetical machine to solve that problem.

Then he constructed another hypothetical machine that ran the first and basically reversed the answer of the first.

Then he “ran” the last machine using itself as input.

Basically, this machine would run forever when it halted, and it would halt when it ran forever.

Boom! contradiction. The assumption must be false.

What is the assumption? The existence of the first hypothetical machine that can solve the Halting Problem.

And since no machine can solve the Halting Problem, no machine can solve the Entscheidungsproblem.

By the way, Turing’s trick to use a paradox to eliminate the possibility that any machine could solve the Halting Problem is exactly why I think that proving the existence of one-way functions might be easier than solving P v NP. I think such a contradiction might exist.


That may have been all Turing proved, but people absolutely built on the shoulders of that giant and proved stronger results.

This was “easy” because the other major thing about UTM’s is how well they generalize to proving that just about any property of an algorithm is not computable, in the general case.

You want to know that a function is never called? Nope.

You want to know that a line of code is never executed? Sorry.

You want to ensure a red function is never passed to a blue function? Futile.

You want to check if a program is a virus? Fruitless.

Yep, virus scanners are almost completely useless. They can only detect known viruses, and only if those viruses are not changed.

You want to ensure a certain kind of bug never exists in your program? Impossible.

This is crucial to emphasize: The true importance of Turing-completeness is that there is at least one property that cannot be proven!

You may be able to prove some properties of a program, but if you cannot prove them all, that is the fault of Turing-completeness. And sometimes, you may be able to prove a property you care about, but if you cannot prove it every time, that is the fault of Turing-completeness.

“But Gavin, we have safe languages! They ensure that memory bugs never exist, right?”

No, they just crash if one exists. They are using runtime checks to stop a memory bug from happening. The bug still existed.

More on the power of runtime checks later.

This generalization is pervasive and pernicious. Without assumptions (that are often wrong), proving the correctness of programs is hopeless.

You can prove certain properties if you make assumptions, like the seL4 project did, but if any one of those assumptions is false, your entire proof is garbage.

Over time, people began to realize the importance of what Church and Turing had proved and just how pervasive the concept would be.

So they gave it a name in honor of the man that fought and won a war with just numbers and his piercing intellect.

They called it “Turing-complete.”


To be Turing-complete, a thing needs to be able to accomplish what any other Turing-complete thing can do.

Yes, that definition is recursive, but we can adjust it.

Universal Turing Machines are Turing-complete. Church’s lamda calculus is Turing-complete.

So we can change the definition to this:


An adjective that applies to a thing that can do anything a Universal Turing Machine can do.


But that brings up a question: what exactly can a UTM do?

Let’s first describe what a UTM is.

A UTM has four things: a set of states and a way to transition between them (the program), an unbounded tape with distinct slots (memory), something to read and write symbols in a slot on the tape (reading and writing memory), and something to move the tape.

Okay, cool. What it is also tells us what it can do: it can read and write stuff on a tape and move that tape.

But there’s a missing detail: how can it move the tape?

Most people would assume that it can move the tape both backwards and forwards, and they are correct.

It turns out that moving the tape both directions is important! Without it, a UTM would not be Turing-complete!

Understanding this was a mind-blowing moment for me in class.

And it goes even further: the machine must be able to move the tape backwards and forwards without any restriction. If it can only move the tape 5 spots backwards before it must move the tape at least 5 spots forward, it is no longer Turing-complete.

Conversely, this means that Turing-completeness only needs four simple things:

  • Memory;
  • An ability to read anywhere in memory;
  • An ability to write anywhere in memory;
  • A program that can leverage those abilities.

“That’s nice, Gavin, but what does that mean in practice?”

In practice, assuming that reading and writing memory is possible, then if your compiler or interpreter has, or can implement, only two keywords, then it is Turing-complete.

What are these two keywords?

In programming languages, they are usually if and while.

Yep. That’s it.

if gives you the ability to move forwards any amount.

while gives you the ability to move backwards any amount, and unlike restricted forms of for, it doesn’t prevent you from moving backwards after a certain amount of moves.

What I mean by “restricted forms of for” is for loops that only iterate over a fixed list or iterator, such as a foreach.

C’s for loop, because you can change the loop index, and/or use a condition that has nothing to do with the loop index, is unrestricted, and therefore, is equivalent to a while loop with extra steps.

For me, this was astonishing. My teacher kept talking about how powerful Turing-completeness was; I expected that it would be difficult to implement it.

Nope. It just takes if and while. It’s so easy that people accidentally make things Turing-complete all the time.

I have an acquaintance. He made a calculator language for himself and tried to restrict it so that it wasn’t Turing-complete.

I asked him if it had if statements and while loops. He said yes, and I told him that it was Turing-complete.

His reply was classic. “horsefeathers. i accidentally a language”


Of course, it may seem that Turing-completeness should be shunned if it makes things impossible to prove.

Unfortunately, there are algorithms we want to run for which Turing-completeness is required. Even worse, it turns out that applies to most interesting algorithms.

This is the Faustian Bargain of Turing-completeness: you need it, even if you don’t want it.

Don’t believe me?

Well, the astute among you noticed that I have mentioned “power” twice and have talked about what machines can do.

Surely there is a power spectrum!

Yes, there is. And Turing-completeness is on that spectrum.

And at each level are two things: a type of (hypothetical) machine, and a type of language.

Obviously, Turing machines and Turing-complete languages are in the Turing-complete level.

If Turing-completeness is on that power spectrum, then surely there are things that are less powerful and more powerful, right?

Yes and no. Turing-completeness is as powerful as we can implement.

Sure, there are general recursive functions, as opposed to primitive recursive functions, but it turns out that they correspond directly to Turing-completeness and the next power level down.

So what is at that next power level down, the one with primitive recursive functions?

The machines that implement that power level are called pushdown automata, and the languages are context-free languages.

If the machine is a deterministic pushdown automaton, it only needs a deterministic context-free language, but it turns out that the non-deterministic versions have more power than the deterministic ones.

Essentially, pushdown automata have a stack, and they can use the stack.

Crucially, though, they can only use the top of the stack.

The next level down has finite-state machines and regular languages.

These have a fixed set of states, and their memory consists solely of the states, their relationships, and the current state.

It turns out that there are deterministic and non-deterministic versions of finite-state machines as well, but they are actually equivalent.

And then, at the bottom, you have combinational logic. It is mostly uninteresting.

Consequences of Power

And why is combinational logic uninteresting? Because the same input always results in the same output.

There’s not much power in that, nor many consequences; you can design your combinational logic to always return the desired results.


Finite-state machines can do more stuff. They are useful for things like vending machines, turnstiles, traffic lights, elevators, and anything that is only in one fixed state at a time.

If a programmer is smart, he will turn his code into a finite-state machine whenever he can because finite-state machines can be proven correct.

Pushdown automata can compute, well, context-free languages.

Context-free languages are useful in things like file formats; if a format is easily computed/processed, then it will likely have less bugs.

Pushdown automata can be tricky to prove correct, but as far as I know, it can be done.

And then there’s Turing machines.


As already mentioned, Turing-completeness is pervasive, so pervasive that it happens by accident.

This is no accident!

I said above that most interesting algorithms need Turing-completeness. Here are some examples:

  • Languages that are not context-free. Yep! Some languages need full power to parse!
  • Schedules. Any kind of scheduling, really.
  • Google Maps and how it finds the way to your destination.
  • Facebook and how it maps relationships between people.
  • Recommendation algorithms.
  • Compression of data. If you’ve ever watched video or listened to music, you have used this.
  • Newton’s method, useful for computing square roots, any kind of root, and lots of other nice things.
  • Gradient descents. You like AI? At its heart is gradient descent.
  • Internet search. Google, Bing, etc.
  • Image rendering. This happens in every movie with CGI.
  • Physics simulation. This also happens in every movie with CGI.
  • Any algorithm to make sure computers agree.
  • Laying out text on a sheet of paper, like in your word processor.
  • Processing changes to a spreadsheet.
  • Video games. Every video game literally has a while loop at the top. This loop renders every frame, simulates all physics per frame, emits sound, takes your input, everything.
  • Operating systems. Every operating system has a while loop at the top. This loop continuously dispatches processes, gives out resources, and takes them back, until the user shuts down.

You get the idea; Turing-completeness is necessary for anything interesting enough to not have fixed processing sizes.

And as we will see, fixed processing sizes can sometimes be effectively Turing-complete.

Static vs. Dynamic

Note that I said fixed processing sizes; if you have a fixed size, you’re often okay.

That is an example of something that is static.

For example, say you limit files for your newest app to 1 MB. Is that enough to work around Turing-completeness?

It might be. As long as you do a fixed amount of processing per unit of data, and that fixed amount multiplied by the amount of data does not take too long.

But here’s the kicker: anything that varies at runtime cannot be statically determined, in general. That’s why you need to have a fixed amount of processing. If one type of input causes your runtime to explode, you just ran into a Turing bomb.

And in general, things can can be completely statically determined are not as powerful as Turing-complete things.

Say you statically determined that your physics simulator will have a fixed processing time for every frame. Say you statically determined that it will only ever run for a fixed set of frames.

Would your physics simulator be as powerful as one that could run however long the user wanted? Of course not.

This is why one of the most reliable pieces of software in the world, SQLite, does not use static analysis.

Yup! In this era when Rust and its static analysis is all the rage, SQLite eschews those techniques. In fact, they say,

More bugs have been introduced into SQLite while trying to get it to compile without warnings than have been found by static analysis.

In other words, static analysis has been a net negative for SQLite.

Why is this?

I think one reason is that SQLite runs on different platforms, and a lot of them.

As said in that link, “formal verification requires a formal specification,” and building a formal specification for SQLite would be far too much work simply because all of the platforms are different.

Comptime vs. Runtime

That said, SQLite, and most reliable software I know about, love fuzzing.

Why do they love it? Because fuzzing is to runtime what static analysis is to compile time.

See, if you have something that’s Turing-complete, the only way to tell what it will do on certain input is to run it on that input!

That’s essentially what fuzzing is: finding as many “interesting” inputs as possible and seeing how the software acts with those inputs.

And even then, it might never halt; the Halting Problem bites again. This is why fuzzers use timeouts.

In fact, there is a company pushing what they call “invariant development as a service.”

Their technique? Fuzzing. And they explicitly chose it over formal verification, which is über static analysis.

And they used fuzzers they wrote to easily find bugs in programs that were found first by formal verification.

Someone even offered more context:

This article is target at proving programs that run on blockchain based virtual machines. These are deterministic to their core. This is quite different the envirnoments that than most programs run in. For example every time code on the EVM runs in production, it is being runn [sic] in parallel on hundreds to hundreds of thousands of systems, and the outputs/storage writes/events all have to agree exactly after execution, or bad things things happen. The EVM is also single threaded, reasonably well defined, and with multiple different VM implementations in many different languages. So programs here are much easier to verify.

In addition, program execution costs are paid per opcode executed, so program sizes range from hundreds of lines to about thirty thousand lines (with the high side being considered excessive and insane). It’s again quite different than the average execution environment.

danielvf on Hacker News

In short, this is the perfect environment for formal verification, with small programs and fully deterministic behavior, and a hard requirement to get things right the first time, and fuzzers still beat formal verification.

Or would have, if they had been used.

“But isn’t the number of possible inputs for a program so large that fuzzing would be mostly useless?”

Ah, good question.

The answer is yes and no.

Yes, fuzzing can be useless in the naive case.

But in the hands of smart and experienced people like the SQLite authors, fuzzing can be guided to yield more useful inputs.

And in the experience of at least one person, specifications needed for formal verification were far more complex than the code itself.

This holds with seL4, a formally proven operating system, which had about 10k LoC of C and around 200k LoC of proof.

Another way to make fuzzing better is asserts, which are notoriously useful before code is tested, especially while fuzzing.

“Notorious, Gavin?”

Absolutely. You will hate them because they will cause so many crashes. Or rather, a fuzzer will deliberately try to trip them and succeed.

But this is a good thing! It means:

  • Bugs are found closer to the actual source.
  • Bugs are more easily found because fuzzers detect crashes more easily than bugs that don’t cause crashes.
  • If you build software with assertions, then even if there are bugs, a lot of those bugs do not turn into bigger problems.
  • And finally, asserts constrain behavior in a way that culls the state space exponentially.

“Infinite” State

I want to talk about that last point.

Have you wondered how much your programs can do? Or rather, how many states it can be in while doing its work?

If you haven’t, it’s time to level up, so here’s a speedrun masterclass.

Every bit your program has in memory contributes to its state. Some contribute more than others. (For example, a bit that controls a branch contributes more than a plain old bit used for data.)

Every bit, or set of bits, that contributes a whole independent bit of state, meaning that it can control something, we’ll call control bits.

Your program has 2^N states, where N is the number of control bits it has.

Yes, I’m handwaving cases where programs have more or less bits depending on input. Yes, sometimes bits may not be independent of each other.

Work with me here.

How big does N have to be before you can never hit every possible state?

First, let’s define never. You may think never means “infinite time and space.” Let’s shrink that ever so slightly to “the time and space in this universe.”

In other words, we can assume that we can use all of the energy in the universe, and all of the time, and all of the space. But we do not have an infinite supply.

Okay, with that out of the way, how far could we go with our universe? How big could N be before it’s impossible to hit every state?

You ready for this? The answer is a mere 300.

That’s it. If you have 300 control bits in your program, you cannot reach every possible state.

Don’t believe me? Take it from a physicist:

The universe could currently register ≈ 10^90 bits. To register this amount of information requires every degree of freedom of every particle in the universe.

Dr. Seth Lloyd

In other words, you’d have to use the whole universe to be able to reach 10^90 states. And converting from 10^90 to a power of 2 yields about 2^299, so 300 bits.

In other words, you don’t need a program that deals with an infinite amount of data; you just need enough program to have 300 control bits.

And if we were to be more realistic and restrict ourselves to the resources we have on Earth, it’s even less.

This is actually the entire deal behind cryptography: good cryptography is designed such that every single bit of a key is a control bit. But 300 bits is “Too Much Crypto,” so cryptographers “settled” on 256 because we only have Earth; we don’t need to worry about the entire universe!

You remember how I said I use asserts? I use them because they help me remove unnecessary control bits, which I define as the bits that create accidental complexity.

Mathematical vs. Practical

Now, the mathematicians in the audience might be like:


Here is why I am wrong in their eyes: because the definition of a Turing machine is one that is infinite. So as soon as I talk about finite stuff, it is not a Turing machine, and it is not Turing-complete.

They are technically correct. And if I were trying to prove a solution to the most annoying match problem, I would use the mathematical definition of a Turing machine.

But when it comes to real programs running on real computers, I don’t.

There is a difference between mathematical Turing-completeness and practical Turing-completeness, and for physical humans and machines, we should care about practical Turing-completeness.

Why? Because the only thing we should care about actual software is how good it is, and it is only as good as the lack of bugs and other defects.

Sure, our machines may not be mathematically Turing-complete, but if we still cannot prove properties about those programs, they may as well be Turing-complete.

The rule of thumb that I use: if you cannot prove even one substantial property of the program, it is Turing-complete.

As an example, take Starlark and Meson.

One is a non-Turing-complete language for build systems (used by Bazel and Buck2), and the other is a build system with a bespoke non-Turing-complete language.

Ostensibly, both are not Turing-complete, and that is sort of true.

But they both made one crucial mistake in their language design: loops can break out early with the break keyword.

Sure, they don’t have a while loop, and their for loops are restricted to ranges with upper limits.

But with break keywords, I can simulate a while loop.

I wrote a program in Starlark that could either halt or not on a practical Turing machine:

chrs = [ "0", "1", "2", "3", "4", "5", "6", "7", "8",
	     "9", "a", "b", "c", "d", "e", "f" ]

def hxn(n):
	n7 = chrs[n % 16]
	n6 = chrs[n // 16 % 16]
	n5 = chrs[n // 256 % 16]
	n4 = chrs[n // 4096 % 16]
	n3 = chrs[n // 65536 % 16]
	n2 = chrs[n // 1048576 % 16]
	n1 = chrs[n // 16777216 % 16]
	n0 = chrs[n // 268435456]
	return str(n0) + str(n1) + str(n2) + str(n3) + \
		   str(n4) + str(n5) + str(n6) + str(n7)

def test(stop):
	x = 4294967295
	do_break = False
	for i in range(0, x):
		for j in range(0, x):
			for k in range(0, x):
				for l in range(0, x):
					for m in range(0, x):
						for n in range(0, x):
							for o in range(0, x):
								for p in range(0, x):
									num_str = hxn(i) + hxn(j) + \
									          hxn(k) + hxn(l) + \
									          hxn(m) + hxn(n) + \
									          hxn(o) + hxn(p)
									if num_str == stop:
										do_break = True
								if do_break: break
							if do_break: break
						if do_break: break
					if do_break: break
				if do_break: break
			if do_break: break
		if do_break: break
	return x

Yes, it’s ugly. But it does run; you just need to add a call to test() at the bottom.

Now, why is it ugly?

Because the Starlark designers did do one clever thing: they limited integers to 2^32-1. This means that in order to get enough iterations to span the life of the universe, I needed 8 nested loops at that limit.

This gives me 2^256 iterations, plenty of time to wait for death.

And depending on the value of the string I pass in, I could make it go as many iterations as I want. I could even make it impossible to stop by passing in the string "GO".

Or even "STOP", just to be cheeky.

In addition, while Starlark can prove that the code above doesn’t iterate a fixed, although huge, number of times, it cannot prove the opposite.

In other words, it can prove the maximum number of iterations, but it cannot prove the minimum.

That looks like an unprovable property, which is exactly what you would have in a Turing-complete language.

I could go further and implement math on 256-bit integers disguised as strings, but that is just rubbing salt in the wounds of a dead horse.

Anyway, what I have done in that twisted code is implement something that can act like a while loop on a practical Turing machine, and I could extend that loop to any size by just adding more nested loops and bigger strings.

So while the author of Meson doth protest, as well as a user of Starlark, and both are technically correct, in this case, if users are nesting for loops to simulate while loops to make a ray tracer, you have lost.

In essence, your language is still Turing-complete in practice.

At that point, just give your users a while loop for usability’s sake.

Program Size

But if our physical machines don’t have to be infinite to be effectively Turing-complete, how big a program has to be before it is large enough to also be effectively Turing-complete, i.e., have at least 2^300 states?

That I don’t know, because it depends on the program, but I think I have a good starting point.

I wrote an implementation of POSIX bc and Unix dc. It’s more sophisticated than other implementations, and most importantly, it has zero dependencies.

At least one person thinks bc is too small to compare against a server program, but I believe my bc can be compared to servers in one thing: like servers, it has more than 2^300 states.

I think.

And how big is bc? 14.3k LoC. That’s not including comments or blank lines. Or headers.

bc is tiny compared to the software that companies produce, and it doesn’t even touch the outside world except through stdin, stdout, stderr, and reading files on disk! Yep, not even writing files!

Mine will also touch the kernel random number generator.

Nevertheless, it is still big enough to have states that will never be reached.

Another example: Zig at comptime. They use a branch limiter to make sure comptime halts.

This begs the question: how many branches are okay before the program could have unreachable states?

Well, it could be effectively infinite if there is only one branch and it’s for a limited foreach loop. That’s only one control bit.

But it could be as low as 300 if all branches depend on different control bits!

Another example: last I heard, seL4 was around 10k LoC. They managed to prove it correct at that size, but it was an enormous effort. I bet it was, and is, on the very edge of what humans can prove.

And I bet it was because they culled control bits. You should too.

Sources of State

So where can control bits come from?

Unfortunately, they can come from anywhere!

Memory? Yep.

Files on disk? Sure.

Random number generator? Most definitely.

Stuff from the network? Oh, yeah.

Sharp rocks at the bottom? Most likely.

Programmers are bad at managing state, so the best programming systems try to eliminate sources of state, or at least reduce the amount of state those sources can contribute.

This is what Rust does and why people like it so much.

I wish our operating systems were designed this way; Rust and friends can only do so much.

But that’s a post for another day.

And the worst ones? They add it. Deliberately.

Good C and C++ programmers have the scars to prove it.

Why? Because the king of state explosion is Undefined Behavior (UB), and C and C++ have a lot of it.

To boil UB down, whenever system designers say, “…the behavior is undefined,” they are really saying, “The sources of state for the behavior in this situation are as close infinite as possible.”

Yep, that’s right: UB makes state explode.

That is because the true behavior of stuff subject to UB depends on the state of some esoteric stuff that we don’t usually consider:

And maybe the Internet!

Yep! In the presence of UB, you have to worry about everything else that the computer has been running!

This is why I hate UB and why I hate compiler developers who exploit UB.

“But Gavin, aren’t those compiler developers really reducing the state from UB?”

Sometimes, but they are doing at the cost of always taking away from the programmer the tools they need to manage that state.

Sure, your program may have less states, but those states that are left are all evil, so while you have less state space, more of it is not just bad, but actively terrible!


So there you have it. That is why Turing-completeness is important to understand, what it is, and its consequences for our finite machines, which are technically not Turing-complete.

I hope this was useful.

And remember: the magic measure is 300 control bits; once you go beyond that, you best consider your program effectively infinite.

Below is the ad mentioned at the top!


I have spent more than 10 years learning about Turing-completeness, including being embarrassed live in front of an audience at a conference.

The guy was not trying to embarrass me; my own ignorance did.

With all of that knowledge, you might assume that I would try to avoid Turing-completeness wherever possible.

And the answer is no, I don’t.

I even built a build system called Rig that has a Turing-complete language backing it.

There is a reason it’s Turing-complete; well, 300 reasons: control bits.

That is such a small number that, even though builds should not be big programs, I have to assume that builds can have that many control bits.

Just take it from users of build systems:

[A]ny sufficiently complex build system needs a Turing complete language for configuration.

obsidian_golem on Reddit

Meson is quite good. It’s biggest issue is it is not Turing complete so it has certain limits of extensibility.

marcthe12 on Hacker News

Rig is meant to not have limits of extensibility, and it is meant to have a better user experience, so it will be Turing-complete and provide a while loop, among other things.

Though Rig may not be able to parse while loops at release; it didn’t need them to bootstrap itself.

But that is not all; despite embracing Turing-completeness, I know there must be a way to avoid it in the default case.

So there will be: Rig will be able to selectively turn off keywords, and by default, it will. So it will not be Turing-complete by default.

Though not at the initial release.

In addition, because Rig will be intended to build code from outside sources, which may not be trustworthy, it will be able to check permissions at runtime using capabilities.

Why runtime? The same reason fuzzing is better than formal verification: because running something is the only way to tell if it goes haywire.

This will go so far as to check which commands to run; if you are building a third-party C project, you will be able to restrict it to only call the C compiler of your choice. Or only connect to certain IP address/domains. Or not connect to the Internet at all.

You will even be able to restrict it from touching disk, or only touching disk in certain directories.

That is the power of runtime: checks can be as coarse or as detailed as you want, and as long as there aren’t bugs in the check code (Turing-completeness strikes again!), you are guaranteed that those bad things cannot happen!

Yes, it will require a lot of work, but the plumbing already exists.

Why do this? Because I plan to have commercial customers, who depend on software supply chains. My hope is that supply chain attacks on my customers can be curtailed, and that I will be paid for doing so.

I don’t even care about being paid handsomely; I just want enough.

And while runtime capabilities will start in Yao and Rig scripts, run by an interpreter, once Yao’s compiler can generate actual machine code, I will add runtime capability checks to compiled Yao code.

In fact, Yao is actually using my own LLVM-like compiler backend, and that backend is where runtime capabilities exist, so any language that targets my backend would get those runtime checks for free.

Yes, these runtime capabilities are not as advanced as CHERI, but it is something that will exist on any platform for any piece of software.

And finally, before we part ways, I would like to publicly answer a question I got:

If turing-completeness is necessary, why should there even exist a build system independent from a programming language now?

choeger on Hacker News

Besides the answer given on Hacker News (mult-language builds) and the capabilities system I just mentioned, I have one good reason: those languages are not designed for builds.

Rig’s language, Yao, is my own creation. I have been designing it for 12 years.

And it has one trick: it is extensible in ways other languages are not.

Yes, Rust and Lisp have macros. Lisp has reader macros.

But Yao has lex modes and user-defined keywords.

Lex modes are like reader macros; in fact, they may be equivalent.

In Yao, they let you do this:

$ clang -o test test.c;

Which will run Clang on test.c.

In short, lex modes let Yao have a first-class shell-like syntax for running commands that can take Yao expressions as arguments. And that is not all they can do.

But user-defined keywords are even better.

They are like macros, except that they are run directly by the compiler at parse time, as though they are part of the compiler.

All of Yao’s keywords are defined as user-defined keywords, and all of Rig’s are too. Since Rig can have its own, it can define its own build DSL.

And that’s exactly what it does. In fact, the first Yao program to ever run is its own build script, which is written in the Rig DSL.

So why should a build system exist independently of the language it builds? So it can have a build language more fit for purpose.

And all of the other reasons too.

If all of that sounds interesting to you, wait one more week; Rig and its language, Yao, will be publicly released on April 2, 2024.

Look for an announcement on Hacker News, Reddit, and this blog!