Assumed Audience: Hackers, programmers, anyone in a software-related job. Discuss on Hacker News.
Epistemic Status: Extremely confident, even a little smug. Hey, I’m only human.
I am angry. My anger took over, and I am writing a blog post, even though I tried to mellow out.
So yeah, expect some verbal fire and brimstone; Sodom and Gomorrha are lonely.
The Introduction
Earlier today, I was on Hacker News.
I saw a post about undefined behavior (UB) in C.
I saw someone complaining a technique I use and decided to comment.
The resulting conversation was terribly depressing and infuriating.
The Technique
So…what is this technique that I must vigorously defend? It has to be life-changing, right? Surely, it must be the secret to life, the universe, and everything, yes? After all, only something like that would deserve the negative energy and subpar post I’m now writing, no?
Um, well, no…
I use unsigned integers, not signed integers.
“Uh, Gavin, what are unsigned and signed integers?”
Signed integers can go negative, i.e., they can have a negative “sign.” Unsigned integers cannot go negative.
That’s good enough for this discussion. Now get off my lawn!
I even go so far as to implement my own arithmetic to simulate signed two’s-complement arithmetic.
The Destroyer
“That’s it? So what?”
Ah, if only. No, there are three more facts you must know:
- Just about every operation on signed integers can result in UB.
- Conversely, almost every operation on unsigned integers cannot result in UB.
- UB can cause anything to happen.
In other words, if you use signed integers, then every +
, -
, *
, /
, >>
,
and <<
you write can blow up your program.
Oh, never mind, it can blow up your computer.
The post we were discussing was about the “Sledgehammer Principle,” which says that UB takes a sledgehammer to the part of the program that executed UB.
I believe that the author, JeanHeyd Meneide, is good at what he does. And he is on the committee that standardizes C.
But Mr. Meneide is terribly, awfully wrong about this. UB is not a sledgehammer; it is a nuclear holocaust!
After UB, anything can happen; the program and the computer may be dead.
Or we all might be dead because
- the AI controlling the nuclear arsenal executes some UB,
- which leads to light off of clouds looking like a hostile missile launch,
- which leads the AI to launch a “counter” attack,
- which leads to a real missile launch.
What? Too unrealistic? Nah, it almost happened.
So why? Why is undefined behavior in C like this? After all, Java, Go, Rust, and others have undefined behavior too; what makes C (and C++) so special?
Well, there’s one more fact that you need to know: compiler authors for C are fans of 00UB.
00UB, as I define it, is the idea that UB is licensed to kill.
“Kill what, Gavin?”
Anything. Programs, computers, people, all living life, the nearest black hole. Whatever it can reach out and touch.
The Villians
Who are the villians of this James Bond wannabe reconned story?
Well, this spy better look in the mirror and suicide deep six MI6 because he and his handlers are the villians.
Yes, that’s right: compiler authors are the villains.
Compiler authors, who I shall call SPECTRE
(very appropriate),
like 00UB because they can use it as an excuse to destroy your code so that they
look impressive on benchmarks.
Oh, your program had UB? I’ll just delete that crucial check for
NULL
! No biggie.
Oh, your code had an infinite loop? Well I looped the loop over your hard drive to wipe it. Your machine should run faster now, in more ways than one, so I did you a favor!
Because all they care about is all the accolades from all the people who love all the raw speed. All of it.
In fact, every last bit of it; they want to go so fast that they’ll make their compilers exorcize massive sections of code.
“Oh, just be a better programmer and avoid UB!” they’ll say.
So they assume that programmers will actually be superhuman and avoid all of the teeming traps that would impede Indiana Jones?
Nah, they’re either ignorant or malicious. And to claim ignorance of the foibles of fleshy fools, as all mortals are, is more brazen than the prows of Roman quinqueremes quickly quashing Carthage combatants.
Malicious, they must be.
“That’s too harsh, Gavin!”
Is it? They have indirectly given us at least half of all cybersecurity vulnerabilities. All to look good on benchmarks.
If fact, John Regehr said it better:
It is basically evil to make certain program actions wrong, but to not give developers any way to tell whether or not their code performs these actions and, if so, where.
– John Regehr (emphasis added)
But surely, there are people who could stop them, right?
Yes, they exist. They are the committee for standard C.
But perhaps as a foreboding, the committee has a wonderfully good name for a sinister spy syndicate: WG14.
And so it is; WG14 is made up of the very people they must stop.
Besides two people, the committee, as far as I know, tacitly endorses the view that 00UB is official policy.
And though JeanHeyd Meneide claims to be one of the good guys, he is one of the bad guys because he’s on the committee!
The Useful Idiots
JeanHeyd Meneide claims that users blinked first. That’s false.
The truth is that there were some useful idiots that accepted the narrative
created by SPECTRE
; they decided that, yes, performance was everything.
And this is where my conversation on Hacker News appears: I was talking with a useful idiot.
Yes, this person seemed to actually believe that performance was more important than anything else.
He complained about using unsigned integers to avoid UB because it would hurt optimization! And that was after acknowledging that the possible bugs from using unsigned integers would be less bad than UB.
(⊙.⊙)
The Punishment
In the spirit of Ben Franklin, those who would give up essential correctness, to purchase a little temporary performance, deserve neither.
I hereby order such punishment to be carried out.
And so it is!
Wait, what?
Yep, the punishment has already been carried out.
Of course, SPECTRE
may convince themselves that nothing has happened, but I
bet every single one of them has had their data breached from some company
somewhere. And I bet it has happened multiple times, enough that at least one
of the breaches was caused by UB.
But it gets worse; that’s only the visible price.
The invisible cost is something SPECTRE
themselves might hate, but they
might not realize that they are the cause.
You see, I love C. I also despise it, and others do too.
Why? Because it’s “unsafe,” which is code for “bugs will probably cause structure smashing somewhere.”
This makes people nervous.
It makes me nervous too, but I’ve developed tools to get around it. Everyone else isn’t so lucky.
What they do instead is create better platforms, ones that keep you safe.
Boom! Electron.
Which is famous as one of the slowest, bloatiest, disastrous artifacts of software in existence.
Nevertheless, programmers choose to write their apps using Electron.
And the world mourned.
But it is still better! Because it’s safer.
If WG14 had done their job and SPECTRE
hadn’t raised a ruckus for their
rationalization of viral vulnerabilities, and instead, had fixed up the problems
with C, perhaps C would still be the best language to write apps in.
Think of it: it could be safe and fast! But SPECTRE
and WG14 decided that
they wanted faster.
So I hereby sentence SPECTRE
and WG14 to life imprisonment in an Operating
System written in Electron and order this sentence to be carried out.
Oh, wait, it already has been as well; the operating system is called Google Chrome.
Yes, Electron uses Chrome, not the other way around.
Don’t nitpick me for a joke.
And just like the last punishment, all of us are suffering it.
The Solution
You think I’m being harsh again?
WG14 could have fixed this, long ago. By removing dumb UB from the language and defining it.
“No, they couldn’t; it’s too ingrained!”
I’m sure there are tons of developers that would hunt them down and glare hard at them for daring to reduce bugs!
“But they couldn’t fix pointers, not without breaking the ABI!”
Bull roar.
There are malloc()
implementations that have fast ways of returning the true
size of allocations. Combine that with a language-level construct, and you have
a way to get the length of an array from just the pointer.
So perhaps add something analogous to sizeof()
, except it takes a pointer
value and returns the length. Call it lengthof()
. Or lenof()
, whatever.
So WG14 could add a function to the malloc()
set that returns the size, and
lengthof()
could take that value and divide it by the size of the type that
the pointer points to; easy way to get the length. No change to the ABI!
“But what about pointers to the stack, Gavin?”
Do you seriously think the language wouldn’t have some way of taking a pointer, figuring out where in the stack it is, what function it is for, and what local variable it is for?
“But that would require using the frame pointer!”
So what? That’s a small price to pay for that feature.
Which, by the way, would enable a bunch of other stuff too, like stacktraces and good swag like that.
“But what about losing optimizations? Does that mean we can’t optimize based on the absence of UB anymore?”
Yeah, that’s the point.
But if you really must have some optimizations, only assume when the UB is opt-in, not opt-out.
Here’s what I mean: the restrict
keyword is something the programmer has to
explicitly put in the code, and it tells the compiler that the programmer is
taking responsibility for its use.
Thus, restrict
is opt-in, and the compiler can go to town.
Signed arithmetic overflow? That’s just normal code; don’t assume.
Is the programmer sharing data across threads? That’s explicit, and thus, it is opt-in, and the compiler can assume no data races.
Non-terminating loop? At least warn, but for the love of all beautiful things, don’t just remove them!
At this point, I’m using unsigned types to opt-out of optimizations; that is backwards.
And should be common; it shouldn’t be controversial.
Point being, there are solutions. What gives, WG?
The Conclusion
Sigh. I’m sorry for this rant. I’m sick and tired and sick and tired of the mess we have in software.
But this was a lot of fun to write. And cathartic.
Anyway, 00UB is bad, and the people who push it are bad, and we can still fix this, and we should.
It’s past midnight; I’m going to bed.