The Scourge of 00UB

Assumed Audience: Programmers and hackers. Discuss on Hacker News.

Introduction

So I was part of the founding group of Stack Exchange’s Programming Language Design and Implementation site. I joined because I’m designing a language myself.

At first, things were great. And then came one person.

Now, I respected this person. I like several blog posts on this person’s blog, which are mostly on programming languages, their design, and how to make best use of them.

So when I saw this person join, I was happy to hear more from them.

And then…

Well, I saw a question about undefined behavior, and I answered. This sparked a discussion with this person about undefined behavior.

There seemed to be a lot of misunderstandings; I could not get a handle on what this person thought UB meant.

I finally figured it out: this person’s definition of UB was not “the language spec can’t guarantee anything.” Instead, it was “compilers can assume UB does not exist and optimize accordingly.”

Wat.

This was a moment when things clicked for me. But to explain why and what, I need to explain some background.

UB in C and C++

UB in C and C++ is roughly defined as “anything can happen.”

The true quote is “the Standard imposes no requirements.”

Compiler authors believe that this lets them do whatever they want.

Perhaps they are right, but that is not the question I want to address.

The question is: should compiler authors be able to do whatever they want? I argue that they should not.

Compiler Author vs. Language User

Now, I’m a language designer. I’m a compiler author. You would think I would have much of the same opinion as other compiler authors.

But I have something different from typical compiler authors: I am a language user first.

This may seem like no distinction at all, but it is.

The distinction is hinted at Russ Cox’s new blog post “C and C++ Prioritize Performance Over Correctness.” In fact, the hint is in the title: authors for C and C++ compilers care more about performance than about correctness.

This is the perspective of someone who is a compiler author first.

I call this perspective “00UB” because, like the 00 agent James Bond, this perspective claims that UB has a license to kill code.

Meanwhile, most programmers care more about correctness. By “correctness,” I mean that they want their code to run the way they think it should run and that the compiler will be a faithful translator.

This is the perspective of someone who is a language user first.

Don’t believe me? Well, Russ Cox lists some egregarious “optimizations” that Clang does, and this is the response to one of them.

Programmers do not expect that compilers will remove overflow checks, but they will.

Programmers do not expect that compilers will remove infinite loops that do not have side effects, but they will.

Note that those surprised programmers are actually Rust compiler authors.

Programmers do not expect that compilers will remove NULL pointer checks, but they will.

The response of the Linux kernel devs to that GCC optimization was to disable it as much as possible using -fno-delete-null-pointer-checks.

By the way, Linux also uses -fwrapv, -fno-strict-overflow, and -fno-strict-aliasing.

Basically Evil

And so, at the end of that discussion I mentioned earlier, I had the epiphany that compiler authors for C and C++ have deliberately pushed a definition of UB that most programmers never even consider before they are burned by it.

This is why that person was so adamant that my definition of UB, the one shared by most unburned programmers, is wrong: this person is a compiler author first and was consciously trying to push the definition that fit their worldview.

Unfortunately, C and C++ compiler authors have largely succeeded.

How did they do this? Easy: they control the standard.

Few people think about this, but there is a bunch of actual, breathing people that have to propose, debate, and incorporate changes to the standard.

There are a lot of people on the C++ committee and a smaller number on the C committee.

And many of them are compiler authors. In fact, they make up a huge chunk of the committees.

So despite holding the minority world view, they have managed to force it on us by fiat because we have to use their compilers. And they have managed to stop several proposals to remove undefined behavior from standards.

Including this one.

Now, in earlier times when I was more incendiary, I might have said that compiler authors were malicious. But now…

Wait, compiler authors are creating compilers that deliberately miscompile code. In fact, a compiler researcher, John Regehr, said,

It is basically evil to make certain program actions wrong, but to not give developers any way to tell whether or not their code performs these actions and, if so, where.
– John Regehr (emphasis added)

So yeah, they are malicious, and I’m not the only one saying so.

What to Do?

So what do we do?

Well, first, we need to start pushing back on the 00UB definition of UB. We should all start using -fwrapv, -fno-delete-null-pointer-checks, -fno-strict-overflow, and -fno-strict-aliasing on Clang and GCC (and the equivalents on MSVC). And this should become the de facto standard C and C++.

Chris Lattner, a compiler author who started LLVM, concedes that using those flags is tantamount to using separate dialects. He says that they are non-portable, but since GCC and Clang are basically the only two compilers for Unix-like systems, I would consider them portable enough.

Second, someone should create boringcc, a compiler that uses the definition of UB that most programmers use, and it should be made completely cross-platform, able to target Windows, Mac OSX, iOS, Linux, Android, the BSD’s, and any other semi-important platforms.

Yes, I’ve thought about creating boringcc myself, but I’m busy with other projects for my business.

However, if you want boringcc, contact my business address. If I get enough interest, I’ll do it because real, tangible interest would convince me that there is more of a business case for that compiler than my current projects.

Third, if you can get on the C or C++ committee, do so; having more voices against the 00UB worldview would help.

Conclusion

Whatever the case, we need to start pushing back against this perspective; it is a scourge on our industry, destroying confidence in our code and our compilers.

And compiler authors need to back down from their “evil” perspective.

About

Contact

Archive

Categories

Tags

Subscribe

Table of Contents