Why Perfect Software Is Nearly Impossible

Introduction

tl;dr It takes a lot of work, but it’s still possible.

For the past year and a half, I have built a bc as a résumé piece, and because it was meant to help me get a job, I wanted it to be as perfect as possible. And I learned a lot of lessons that I will take with me to my current job and beyond.

This post is about those lessons and what it takes to make perfect, or nearly perfect, software.

Learn the Problem

For any programmer writing a bc, there are three important things to get right:

Parser
Interpreter
Math

From a bit of experience writing compilers, I had enough knowledge to write the parser and interpreter without much trouble. However, at the time I started, I did not know how to write code to calculate arbitrary-precision math. And that was a big problem.

So the first thing I needed to do was to learn the intricacies of the problem I was trying to solve. I studied and learned the brute force algorithms for all four fundamental operations (addition, subtraction, multiplication, and division), the standard algorithm for power (exponentiation by squaring), and the standard algorithm (Newton-Raphson) for square root (required for bc). I also had to learn the algorithms to estimate sin(x), cos(x), atan(x), e^x, ln(x), and the integer bessel function.

And later, when needed, I learned a faster function for multiplication: Karatsuba. More on that later.

But as I was saying, learning the ins and outs of the problem is crucial; without doing so, I would never have been able to make bug-free math.

But that’s not all. The person who first asked me to write the bc actually wanted me to write the parser and interpreter; he was writing an arbitrary-precision math library, specifically for use in a bc.

To keep me under control, he kept insulting my intelligence about math and made me feel terrible about myself. Nevertheless, it didn’t take long for me to realize that even though he was writing his library for a bc, he was still doing it wrong because his library did not adhere to the standard. This is yet another thing that is part of learning the problem: learn the applicable standards.

So, two weeks after I started, I started learning the algorithms, and two weeks later, I had a library that was more finished than his. As such, I had proof that I was not as stupid as I had been told.

This taught me another lesson: do not listen to those who do not have your best interests at heart.

Yes, it’s a life lesson, not a software development lesson. But it was an important one for me.

Feedback Must Be Fast

Because of the person mentioned above who made me feel unintelligent when it came to math, I did not dare do any other algorithms for the four fundamental operations other than the stupid, brute-force ones, especially for multiplication, because the person who insulted me specifically said that I could never implement Karatsuba multiplication. This caused my bc to be unbearably slow on power, which is just a chain of multiplications, because the brute-force multiplication algorithm is slow.

It took the encouragement of my wife and another programmer who knew the person who insulted me to try writing an implementation of the Karatsuba algorithm.

And it only took me two days. I was blown away by what I could do.

But the important bit is that when I enabled Karatsuba, my test suite went from 45 seconds to 12 seconds. That did something that I sorely needed: to reduce the average length of my edit, compile, and test cycle in order to increase the speed of feedback.

This has an interesting corollary: the test suite must be fast.

A nice side effect of a faster test suite is that I was able to run the test suite more often.

The end result of these changes was to double the rate at which I could develop the bc. Before this experience, I had not considered the effect that a slow test suite would have on my ability to program, but now, I know that the effect is big.

Testing Must Be Thorough

There were two reasons that my test suite was not fast at the beginning:

All tests were run on the full bc (all integration tests, no unit tests),
I made the test suite thorough.

Number 2 was something that I intuitively knew had to be the case in order for the bc to approach perfection.

And yes, you have to make a thorough test suite. If it’s not tested, it’s not perfect. For that reason, coverage matters. A test suite must have excellent coverage.

Note, however, it does not need 100% coverage because high coverage starts to have diminishing returns.

One thing I did that had great returns was testing errors. In fact, if I did not have a way to test errors, my bc would work great in normal conditions, but it would be fragile in any other case. In order to make a program robust, the test suite must test error conditions.

Unfortunately for me, the amount of testing that a bc needs to have a thorough test suite is ridiculous. Fortunately for me, I learned quickly how to generate tests that could be generated. Thus I learned that tests should be generated if possible.

The other testing tool that I used was assert(). I used it to test all of the assumptions I made at various points, including all preconditions that procedures had and the postconditions that they were supposed to guarantee.

Combined with a high coverage, thorough test suite, assert() made sure that bugs were caught quickly. Therefore, using assert() to test assumptions increases the effectiveness of the test suite.

As a side effect, using assert() to document assumptions makes it easy to remember what those assumptions are.

The result: I love assert(). (Go authors: why in the world have you refused to make assert() available?)

I also learned something that I did not expect, if assert() is used properly and liberally with a thorough test suite, unit testing is a hindrance.

The reason is that all of the units, procedures in this case, are all thoroughly tested by a full integration test suite, and the assert() calls always ensure that the procedure’s preconditions and postconditions are always met.

However, this means that it is doubly important to have a good test suite.

Now, regarding unit tests, I would be remiss if I did not quote something (edited slightly for grammar and spelling) that I was told by a programmer with more experience than me:

There’s just a little thing I want to elaborate on: you’re saying “unit testing does not matter as much as I thought.” That is true for the kind of project that bc is. It’s a C program at a small-to-moderate scale where you’re the only writer, or close to it. In this precise circumstance, unit testing would bring you more bloat and more pain than it would help you. Because it wasn’t devised for that kind of software; we don’t really have good frameworks for testing that kind of software.
(I know. I’ve written a lot of similar software and finding a good testing framework has always been a pain point.)
However, in different working conditions, typically when you’re doing a large project in collaboration, unit testing is vital. That’s the place it has been designed for; that’s the place development resources or weight of the framework are not limiting factors. It’s also more difficult to test large software by just running it on your machine, as we can do with bc. :) So unit testing is how you do it. And it’s especially useful in collaborative work, because you have to make sure the interfaces you provide your coworkers are correctly implemented.
So, in summary, when a manager or software architect tells you that unit testing is important, don’t let your experience with bc tell you they’re lying. They’re not; their field of expertise is just not the kind of software you’ve written. And the advice was not applicable to your use case.

I agree with him that unit testing does not apply to bc’s particular use case, but I am not sure I agree with him that unit testing is ever useful, as long as assert() is used properly with a good test suite, especially since he says that one of the uses of unit tests is to make sure that the interfaces are properly implemented, and good use of assert() does that job as well.

However, as said before, he has more experience than me, so maybe he is right.

Tools Enable Perfection

I did not start using tools right away. I only did so once I was encouraged to by other programmers.

I learned how to use static analysis tools, runtime analysis tools, continuous integration tools, and fuzzers.

Static analysis tools were, by far, the easiest to use. Just run the tool and fix the problems it found, if they are valid.

I used two static analysis tools: scan-build and Coverity Scan.

Runtime analysis tools are harder; I would run the bc under the tool, collect the results, and fix the bugs.

I used four runtime analysis tools: valgrind, AddressSanitizer, MemorySanitizer, and UndefinedBehaviorSanitizer.

Late in the game, I decided to learn how to use the Travis CI to ensure my commits are buildable with both gcc and clang. This was invaluable to find out quickly when I had committed dud code.

However, runtime analysis and continuous integration tools come with a caveat: in order to work well, they need to be paired with a thorough test suite, which means that it is triply important to have a good test suite.

But of all of these tools, my favorite, by far, was the fuzzer that I used: afl.

I did not want to put the effort into learning how to fuzz my code, but as mentioned earlier, I was pushed hard to do so, and boy, am I glad that I was.

Even though other tools caught as many bugs as afl did, the ones that it caught were the subtlest bugs. And even though it was not as easy to use as the other tools, it gave me the biggest bang for the buck since the bugs it found would have been exponentially harder to find otherwise.

And for the cherry on top, afl does not require a thorough test suite.

But I didn’t just use afl to find bugs; I added the test cases it built to my test suite. I used it to create regression tests.

There are two main lessons: first, if you have a good test suite, using tools makes finding bugs easier. Second, fuzzing is worth the effort.

Use Safe Languages

Did you notice that most of the tools I used above were necessary because I used C, an unsafe language?

There were good reasons for me to write in C, but if those reasons had not existed, I would have been better off writing bc in a safe language. If I had, I would not have had to use many of those tools; they would have been unnecessary.

In fact, most of my effort was spent in chasing down bugs that would not have been a problem at all if I had used a safe language.

The lesson, as aptly stated by Trevor Jim, is that using an unsafe language is a design flaw, unless adequate care is taken to ensure that memory corruption either does not happen or is properly dealt with.

Trevor said, “If you decide to write your project in C/C++ you must have a plan for handling the inevitable memory corruption in advance. Anything else is malpractice.”

And I agree. So what was my plan for handling memory corruption? Eliminate it.

I tested thoroughly and with the tools above. If they detected a possible memory corruption bug, I fixed it. With prejudice. The end result is that, as far as I know, no user of my bc has ever encountered a memory corruption bug in the wild.

Now, Trevor suggested formally verifying correctness. I did not do that, but if bc were a setuid program, or did anything more to its environment than read files and write to stdout, I would have. As it was, when I added dc to my bc, I left out the ! command, which runs a shell command, for this very reason; it would simply be too dangerous, and malpractice, to put it in without formally verifying the code.

(Disclaimer: I don’t know if Trevor would think that I put enough effort in. I am just saying that I think that I did.)

False Goals Must Be Avoided

There are two projects that wanted my bc: busybox and toybox. For both projects, I offered to be the maintainer of the bc, but both turned me down because my philosophy is different.

For busybox, the overriding goal is small executable size. The result is that the readability suffers and sometimes, so does correctness, which was my goal.

For toybox, the goal was as few lines of code as possible, which also made readability suffer.

From the experience of dealing with them, I learned that anything other than correctness is a false goal.

The other thing I learned from that is that readability enables correctness.

Code Must Be Consistent

One thing that enables readability is consistency. Throughout my time writing bc, I spent effort keeping the code consistent, using patterns for variable naming, function naming, function parameters, assert() calls, and many other things.

As a result, whenever I go to any code in my bc, I can make assumptions about many different things, reducing my cognitive load and enabling me to spend more of my capabilities on perfecting the code.

If you want readability and perfection, code must be consistent.

And because the number one thing that enables code to be consistent is good style, code must be written in a good style.

Strive for Portability

From the beginning, I knew that I wanted my bc to have no other dependencies other than POSIX utilities and API’s. It was difficult.

First, I could not use gcc extensions; I had to use vanilla C99. This limited what I could write and forced me to write simple code. Second, I had to use POSIX make, which has very few standard features. In fact, it does not have conditionals, so if there are any build options at all, it is utterly useless.

But since I did not want GNU make or GNU autotools as dependencies, I had to make my own custom build system with POSIX sh scripts. I couldn’t even use bash!

However, the end result was that my bc builds without modification on Linux, Mac OSX, FreeBSD, OpenBSD, and NetBSD. And I am sure it will build without modification on other POSIX systems.

In fact, with the exception of signal handling, my bc builds on Windows! To be honest, even signal handling works on Windows; my bc detects if it is being compiled for Windows and enables the Windows version of signal handling.

Portability is hard, but having it means that my bc is starting to be used. Thus, portability makes it easier to get users.

Optimizability Is a False Goal

Throughout the entire project, I hardly ever worried about writing code in such a way that a compiler could easily optimize it; I just focused on portability.

But none of that hurt me.

When not optimized, my bc takes 27.16 seconds to run the test suite on my machine. When optimized with the recommended optimizations, my bc takes 11.17 seconds, less than half the time.

It turns out that optimizability is a side effect of writing simple code, just as simple code is a side effect of writing portable code.

But there was yet another good side effect of portable code: optimizations do not change the behavior of the code.

This mostly has to do with the elimination of undefined behavior (UB). By definition, undefined behavior is not portable, so to make sure that my code was portable, I used the tools above to ensure that it did not exist. And I tested for it before every release.

The end result was that I have had no users on any platform report that tests failed for them, except for one user, but it was because of xterm not keeping up with test output.

The lesson here is that portable code will not change behavior when optimized.

Avoid Asynchrony

One surprising thing I learned is that asynchrony is terrible.

I needed to put signal handling for SIGINT and SIGTERM in my code. Because of the rule that only async-signal safe functions can be used in signal handlers, I had to use a direct write() call on stderr. This meant that whenever my bc wrote to stderr otherwise, it needed to flush right away.

And even with that precaution, there is an unavoidable race condition.

On top of that, whenever I am running a long operation, which includes all math, I need to keep checking a flag to see if a signal was sent. This actually does slow down my bc slightly.

And still, I cannot guarantee that my bc will respond to a signal in a timely fashion.

As a result, one of my goals for Yao is to eliminate the need for such code. But more on that in a future post.

Conclusion

In a way, this “résumé piece” was one of my greatest educations and achievements.

But that is not the point of this post. The real point is that perfect software is attainable; it just takes a lot of effort.

Is my bc perfect? Well, no. Not quite. But it’s close.

(And really, the best measure of perfection is whether or not a user runs into a bug. So far, other than one user that loves to actively find bugs, not a single one has. And he hasn’t found any recently, either.)

Still, now I know what it takes to make perfect software. After this experience, I can better design Yao because, after all, Yao is supposed to make perfect software easy.

But until then, these lessons, and the habits I have developed from them, will serve me well.

About

Contact

Archive

Categories

Tags

Subscribe