Please see the disclaimer.
Oh, the irony!
Merely hours after I posted my last post, in which I said, “I have found
more bugs than users have,” I got an email from a FreeBSD user (where my
dc are default) reporting a bug in my
What I claimed in my previous post is still true; I found and fixed two bugs in the past two weeks after a month of inactivity.
The bug was found by running the following script (found here) with my
#!/bin/sh if test $# -lt 1 then echo usage: $0 year exit 1 fi echo $* '[ddsf[lfp[too early ]Pq]s@1583>@ ddd19%1+sg100/1+d3*4/12-sx8*5+25/5-sz5*4/lx-10-sdlg11*20+lz+lx-30% d[30+]s@0>@d[[1+]s@lg11<@]s@25=@d[1+]s@24=@se44le-d[30+]s@21>@dld+7%-7+ [March ]smd[31-[April ]sm]s@31<@psnlmPpsn1z>p]splpx' | dc | tr '\012' ' ' echo ''
Wow, that is unreadable! The truth is that
dc code is, in my opinion, the
best (or worst) example of a write-only language. It’s even worse than
And even though Wikipedia claims that the term is used in humor, I am not
joking. My evidence is just how much work it took to analyze this 233-byte
Of course, there is a reason for that: an RPN parser is dead simple.
As proof, compare the size of my
dc’s lexer and parser to my
bc’s lexer and parser.
This fact meant that
dc was one of the first programs Ken Thompson ran on the
PDP-11, and people have used
dc to implement Diffie-Hellman as a
protest against unconstitutional crypto export restrictions.
In other words,
dc is a historically significant language and calculator; it
might be useful to know it.
So let’s break this script down line-by-line.
But in order to do that, we need to learn some basics.
dc is a stack-oriented language, like Forth.
And it’s not nearly as well-designed.
This means that you push things on the stack, in a certain order, to do things with them.
The way this is done is by Reverse Polish Notation, like old HP calculators.
I still have one of those old HP calculators, the one I used in Junior High, and even though I wrote my first program on a TI-84 (which I also still have), I am still nostalgically fond of that calculator.
Now that I know
dc code, I should probably attempt to write a program on that
Also, read that link about HP calculators. It’s a treasure.
dc has variables, called “registers.” In some
register names can only be one character/byte. In some, you can get access to
actual variable names.
dc has strings, but they do not look like strings in most programming
languages. They are bounded by square brackets:
Fourth, strings can contain other strings. (The brackets have to be balanced.)
Fifth, strings can be executed. This means that when looking at a string, you have to guess whether it is supposed to be printed, executed, or both. Often, that is easy, but not necessarily.
And finally, the combination of the above two things is how you create
statements, loops, and functions in
For more details about how to create those things, see my post about
But even though we understand
dc code now, it would serve us well to reformat
dc code is no easy task.
The reason is that newlines don’t matter, most of the time. There are times when adding or removing a newline in the wrong place can completely change the meaning of the code.
dc’s case, you must know what commands take a register name, and which do
If you are using my
dc, you can check the manpage, but in this case, I will do
the formatting for you.
#!/bin/sh if test $# -lt 1 then echo usage: $0 year exit 1 fi echo $* \ '[ d d sf [ lf p [too early ] P q ] s@ 1583 >@ d d d 19 % 1 + sg 100 / 1 + d 3 * 4 / 12 - sx 8 * 5 + 25 / 5 - sz 5 * 4 / lx - 10 - sd lg 11 * 20 + lz + lx - 30 % d [ 30 + ] s@ 0 >@ d [ [ 1 + ] s@ lg 11 <@ ] s@ 25 =@ d [ 1 + ] s@ 24 =@ se 44 le - d [ 30 + ] s@ 21 >@ d ld + 7 % - 7 + [March ] sm d [ 31- [April ] sm ] s@ 31 <@ p sn lm P p sn 1 z >p ] sp lp x' \ | dc \ | tr '\012' ' ' echo ''
Yes, that is one command per line. That is how many commands there are in that amount of code.
That is why people sometimes use it for code golf.
But I would still like to make one more change. The two lines
[too early and
]P are ugly, but they are that way because that string is supposed to be
printed, and the programmer wanted it printed with a newline. Adding the newline
directly does that.
However, there is a way to get around that, resulting in this code:
#!/bin/sh if test $# -lt 1 then echo usage: $0 year exit 1 fi echo $* \ '[ d d sf [ lf p [too early] P 10 P q ] s@ 1583 >@ d d d 19 % 1 + sg 100 / 1 + d 3 * 4 / 12 - sx 8 * 5 + 25 / 5 - sz 5 * 4 / lx - 10 - sd lg 11 * 20 + lz + lx - 30 % d [ 30 + ] s@ 0 >@ d [ [ 1 + ] s@ lg 11 <@ ] s@ 25 =@ d [ 1 + ] s@ 24 =@ se 44 le - d [ 30 + ] s@ 21 >@ d ld + 7 % - 7 + [March ] sm d [ 31- [April ] sm ] s@ 31 <@ p sn lm P p sn 1 z >p ] sp lp x' \ | dc \ | tr '\012' ' ' echo ''
Just trust me that
10P will print a newline.
The full script can be downloaded from here.
Now that we have formatted the code, let’s look at it line-by-line.
First, let’s look at the last two lines of the entire script:
| tr '\012' ' '
Obviously, this translates all newlines
dc prints into spaces.
And if it wasn’t obvious, that just prints a newline.
Now, all the shell lines:
#!/bin/sh if test $# -lt 1 then echo usage: $0 year exit 1 fi echo $* \
The first line is obvious.
The second through sixth lines just check the script argument.
The last line just echoes the year into
dc, which, since
dc is stack-based,
makes the year the first thing on the stack.
The very first line is this:
Simply put, that is just the start of the
dc code (
') and the start of a
[). When we get to the end, this long string will be executed, so in
essence, this string is still code.
In this case, think of it more like the opening brace of a function.
d d sf
d command copies the item on the top of the stack and pushes the
copy onto the stack. In other words, it “duplicates” the top of the stack.
There are two
d commands, so whatever is on the top of the stack will be
duplicated twice, leaving three total copies on the stack.
And what was on the top of the stack? The year; it was inserted by the shell code.
So at the end of the first two lines, there are three copies of the year on the top of the stack.
But the next command is
s, which stores the top of the stack into a register.
In this case, the script only uses one-character names, so the year is stored
At the end of the
s command; we have two copies of the year on the stack, and
the year is also stored in the
[ lf p [too early] P 10 P q ] s@
This bunch of code is much more complicated, and that’s not just because there is more of it.
This is our first example of a full string (two, actually).
First, the larger outer string is pushed onto the stack.
s command stores the string into the
dc’s register names are only one character, they can be basically
any character, even non-alpha-numeric ones. This includes whitespace,
This is why inserting a newline into the wrong place can change the meaning of code.
We’ll come back to the contents of the string in a minute because first, let’s talk about these two lines:
This tiny bit of code is the most complicated yet. It introduces string execution. In fact, it introduces conditional execution.
The number is just pushed onto the stack, but the next command is
>. This is a
command that pops the top two items off the stack and conditionally executes the
contents of the register after it.
In this case, the condition is that the value of the top of the stack is greater than the value of the second-to-top item on the stack.
While that makes sense if you don’t think about it, it is actually backwards from how other programming languages do it. So don’t think about it too hard.
Since the top of the stack is
1583, and the second-to-top is the year, this
command will execute the contents of the
@ register if
1583 is greater than
I suspect that the year 1583 is chosen because it was the first year under the Gregorian Calendar that Easter was celebrated.
What are the contents of the
@ register? The string I skipped over earlier, so
let’s look at it now.
lf p [too early] P 10 P q
l command pushes the contents of the given register onto the stack, so the
first command will push the contents of the
f register onto the stack.
f contains the last duplicate of the year, so the year is pushed onto the
p command prints the item on top of the stack, but it does not pop the
item off of the stack. It also prints a newline after. This is one of the
[too early] line pushes that string on the stack, and the
prints it without a newline.
P lines print a newline, and the
q command quits
Of course, since
tr translates all of the newlines into spaces, and because it
comes after the text, and because a newline is printed after
dc exits anyway
echo ''), the
P lines could be removed.
In essence, this first part of the script is an just an error check, an
statement, but that is not obvious at first glance.
The next lines are:
d d d
The item on the top of the stack is duplicated three times.
What’s on the top of the stack? The year. Again.
There used to be three, but one was used for storing into the
f register, and
one was used to do the comparison with
So after the duplications, there are four copies of the year on the stack.
The next lines begin the algorithm in earnest.
This appears to be calculating the variable
a (year mod 19), which appears in
all of the algorithms on Wikipedia.
Remember the pattern of “number, operation” because it appears again and again.
I cannot figure out this line. In none of the algorithms listed in Wikipedia is
a is incremented.
a is popped from the stack and stored in register
This divides the year (which is the top of the stack again) by 100, which
appears to calculate
k from Gauss’ algorithm, or
b from the later
1 + d
Once again, the variable is incremented. This is confusing again because Gauss'
algorithm does not increment
k, and the others do not increment
But whatever the value is, it is also duplicated.
3 * 4 /
The duplicate value is multiplied by
3, and the result is divided by
At this point, it is starting to look like Gauss’ algorithm is the one in use, with changes, of course.
12 - sx
12 is subtracted from the result, and that result is popped and stored
Now, before we go any further, we have to answer the question that you always have to answer for stack-based languages: what’s on the top of the stack?
The value that was incremented after getting divided by
With that in mind, let’s continue:
8 * 5 + 25 / 5 - sz
That value is multiplied by
8, added to
5, divided by
25, and has
subtracted from it before it is popped and stored in the
Looking at the algorithms, the Gauss algorithm is still in the running, but the 1961 New Scientist algorithm still seems to have a chance.
Again we ask the question: what’s on the top of the stack?
It’s the year again; there are still two copies.
5 * 4 /
The year is multiplied by
5, and the result is divided by
4, and it is left
on the top of the stack.
Next, the value in the
x register is pushed onto the stack, and we should
remind ourselves what is in
x: the value that was multiplied by
4, and had
12 subtracted from it.
- 10 - sd
First, the value loaded from
x is subtracted from our previous value, then
10 is subtracted from that, before that result is stored in the
Again, what is on the top of the stack?
It’s the last copy of the year, but it turns out that it doesn’t matter because…
lg 11 * 20 +
The value in the
g register is loaded (the one modded by
incremented), multiplied by
11, and added to
lz + lx - 30 % d
The value in
z (the one divided by
25 and less
5) s loaded and added to
the value that was already there. Then the value in
x is loaded (mulitplied by
3, divided by
4, and less
12), and that is subtracted from the value that
had been on top before all of that is modded by
30 and duplicated.
At this point, I have lost what the algorithm was…the one used here is certainly not on Wikipedia.
But what is on the stack?
Debugging tip: to figure out what is on the stack at a certain point, I just add
fq after the command I am interested in.
f command prints the contents of the stack, and
q will quit
dc in most
cases. If there are more than 3 strings executing, you would put that number of
strings on the stack and then run the
All that’s on the stack is the two copies of the result just calculated and the year.
[ 30 + ] s@
Next, a string is pushed onto the stack. This doesn’t look like a string you
would want to print, so it’s probably code. And it is stored into the
Another string was stored in the
@ register, the error check string at the
beginning. It doesn’t matter now, so the program can safely overwrite it.
0 is pushed onto the stack, compared against the former top of the stack,
and if it is greater (i.e., the former top of the stack is less than 0), the
string just stored in
@ is executed (just another
if statement), which just
adds 30 to the top of the stack.
Remember that the items that were compared were removed from the top of the
stack, so the current top of the stack is the value that was modded by
duplicated. This was, in fact, why it was duplicated.
d [ [ 1 + ] s@ lg 11 <@ ] s@
The value on top of the stack, the value that was just added to
duplicated and another string that looks like code is pushed on the stack before
being stored in the
Next, the number on top of the stack is compared to
25, and if they are equal,
the contents of
@ are executed, so let’s go back to that.
[ 1 + ] s@ lg 11 <@
First, a string is pushed onto the stack and stored in
It looks like whoever wrote this code loved to use the
@ register to store
code to execute in
Then the contents of the
g register are loaded, and if
11 is less than that,
that string is executed. Obviously, that string just increments the top of the
What is the top of the stack at that point? It’s the value that was duplicated before the first outer string was pushed, so that’s the value that could be incremented.
d [ 1 + ] s@ 24 =@
That value, which may have been incremented or not, is duplicated before yet
if statement. This time, it compares it against
24, and if they are
equal, the original (not the duplicate, which is gone) is incremented.
The pattern of “duplication,
if statement” is a good one to remember. It will
pop up again.
se 44 le -
Next, that value is stored in the
e register before
44 is pushed onto the
stack and the value in
e is pushed onto the stack again before being
It was pushed onto the stack again because in subtraction, order of the operands matters.
d [ 30 + ] s@ 21 >@
Here is the duplication,
if statement pattern again. I will leave it to the
reader to analyze it in detail.
d ld + 7 % - 7 +
The value on top is duplicated before the contents of the
d register are
pushed. Then they are added before the patter of “number, operation” is done
once. Then, the result is subtracted from the original before yet another
“number, operation” pattern.
[March ] sm
March is stored into the
d [ 31- [April ] sm ] s@
The value on top of the stack is duplicated before another string is put into
@. It looks like it’s going to be another
Yes, it is. If the top of the stack is greater than 31, then we execute that string.
31- [April ] sm
31 is subtracted from the top of the stack (the original because the duplicate
was consumed), and then
April is stored into
In other words, the top of the stack is the day, and if it’s greater than
(the number of days in March), then Easter will be in April, and we need to
adjust the day.
p prints the current item on the stack, which is the day, followed by a
newline (later changed by
tr). It does not, however, consume the day, so next,
it is stored into the
First, the contents of
m are loaded, which is the month, and it is popped off
the stack and printed (without a newline).
The item on top of the stack (which is the year, by the way) is printed, popped,
and stored into
1 z >p
These lines push
1 onto the stack, then they push the number of items on the
stack onto the stack.
It pushes the length of the stack at the start of the command.
Then if the stack length is greater than
1, the contents of
p are executed.
In other words, these lines are checking that the stack was empty before their
execution, and if they were not, it executes the contents of
What are the contents of
p? We need to continue to find out.
] sp lp x' \
Here, that long string is finished and stored into
p. Then it is loaded again
(this is done to ensure there is a copy of it) and executed with the command
x, which unconditionally executes a string.
So those stack-checking lines above will end up re-executing
p if the stack
was not empty.
And just like that, I know where the bug is in my
dc: it’s not popping items
off the stack properly when printing them with the
The bug was fixed by commit
These lines would not have been necessary, by the way, if it had not been for
the stack check, which is the very reason it found a bug in my
I hope you enjoyed this tour of the historically significant and hacker-like
dc calculator. I don’t know that there was any point to it other than for me
to add to the oral tradition of software engineering.
And to debug my