Please see the disclaimer.

Introduction

Oh, the irony!

Merely hours after I posted my last post, in which I said, “I have found more bugs than users have,” I got an email from a FreeBSD user (where my bc and dc are default) reporting a bug in my dc.

What I claimed in my previous post is still true; I found and fixed two bugs in the past two weeks after a month of inactivity.

The Script

The bug was found by running the following script (found here) with my dc:

#!/bin/sh
if test $# -lt 1
then
	echo usage: $0 year
	exit 1
fi
echo $* '[ddsf[lfp[too early
]Pq]s@1583>@
ddd19%1+sg100/1+d3*4/12-sx8*5+25/5-sz5*4/lx-10-sdlg11*20+lz+lx-30%
d[30+]s@0>@d[[1+]s@lg11<@]s@25=@d[1+]s@24=@se44le-d[30+]s@21>@dld+7%-7+
[March ]smd[31-[April ]sm]s@31<@psnlmPpsn1z>p]splpx' | dc | tr '\012' ' '
echo ''

Wow, that is unreadable! The truth is that dc code is, in my opinion, the best (or worst) example of a write-only language. It’s even worse than Perl!

And even though Wikipedia claims that the term is used in humor, I am not joking. My evidence is just how much work it took to analyze this 233-byte dc program.

Of course, there is a reason for that: an RPN parser is dead simple.

As proof, compare the size of my dc's lexer and parser to my bc's lexer and parser.

There is more code to the dc's lexer and parser, but bc's lexer and parser uses that code as well.

This fact meant that dc was one of the first programs Ken Thompson ran on the PDP-11, and people have used dc to implement Diffie-Hellman as a protest against unconstitutional crypto export restrictions.

In other words, dc is a historically significant language and calculator; it might be useful to know it.

So let’s break this script down line-by-line.

Basics

But in order to do that, we need to learn some basics.

dc Concepts

First, dc is a stack-oriented language, like Forth.

And it’s not nearly as well-designed.

This means that you push things on the stack, in a certain order, to do things with them.

The way this is done is by Reverse Polish Notation, like old HP calculators.

I still have one of those old HP calculators, the one I used in Junior High, and even though I wrote my first program on a TI-84 (which I also still have), I am still nostalgically fond of that calculator.

Now that I know dc code, I should probably attempt to write a program on that HP.

Also, read that link about HP calculators. It’s a treasure.

Second, dc has variables, called “registers.” In some dc implementations, register names can only be one character/byte. In some, you can get access to actual variable names.

Third, dc has strings, but they do not look like strings in most programming languages. They are bounded by square brackets: [].

Fourth, strings can contain other strings. (The brackets have to be balanced.)

Fifth, strings can be executed. This means that when looking at a string, you have to guess whether it is supposed to be printed, executed, or both. Often, that is easy, but not necessarily.

And finally, the combination of the above two things is how you create if statements, loops, and functions in dc.

For more details about how to create those things, see my post about dc tricks.

Reformatting

But even though we understand dc code now, it would serve us well to reformat it.

And reformatting dc code is no easy task.

The reason is that newlines don’t matter, most of the time. There are times when adding or removing a newline in the wrong place can completely change the meaning of the code.

In dc's case, you must know what commands take a register name, and which do not.

If you are using my dc, you can check the manpage, but in this case, I will do the formatting for you.

#!/bin/sh
if test $# -lt 1
then
	echo usage: $0 year
	exit 1
fi
echo $* \
'[
	d
	d
	sf
	[
		lf
		p
		[too early
]
		P
		q
	]
	s@
	1583
	>@
	d
	d
	d
	19
	%
	1
	+
	sg
	100
	/
	1
	+
	d
	3
	*
	4
	/
	12
	-
	sx
	8
	*
	5
	+
	25
	/
	5
	-
	sz
	5
	*
	4
	/
	lx
	-
	10
	-
	sd
	lg
	11
	*
	20
	+
	lz
	+
	lx
	-
	30
	%
	d
	[
		30
		+
	]
	s@
	0
	>@
	d
	[
		[
			1
			+
		]
		s@
		lg
		11
		<@
	]
	s@
	25
	=@
	d
	[
		1
		+
	]
	s@
	24
	=@
	se
	44
	le
	-
	d
	[
		30
		+
	]
	s@
	21
	>@
	d
	ld
	+
	7
	%
	-
	7
	+
	[March ]
	sm
	d
	[
		31-
		[April ]
		sm
	]
	s@
	31
	<@
	p
	sn
	lm
	P
	p
	sn
	1
	z
	>p
]
sp
lp
x' \
| dc \
| tr '\012' ' '
echo ''

Yes, that is one command per line. That is how many commands there are in that amount of code.

That is why people sometimes use it for code golf.

But I would still like to make one more change. The two lines [too early and ]P are ugly, but they are that way because that string is supposed to be printed, and the programmer wanted it printed with a newline. Adding the newline directly does that.

However, there is a way to get around that, resulting in this code:

#!/bin/sh
if test $# -lt 1
then
	echo usage: $0 year
	exit 1
fi
echo $* \
'[
	d
	d
	sf
	[
		lf
		p
		[too early]
		P
		10
		P
		q
	]
	s@
	1583
	>@
	d
	d
	d
	19
	%
	1
	+
	sg
	100
	/
	1
	+
	d
	3
	*
	4
	/
	12
	-
	sx
	8
	*
	5
	+
	25
	/
	5
	-
	sz
	5
	*
	4
	/
	lx
	-
	10
	-
	sd
	lg
	11
	*
	20
	+
	lz
	+
	lx
	-
	30
	%
	d
	[
		30
		+
	]
	s@
	0
	>@
	d
	[
		[
			1
			+
		]
		s@
		lg
		11
		<@
	]
	s@
	25
	=@
	d
	[
		1
		+
	]
	s@
	24
	=@
	se
	44
	le
	-
	d
	[
		30
		+
	]
	s@
	21
	>@
	d
	ld
	+
	7
	%
	-
	7
	+
	[March ]
	sm
	d
	[
		31-
		[April ]
		sm
	]
	s@
	31
	<@
	p
	sn
	lm
	P
	p
	sn
	1
	z
	>p
]
sp
lp
x' \
| dc \
| tr '\012' ' '
echo ''

Just trust me that 10P will print a newline.

The full script can be downloaded from here.

Analysis

Now that we have formatted the code, let’s look at it line-by-line.

Shell

First, let’s look at the last two lines of the entire script:

| tr '\012' ' '

Obviously, this translates all newlines dc prints into spaces.

echo ''

And if it wasn’t obvious, that just prints a newline.

Now, all the shell lines:

#!/bin/sh
if test $# -lt 1
then
	echo usage: $0 year
	exit 1
fi
echo $* \

The first line is obvious.

The second through sixth lines just check the script argument.

The last line just echoes the year into dc, which, since dc is stack-based, makes the year the first thing on the stack.

dc

The very first line is this:

'[

Simply put, that is just the start of the dc code (') and the start of a string ([). When we get to the end, this long string will be executed, so in essence, this string is still code.

In this case, think of it more like the opening brace of a function.

Preparation

	d
	d
	sf

Next, the d command copies the item on the top of the stack and pushes the copy onto the stack. In other words, it “duplicates” the top of the stack.

There are two d commands, so whatever is on the top of the stack will be duplicated twice, leaving three total copies on the stack.

And what was on the top of the stack? The year; it was inserted by the shell code.

So at the end of the first two lines, there are three copies of the year on the top of the stack.

But the next command is s, which stores the top of the stack into a register. In this case, the script only uses one-character names, so the year is stored into register f.

At the end of the s command; we have two copies of the year on the stack, and the year is also stored in the f register.

	[
		lf
		p
		[too early]
		P
		10
		P
		q
	]
	s@

This bunch of code is much more complicated, and that’s not just because there is more of it.

This is our first example of a full string (two, actually).

First, the larger outer string is pushed onto the stack.

Then, the s command stores the string into the @ register.

Because dc's register names are only one character, they can be basically any character, even non-alpha-numeric ones. This includes whitespace, especially newlines.

This is why inserting a newline into the wrong place can change the meaning of code.

We’ll come back to the contents of the string in a minute because first, let’s talk about these two lines:

	1583
	>@

This tiny bit of code is the most complicated yet. It introduces string execution. In fact, it introduces conditional execution.

The number is just pushed onto the stack, but the next command is >. This is a command that pops the top two items off the stack and conditionally executes the contents of the register after it.

In this case, the condition is that the value of the top of the stack is greater than the value of the second-to-top item on the stack.

While that makes sense if you don’t think about it, it is actually backwards from how other programming languages do it. So don’t think about it too hard.

Since the top of the stack is 1583, and the second-to-top is the year, this command will execute the contents of the @ register if 1583 is greater than the year.

I suspect that the year 1583 is chosen because it was the first year under the Gregorian Calendar that Easter was celebrated.

What are the contents of the @ register? The string I skipped over earlier, so let’s look at it now.

		lf
		p
		[too early]
		P
		10
		P
		q

The l command pushes the contents of the given register onto the stack, so the first command will push the contents of the f register onto the stack.

f contains the last duplicate of the year, so the year is pushed onto the stack again.

The p command prints the item on top of the stack, but it does not pop the item off of the stack. It also prints a newline after. This is one of the newlines that tr translates.

The [too early] line pushes that string on the stack, and the P command prints it without a newline.

Then the 10 and P lines print a newline, and the q command quits dc.

Of course, since tr translates all of the newlines into spaces, and because it comes after the text, and because a newline is printed after dc exits anyway (by echo ''), the 10 and P lines could be removed.

In essence, this first part of the script is an just an error check, an if statement, but that is not obvious at first glance.

Algorithm

The next lines are:

	d
	d
	d

The item on the top of the stack is duplicated three times.

What’s on the top of the stack? The year. Again.

There used to be three, but one was used for storing into the f register, and one was used to do the comparison with 1583.

So after the duplications, there are four copies of the year on the stack.

The next lines begin the algorithm in earnest.

	19
	%

This appears to be calculating the variable a (year mod 19), which appears in all of the algorithms on Wikipedia.

Remember the pattern of “number, operation” because it appears again and again.

	1
	+

I cannot figure out this line. In none of the algorithms listed in Wikipedia is a is incremented.

	sg

Incremented variable a is popped from the stack and stored in register g.

	100
	/

This divides the year (which is the top of the stack again) by 100, which appears to calculate k from Gauss’ algorithm, or b from the later algorithms.

	1
	+
	d

Once again, the variable is incremented. This is confusing again because Gauss’ algorithm does not increment k, and the others do not increment b.

But whatever the value is, it is also duplicated.

	3
	*
	4
	/

The duplicate value is multiplied by 3, and the result is divided by 4.

At this point, it is starting to look like Gauss’ algorithm is the one in use, with changes, of course.

	12
	-
	sx

Next, 12 is subtracted from the result, and that result is popped and stored into the x register.

Now, before we go any further, we have to answer the question that you always have to answer for stack-based languages: what’s on the top of the stack?

The value that was incremented after getting divided by 100.

With that in mind, let’s continue:

	8
	*
	5
	+
	25
	/
	5
	-
	sz

That value is multiplied by 8, added to 5, divided by 25, and has 5 subtracted from it before it is popped and stored in the z register.

Looking at the algorithms, the Gauss algorithm is still in the running, but the 1961 New Scientist algorithm still seems to have a chance.

Again we ask the question: what’s on the top of the stack?

It’s the year again; there are still two copies.

	5
	*
	4
	/

The year is multiplied by 5, and the result is divided by 4, and it is left on the top of the stack.

	lx

Next, the value in the x register is pushed onto the stack, and we should remind ourselves what is in x: the value that was multiplied by 3, divided by 4, and had 12 subtracted from it.

	-
	10
	-
	sd

First, the value loaded from x is subtracted from our previous value, then 10 is subtracted from that, before that result is stored in the d register.

Again, what is on the top of the stack?

It’s the last copy of the year, but it turns out that it doesn’t matter because…

	lg
	11
	*
	20
	+

The value in the g register is loaded (the one modded by 19 and incremented), multiplied by 11, and added to 20.

	lz
	+
	lx
	-
	30
	%
	d

The value in z (the one divided by 25 and less 5) s loaded and added to the value that was already there. Then the value in x is loaded (mulitplied by 3, divided by 4, and less 12), and that is subtracted from the value that had been on top before all of that is modded by 30 and duplicated.

At this point, I have lost what the algorithm was…the one used here is certainly not on Wikipedia.

But what is on the stack?

Debugging tip: to figure out what is on the stack at a certain point, I just add fq after the command I am interested in.

The f command prints the contents of the stack, and q will quit dc in most cases. If there are more than 3 strings executing, you would put that number of strings on the stack and then run the Q command.

All that’s on the stack is the two copies of the result just calculated and the year.

	[
		30
		+
	]
	s@

Next, a string is pushed onto the stack. This doesn’t look like a string you would want to print, so it’s probably code. And it is stored into the @ register.

Another string was stored in the @ register, the error check string at the beginning. It doesn’t matter now, so the program can safely overwrite it.

	0
	>@

Then 0 is pushed onto the stack, compared against the former top of the stack, and if it is greater (i.e., the former top of the stack is less than 0), the string just stored in @ is executed (just another if statement), which just adds 30 to the top of the stack.

Remember that the items that were compared were removed from the top of the stack, so the current top of the stack is the value that was modded by 30 and duplicated. This was, in fact, why it was duplicated.

	d
	[
		[
			1
			+
		]
		s@
		lg
		11
		<@
	]
	s@

The value on top of the stack, the value that was just added to 30 is duplicated and another string that looks like code is pushed on the stack before being stored in the @ register.

	25
	=@

Next, the number on top of the stack is compared to 25, and if they are equal, the contents of @ are executed, so let’s go back to that.

		[
			1
			+
		]
		s@
		lg
		11
		<@

First, a string is pushed onto the stack and stored in @.

It looks like whoever wrote this code loved to use the @ register to store code to execute in if statements.

Then the contents of the g register are loaded, and if 11 is less than that, that string is executed. Obviously, that string just increments the top of the stack.

What is the top of the stack at that point? It’s the value that was duplicated before the first outer string was pushed, so that’s the value that could be incremented.

	d
	[
		1
		+
	]
	s@
	24
	=@

That value, which may have been incremented or not, is duplicated before yet another if statement. This time, it compares it against 24, and if they are equal, the original (not the duplicate, which is gone) is incremented.

The pattern of “duplication, if statement” is a good one to remember. It will pop up again.

	se
	44
	le
	-

Next, that value is stored in the e register before 44 is pushed onto the stack and the value in e is pushed onto the stack again before being subtracted from 44.

It was pushed onto the stack again because in subtraction, order of the operands matters.

	d
	[
		30
		+
	]
	s@
	21
	>@

Here is the duplication, if statement pattern again. I will leave it to the reader to analyze it in detail.

	d
	ld
	+
	7
	%
	-
	7
	+

The value on top is duplicated before the contents of the d register are pushed. Then they are added before the patter of “number, operation” is done once. Then, the result is subtracted from the original before yet another “number, operation” pattern.

	[March ]
	sm

The string March is stored into the m register.

	d
	[
		31-
		[April ]
		sm
	]
	s@

The value on top of the stack is duplicated before another string is put into @. It looks like it’s going to be another if statement.

	31
	<@

Yes, it is. If the top of the stack is greater than 31, then we execute that string.

		31-
		[April ]
		sm

31 is subtracted from the top of the stack (the original because the duplicate was consumed), and then April is stored into m.

In other words, the top of the stack is the day, and if it’s greater than 31 (the number of days in March), then Easter will be in April, and we need to adjust the day.

	p
	sn

The p prints the current item on the stack, which is the day, followed by a newline (later changed by tr). It does not, however, consume the day, so next, it is stored into the n register.


	lm
	P

First, the contents of m are loaded, which is the month, and it is popped off the stack and printed (without a newline).

	p
	sn

The item on top of the stack (which is the year, by the way) is printed, popped, and stored into n.

	1
	z
	>p

These lines push 1 onto the stack, then they push the number of items on the stack onto the stack.

It pushes the length of the stack at the start of the command.

Then if the stack length is greater than 1, the contents of p are executed.

In other words, these lines are checking that the stack was empty before their execution, and if they were not, it executes the contents of p.

What are the contents of p? We need to continue to find out.

End

]
sp
lp
x' \

Here, that long string is finished and stored into p. Then it is loaded again (this is done to ensure there is a copy of it) and executed with the command x, which unconditionally executes a string.

So those stack-checking lines above will end up re-executing p if the stack was not empty.

And just like that, I know where the bug is in my dc: it’s not popping items off the stack properly when printing them with the P command.

The bug was fixed by commit 34ca7b50d1c0.

These lines would not have been necessary, by the way, if it had not been for the stack check, which is the very reason it found a bug in my dc.

Conclusion

I hope you enjoyed this tour of the historically significant and hacker-like dc calculator. I don’t know that there was any point to it other than for me to add to the oral tradition of software engineering.

And to debug my dc without gdb.