Please see the disclaimer.
By this point, it should be well-known among my non-existent readership that I
wrote an implementation of POSIX
bc. It is less well-known that I work
for SchedMD, LLC supporting Slurm, the number one supercomputer
When submitting a job to a Slurm “cluster,” we can either submit a normal executable or a shell script. When I am working with Slurm on a day-to-day basis, I usually use shell scripts.
I did not need a non-seeded PRNG because Linux has
/dev/urandom, which is
also as close to cryptographically secure as I could get in shell.
bash, which is the shell I use, has the special variable,
but it is only guaranteed to return 15 bits of random data. Also, because it
probably uses C's
rand(), which is low quality, the random stream is not
good. And because C's
rand() is not good (and also is only guaranteed to
return 15 bits of random data), I can't just write a simple C program to
generate random numbers either.
After thinking about it awhile, I realized that a pseudorandom number generator
bc would be worth the effort.
There are a few reasons for this:
bccould make sure the output is unbiased when generating a random number within a range.
bccould generate integers of any size, since it can handle integers of any size.
bccould generate reals with arbitrary precision as well.
So I got to work.
Choosing the Algorithm
The first step was choosing which PRNG algorithm I would use. I had a few requirements:
- The PRNG should be a family of PRNG's able to generate either 32 or 64 bits
of random data on each call. This is because my
bccan be compiled for 32-bit and 64-bit platforms.
- The PRNG should be relatively fast.
- The PRNG should at least pass BigCrush and PractRand.
Starting from this list, I began doing my research.
I first came across this critique of the PCG generators, and I must admit, it took me a bit to realize that it was written by the author of the xoroshiro generators. Once I did, I started looking for other sources.
This link says that some members of the xoroshiro family fail BigCrush. It also says that the PCG family pass BigCrush.
This link says that the xoroshiro family failed PractRand, but that the PCG generators passed.
There were more, but I did not find anyone who managed to break the PCG generators after trying, while there were several people who managed to show instances of the xoroshiro generators not passing the PRNG test suites.
The difference in tone between the critique and the response was…enlightening. The critique had vitriol, but the response showed a willingness to admit mistakes and learn.
Then I began reading the blog by the PCG author, and I learned more about PRNG's than I ever expected too.
When it came down to it, I was leaning towards the PCG generators, but I freely admit that I was emotionally swayed, by the difference in tone between the two authors plus the plethora of information on the PCG blog, to choose the PCG family of PRNG's.
However, which PCG generators I use was still important; Melissa O'Neill,
the author of the PCG generators, has made it clear which of the generators she
would recommend for general-purpose use when speed isn't paramount and
prediction difficulty is needed:
pcg32 for 32 bits and
pcg64 for 64 bits.
And those were the ones I went with. No, they are not the fastest (thought they are still fast), but I trust, from what was written by other, unbiased experts, that the quality of those generators would be as high as I could get in 32 or 64 bits.
Designing the API
The next step was designing the API, which was a surprisingly hard thing to do.
Well, designing the way to generate random numbers was easy: built-in functions.
For querying the maximum value generated by the PRNG, I added the
maxrand() function. Users can use it to see what the maximum integer generated
rand() built-in function.
rand() is the function that generates a random number without any extra
processing. It just generates the 64- or 32-bit random integer, turns it into a
bc understands, and returns it.
irand(x) takes an integer
x and returns an integer in the range
When doing so, it removes bias. It can generate integers of arbitrary size as
x is greater than the value returned by
maxrand(), it generates
more than one integer, multiplying some by correct powers to grow the integer to
the correct size. So
irand(x) can generate a massive integer if
These three built-in functions are also available as commands in
From there, I added functions to
bc's extended math library, instead of
built-in functions. The reason for this is because all of these other functions
could easily be implemented with the built-in functions.
frand(p)generates a real in the range
pdigits after the decimal point.
ifrand(x,p)does a combination of
xwith its sign flipped with a probability of
0.5. In other words, it randomizes the sign of
brand(): returns a random boolean value.
The implementation of these three functions is in
The API Problems
Didn't I say that designing the API was hard? Yes, but only one thing was actually hard: handling the seed.
bc should be able to both query and set the seed. And that's only
part of what made it difficult.
In the PCG family of generators, there are actually two parts to the seed: the
seed itself and the “stream selector." Each of these has twice the number
of bits than the PRNG returns on every call,1 e.g.,
pcg64's seed and stream
selector both have 128 bits, and
pcg32's seed and stream have 64 bits.
These aspects caused two problems:
- I needed 128-bit math for the 64-bit
- The seed in the
bcneeded to be two separate values.
While POSIX, which is the standard my
bc depends on for portability, requires
implementations to have a
uint64_t type (required for a 32-bit
implementation), it does not require them to have a
uint128_t type. Thus, I
could not depend on having 128-bit math available for my
bc, even though
clang provide 128-bit math most of the time. I implemented the required
math by hand, though my build system detects if 128-bit math is already
available and uses it if it is.
This solved problem 1.
Two Values for the Seed
The other problem, two values for the seed, was much harder.
I knew that I would need to make the seed values special variables, like
scale. That way, they could both be set (by assignment)
and queried (by putting them in an expression).
However, I did not like the thought of adding two new variables, mostly because it would make an awkward upgrade path for my users if I later switched to an algorithm that only needed one seed value since one special variable would become a regular variable.
It took a long time for me to find the solution: have only one value and use both sides of the decimal point.
The current seed API for my
bc is this: there is a special value called
seed, and unlike
scale, which must all be integers,
seed can be, and always is (because of a quirk in the PCG family), a real
number. The integer portion is converted to a 128-bit (for 64-bit) or 64-bit
(for 32-bit) integer and is used as the stream selector. The non-integer portion
is multiplied by a certain value and truncated to turn it into a 128-bit (for
64-bit) or 64-bit (for 32-bit) integer and is used as the seed value.
Do NOT make
bc scripts that depend on the above information because it
could become invalid if I change the PRNG implementation later.
Any guarantees that
bc provides regarding the PRNG API are spelled out clearly
bc manual, which can be accessed by
man bc in the shell if you
installed it with man pages. These guarantees will remain the same regardless of
which PRNG algorithm is used.
But even that wasn't all.
bc has an option,
-g that turns globals into stacks. This means
that if a global is changed in a function, the change is not seen outside the
This is one of my
bc's greatest features because it makes writing
easier; however, it creates a problem with the PRNG because
seed is changed
every time the PRNG is used, which should be visible to other functions.
I struggled for a long time, and finally came up with an answer: if
assigned to in a function, then any calls to the PRNG after the assignment do
seed's value outside the function. If the PRNG is used before an
seed in a function, then changes to
seed are visible outside
Seems simple enough, but the code to implement it was not.
After figuring that out, it was just of matter of implementing it and the rest
of my design, which took some time to do while following the style I already had
bc. It also took some time to test.
To do this, I had to do many things that were new and interesting. Here are some of them.
- As mentioned before, I had to implement 128-bit math.
- I learned how to unbias a PRNG when generating with a smaller range.
- I had to make sure the
seedwas set if it was queried before any random number was generated.
- I had to learn how to generate an integer of arbitrary size.
That last one is the most interesting to me because I almost got it wrong. Twice. In fact, I did get it wrong for a time.
At first, I thought that I could just generate a bunch of integers in the range
[0,maxrand()], or in other words, just use
rand(), and only the integer
that was placed at the most significant position had to be generated with a
smaller range. That is not quite the case, but it was close.
However, the second mistake I made was thinking that it was not close. My next
try was to generate a number, but each integer portion had to be less than the
one in the target. This was completely wrong;
irand(2^64+1) would only
generate two numbers,
My next try was to check each to see if each integer portion was larger than its equivalent in the target. If it was, I subtracted one from the next portion, which was more significant.
Anyway, I think I got it right this time, but I may be wrong. Bug reports are welcome.
Status of the PRNG
After I did all of this work, readers may be surprised to know that I am not
putting out an official release of
bc with the PRNG. The reason is that I
asked the majority of my users if they had a use for it, and the answer was
unanimous: no, they did not.
This change bloats
bc by almost 5 kb when building with size optimizations;
that is massive, and since
bc is not really used nowadays (most people use
Python or another arbitrary-precision calculator), except to build the Linux
kernel, I can't justify putting it in a release, especially when some of my
users want a small
That said, if you want
bc with its PRNG, you can build it from the
branch, and you can rest assured that it is not disappearing anytime soon
because I use it. It is also heavily tested, just like real releases are.
And if, over time, more and more users decide that they want to use it, then maybe someday, I will put out an official release with the PRNG.
So if you decide to use it, let me know.