I have no clue if this idea is original or not. It may not be.

On Sunday, I had a flash of insight: what if we could use the Greatest Common Divisor (GCD) algorithm to factor an integer N? How could we make it do that?

The answer was clear: make sure all of the possible factors, the primes up to the sqrt(N), are in one of the arguments to the GCD algorithm. In other words, we need to run GCD on N and the primorial that includes all primes up to sqrt(N).

Unfortunately, the primorial we want is HUGE, so that won’t work.

Maybe if we split the primes into sections and ran GCD on just the products of the sections, we could prevent the need for huge numbers.

In fact, it turns out that if we do that, the algorithm gains a few nice properties.

First, there is a bad case for the algorithm: when all of N’s factors are contained in a section, GCD returns N, which is unhelpful. But if we section the primes such that we run GCD on N and the first product that is greater than N, repeating until done, then it turns out that the last prime that was multiplied into the product is a factor, so if GCD returns N, we can just take the last prime as a factor.

The whole algorithm is suprisingly small in code:

def factor(n):
    prod = 1
    for p in get_next_prime():
        while prod < n:
            prod = prod * p
            p = get_next_prime()
        g = gcd(prod, n)
        if g == n:
            g = p
        elif g != 1:
            return g
        prod = 1

This code was written by my coworker Broderick Gardner. It has been slightly edited.

Of course, this is assuming that GCD is implemented somewhere else, as well as a prime sieve in the get_next_prime() generator.

Actually, as it turns out, this algorithm has the makings of a prime sieve as well: when you need to test a number P to see if it’s prime, you just run GCD on P and all of the relevant products (up to sqrt(P)).

Even better, if all you want is a product of primes, you can use this modified sieve to just give them to you directly by multiplying all of the candidate numbers together until you reach a limit. Then you run GCD on that number and on prime products up to the square root of the largest number. If the GCD is 1, all of the candidate numbers are primes. Otherwise, you divide by the GCD and then repeat until GCD does return 1. At that point, you know all of the factors inside the product are primes themselves.

Of course, wheel factorization will still help.

And with all of this said, I am sure there will be a question or two:

Is it possible that the NSA is using this to crack RSA?

It’s possible, but not very likely. Using the limit on the number of primes below a number x, which is (x / ln(x)) * (1 + (1.2762 / ln(x)), I am able to calculate an estimate of the amount of storage needed to store all of the primorials.

The code (for bc with the -l flag) for calculating the amount of storage is below.

define bound(x) {

	auto s, l

	s = sqrt(x)
	l = l(s)

	return s / l * (1 + (1.2762 / l))
}

define primorials_data(t) {

	auto i, j, sum, bits[], sizes[]

	sizes[0] = 0
	sizes[1] = 2
	sizes[2] = 2
	sizes[3] = 2
	sizes[4] = 5

	for (i = 5; i < t; ++i)
	{

		sizes[i] = bound(2^i)

		for (j = 0; j < i; ++j)
		{
			sizes[i] -= sizes[j]
		}
	}

	for (i = 0; i < t; ++i)
	{
		bits[i] = sizes[i] * i - sizes[i] / 2
	}

	sum = 0

	for (i = 0; i < t; ++i)
	{
		sum += bits[i]
	}

	return ceil(sum / 8, 0)
}

You need my bc with its extended math library for this code to work.

The function bound(x) just calculates the prime bound I mentioned earlier. The function primorials_data(t) calculates the amount of data needed to store the primorials below 2^t. This is how it works:

First, the number of primes for the first few numbers of bits are set as constants. Then, for every bit size between 6 and t, the upper limit on the number of primes for that bit size is calculated by calculating the upper limit and then subtracting the number of primes in all previous sizes. The it calculates the number of bits needed for all sizes and sums them together, dividing by 8 at the end to give a result in bytes.

In the calculation of bits, there is a bit of code that may seem weird: sizes[i] / 2. This code is there because of the rules for the size of a product. A product’s size can either be the sum of the sizes of the operands or one digit less. And both of those have a probability of 0.5. The sizes[i] / 2 takes into account those lost digits.

The result: to store the prime products up to 2^128 with no wasted bits would require almost 5 exabytes of storage.

Now, the NSA almost certainly can store 5 exabytes, that is only enough to generate primes up to 2^256, which is only enough to crack 512-bit RSA keys. That’s not enough to crack even 1024-bit keys.

And that’s still a lot of data; going any further is basically unfeasible.

What is the computational complexity of this algorithm?

As stated earlier, it’s exponential. (I didn’t care to calculate exactly what it is.)

The reason for this is that the upper limit on the number of primes is (sqrt(N) / ln(sqrt(N))) * (1 + (1.2762 / ln(sqrt(N)))) (because we only need primes up to sqrt(N)), which while it grows fairly slowly, still grows fast enough to place this algorithm into exponential territory. If it didn’t grow so fast, this algorithm would be polynomial because the biggest limiting factor (heh) is the number of primes.

Note that this complexity still applies if you use the primorial sieve trick because that trick is still limited by the number of primes.

Can this algorithm be sped up?

Absolutely. Just about every part of this algorithm is ready-made to be embarrassingly parallel. Once you partition the primes into sections, the algorithm can be run on each section in parallel. This goes for the prime product sieve as well, though the primes needed for the sieve do need to be stored either in memory or on disk in order to do so.

And further, if you have a lot of keys to factor, just factor them all at the same time, using the same data. If you have enough keys, you might bring the amortized complexity of the algorithm into polynomial territory, though that would probably require tons of keys.

In my opinion, it is that last fact that means there is a chance that the NSA is using this algorithm; they might be able to justify the use of the algorithm because of the amount of factoring they might want to do.

If any of you have further questions or even comments, please don’t hesitate to contact me.