Programmers Dojo

Posts

Publications by Jeff Vitter

Jeff Vitter is the provost and executive vice chancellor and the Roy A. Roberts Distinguished Professor at the University of Kansas. I just stumbled across his page and found a great book on external memory algorithms and data structures external memory algorithms and data structures. IE: algorithms and data structures to use when your dataset is too large to fit in main memory. He has many other papers on his site including several on external memory graph algorithms. Here's the link... http://www.ittc.ku.edu/~jsv/Papers/catalog/

Stuck on a Big Hard Programming Task? Read this!

Here is a great article about tackling a difficult problem. http://softwarecreation.org/2011/stuck-on-a-big-hard-programming-task-read-this/

How big are primes?

So, If we were to calculate the first billion primes how much ram would we need? At the end the primes won't fit in a 32bit integer. If we store them all in 64bit integers we'd need 8GB. Even if they all fit in 32bit integers we'd still need 4GB of storage just to store the list. Seeing at my little machine has 4GB ram that isn't going to work either. How about just storing the intervals between primes? For the first primes (2, 3, 5, 7, 11, 13, 17) we would store (2, 1, 2, 2, 4, 2, 4) Using a byte for every interval less than 128 and a short for the rest lets us store the first billion primes in just about 1GB. Now that we can do! Maybe we can do even better though...

Segmented Sieve

Our next version works just like the previous sieve but works on a segment of the number line at a time. That might be why it is called a segmented sieve . Now we are sieving multiples of the primes from 1M numbers at a time, adding those primes to the list and then moving on to the next 1M numbers. I hit 140M primes in 60 seconds on my box... but now I run into a new problem. This version of the sieve quickly consumes all the memory in my box and slows to a crawl. To progress we will need to find a more efficient way of storing the primes we have found.

Basic Sieve

I've implemented a basic sieve to see if it does any better than trial division. While trial division found the first million primes in 8.28 seconds the sieve does the same task in 0.84 seconds. That's a little better! Running the sieve on the range [2..1,000,000,002) yields 50,847,534 primes in 78.14 seconds. I'm currently storing a byte for each number so that means I'm using 1GB ram. We can't just keep increasing the range because we will eventually run out of memory. Even storing a single bit for each number (we only need to know if it has been crossed our or not) would only divide this by a factor of 8. Perhaps we can work on a section of the number line at a time?

Sieve of Eratosthenes

Another way to generate a list of primes is through the Sieve of Eratosthenes . Essentially you create a long list of numbers starting at 2. 1. Set p equal to 2. This is your first prime. 2. Cross out every pth number because they are all divisible by p. 3. The first number after p which hasn't been crossed out is the new prime. 4. Repeat from setup 2. As an example lets take the number line from 2 to 10. list = 2, 3, 4, 5, 6, 7, 8, 9, 10 1. p = 2 2. list = 2 , 3, 4 , 5, 6 , 7, 8 , 9, 10 3. p = 3 4. goto 2 2. list = 2 , 3 , 4 , 5, 6 , 7, 8 , 9 , 10 3. p = 5 4. goto 2 2. list = 2 , 3 , 4 , 5, 6 , 7, 8 , 9 , 10 3. p = 7

Simple list of primes using trial division.

Using the list from an earlier post here is a simple prime generator. trial_division.c This will generate one million primes in around 8 seconds on my machine. Not bad but I think we can probably do better.