Skip to main content

Posts

Showing posts from April, 2011

Publications by Jeff Vitter

Jeff Vitter is the provost and executive vice chancellor and the Roy A. Roberts Distinguished Professor at the University of Kansas. I just stumbled across his page and found a great book on external memory algorithms and data structures external memory algorithms and data structures. IE: algorithms and data structures to use when your dataset is too large to fit in main memory. He has many other papers on his site including several on external memory graph algorithms. Here's the link... http://www.ittc.ku.edu/~jsv/Papers/catalog/

How big are primes?

So, If we were to calculate the first billion primes how much ram would we need? At the end the primes won't fit in a 32bit integer.  If we store them all in 64bit integers we'd need 8GB. Even if they all fit in 32bit integers we'd still need 4GB of storage just to store the list. Seeing at my little machine has 4GB ram that isn't going to work either. How about just storing the intervals between primes? For the first primes (2, 3, 5, 7, 11, 13, 17) we would store (2, 1, 2, 2, 4, 2, 4) Using a byte for every interval less than 128 and a short for the rest lets us store the first billion primes in just about 1GB.  Now that we can do! Maybe we can do even better though...

Segmented Sieve

Our next version works just like the previous sieve but works on a segment of the number line at a time. That might be why it is called a  segmented sieve . Now we are sieving multiples of the primes from 1M numbers at a time, adding those primes to the list and then moving on to the next 1M numbers. I hit 140M primes in 60 seconds on my box... but now I run into a new problem.  This version of the sieve quickly consumes all the memory in my box and slows to a crawl.  To progress we will need to find a more efficient way of storing the primes we have found.

Basic Sieve

I've implemented a basic sieve to see if it does any better than trial division. While trial division found the first million primes in 8.28 seconds the sieve does the same task in 0.84 seconds.  That's a little better! Running the sieve on the range [2..1,000,000,002) yields 50,847,534 primes in 78.14 seconds.  I'm currently storing a byte for each number so that means I'm using 1GB ram.  We can't just keep increasing the range because we will eventually run out of memory.  Even storing a single bit for each number (we only need to know if it has been crossed our or not) would only divide this by a factor of 8. Perhaps we can work on a section of the number line at a time?

Sieve of Eratosthenes

Another way to generate a list of primes is through the Sieve of Eratosthenes . Essentially you create a long list of numbers starting at 2. 1. Set p equal to 2.  This is your first prime. 2. Cross out every pth number because they are all divisible by p. 3. The first number after p which hasn't been crossed out is the new prime. 4. Repeat from setup 2. As an example lets take the number line from 2 to 10. list = 2, 3, 4, 5, 6, 7, 8, 9, 10 1. p = 2 2. list = 2 , 3, 4 , 5, 6 , 7, 8 , 9, 10 3. p = 3 4. goto 2 2. list = 2 , 3 , 4 , 5, 6 , 7, 8 , 9 , 10 3. p = 5 4. goto 2 2. list = 2 , 3 , 4 , 5, 6 , 7, 8 , 9 , 10 3. p = 7

comparison of graph500 implementations #2

As promised, another graph but at a larger scale.  This should give some general idea for real performance on desktop class machines. I think I might attempt an out of core implementation of graph500.

comparison of graph500 implementations

Quick graph I created of the reference graph500 implementations and my bgl implementation on my home box. The data sizes are small enough that we see heavy cache effects.  I'll run another set at the larger graph sizes so that we can get a better picture of relative performance.

graph500 with the boost graph library

Because I don't have enough to do... I decided to start trying to implement the graph500 benchmark in the boost graph library and then the parallel boost graph library. I've got a very simply implementation working using an adjacency_list and a BFS visitor but I'm sure I can do much better. On my box: seq-csr at scale 20 is ~3.5e+07 TEPS adjacency-list at scale 20 is ~5.0e+07 TEPS (TEPS: traversed edges per second) I'll work on this one a bit more and then try a compressed sparse row graph.

Staking Claims: A History of Programming Language Design Claims and Evidence

Found on  Lambda the Ultimate  ( a great programming languages blog btw ) A great article  on programming languages through history and the truth of their claims. From the article: While still a relatively young field, computer science has a vast body of knowledge in the domain of programming languages. When a new language is introduced, its designers make claims which distinguish their language from previous languages. However, it often feels like language designers do not feel a pressing need to back these claims with evidence beyond personal anecdotes. Peer reviewers are likely to agree. In this paper, we present preliminary work which revisits the history of such claims by examining a number of language design papers which span the history of programming language development. We focus on the issue of claim-evidence correspondence, or determining how often claims are or are not backed by evidence. These preliminary results confirm that unsupported claims have been around since t
C Craft  is a great article about C including discussion on the good, the bad, and why you can't eat just one. Quote from the preface: Let me open with a disclaimer. What follows is an unabashedly opinionated diatribe certain to offend many programmers. Craftsmen inevitably grow defensive of their favourite tools and practices. Rather than respect this natural sensitivity, I will exploit it, shamelessly intensifying the vitriol for attention. I care little whether one applauds or deplores my views; I mainly want my audience to feel compelled to read on. However, I try to be informative as well as incendiary. Whether one finds my statements profound or unsound, whether one abhors or adores my examples, and whether one is amused or bemused by my observations, I hope one leaves wiser.

Our first list.

For our list of primes we need a list (naturally).  The best way to understand data structures is to implement them in a low level language.  Since I don't feel like inflicting ASM on everyone (although I do like assembler) we will use the next best thing.  C Out list needs a data element (the found prime) and a pointer to the next element. In C that looks like this. struct prime_elem_s {   uint64_t prime;   struct prime_elem_s *next; }; typedef struct prime_elem_s prime_elem_t; We define a structure containing the current prime and a pointer to the next element.  Then we define a type prime_elem_t to refer to that structure. In this case we want to add elements to the end of the list so we will also need a tail pointer.  We will create a list header which will keep a head and tail pointer as well as the number of elements in the list. typedef struct {   uint64_t length;   prime_elem_t *head;   prime_elem_t *tail; } prime_list_t; Now we can write a function whic