The programmable programmer: Don's benchmark.

20071128

Don's benchmark.

It appears the mob has spoken and that I am a wrong headed twit.

My apologies to Don on the misunderstanding. When I first saw the numbers I assumed that the speed had to be due to memoization. It's not.

I saw a link earlier to a post by Don Stewart in that he asserts that Haskell is 40 times faster than python for a given algorithm.

He then goes on to point out how easy it is to drop hints to the code to parallelize the code.

While I won't go into any threading issues, the core point that it takes python almost thirty seconds to generate each Fibonacci number up to 36 is misplaced.

Haskell is a language built around the idea of functions that can be assumed to be functionally transparent. If you give them the same set of arguments they'll give the same answer.

The upside to this is that the compiler writers can automatically memoize every function if they want to unless its been explicitly marked as being otherwise.

Python, being based upon the idea of dynamic objects rather than that of transparent functions has no need for such optimizing. The developers time is better spent ensuring rapid dictionary insertion and deletion over transparent function memoization.

But, as he doesn't seem to mind having to annotate concurrancy, I'm sure that annotating transparency is just as fair.

#!/usr/bin/python

# Don's original python
# def fib(n):
#     if n == 0 or n == 1:
#          return n
#     else:
#         return fib(n-1) + fib(n-2)
#     
# for i in range(36):
#     print "n=%d => %d" % (i, fib(i))

def memoize ( func ) :
    """ In haskell mutability must be handled explicitly.
    Only fair that Python do the same to transparent functionality
    """
    remembered = {}
    def memoized ( *args ) :
        if args in remembered :
            return remembered[ args ]
        else :
            new = func( *args )
            remembered[ args ] = new
            return new
    return memoized

@memoize
def fib(n):
    if n == 0 or n == 1:
         return n
    else:
        return fib(n-1) + fib(n-2)
    
for i in range(36):
    print "n=%d => %d" % (i, fib(i))

The above code runs at a comfortable

  real    0m0.020s
  user    0m0.012s
  sys     0m0.008s

which compared to his times

  Ruby (1.8.5)   64.26s
  Python (2.4) 25.16s
  Haskell (GHC 6.8)  0.78s
  Parallel Haskell (GHC 6.8)  0.67s

fairs reasonably.

Since Python allows mutable variables a looping implementation would have been more idiomatic python. Even better to use a generator. Then we can start output before the sequence is complete.

All his benchmark really showed was that memoization on transparent functions is really great. Thanks guy.

11 comments:

Unknown said...: > All his benchmark really showed was that memoization on transparent functions is really great. Thanks guy.

There was no memoization in his benchmark code. And you completely and utterly missed his point, congratulation.; November 28, 2007 at 2:36 PM
Anonymous said...: Except (GHC) Haskell doesn't memoize function calls, and certainly not in this example, as you may see if you take a look at the generated code. Purity certainly parallelization easier, but it's still parallelizing lots of repeated work.

Dons actually cheats by specifically requesting the nasty fixed-size overflowing Int rather than the safe bignum Integer type. Changing the type to Integer seems to cost about 8.5x.; November 28, 2007 at 2:55 PM
Andrew said...: There's no memoization in the Haskell version:

http://programming.reddit.com/info/61no8/comments/c02jva2

Good discussion of the parallelization hints:

http://programming.reddit.com/info/61no8/comments/c02juu8; November 28, 2007 at 2:57 PM
Cale Gibbard said...: Except that this entire post is misguided, since the Haskell version which Don posted doesn't do any memoisation.

Memoising by default would be a silly thing for a Haskell implementation to do, since it would generally cause memory usage to go through the roof, so no implementation does that. If you want to memoise, you can define a constant array or list which holds the values of the function, and that will stick around for as long as it remains in scope.

(Often if the function is recursive you then want to modify it to look in the data structure for its recursive calls, whenever the parameter is in range.)

Don's program did none of that.; November 28, 2007 at 3:04 PM
Unknown said...: You are wrong. Haskell does not use memoisation in given example.
They could use it though by defining fib as a lazy list:
fib = 0 : 1 : zipWith (+) fib tail fib

So, no soup for you.; November 28, 2007 at 3:50 PM
Don Stewart said...: The Haskell version does not memoise results -- it is just faster as Haskell is compiled to native code, optimised better, and has type erasure.

A memoised implementation is much faster again. See examples here:

http://haskell.org/haskellwiki/The_Fibonacci_sequence; November 28, 2007 at 4:02 PM
Anonymous said...: I think i have to thank you for one thing - we can comment :)

There are some other blogs which dont allow comments. I dont read them anymore.

(I leave to others what can be said about your entry here though ;> ); November 28, 2007 at 4:04 PM
Don Stewart said...: This premise of this post is incorrect. Haskell doesn not transparently memoise results.

The reason GHC produces faster code here is not due to magic tricks, but simply it compiles to native code, its an optimising compiler, and it the language supports type erasure.; November 28, 2007 at 4:06 PM
Logan Capaldo said...: Haskell does not magically memoize functions.
Here is a memoized version:

import Text.Printf ( printf )
import Control.Monad ( forM_ )

fibs :: [Int]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
fib :: Int -> Int
fib n = fibs !! n

main = forM_ [0..35] $ \i ->
printf "n=%d => %d\n" i (fib i)

On my machine the timing for this version is

0.03 real 0.00 user 0.00 sys

and for the regular, non-memoized (non-parrallelized) version

59.20 real 25.30 user 0.40 sys; November 28, 2007 at 7:05 PM
danly said...: Nice, you compare your own final benchmark against his raw numbers.

How about running those tests again on a single, identical machine, eh?

And... A few dozen additional printable characters vs. your a pile of Python hackage. Not convincing.; November 28, 2007 at 8:07 PM
Unknown said...: > And... A few dozen additional printable characters vs. your a pile of Python hackage. Not convincing.

Let's be fair in both ways: while michael was completely and utterly wrong in saying that Dons' code memoizes, there is no "pile of Python hackage" in his code: the memoize function is kind-of a staple in Python (and of examples of Python decorators in particular) and it's trivial to find that kind of basic implementation on the python cookbook. So even if you don't want to write it it takes all of one search.

And that memoize function (plus the @memoize decoration) is, as far as I can tell, everything micheal added to dons' python code (which wasn't written by dons in the first place); November 28, 2007 at 11:41 PM

The programmable programmer

20071128

Don's benchmark.

11 comments:

Blog Archive

About Me