DNA Reverse-complement

117 views
Skip to first unread message

James L.

unread,
Jan 25, 2013, 9:57:24 AM1/25/13
to scala-l...@googlegroups.com

Rex Kerr

unread,
Jan 25, 2013, 10:07:17 AM1/25/13
to scala-l...@googlegroups.com
It's about 30% slower than the best Java version, which is typical (rarely faster, usually comparable, sometimes up to ~30% slower given apparently essentially identical approaches).  Feel free to improve it; I don't really have time.

  --Rex

Dave

unread,
Jan 25, 2013, 10:20:34 AM1/25/13
to scala-l...@googlegroups.com
The scala version is java 6 bytecode running on java 7 runtime.
The java version is java 7 bytecode running on java 7 runtime.


Op vrijdag 25 januari 2013 15:57:24 UTC+1 schreef James L. het volgende:

Aleksey Nikiforov

unread,
Jan 25, 2013, 12:35:41 PM1/25/13
to scala-l...@googlegroups.com
It looks like Java version exploits multithreading via ForkJoinPool. Scala version, on the other hand, seems to be single-threaded.



--
 
 

Aleksey Nikiforov

unread,
Jan 25, 2013, 12:45:29 PM1/25/13
to scala-l...@googlegroups.com
From the description of the benchmark:

"We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result."
(http://benchmarksgame.alioth.debian.org/u64/performance.php?test=revcomp#about)

Strapping multithreading to an algorithm may or may not be a form of cheating. But I think it certainly violates the spirit of the benchmark.

Rex Kerr

unread,
Jan 25, 2013, 12:47:37 PM1/25/13
to scala-l...@googlegroups.com
Multithreading matters on the quad-core test but not on the linked single-core one.

Threading is explicitly encouraged for multi-core tests (why have them otherwise?!).  Look at the single-core tests if you don't want to consider multithreading.

  --Rex

Aleksey Nikiforov

unread,
Jan 25, 2013, 12:52:32 PM1/25/13
to scala-l...@googlegroups.com
The java test with fork-join that beats scala is in the one-core category: "x64 Ubuntu : Intel® Q6600® one core Computer Language Benchmarks Game".

My guess is that it gains acceleration from hypertheading. If the assumption for one-core tests is to be single threaded, this is definitly cheating.


--
 
 

Erik Osheim

unread,
Jan 25, 2013, 1:03:21 PM1/25/13
to scala-l...@googlegroups.com
On Fri, Jan 25, 2013 at 06:57:24AM -0800, James L. wrote:
> I still think the Scala code isn't well optimize, is it?

For what it's worth, on my machine [1] for N=25M I get real times of:

C gcc #2 (top-ranked) 0.922s
Java 7 #6 1.505s
Scala #4 2.031s
Scala port of J7 #6 1.694s

So what this says is that the newer Java implementation (which may not
have been around when Scala #4 was written) is 0.526s faster than Scala
(i.e. the Scala version is ~35% slower). However, when I ported Java 7
#6 to Scala in ~15 minutes I was able to close the gap to 0.189s (which
means that the Scala version I wrote is ~13% slower).

I didn't make any attempt to optimize the Java program in an
intelligent way for Scala (e.g. with @tailrec) or to come up with a
novel algorithm. Either of those things is possible IMO.

Bear in mind that when I timed "hello world" it took about 0.3s to run
in Scala (I assume there is also some kind of startup overhead for
Java). Given that the difference between C and my Scala example is
~0.75s, that means that on my system about 40% the difference may be
startup time.

For most things I work on, startup time is not the primary concern.

Hope this helps give some perspective.

-- Erik

[1] MBP Retina, 2.7GHz core i7, 16G RAM, OSX, Java 1.7.0_04-b21, Scala
2.9.2 (only realized this after doing the work, don't feel like redoing
it again).

Erik Osheim

unread,
Jan 25, 2013, 1:07:34 PM1/25/13
to scala-l...@googlegroups.com
On Fri, Jan 25, 2013 at 11:45:29AM -0600, Aleksey Nikiforov wrote:
> Strapping multithreading to an algorithm may or may not be a form of
> cheating. But I think it certainly violates the spirit of the benchmark.

The top-ranked C benchmark here is using POSIX threads. So I think it
must be accepted.

-- Erik

Dave

unread,
Jan 25, 2013, 1:10:07 PM1/25/13
to scala-l...@googlegroups.com, er...@plastic-idolatry.com
Apart from improving the implementation, did you see some speedup when using bytecode 7 instead of 6 (maybe the jit compiler optimizes better)?

Op vrijdag 25 januari 2013 19:03:21 UTC+1 schreef Erik Osheim het volgende:

Aleksey Nikiforov

unread,
Jan 25, 2013, 1:10:49 PM1/25/13
to scala-l...@googlegroups.com
Indeed, I think the benchmar suit runs the same tests for quad core and single core.



-- Erik

--



er...@plastic-idolatry.com

unread,
Jan 25, 2013, 1:19:37 PM1/25/13
to scala-l...@googlegroups.com
On Fri, Jan 25, 2013 at 10:10:07AM -0800, Dave wrote:
> Apart from improving the implementation, did you see some speedup when
> using bytecode 7 instead of 6 (maybe the jit compiler optimizes better)?

> > C gcc #2 (top-ranked) 0.922s
> > Java 7 #6 1.505s
> > Scala #4 2.031s
> > Scala port of J7 #6 1.694s

So just now with Scala 2.10.0 and -target:1.7 I saw 1.678s on my port.
So I guess there might be a benefit?

Please bear in mind that I'm not going to the trouble of running these
things N times, calculating statistical significance, etc. These
results are anecdotal and should be treated as such.

-- Erik

Dave

unread,
Jan 25, 2013, 1:20:13 PM1/25/13
to scala-l...@googlegroups.com, er...@plastic-idolatry.com

2.9.2 (only realized this after doing the work, don't feel like redoing
it again).

O but this might be the cause why it is faster on your computer.
When I looked at the shootout benchmark I immediately saw that the scala 2.10 is slower than 2.9, because Haskell GHC was higher in the ranking. In the previous version was higher in the ranking.

Dave

unread,
Jan 25, 2013, 1:21:47 PM1/25/13
to scala-l...@googlegroups.com, er...@plastic-idolatry.com
forgot a word:
In the previous version was Scala higher in the ranking.

Rex Kerr

unread,
Jan 25, 2013, 1:41:28 PM1/25/13
to scala-l...@googlegroups.com
It's pretty easy to submit this as another Scala solution (maybe another 15 minutes to make the account and go through the "how to submit" instructions).  Why don't you?

  --Rex



--



Erik Osheim

unread,
Jan 25, 2013, 1:47:59 PM1/25/13
to scala-l...@googlegroups.com
On Fri, Jan 25, 2013 at 01:41:28PM -0500, Rex Kerr wrote:
> It's pretty easy to submit this as another Scala solution (maybe another 15
> minutes to make the account and go through the "how to submit"
> instructions). Why don't you?

That's a good idea.

I'll try to do that later tonight... along with all the other Scala
things I need to do! ;)

-- Erik

er...@plastic-idolatry.com

unread,
Jan 25, 2013, 11:41:09 PM1/25/13
to scala-l...@googlegroups.com
On Fri, Jan 25, 2013 at 01:19:37PM -0500, er...@plastic-idolatry.com wrote:
> > > C gcc #2 (top-ranked) 0.922s
> > > Java 7 #6 1.505s
> > > Scala #4 2.031s
> > > Scala port of J7 #6 1.694s
>
> So just now with Scala 2.10.0 and -target:1.7 I saw 1.678s on my port.

One final follow-up:

After a bit of optimization, I was able to get the Scala 2.10 example
down to ~1.540s. I had hoped to get it faster than Java (since I
discovered at least one trick that Java wasn't using). There may be
some kind of overhead that Scala is imposing here; the answers are in
the bytecode I expect.

But anyway, these numbers look good enough to defeat any performance
FUD (versus Java at least).

-- Erik

P.S. I've attached the code I'm using in case anyone is curious. It's
heavily based on Java 7 #6 (like I said) and I certainly would not call
it idiomatic Scala code. :P

er...@plastic-idolatry.com

unread,
Jan 25, 2013, 11:47:04 PM1/25/13
to scala-l...@googlegroups.com
On Fri, Jan 25, 2013 at 11:41:09PM -0500, er...@plastic-idolatry.com wrote:
> P.S. I've attached the code I'm using in case anyone is curious. It's
> heavily based on Java 7 #6 (like I said) and I certainly would not call
> it idiomatic Scala code. :P

That statement was false, but now is true.

-- Erik
newest.scala

er...@plastic-idolatry.com

unread,
Jan 25, 2013, 11:58:20 PM1/25/13
to scala-l...@googlegroups.com
Actually it seems like I emailed the list the wrong version and it has
a bug. Sorry for the noise everyone. I will submit a corrected
benchmark to that site although I will probably stop posting to this
thread. Feel free to email me off-list if you have questions.

Sorry,

-- Erik

Sassa Nf

unread,
Jan 26, 2013, 5:13:35 AM1/26/13
to scala-l...@googlegroups.com
I wouldn't pay too much attention to that benchmark.

It isn't clear how the file is buffered there, and the benchmark owner wouldn't disclose that, when I asked about it.

I tried Java 7 #6 on my Westmere with tons of RAM, elapsed time is about 10 seconds, if the file isn't cached (is read first time)

Elapsed time is about 2 seconds, when the file is in /dev/shm (proving that Java 7 #6 gets such results only because by the time it was running the file was definitely cached)


There is nothing to optimize language-wise. The efficiency will come from eliminating single-threaded memory scanning.

Alex

Sassa Nf

unread,
Jan 26, 2013, 5:57:33 AM1/26/13
to scala-l...@googlegroups.com
Also, there is one, just one, contribution whose CPU time is 5x less than Elapsed time. The benchmark owner won't tell why / how that's measured. There is no way that's possible. Even if someone would consider lock contention of some kind, wouldn't you expect at least one thread burn the CPU, hence CPU time to be at least equal to Elapsed time?...

In my env the picture is very different. Which in itself discredits the benchmark, too.


Alex


2013/1/26 Sassa Nf <sass...@gmail.com>

--
 
 

Isaac Gouy

unread,
Jan 26, 2013, 12:04:46 PM1/26/13
to scala-l...@googlegroups.com
> I wouldn't pay too much attention to that benchmark.

If anyone is curious about Alex's comments, please see the benchmarks
game tracker item here --

https://alioth.debian.org/tracker/
index.php?func=detail&aid=313896&group_id=30402&atid=413100

Sassa Nf

unread,
Jan 26, 2013, 3:03:14 PM1/26/13
to scala-l...@googlegroups.com
Probably offtopic, but would be great if the issue with 10 seconds elapsed vs 2 seconds CPU time could be investigated further.

This is sample output I have in my env. Order of magnitude for Java 7 #6 is the same as yours, so the order of magnitude for my contribution should be similar to what I get here; given that we have around 4 cores busy, there should be no big advantage from Westmere vs quadcore:

Java 7 #7
[fork_join]$ time ~/jdk/jdk7/jdk1.7.0/bin/java -cp classes b_nio < /dev/shm/revcomp.25000000.in > /dev/null
0.370340876 -- elapsed time as seen by the app (the rest is JVM startup)

real    0m0.546s
user    0m1.933s
sys     0m0.196s

Java 7 #6
[fork_join]$ time ~/jdk/jdk7/jdk1.7.0/bin/java -cp classes revcomp < /dev/shm/revcomp.25000000.in > /dev/null
0.898935256

real    0m0.970s
user    0m0.858s
sys     0m0.356s


Anyway, this is to demonstrate the actual saving is in a algorithmic improvement - avoid scanning memory.


Alex


2013/1/26 Isaac Gouy <igo...@yahoo.com>

--



Sassa Nf

unread,
Jan 26, 2013, 3:35:11 PM1/26/13
to scala-l...@googlegroups.com
I forgot you want to keep the output, so redirecting to a /dev/shm file yields higher times, still Java 7 #6 is good approximation of alioth results, and Java 7 #7 offers a great improvement over it, which should be seen:

[mwm_jms@mwm-c12 fork_join]$ time ~/jdk/jdk7/jdk1.7.0/bin/java -cp classes b_nio < /dev/shm/revcomp.25000000.in > /dev/shm/revcomp.25000000.out
0.552204378

real    0m0.726s
user    0m1.780s
sys     0m0.387s
[mwm_jms@mwm-c12 fork_join]$ time ~/jdk/jdk7/jdk1.7.0/bin/java -cp classes revcomp < /dev/shm/revcomp.25000000.in > /dev/shm/revcomp.25000000.out
1.048816646

real    0m1.141s
user    0m0.863s
sys     0m0.522s


2013/1/26 Sassa Nf <sass...@gmail.com>
Reply all
Reply to author
Forward
0 new messages