Why is this simple iteration over maps as slow as the Ruby equivalent ?

Ijonas Kisselbach

unread,

Apr 9, 2014, 1:53:24 AM4/9/14

to golan...@googlegroups.com

Hi folks,

I'm doing some performance measurements between Node and Go and Ruby. I explored an example of iterating 10million small hashes, during which I would perform a small operation (the multiplication of two numbers) on each hash. During stage 1 (the prep) I would load 10million hashes into memory and in stage 2 I would perform the iteration & multiplication. Both stages would then be measured.

The Go version is slowest implementation of the test. Slower than Ruby.

The Node version performs the test on my laptop with the following results:

- prep took 3155ms to run.

- calls took 684ms to run.

The Go version performs with the following results:

- prep took 13.899046356s to run.

- calls took 3.305280308s to run.

The Ruby version performs with the following results:

- prep took 12940.857ms to run.

- calls took 2670.061ms to run.

You can see the source of all three tests here https://gist.github.com/ijonas/10229518

Question: Is my test using maps in a stupidly inefficient way? Because if I perform a test where I don't iterate over 10million maps but just simply an array of 10million integers, multiplying each one, the test completes in a superfast 33ms.

Thanks,

Ijonas.

Dan Kortschak

unread,

Apr 9, 2014, 1:58:04 AM4/9/14

to Ijonas Kisselbach, golan...@googlegroups.com

why use maps for that when a struct { subtotal, total float64 } would work?

Ijonas Kisselbach

unread,

Apr 9, 2014, 2:01:59 AM4/9/14

to Dan Kortschak, golan...@googlegroups.com

Hi Dan,

Good question, I should've provided more background.

IRL we're going to be processing schemaless documents held in MongoDB, where the names & types of fields in the documents will only me known at runtime, hence the use of maps instead of structs.

Thanks,

Ijonas.

On Wed, Apr 9, 2014 at 6:58 AM, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:

why use maps for that when a struct { subtotal, total float64 } would work?

--
http://about.me/ijonas

Dmitry Vyukov

unread,

Apr 9, 2014, 2:03:16 AM4/9/14

to Ijonas Kisselbach, golang-nuts

In Go you do it as:
http://play.golang.org/p/SNGxfek_TZ

The calls took 94.674265ms to run.
The calls took 33.886967ms to run.

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Dmitry Vyukov

unread,

Apr 9, 2014, 2:05:05 AM4/9/14

to Ijonas Kisselbach, Dan Kortschak, golan...@googlegroups.com

On Wed, Apr 9, 2014 at 10:01 AM, Ijonas Kisselbach
<ijonas.k...@gmail.com> wrote:
> Hi Dan,
>
> Good question, I should've provided more background.
>
> IRL we're going to be processing schemaless documents held in MongoDB, where
> the names & types of fields in the documents will only me known at runtime,
> hence the use of maps instead of structs.

And what will be actual types of fields?

Dan Kortschak

unread,

Apr 9, 2014, 2:07:58 AM4/9/14

to Ijonas Kisselbach, golan...@googlegroups.com

Toy examples often don't work because context is important, take a look a Egon's how to ask question (don't have a link sorry). I can think of a few optimisations, but your use case may preclude them, so please give a more complete description of the problem if you can.

Ijonas Kisselbach

unread,

Apr 9, 2014, 2:08:27 AM4/9/14

to Dmitry Vyukov, Dan Kortschak, golan...@googlegroups.com

Thanks Dmitry,

They're going to be MongoDB BSON documents, so nested hashes of integers, strings, dates, etc. (they're all end-user definable at runtime). For speed test we reducing that to simulate with a map of strings and ints.

Thanks,

Ijonas.

--
http://about.me/ijonas

Ijonas Kisselbach

unread,

Apr 9, 2014, 2:15:26 AM4/9/14

to Dan Kortschak, golan...@googlegroups.com

Hi Dan,

I didn't think I was posting a toy example. Apologies for not explaining that structs weren't an option to me during the original post. Structs can't be used because the type information is defined at runtime, hence the use of maps.

The problem remains why does the use of maps slow down the performance of Go so badly? Is it garbage collector? Dmitry's use of a struct and the resulting performance improvements suggest so.

I'm stumped hence the post.

Thanks,

Ijonas.

On Wed, Apr 9, 2014 at 7:07 AM, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:

Toy examples often don't work because context is important, take a look a Egon's how to ask question (don't have a link sorry). I can think of a few optimisations, but your use case may preclude them, so please give a more complete description of the problem if you can.

--
http://about.me/ijonas

egon

unread,

Apr 9, 2014, 2:21:44 AM4/9/14

to golan...@googlegroups.com, Ijonas Kisselbach

On Wednesday, April 9, 2014 9:07:58 AM UTC+3, kortschak wrote:

Toy examples often don't work because context is important, take a look a Egon's how to ask question (don't have a link sorry).

It's up on the go-wiki https://code.google.com/p/go-wiki/wiki/HowToAsk

Tamás Gulácsi

unread,

Apr 9, 2014, 2:35:00 AM4/9/14

to golan...@googlegroups.com

Two values are not for maps. Use struct, or a slice.
Either this is a toy example (your real maps will have at least a few dozen keys), or you know the interesting keys beforehand, and you can use structs.
Augment your documents with the intresting values:
type Docu struct {
All map[string]interface{}
Nums []float64
}
Would be better.

All in all use struct or slice, not map, for this

Henrik Johansson

unread,

Apr 9, 2014, 2:45:15 AM4/9/14

to golang-nuts

But guys really?

Don't assume the op has no clue and is stupid.

If he needs to use a map he needs to use a map and he has a legitimate question about the performance.

These may all be good suggestions, that is not what I am saying.

Dmitry Vyukov

unread,

Apr 9, 2014, 2:46:19 AM4/9/14

to Ijonas Kisselbach, Dan Kortschak, golan...@googlegroups.com

On Wed, Apr 9, 2014 at 10:15 AM, Ijonas Kisselbach
<ijonas.k...@gmail.com> wrote:
> Hi Dan,
>
> I didn't think I was posting a toy example. Apologies for not explaining
> that structs weren't an option to me during the original post. Structs can't
> be used because the type information is defined at runtime, hence the use of
> maps.
>
> The problem remains why does the use of maps slow down the performance of Go
> so badly? Is it garbage collector? Dmitry's use of a struct and the
> resulting performance improvements suggest so.

I guess that Go was not optimized for this type of workload.
GC consumes some significant part of the first phase (40% or so),
because heap grows from 0 to 2.5GB. But it should not be a problem in
real program.
When you care about performance you usually use structs (as most of
the answers in this thread suggest). Maps can be made somewhat faster,
but still they will lose an order of magnitude to structs. I think
that's why it was not optimized to death in Go.

Your full use case is still not completely clear, and looks like quite complex.
Probably you can store data in slices instead, and map field names to
indices in the slice.
E.g. the following program:
http://play.golang.org/p/Zq_s8zMJwp
runs in:
The calls took 637.994092ms to run.
The calls took 38.182108ms to run.
and does not use hardcoded structs (but it misses the interesting part
of mapping names to indices).

egon

unread,

Apr 9, 2014, 2:52:01 AM4/9/14

to golan...@googlegroups.com

On Wednesday, April 9, 2014 9:45:15 AM UTC+3, Henrik Johansson wrote:

But guys really?

Don't assume the op has no clue and is stupid.
If he needs to use a map he needs to use a map and he has a legitimate question about the performance.

That's the reason for providing more context and information.

It's always hard to tell from the question alone, whether the person knows what he is doing or not. Also very smart people can do stupid stuff as well. Or indeed there can be very good reasons for doing that. It's impossible to tell which case is it.

Also, most of the people on forums are well meaning, they just don't like to write "maybe", "in my humble opinion, the X approach would be better"... etc. It's just faster to write "have you tried X?", "doing X would be faster." and so on, it saves time in reading/writing and it's clearer, although it is less "friendlier".

+ egon

Francesc Campoy Flores

unread,

Apr 9, 2014, 3:05:18 AM4/9/14

to egon, golan...@googlegroups.com

You could also avoid using maps if the number of fields of every doc is pretty small and try using a slice of key value pairs.

I tried as an experiment and it's way faster (for the example you're proposing).

http://play.golang.org/p/cDElATDhWA

--

Francesc Campoy

http://twitter.com/francesc

Ijonas Kisselbach

unread,

Apr 9, 2014, 3:05:33 AM4/9/14

to Dmitry Vyukov, Dan Kortschak, golan...@googlegroups.com

Thanks Dmitry I'll try and rethink the solution design.

IRL I've got 1.5million MongoDB documents (hence the use of Maps in the example) with tens of fields in them, all user-definable. The user, through the use of Excel-like formulas, can perform calculations on those fields, say multiply 5 of the 50 fields. So the "real life" process could involve 1.5million iterations over 5 user-defined fields, all multiplied together.

I've tried to distill the essence of the problem into the original post.

--
http://about.me/ijonas

Caleb Spare

unread,

Apr 9, 2014, 3:12:00 AM4/9/14

to Ijonas Kisselbach, Dmitry Vyukov, Dan Kortschak, golan...@googlegroups.com

What versions of things are you using? I get significantly different results.

linux/amd64
Go 1.2.1, node 0.10.25, ruby 2.0.0p247

Node: 2101ms prep, 492ms calls
Go: 6602ms prep, 570ms calls
Node: 11165ms prep, 1518ms calls

So for me, Go is about twice the speed of ruby in your toy benchmark
(but still slower than node).

That said:

Pervasive use of hashmaps (please, let's not give in to the Perl/Ruby
shortening to "hashes") is a big cause of general slowness in dynamic
languages like Javascript, Ruby, and Python. These encourage their
usage throughout their APIs and even in the core languages.

As a result, map optimizations (in carefully tuned native code) have
been a major performance focus for such language implementations. It
does not surprise me that small maps perform very well here.

-Caleb

On Wed, Apr 9, 2014 at 12:05 AM, Ijonas Kisselbach

Francesc Campoy Flores

unread,

Apr 9, 2014, 3:16:07 AM4/9/14

to Caleb Spare, Ijonas Kisselbach, Dmitry Vyukov, Dan Kortschak, golan...@googlegroups.com

Note: my code before had a bug, this is the correct version: http://play.golang.org/p/2hMar8vVij

Bakul Shah

unread,

Apr 9, 2014, 3:20:22 AM4/9/14

to Ijonas Kisselbach, Dan Kortschak, golan...@googlegroups.com

On Wed, 09 Apr 2014 07:15:26 BST Ijonas Kisselbach <ijonas.k...@gmail.com> wrote:
> I didn't think I was posting a toy example. Apologies for not explaining
> that structs weren't an option to me during the original post. Structs
> can't be used because the type information is defined at runtime, hence the
> use of maps.
>
> The problem remains why does the use of maps slow down the performance of
> Go so badly? Is it garbage collector? Dmitry's use of a struct and the
> resulting performance improvements suggest so.

Try implementing the same logic (with small hashmaps) in C++.
You won't see much improvement. Maps are expensive! For
something like

mymap["somestring"] = foo

You have to compute the hash for the string which involves
iterating over the whole string. If there is already something
in the hash bucket, you have compare the new string with
existing one to make sure different strings don't hash to the
same index and so on. Add to that the overhead of creating
many small maps and initializing them.

For better performance in effect you have to implement what a
compiler does for structures.

For example, keep a single global symbol table.

For your simple example you'd create a simple "symbol table"
and map key strings to ints. Something like:

symtab := make(map[string]int)
...
symtab["total"] = 0 // slot 0 for total
symtab["subtotal"] = 1 // slot 1 for subtotal
...
for i := 0; i < total; i++ {
docs[i] = [2]float64{r.Float64(), 0.0}
}
...
itot := symtab["total"]
isubtot := symtab["subtotal"]
for i := 0; i < total; i++ {
docs[i][itot] = docs[i][isubtot] * 1.2
}

If you have many different kinds of runtime "structures", you
will need a more complex symbol table than just a simple
string->int map. If you have a known schema, you can optimize
things quite a bit. But if apriori you don't know how many
subfields may exist, you need something even more complex.

To summarize: For better performance (but not clarity) use
small vectors/slices & indices in them instead of maps. That
is in effect what a compiler does with structs!

Dmitry Vyukov

unread,

Apr 9, 2014, 3:22:46 AM4/9/14

to Ijonas Kisselbach, Dan Kortschak, golan...@googlegroups.com

On Wed, Apr 9, 2014 at 11:05 AM, Ijonas Kisselbach
<ijonas.k...@gmail.com> wrote:
> Thanks Dmitry I'll try and rethink the solution design.
>
> IRL I've got 1.5million MongoDB documents (hence the use of Maps in the
> example) with tens of fields in them, all user-definable. The user, through
> the use of Excel-like formulas, can perform calculations on those fields,
> say multiply 5 of the 50 fields. So the "real life" process could involve
> 1.5million iterations over 5 user-defined fields, all multiplied together.

Yes, I would try to use slices in this case, where a field maps to an
index in the slices.
You may also try to incorporate Francesc idea about flattening maps
into {string, float64} slices. []float64 will be faster, but not as
flexible, though.
Then I would try to "compile" the user request into some form that
operates only with field indices. And then apply that compiled form to
the collection of documents.

egon

unread,

Apr 9, 2014, 3:24:45 AM4/9/14

to golan...@googlegroups.com, Dmitry Vyukov, Dan Kortschak

On Wednesday, April 9, 2014 10:05:33 AM UTC+3, Ijonas Kisselbach wrote:

Thanks Dmitry I'll try and rethink the solution design.

IRL I've got 1.5million MongoDB documents (hence the use of Maps in the example) with tens of fields in them, all user-definable. The user, through the use of Excel-like formulas, can perform calculations on those fields, say multiply 5 of the 50 fields. So the "real life" process could involve 1.5million iterations over 5 user-defined fields, all multiplied together.

What kind of calculations?

Is there any reason you don't want to use http://docs.mongodb.org/manual/tutorial/map-reduce-examples/ directly?

mgo has also support for using it in Go http://godoc.org/labix.org/v2/mgo#MapReduce

+ egon

Ijonas Kisselbach

unread,

Apr 9, 2014, 3:26:53 AM4/9/14

to Dmitry Vyukov, Dan Kortschak, golan...@googlegroups.com

Thanks folks,

The last couple suggestions around using slices and a global map to provide an index into those slices have brought great clarity into an alternative approach.

Much appreciated.

--
http://about.me/ijonas

Ijonas Kisselbach

unread,

Apr 9, 2014, 3:31:41 AM4/9/14

to egon, golan...@googlegroups.com, Dmitry Vyukov, Dan Kortschak

Hi Egon,

User-defined calculations, using an Excel-like formula language. I suppose an option would be to transpile those formulas (a DSL we've built) into JavaScript compatible with a MongoDB map-reduce.

Thanks,

Ijonas.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/5tPZhkfDxm8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
http://about.me/ijonas

Rodrigo Kochenburger

unread,

Apr 9, 2014, 3:56:24 AM4/9/14

to Ijonas Kisselbach, egon, golan...@googlegroups.com, Dmitry Vyukov, Dan Kortschak

Ijonas,

Just out of curiosity have you tried profiling the Go version and see where time is mostly being spent?

I gotta say even though I expect maps to be slow, I would't expect ruby to be faster. Specially because there are only very small hashes on your example, so I doubt it's actually the hash that is consuming so much time.

Also, which version of Go are you using?

- RK

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Tom Payne

unread,

Apr 9, 2014, 4:06:57 AM4/9/14

to Rodrigo Kochenburger, Ijonas Kisselbach, egon, golan...@googlegroups.com, Dmitry Vyukov, Dan Kortschak

Just a quick note on the *JavaScript* code that you're using for the benchmark:

https://gist.github.com/ijonas/10229518#file-speedtest-js

This is not actually testing maps in JavaScript :)

The reason is that properties used (subtotal and total) are both constant and initialised in a fixed order. This means the V8 will generate an "hidden class" for these objects. Hidden classes allow V8 to do constant-time lookup of fields, and the performance characteristics are generally much closer to those of Go's structs.

If you really want to test maps in JavaScript/V8, you'll need to randomise property names to defeat the hidden class optimisation. More info here:

http://v8-io12.appspot.com/#29

(the whole presentation is well worth reading if you're interested in JavaScript VM performance).

Regards,

Tom

egon

unread,

Apr 9, 2014, 4:20:10 AM4/9/14

to golan...@googlegroups.com, egon, Dmitry Vyukov, Dan Kortschak

On Wednesday, April 9, 2014 10:31:41 AM UTC+3, Ijonas Kisselbach wrote:

Hi Egon,

User-defined calculations, using an Excel-like formula language. I suppose an option would be to transpile those formulas (a DSL we've built) into JavaScript compatible with a MongoDB map-reduce.

I would definitely try that route. Also I noticed there is one more addition there that may help you http://docs.mongodb.org/manual/aggregation/

I predict that it will be faster than using Node, Go or Ruby; and probably will scale better. The data transformations with any language will probably kill the performance and MongoDB (theoretically) can avoid those.

+ egon

Taru Karttunen

unread,

Apr 9, 2014, 4:46:32 AM4/9/14

to Tom Payne, Rodrigo Kochenburger, Ijonas Kisselbach, egon, golan...@googlegroups.com, Dmitry Vyukov, Dan Kortschak

On 09.04 10:06, Tom Payne wrote:
> Just a quick note on the *JavaScript* code that you're using for the
> benchmark:
> https://gist.github.com/ijonas/10229518#file-speedtest-js
>
> This is not actually testing maps in JavaScript :)
>
> The reason is that properties used (subtotal and total) are both constant
> and initialised in a fixed order. This means the V8 will generate an
> "hidden class" for these objects. Hidden classes allow V8 to do
> constant-time lookup of fields, and the performance characteristics are
> generally much closer to those of Go's structs.
>
> If you really want to test maps in JavaScript/V8, you'll need to randomise
> property names to defeat the hidden class optimisation. More info here:
> http://v8-io12.appspot.com/#29
> (the whole presentation is well worth reading if you're interested in
> JavaScript VM performance).

Also the Ruby version is using symbols (interned strings) rather than
strings. So this is mostly apples to oranges.

You should really be benchmarking your real use case...

- Taru Karttunen

C Banning

unread,

Apr 9, 2014, 5:29:17 AM4/9/14

to golan...@googlegroups.com

On OS X 10.9.2 / 2.6 GHz Intel Core i7, 8 GB -

NodeJS v0.10.26 > node speedtest.js:

The prep took 2296ms to run.

The calls took 481ms to run.

Go v1.2 darwin/amd64 > go run speedtest.go:

The prep took 6.328421182s to run.

The calls took 524.069029ms to run.

Guess it depends on what your production platform will be. (Changed Go program print statements to be same as for JS.)

C Banning

unread,

Apr 9, 2014, 6:15:02 AM4/9/14

to golan...@googlegroups.com

On a 4 core/4 GB VM running Linux version 3.11.0-15-generic (Ubuntu) 64-bit -

NodeJS v0.6.12 > node speedtest.js:

The prep took 6648ms to run.

The calls took 436ms to run

Go v1.2 cross-compiled for GOOS=linux > ./speedtest:

The prep took 15.367587177s to run.

The calls took 620.526937ms to run.

Henrik Johansson

unread,

Apr 9, 2014, 6:38:17 AM4/9/14

to golang-nuts

Well...

➜ ~ ruby --version

jruby 1.7.4 (1.9.3po392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.8.0-b132 +indy [linux-amd64]

ruby -J-Xms4096m -J-Xmx4096m speedtest.rb

Prepping.

The prep took 12028.0ms to run.

Ready.

The calls took 15035.0ms to run.

Have to give it quite a bit of memory to run.

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Taru Karttunen

unread,

Apr 9, 2014, 7:26:21 AM4/9/14

to Tom Payne, Rodrigo Kochenburger, Ijonas Kisselbach, egon, golan...@googlegroups.com, Dmitry Vyukov, Dan Kortschak

And benchmarking with interned strings in Go, Go is faster than Ruby
in your comparison... Getting random names for V8 is left as an
exercise.

/go/path/src/hack/speed $ ruby speedtest.rb
Prepping.
The prep took 7367.164496ms to run.
Ready.
The calls took 1122.499116ms to run.
/go/path/src/hack/speed $ go run speedtest.go
Prepping.
The calls took 4.025018352s to run.
The calls took 410.672552ms to run.

http://play.golang.org/p/jkZU1rqchd

- Taru Karttunen

roger peppe

unread,

Apr 9, 2014, 8:07:09 AM4/9/14

to Ijonas Kisselbach, golang-nuts

Have you considered using bson.D instead of maps?

RickyS

unread,

Apr 9, 2014, 1:02:03 PM4/9/14

to golan...@googlegroups.com

Just to clarify. The idea of make-ing ten million maps is a no-go in Go.
A map with 10-million elements makes more sense. And that's pretty much what you have in ruby and JavaScript.

Caleb Spare

unread,

Apr 9, 2014, 1:05:44 PM4/9/14

to RickyS, golang-nuts

What? The Ruby/JS code he shared each made an array with 10 million maps.

Benjamin Measures

unread,

Apr 9, 2014, 1:19:51 PM4/9/14

to golan...@googlegroups.com, RickyS

On Wednesday, 9 April 2014 18:05:44 UTC+1, Caleb Spare wrote:

What? The Ruby/JS code he shared each made an array with 10 million maps.

You must've missed the important bits of the thread:

On Wednesday, 9 April 2014 09:06:57 UTC+1, Tom Payne wrote:

The reason is that properties used (subtotal and total) are both constant and initialised in a fixed order. This means the V8 will generate an "hidden class" for these objects.

Caleb Spare

unread,

Apr 9, 2014, 1:21:15 PM4/9/14

to Benjamin Measures, golang-nuts, RickyS

No, I read the whole thread. Those are (very important, no doubt)
implementation details. I was responding to

> A map with 10-million elements makes more sense. And that's pretty much what you have in ruby and JavaScript.

which is just not the case.

RickyS

unread,

Apr 9, 2014, 1:24:25 PM4/9/14

to golan...@googlegroups.com, RickyS

They both use a single "docs" with subscripts, some of which are hash-mapped sometimes. This gives the language runtime a different
set of challenges. See the comments about the V8 hidden classes.

They also worked on different data, as they each used their own rand function to generate the test cases.

RickyS

unread,

Apr 9, 2014, 3:32:33 PM4/9/14

to golan...@googlegroups.com, Benjamin Measures, RickyS

Well, I think you'll agree a single map with 10 million elements makes more sense than 10 million maps with just one key-value pair each. If not, please explain.

I also agree that the program doesn't need any sort of map, as it is written. Though you might if you were bringing fields from MongoDB.

So exactly which of my claims is "Just not the case" ?

Oh, I think I see. The Ruby is using real subscripts. The JavaScript is using hidden structs, so neither is really hashing, I guess. Is that what you mean?

Robert K

unread,

Apr 9, 2014, 3:42:39 PM4/9/14

to golan...@googlegroups.com, Benjamin Measures, RickyS

In this case, with Go you tell it to use a map and it does explicitly that. With V8, you tell it to use something "maplike" and it figures out an efficient implementation. That could be hashing, could be an array, could be a struct, etc.

Caleb Spare

unread,

Apr 9, 2014, 3:55:29 PM4/9/14

to RickyS, golang-nuts, Benjamin Measures

Yes, I'm talking about your claims about what the programs do, rather
than whether any of this makes sense or is a meaningful benchmark or
would be better done not using maps, which has been pretty thoroughly
explored on this thread.

I'm also talking about language semantics, not implementation. Hidden
classes are part of v8, not javascript-the-language.

You said this:

> A map with 10-million elements makes more sense. And that's pretty much what you have in ruby and JavaScript.

In the original post, the Go, Ruby, and Javascript code each made 10
million map-like objects (Go map, Ruby Hash, JS object). They were
contained in each language's array-like data structure (Go slice, Ruby
Array, JS array object). It is true that JS uses an associative array,
but that doesn't take away from the fact that there are still 10
million map-like objects in play in all three code samples. Saying

> a single map with 10 million elements makes more sense than 10 million maps with just one key-value pair each

is misleading, because none of the provided code samples had "a single
map with 10 million elements" except for JS, and only on a
technicality: JS's array-like type is technically a "map" as well. And
those 10 million elements are themselves map-like objects so I'm don't
think it is any better than keeping them in any other data structure.

Benjamin Measures

unread,

Apr 9, 2014, 5:22:16 PM4/9/14

to golan...@googlegroups.com, RickyS, Benjamin Measures

On Wednesday, 9 April 2014 20:55:29 UTC+1, Caleb Spare wrote:

I'm also talking about language semantics, not implementation.

If you're going to argue semantics then note the JS code appended *object literals*[1] to an array (of length 10^7). That objects may (or may not) be implemented as hashmaps is, well, an implementation detail.

In fact, Javascript has no associative array and the spec "does not define concept of “hash” (and similar)."[2]

So, semantically, apples-to-apples would be an array of struct (which is the recommended approach in Go).

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Values,_variables,_and_literals#Object_literals

[2] http://dmitrysoshnikov.com/ecmascript/chapter-7-2-oop-ecmascript-implementation/#associative-arrays

Sugu Sougoumarane

unread,

Apr 10, 2014, 12:16:57 PM4/10/14

to golan...@googlegroups.com, RickyS, Benjamin Measures

Avoid doing crazy optimizations this early. If a map is what's natural, use it.

If you're going to be decoding millions of bson objects, you probably have many CPUs to use. So, you may just win by setting GOMAXPROCS to the number of CPUs available.

I'm also guessing that you're more likely to be decoding millions of docs rather than a few docs that contain a million objects. So, they should parallelize well with goroutines.

Once you get your full program running, you may actually find that your bottlenecks are elsewhere. As others have recommended before, profile and optimize as needed.

Reply all

Reply to author

Forward