Enum vs NamedTuple

Jason Frey

unread,

Nov 13, 2018, 7:35:29 PM11/13/18

to Crystal

I had seen a shard recently that was using NamedTuples for what would otherwise be an enumeration, and I thought I’d contribute by converting it to an Enum. However, I decided to run a performance check between them, and was surprised that NamedTuples were faster than Enums. Can someone explain why this is? Intuitively, I feel like Enums should be faster.

Jason

enum MyEnum
  Foo
  Bar
end 

my_tuple = {foo: 1, bar: 2}
my_hash = {:foo => 1, :bar => 2} 

require “benchmark”
Benchmark.ips do |x|
  x.report(“from tuple”) { a = my_tuple[:foo] }
  x.report(“from enum”) { a = MyEnum::Foo.value } # .value here or not seems to make no difference
  x.report(“from hash”) { a = my_hash[:foo] }
end

$ crystal run --release crystal_test.cr
from tuple 327.76M (  3.05ns) (± 8.09%)  0 B/op        fastest
 from enum 286.91M (  3.49ns) (± 7.89%)  0 B/op   1.14× slower
 from hash   43.3M ( 23.09ns) (± 4.86%)  0 B/op   7.57× slower

Ary Borenszweig

unread,

Nov 13, 2018, 7:51:45 PM11/13/18

to crysta...@googlegroups.com

When operations take 2~3 nanoseconds the benchmark suit seems to become unreliable. Try this:

~~~

require "benchmark"

Benchmark.ips do |x|

x.report("one") { 1 }

x.report("two") { 2 }

x.report("three") { 3 }

x.report("four") { 4 }

end

~~~

I get:

~~~

one 462.31M ( 2.16ns) (± 5.25%) 0 B/op fastest

two 422.92M ( 2.36ns) (±11.24%) 0 B/op 1.09× slower

three 459.17M ( 2.18ns) (±10.85%) 0 B/op 1.01× slower

four 437.28M ( 2.29ns) (±10.13%) 0 B/op 1.06× slower

~~~

If you compile this program in release mode with `crystal foo.cr --release --emit llvm-ir:

~~~

enum MyEnum

Foo

Bar

end

fun tuple : Int32

my_tuple = {foo: 1, bar: 2}

my_tuple[:foo]

end

fun enum_value : Int32

MyEnum::Foo.value

end

~~~

somewhere in the file you'll see this:

~~~

; Function Attrs: norecurse nounwind readnone uwtable

define i32 @tuple_value() local_unnamed_addr #9 !dbg !13637 {

alloca:

ret i32 1, !dbg !13639

}

; Function Attrs: norecurse nounwind readnone uwtable

define i32 @enum_value() local_unnamed_addr #9 !dbg !13640 {

entry:

ret i32 0, !dbg !13641

}

~~~

That means that LLVM was able to optimize it to constants so there shouldn't be a real difference.

That said, it would be interesting to benchmark the benchmarker and check why it gives different values for different indexes inside a same run. It's curious that if you swap the order in your code it still gives the slower one as slowest.

--
You received this message because you are subscribed to the Google Groups "Crystal" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crystal-lang...@googlegroups.com.
To post to this group, send email to crysta...@googlegroups.com.
Visit this group at https://groups.google.com/group/crystal-lang.
To view this discussion on the web visit https://groups.google.com/d/msgid/crystal-lang/9fcb44c3-8ed7-43f9-bcaf-d23b0e842f56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ary Borenszweig

unread,

Nov 13, 2018, 7:52:54 PM11/13/18

to crysta...@googlegroups.com

Correction: Crystal optimizes the values to constants, it's not LLVM. Crystal will write an integer literal for both `SomeEnum.value` and `named_tuple[:known_value]`.

Jason Frey

unread,

Nov 13, 2018, 8:05:42 PM11/13/18

to Crystal

> When operations take 2~3 nanoseconds the benchmark suit seems to become unreliable.

Oh that's interesting. I'm curious what causes that.

> It's curious that if you swap the order in your code it still gives the slower one as slowest.

I actually tried that, and that's part of why I thought it was legitimately slower. The results were relatively consistent.

> Correction: Crystal optimizes the values to constants, it's not LLVM. Crystal will write an integer literal for both `SomeEnum.value` and `named_tuple[:known_value]`.

Ah! I expected the Enum to resolve to a constant, but I didn't expect the NamedTuple to do so, and that constant resolution was why I expected Enums to be significantly faster. Now it makes sense why they are basically the same.