Enum vs NamedTuple

閲覧: 50 回
最初の未読メッセージにスキップ

Jason Frey

未読、
2018/11/13 19:35:292018/11/13
To: Crystal
I had seen a shard recently that was using NamedTuples for what would otherwise be an enumeration, and I thought I’d contribute by converting it to an Enum. However, I decided to run a performance check between them, and was surprised that NamedTuples were faster than Enums. Can someone explain why this is? Intuitively, I feel like Enums should be faster.

Jason

enum MyEnum
 
Foo
 
Bar
end

my_tuple
= {foo: 1, bar: 2}
my_hash
= {:foo => 1, :bar => 2}

require benchmark
Benchmark.ips do |x|
  x
.report(“from tuple”) { a = my_tuple[:foo] }
  x
.report(“from enum”) { a = MyEnum::Foo.value } # .value here or not seems to make no difference
  x
.report(“from hash”) { a = my_hash[:foo] }
end

$ crystal run --release crystal_test.cr
from tuple 327.76M (  3.05ns) 8.09%)  0 B/op        fastest
 
from enum 286.91M (  3.49ns) 7.89%)  0 B/op   1.14× slower
 
from hash   43.3M ( 23.09ns) 4.86%)  0 B/op   7.57× slower


Ary Borenszweig

未読、
2018/11/13 19:51:452018/11/13
To: crysta...@googlegroups.com
When operations take 2~3 nanoseconds the benchmark suit seems to become unreliable. Try this:

~~~
require "benchmark"

Benchmark.ips do |x|
  x.report("one") { 1 }
  x.report("two") { 2 }
  x.report("three") { 3 }
  x.report("four") { 4 }
end
~~~

I get:

~~~
  one 462.31M (  2.16ns) (± 5.25%)  0 B/op        fastest
  two 422.92M (  2.36ns) (±11.24%)  0 B/op   1.09× slower
three 459.17M (  2.18ns) (±10.85%)  0 B/op   1.01× slower
 four 437.28M (  2.29ns) (±10.13%)  0 B/op   1.06× slower
~~~

If you compile this program in release mode with `crystal foo.cr --release --emit llvm-ir:

~~~
enum MyEnum
  Foo
  Bar
end

fun tuple : Int32
  my_tuple = {foo: 1, bar: 2}
  my_tuple[:foo]
end

fun enum_value : Int32
  MyEnum::Foo.value
end
~~~

somewhere in the file you'll see this:

~~~
; Function Attrs: norecurse nounwind readnone uwtable
define i32 @tuple_value() local_unnamed_addr #9 !dbg !13637 {
alloca:
  ret i32 1, !dbg !13639
}

; Function Attrs: norecurse nounwind readnone uwtable
define i32 @enum_value() local_unnamed_addr #9 !dbg !13640 {
entry:
  ret i32 0, !dbg !13641
}
~~~

That means that LLVM was able to optimize it to constants so there shouldn't be a real difference.

That said, it would be interesting to benchmark the benchmarker and check why it gives different values for different indexes inside a same run. It's curious that if you swap the order in your code it still gives the slower one as slowest.

--
You received this message because you are subscribed to the Google Groups "Crystal" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crystal-lang...@googlegroups.com.
To post to this group, send email to crysta...@googlegroups.com.
Visit this group at https://groups.google.com/group/crystal-lang.
To view this discussion on the web visit https://groups.google.com/d/msgid/crystal-lang/9fcb44c3-8ed7-43f9-bcaf-d23b0e842f56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ary Borenszweig

未読、
2018/11/13 19:52:542018/11/13
To: crysta...@googlegroups.com
Correction: Crystal optimizes the values to constants, it's not LLVM. Crystal will write an integer literal for both `SomeEnum.value` and `named_tuple[:known_value]`.

Jason Frey

未読、
2018/11/13 20:05:422018/11/13
To: Crystal
>  When operations take 2~3 nanoseconds the benchmark suit seems to become unreliable.

Oh that's interesting.  I'm curious what causes that.

> It's curious that if you swap the order in your code it still gives the slower one as slowest.

I actually tried that, and that's part of why I thought it was legitimately slower.  The results were relatively consistent.

> Correction: Crystal optimizes the values to constants, it's not LLVM. Crystal will write an integer literal for both `SomeEnum.value` and `named_tuple[:known_value]`.

Ah! I expected the Enum to resolve to a constant, but I didn't expect the NamedTuple to do so, and that constant resolution was why I expected Enums to be significantly faster.  Now it makes sense why they are basically the same.


Thanks Ary!

Jason
全員に返信
投稿者に返信
転送
新着メール 0 件