String.length(string) > 10 == String.longer?(string, 10)
String.length(string) >= 10 == String.longer?(string, 10 - 1)
String.length(string) <= 10 == !String.longer?(string, 10)
String.length(string) < 10 == !String.longer?(string, 10 - 1)
Warning: The function you are trying to benchmark is super fast, making time measures unreliable!
Benchee won't measure individual runs but rather run it a couple of times and report the average back. Measures will still be correct, but the overhead of running it n times goes into the measurement. Also statistical results aren't as good, as they are based on averages now. If possible, increase the input size so that an individual run takes more than 10μs
Name ips average deviation median
string.length, 10 1012798.94 0.99μs (±305.69%) 0.90μs
string.longer?, 10, 10 1005594.61 0.99μs (±184.19%) 0.90μs
string.longer?, 10000, 10 896158.77 1.12μs (±364.52%) 1.00μs
string.longer?, 10000, 5000 2298.79 435.01μs (±44.71%) 390.00μs
string.length, 10000 1241.32 805.59μs (±25.77%) 761.00μs
Comparison:
string.length, 10 1012798.94
string.longer?, 10, 10 1005594.61 - 1.01x slower
string.longer?, 10000, 10 896158.77 - 1.13x slower
string.longer?, 10000, 5000 2298.79 - 440.58x slower
string.length, 10000 1241.32 - 815.90x slower
Further optimizations
I considered checking the byte_size of the string, hoping to find conditions when we could say for sure what the results would be.
I planned to return false if the byte_size was below the limit, since that would mean that there are less codepoints than the limit. But I'm not sure there are no situations where a codepoint could produce more than one grapheme.
I also planned to return true if byte_size was more than 4 times the limit, since each codepoint uses at most four bytes. But since a grapheme could use multiple codepoints it could also use more than four bytes, and I'm not sure what the upper limit is (if there is any).
Checking the length of a String requires traversing the whole string. I fairly common use case is checking if the String is longer or shorter than a given value. I might not be interested if the string is 123 456 graphemes long or 654 321, if all I want to know if it is longer than 50 000 characters, but still has to calculate the full length to know if it is.
--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-core" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-core/eY_YkhopJRA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAPhAwGwM0aN_Mx2a-_mwe9qi_UNR91KMDH_BpkskSaXHS-D0%3DA%40mail.gmail.com.
On 26 Jun 2016, at 18:03, Filip Haglund <fille....@gmail.com> wrote:Could this be done automatically by the compiler? Replacing `String.length str >= 13` with `String.at(str, 13) != nil`?