I think we run into the general issues with indices and utf8 strings here don't we? What does 2 mean? is it the 2nd code point of the 2nd grapheme?
--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/2d1ba237-1755-4fd0-82f6-7f086a1c2b72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
José: String.split_at already exists and works at the grapheme level. I'm not sure what you are suggesting.
I think it makes perfect sense to have index such that
String.at(str, String.index(str, chr)) == chr
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4J0XZ%2BAfx6ZE37ey7Jymaf-Nf42X7AOPqE2hZHixJ0AHg%40mail.gmail.com.
I think it is more complicated than that because, depending on what you want to do, using the grapheme index is going to be much slower than working at the byte level.
For example, if you want to split based on the index, doing a straight split operation or working with bytes is going to perform much better than doing two grapheme based operations (which are linear). So unless you have a use case that won't work with binary:match nor any of the other String functions, I am not convinced an index function is useful.
I virtually never work on a byte level. Almost all real-world strings I need to process come in as UTF-8 and do indeed contain multi-byte graphemes.
From C++ to Java to JS to Python to Ruby they all have it. We are talking simple string processing here.
It will be hard for me to explain to every new programmer I convert to Elixir <3 that String.index is not present in Elixir because... slow or not idiomatic.
The remaining thing I would love to understand is what is the major cause for the slowness. Is it graphemes support or limitations of the underlying VM? If we worked on a codepoint level like most languages - would it help with performance?
"joséjosé"
j o s é j o s é<<106, 111, 115, 195, 169, 106, 111, 115, 195, 169>>
--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4J5oCOUnvmQnTX%3Du%3DYhnMj3wwRCAOuJptHti-WYbfnJ5A%40mail.gmail.com.