`Enum.join/2` behaviour in presence of non-UTF-8 binaries in enumerable

44 views
Skip to first unread message

Łukasz Niemier

unread,
Nov 10, 2023, 10:35:32 AM11/10/23
to elixir-l...@googlegroups.com
Currently `Enum.join/2` is using `IO.iodata_to_binary/1` internally to join output string. My proposal there is to change that to use `:unicode.chardata_to_binary/1` instead, which will allow this function to conform to specs. Because currently if we have code like:

s = Enum.join([<<255>>, <<255>>])

It will output binary that is not string (`String.valid?(s) == false`). With `:unicode.chardata_to_binary/1` we could conform to the docs, but it would impose some performance hit, because of additional traverse to check if all binaries are properly UTF-8 encoded.

So before proposing PR with that change I wanted to consult there - what is more expected, correction of the specs (so it would accept any binary and will output binary) or it should be changed to check if the resulting binary is `String.t()`?

--

Łukasz Niemier
luk...@niemier.pl

José Valim

unread,
Nov 10, 2023, 11:03:58 AM11/10/23
to elixir-l...@googlegroups.com
Correct. When it comes to strings, Elixir generally assumes that the string has been validated before entering the system. If all functions were to validate they are indeed strings, it will become quite expensive.

That said, I think your solution of changing the spec is the correct one, regardless of Unicode. We can give binaries and it will emit binaries back. So a PR is definitely welcome.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/C5E0B65F-3373-4980-A04D-351E0EFAEB8D%40niemier.pl.

Cameron Duley

unread,
Nov 10, 2023, 12:05:28 PM11/10/23
to elixir-lang-core
Being able to join binary data outside of UTF-8 spec is nice, particularly when working with old stuff that speaks in null terminators.

Haven't had to do this myself but the use case came to mind, and I found at least one public example.

That considered, +1 for supporting binary data in the spec!

luk...@niemier.pl

unread,
Nov 10, 2023, 2:27:15 PM11/10/23
to elixir-lang-core
Reply all
Reply to author
Forward
0 new messages