## The basics
Elixir has binaries and lists. A binary holds bytes, a list can hold anything.
A string is a binary encoded in UTF-8. A char list is a list containing UTF-8 codepoints.
Strings and char lists are not tagged, i.e. there isn't anything in the list or in the binary marking it as UTF-8 encoded data. It is responsibility of the IO interface to know and properly enforce their encoding. This means that, from the perspective of the system, a list of integers could mean anything.
## Codepoints and bytes
Char lists are made of codepoints and strings are made of bytes. This means that the conversion of a char list to a string is not simply a matter of "replacing [...] by <<...>>". Let's see an example:
# Codepoint for Ѐ
iex> [200]
'È'
iex> String.from_char_list [200]
"Ѐ"
iex> size String.from_char_list [200]
2
In the example above, we can see that the codepoint 200 is represented by two bytes in UTF-8.
## The trouble in Elixir today
Strings were added to Elixir on later versions, leaving some rough edges to be smoothed.
The first issue is regarding list_to_binary/1 and binary_to_list/1. Both functions do not make any assumption about the encoding. Consider the code below:
iex> list_to_binary [200]
<<200>>
The list_to_binary/1 performs a raw conversion from a list to binary with integers up to 255. The end result is not guaranteed, in any way, to be a valid string. From this perspective, list_to_binary/1 is a low level operation. The issue is that developers are using list_to_binary/1 as the primary mechanism for converting char lists to strings and that is just going to yield the wrong result when unicode characters are added. We even have this issue in Elixir source code itself.
Furthermore, string interpolation uses the to_binary/1 function (which is powered by the Binary.Chars protocol) to convert the interpolated content into a binary. Let's take a look at it:
iex> to_binary [200]
<<200>>
to_binary/1 uses the raw list_to_binary/1 which doesn't know about char lists nor Unicode codepoints. Alexei has pointed out this is a very poor behaviour for a function that is supposed to interpolate contents into a String, which is meant to be in UTF-8 after all.
## The solution
In the upcoming days, list_to_binary/1 and binary_to_list/1 will be replaced by String.to_char_list/1 and String.from_char_list/1 (they will be available from the :erlang module still). to_binary/1 and the Binary.Chars protocol will be replaced by to_string/1 and String.Chars.
It is important to notice that the remaining *_to_binary and binary_to_* functions won't be changed. I have many times contemplated getting rid of those functions and converting them to to_string variants but the truth is that some functions like is_binary/1 cannot be converted into is_string/1 as is_string/1 would require a guarantee that it is UTF-8 encoded and we cannot give this guarantee inside a guard.