On 12/02,
anu...@cinova.co wrote:
> 2. Strings/Binary complexity. I realize this is more a problem with
> Erlang's idiotic language design decision, but it flows into LFE and caused
> too many frustrations. I would randomly see strings looking like "(129 33
> 83 121)" in my database for reasons I still don't understand. I began
> brute-force lists:flatten, or my own utility function ensure-string around
> anything that is supposed to return a string. I think lfe_io:format1 was a
> big offender here, but I'm not sure about it. I suspect I don't understand
> it very well at all.
>
So what you noticed was not Erlang idiocy, it was Erlang smarts. I mean,
outside the fact that lists vs. strings is a bit confusing, what you
noticed (if it's solved through lists:flatten) was the presence of
iolists and iodata.
From Learn You Some Erlang (
http://learnyousomeerlang.com/buckets-of-sockets#io-lists)
A string is a bit like a linked list of integers: for each character,
you've got to store the character itself plus a link towards the rest of
the list. Moreover, if you want to add elements to a list, either in the
middle or at the end, you have to traverse the whole list up to the
point you're modifying and then add your elements. This isn't the case
when you prepend, however:
A = [a]
B = [b|A] = [b,a]
C = [c|B] = [c,b,a]
In the case of prepending, as above, whatever is held into A or B or
C never needs to be rewritten. The representation of C can be seen
as either [c,b,a], [c|B] or [c,|[b|[a]]], among others. In the last
case, you can see that the shape of A is the same at the end of the
list as when it was declared. Similarly for B. Here's how it looks
with appending:
A = [a]
B = A ++ [b] = [a] ++ [b] = [a|[b]]
C = B ++ [c] = [a|[b]] ++ [c] = [a|[b|[c]]]
Do you see all that rewriting? When we create B, we have to rewrite
A. When we write C, we have to rewrite B (including the [a|...] part
it contains). If we were to add D in a similar manner, we would need
to rewrite C. Over long strings, this becomes way too inefficient,
and it creates a lot of garbage left to be cleaned up by the Erlang
VM.
...
In these cases, IO lists are our saviour. IO lists are a weird type
of data structure. They are lists of either bytes (integers from 0
to 255), binaries, or other IO lists. This means that functions that
accept IO lists can accept items such as [$H, $e, [$l, <<"lo">>, "
"], [[["W","o"], <<"rl">>]] | [<<"d">>]]. When this happens, the
Erlang VM will just flatten the list as it needs to do it to obtain
the sequence of characters Hello World.
IoLists will be accepted by most Erlang functions (sockets, io module,
file module, re module, unicode module, etc.) and will allow to do
appending (or splicing) in a waaay more efficient manner than having to
do an operation, then flatten, then do an operation, etc.
It saves you a whole lot of runtime complexity, and as long as you
perceive the iodata as an opaque data type (meaning you don't go and
iterate all willy-nilly down the data structure which makes sense in the
unicode age), you should be able to have a good time with them at a
cheaper cost.
Regards,
Fred.