[erlang-questions] Which is best? string:concat or ++?

1,086 views
Skip to first unread message

Paul Barry

unread,
May 7, 2012, 9:31:04 AM5/7/12
to Erlang
Hi folks.

This might be a case of a dumb question, but here goes anyway. :-)

I have two strings, A and B. Is it better to use:

string:concat(A, B)

or

A ++ B

when combining them to create a new string? I'm writing some code
that generates a chunk of HTML. I know that using ++ on big lists is
regarded as a "no-no", but is it acceptable here?

Thanks.

Paul.

--
Paul Barry, w: http://paulbarry.itcarlow.ie - e: paul....@itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Paul Davis

unread,
May 7, 2012, 9:37:13 AM5/7/12
to Paul Barry, Erlang
Quick scan of the source shows that string:concat/2 is exactly the ++
operator. So using ++ will save you a function invocation assuming the
Erlang compiler doesn't optimize that away.

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/string.erl#L61

Fredrik Andersson

unread,
May 7, 2012, 9:45:19 AM5/7/12
to Paul Davis, Erlang
I would recommend also looking at
http://www.erlang.org/doc/man/lists.html#concat-1

Wes James

unread,
May 7, 2012, 10:12:45 AM5/7/12
to Paul Barry, erlang-q...@erlang.org
On Mon, May 7, 2012 at 7:31 AM, Paul Barry <paul.jam...@gmail.com> wrote:
> Hi folks.
>
> This might be a case of a dumb question, but here goes anyway.  :-)
>
> I have two strings, A and B.  Is it better to use:
>
>   string:concat(A, B)
>
> or
>
>    A ++ B
>
> when combining them to create a new string?  I'm writing some code
> that generates a chunk of HTML.  I know that using ++ on big lists is
> regarded as a "no-no", but is it acceptable here?
>


You may want to consider, if possible, to move all your string
operations to binary format.

For example:

A = <<"<html><head><title>">>.

B = <<"My Title">>.

C = <<"</title></head><body>hi</body></html>">>.

All = <<A/binary, B/binary, C/binary>>.

The All part could just be a return from a fun, etc.

-wes

Paul Barry

unread,
May 7, 2012, 10:18:23 AM5/7/12
to Wes James, erlang-q...@erlang.org
I take it that doing it that way is "faster" than string manipulation
(or is there some other reason for this suggestion)?

On 7 May 2012 15:12, Wes James <comp...@gmail.com> wrote:
> On Mon, May 7, 2012 at 7:31 AM, Paul Barry <paul.jam...@gmail.com> wrote:
>> Hi folks.
>>
>> This might be a case of a dumb question, but here goes anyway.  :-)
>>
>> I have two strings, A and B.  Is it better to use:
>>
>>   string:concat(A, B)
>>
>> or
>>
>>    A ++ B
>>
>> when combining them to create a new string?  I'm writing some code
>> that generates a chunk of HTML.  I know that using ++ on big lists is
>> regarded as a "no-no", but is it acceptable here?
>>
>
>
> You may want to consider, if possible, to move all your string
> operations to binary format.
>
> For example:
>
> A = <<"<html><head><title>">>.
>
> B = <<"My Title">>.
>
> C = <<"</title></head><body>hi</body></html>">>.
>
> All = <<A/binary, B/binary, C/binary>>.
>
> The All part could just be a return from a fun, etc.
>
> -wes



--
Paul Barry, w: http://paulbarry.itcarlow.ie - e: paul....@itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.

Wes James

unread,
May 7, 2012, 10:47:36 AM5/7/12
to Paul Barry, erlang-q...@erlang.org

Fred Hebert

unread,
May 7, 2012, 10:17:33 AM5/7/12
to Paul Barry, Erlang
Given you generate chunks of HTML and that this will undoubtedly be pushed over a socket at some point, what you want to use is IO Lists.

They are lists of either bytes (integers from 0 to 255), binaries, or other IO lists. This means that functions that accept IO lists can accept items such as [$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]]. When this happens, the Erlang VM will just flatten the list as it needs to do it to obtain the sequence of characters Hello World.

Any function from the io module, file module, TCP and UDP sockets will be able to handle them. Some library functions, such as some coming from the unicode module and all of the functions from the re (for regular expressions) module will also handle them, to name a few.

So in your case, to join string A and B, no matter whether they're binaries or lists, just do [A,B] and that's your new string. You'll save a lot in terms of rewriting terms and whatnot doing things that way.

Paul Barry

unread,
May 7, 2012, 10:02:39 AM5/7/12
to Fredrik Andersson, Erlang
Thanks for the quick responses, Paul and Fredrik. That's cleared
things up for me (as well as given me another way to accomplish what I
need). Cheers. :-)

Bob Ippolito

unread,
May 7, 2012, 11:19:14 AM5/7/12
to Fred Hebert, erlang-q...@erlang.org
The advantage of binaries is that they take up significantly less memory per character and you can send them to other processes on the same node with no copying. Iolists of binaries are also good to use for IO. 

On Monday, May 7, 2012, Fred Hebert wrote:
Cowboy does accept IOLists. They're very rarely going to not be the fastest data structure to handle the concatenation strings to be output, in my experience. I do recommend them for any and all appending and prepending that needs to be done with web servers, files, etc.

On Mon May 7 10:48:19 2012, Wes James wrote:

On Mon, May 7, 2012 at 8:18 AM, Paul Barry<paul.jam...@gmail.com> wrote:

I take it that doing it that way is "faster" than string manipulation
(or is there some other reason for this suggestion)?


Based on some discussion I've seen on the list in the past, I believe
binary is faster. In my case, I'm using binaries to construct html
chunks as I'm using cowboy, but I think cowboy can also use io lists,
like Fred mentioned.

-wes

Fred Hebert

unread,
May 7, 2012, 10:50:38 AM5/7/12
to Wes James, erlang-q...@erlang.org
Cowboy does accept IOLists. They're very rarely going to not be the fastest data structure to handle the concatenation strings to be output, in my experience. I do recommend them for any and all appending and prepending that needs to be done with web servers, files, etc.

On Mon May 7 10:48:19 2012, Wes James wrote:

On Mon, May 7, 2012 at 8:18 AM, Paul Barry<paul.jam...@gmail.com> wrote:

I take it that doing it that way is "faster" than string manipulation
(or is there some other reason for this suggestion)?


Based on some discussion I've seen on the list in the past, I believe
binary is faster. In my case, I'm using binaries to construct html
chunks as I'm using cowboy, but I think cowboy can also use io lists,
like Fred mentioned.

-wes

Wes James

unread,
May 7, 2012, 10:48:19 AM5/7/12
to Paul Barry, erlang-q...@erlang.org
On Mon, May 7, 2012 at 8:18 AM, Paul Barry <paul.jam...@gmail.com> wrote:
> I take it that doing it that way is "faster" than string manipulation
> (or is there some other reason for this suggestion)?

Based on some discussion I've seen on the list in the past, I believe
binary is faster.  In my case, I'm using binaries to construct html
chunks as I'm using cowboy, but I think cowboy can also use io lists,
like Fred mentioned.

-wes

Paul Barry

unread,
May 7, 2012, 11:39:09 AM5/7/12
to Bob Ippolito, erlang-q...@erlang.org
Thanks for all of that, folks.

If there are any other newbies out there wanting to know more about
IOlists (which are "missing" from my copies of "Programming Erlang"
as well as "Erlang Programming"), here's a couple of links that I
found that do a good job of describing same:

http://dev.af83.com/2012/01/16/erlang-iolist.html

http://prog21.dadgum.com/70.html

Regards,

Paul.

P.S. Before others (and Joe) shout at me, IOlists are mentioned on
page 230 of Programming Erlang, but only just).
--
Paul Barry, w: http://paulbarry.itcarlow.ie - e: paul....@itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.

Peer Stritzinger

unread,
May 7, 2012, 1:44:15 PM5/7/12
to Paul Barry, erlang-q...@erlang.org
On Mon, May 7, 2012 at 5:39 PM, Paul Barry <paul.jam...@gmail.com> wrote:
> Thanks for all of that, folks.
>
> If there are any other newbies out there wanting to know more about
> IOlists (which are "missing" from my copies of  "Programming Erlang"
> as well as "Erlang Programming"), here's a couple of links that I
> found that do a good job of describing same:
>
>    http://dev.af83.com/2012/01/16/erlang-iolist.html
>
>    http://prog21.dadgum.com/70.html

And the beginning of this chapter (you should read the rest also and
buy the book once it is out!)

http://learnyousomeerlang.com/buckets-of-sockets

Cheers,
-- Peer

Richard O'Keefe

unread,
May 7, 2012, 6:25:50 PM5/7/12
to Paul Barry, Erlang

On 8/05/2012, at 1:31 AM, Paul Barry wrote:

> Hi folks.
>
> This might be a case of a dumb question, but here goes anyway. :-)
>
> I have two strings, A and B. Is it better to use:
>
> string:concat(A, B)
>
> or
>
> A ++ B
>
> when combining them to create a new string?

Erlang is open source. You can read the source code of library modules,
and it's often instructive to do so. In string.erl you find

concat(S1, S2) -> S1 ++ S2.

There is probably no measurable difference.

> I'm writing some code
> that generates a chunk of HTML. I know that using ++ on big lists is
> regarded as a "no-no", but is it acceptable here?

I'm not aware of any guideline that says using ++ on big lists is
a no-no. If for some reason you *need* the concatenation of two
big lists as a list, ++ is the very best way to do it.

The thing that people are warned against is that
(((A ++ B) ++ C) ++ D) ++ E
is more expensive than
A ++ (B ++ (C ++ (D ++ E)))
because the former copies A 4 times, B 3 times, C 2 times,
while the latter copies them each only once. (This is true in
Lisp, Prolog, Erlang, Haskell, Clean, SML, F#, ... or even in
C if you do your own one-way linked lists there.)

You may be asking the wrong question.
The right question is probably "Is it a good idea to generate
HTML as a string in any programming language?" to which the
answer is "only in a language that does not let you build
trees."

One way to represent SGML data, including HTML, in Erlang
looks like
{Element_Name, [{Attribute_Name,Value}...], [Child...]}
or <<text as a binary>>
or "text as a list"

in which while generating the tree you never ever have to worry
about escaping any data. You write a function that walks over
a tree like this, perhaps sending it to a port, or perhaps
creating an IO list, and *that* function does whatever escaping
is necessary.

{p,[],["This <is> safe &so; is <this>!"]}

is perfectly safe, as long as your output function knows what it
is doing. Also, if you know you are generating HTML, you could
have your output function take care of omitting end tags for
empty-by-definition elements.

You wouldn't believe how easy it is to manipulate SGML data this
way until you've tried it.

(I _could_ have pointed to xmerl, but that's rather more complicated.)

Richard O'Keefe

unread,
May 7, 2012, 6:35:38 PM5/7/12
to Wes James, erlang-q...@erlang.org
Lists can represent arbitrary Unicode codepoints as single elements.
Concatenation A++B copies A but shares B.
Suffixes of a list can be shared, taking prefixes requires copying.
Lists require at least two full words of memory per codepoint.

Binaries can represent bit-level data, or Latin 1 strings, or UTF-8
encoded Unicode. When used to represent Unicode, there is no
one-to-one correspondence between characters and bytes, which you
often don't need anyway.
Concatenating A and B *may* have to copy *both* A and B,
but Erlang can be astonishingly clever about binaries.
Any slice of a binary can be shared (but will likely be copied
if and only if it is *small*).
Binaries require one byte of memory per byte (which means up to 3
bytes for a BMP character) plus some fixed overhead.

Binaries are a closer analogue to say Java strings than lists are.

Concatenation is a thing best avoided for all three (lists, binaries,
Java strings). For example, instead of concatenating A++B++C, you
might just form a list [A,B,C] --- this is the "iolist" approach
that's been mentioned --- or indeed some other kind of tree, and
turn it into a single sequence only when you really really need to.
In practice, this is only when you cross an interface that demands
a string of some other sort.

I have done benchmarks in C, Lisp, Java, Smalltalk, and Prolog
over the years, and the cost of using strings instead of trees
is such as to drive all blood from the face. It is *scary* how
bad strings of *any* kind can be, compared with using trees.

It is also scary how *dangerous* strings can be, compared with
using trees.

Richard O'Keefe

unread,
May 7, 2012, 6:45:55 PM5/7/12
to Paul Barry, erlang-q...@erlang.org

On 8/05/2012, at 3:39 AM, Paul Barry wrote:
>
> http://dev.af83.com/2012/01/16/erlang-iolist.html

Under the heading IOList, the third paragraph gets list
concatenation about as wrong as it possibly can.
(The last sentence of that paragraph is right, though.)

Let's consider a simple "nested list" representation of
strings.

nstring = integer | [] | [nstring|nstring]
integer represents a codepoint
[] represents an empty string
[A|B] represents (but is not) the concatenation of A and B

% Convert an nstring to a plain string.
% Do not allocate any space that is not part of the
% final result.

reify_nstring(NS) ->
reify_nstring(NS, []).

reify_nstring([A|B], S) ->
reify_nstring(A, reify_nstring(B, S));
reify_nstring([], S) ->
S;
reify_nstring(C, S) when is_integer(C), C >= 0 ->
[C|S].

% Perform F(C) for each character C in nstring S.

foreach_nstring(F, [A|B]) ->
foreach_nstring(F, A),
foreach_nstring(F, B);
foreach_nstring(F, []) ->
ok;
foreach_nstring(F, C) when is_integer(C), C >= 0 ->
F(C).

% Report the effective length of an nstring in codepoints.

length_nstring(S) ->
length_nstring(S, 0).

length_nstring([A|B], N) ->
length_nstring(B, length_nstring(A, N));
length_nstring([], N) ->
N;
length_nstring(C, N) when is_integer(C), C >= 0 ->
N + 1.

You can generalise this to allow atoms and binaries
as well as character codes.

The basic idea is to *describe* concatenation without
*doing* it, and to give you functions that act *as if*
the concatenation had been done.

Richard Carlsson

unread,
May 8, 2012, 3:44:20 AM5/8/12
to erlang-q...@erlang.org
On 05/08/2012 12:25 AM, Richard O'Keefe wrote:
> One way to represent SGML data, including HTML, in Erlang
> looks like
> {Element_Name, [{Attribute_Name,Value}...], [Child...]}
> or <<text as a binary>>
> or "text as a list"
>
> in which while generating the tree you never ever have to worry
> about escaping any data. You write a function that walks over
> a tree like this, perhaps sending it to a port, or perhaps
> creating an IO list, and *that* function does whatever escaping
> is necessary.
>
> {p,[],["This<is> safe&so; is<this>!"]}
>
> is perfectly safe, as long as your output function knows what it
> is doing. Also, if you know you are generating HTML, you could
> have your output function take care of omitting end tags for
> empty-by-definition elements.
>
> You wouldn't believe how easy it is to manipulate SGML data this
> way until you've tried it.
>
> (I _could_ have pointed to xmerl, but that's rather more complicated.)

What you describe is what is called "simple form" in xmerl:

http://www.erlang.org/doc/man/xmerl.html#export_simple-3
http://www.erlang.org/doc/apps/xmerl/xmerl_ug.html#id56386

Using the xmerl_html callback module:

1> io:put_chars(xmerl:export_simple([{p,[],["This<is> safe&so;
is<this>!"]}], xmerl_html)).
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<p>This&lt;is&gt; safe&amp;so; is&lt;this&gt;!</p>

and the xmerl_xml callback module:

2> io:put_chars(xmerl:export_simple([{p,[],["This<is> safe&so;
is<this>!"]}], xmerl_xml)).
<?xml version="1.0"?><p>This&lt;is&gt; safe&amp;so; is&lt;this&gt;!</p>


/Richard Carlsson

Joe Armstrong

unread,
May 8, 2012, 4:40:48 AM5/8/12
to Paul Barry, Erlang
Here a dumb answer.

"You might not need to concatenate them" - if all you're going to
do is output
the result later then you don't need to concatenate them. For example

X = A ++ B,
file:write_file("foo", X)

and

X = [A,B], %% <--- this thing is
called an iolist
file:write_file("foo", X)

will result in identical content in the file "foo". In this case
(where you're just going
send the concatenated data to an I/O stream) you don't need to bother
concatenating the data.

If you are building complex iolists (which is very common) and you need
to flatten it, then a good strategy is to build a single iolist (X)
and then call
erlang:iolist_to_binary(X).

Using '++' when you don't need to at all is a common programming error :-)

In the cases where you do need it probably both the A and B in A++B should be
small. If A is a long string then you should look for a different algorithm.


Cheers

/Joe
Reply all
Reply to author
Forward
0 new messages