How to build iteratively a large Julia string

1,144 views
Skip to first unread message

Maurice Diamantini

unread,
Feb 17, 2015, 9:28:13 AM2/17/15
to julia...@googlegroups.com
Hi,

In Ruby, String is mutable and there is the << operator to accumulate
a string at the end of another one.

I Julia, String is immutable and so, I use the contatenation operator
when I need to build a large string:

    txt = ""
    while ...
        txt *= "yet another line.\n"
    end
    # do something with txt

This is very slow because it build a new (more and more large) string at each
iteration.
Is there another way in Julia to efficiently build such a string?
Or is there another type one can use for that task (Buffer, Array, ....)?

Thanks in advance,
-- Maurice

René Donner

unread,
Feb 17, 2015, 11:56:19 AM2/17/15
to julia...@googlegroups.com
You could use

a = Any[]
while ...
push!(a, somestring)
end
join(a)

Jameson Nash

unread,
Feb 17, 2015, 12:07:59 PM2/17/15
to julia...@googlegroups.com
There is an IOBuffer type that works well as a string builder

René Donner

unread,
Feb 17, 2015, 12:17:14 PM2/17/15
to julia...@googlegroups.com
Good point, printing to the the IOBuffer is the fastest option:

strings = [randstring() for i in 1:1000000]
function f1(strings)
a = IOBuffer()
for s in strings
print(a, s)
end
takebuf_string(a)
end

function f2(strings)
join(strings)
end

function f3(stings)
buf = Any[]
for s in strings
push!(buf, s)
end
join(buf)
end

assert(f1(strings)==f2(strings)==f3(strings))

@time f1(strings)
@time f2(strings)
@time f3(strings)
;

elapsed time: 0.038563872 seconds (24778644 bytes allocated)
elapsed time: 0.054497171 seconds (24778692 bytes allocated)
elapsed time: 0.289836491 seconds (89548756 bytes allocated, 35.00% gc time)

Maurice Diamantini

unread,
Feb 18, 2015, 3:26:51 AM2/18/15
to julia...@googlegroups.com
René and Jameson , thank you very much for your answers,

For completeness here one more test by incrementally building a large string.

    function f4(strings) 
        buf = "" 
        for s in strings 
            buf *= s
        end 
        buf 
    end 
    # @time f4(strings);
    # I'm yet waiting for result! => Ctl-c

    @time f4(strings[1:10_000]);

  # elapsed time: 0.43980588 seconds (401020144 bytes allocated, 78.57% gc time)


    @time f4(strings[1:100_000]);
    # elapsed time: 48.910219384 seconds (40008864504 bytes allocated, 66.90% gc time)


Maurice Diamantini

unread,
Feb 18, 2015, 3:35:26 AM2/18/15
to julia...@googlegroups.com

Le mardi 17 février 2015 17:47:08 UTC+1, Stefan Karpinski a écrit :

> IOBuffer is what you're looking for:
>The takebuf_string function really needs a new name.

Stefan , thank you and sorry for my double post.

I vote for generalizing the `to_s(xx)` method as the standard way for converting something to string (à la Ruby).

(out of subject, but as Julia already has the "!" character for mutator function, it could also
have the "?" character for a boolean accessor :-)

-- Maurice


Steven G. Johnson

unread,
Feb 18, 2015, 8:56:12 AM2/18/15
to julia...@googlegroups.com
On Wednesday, February 18, 2015 at 3:35:26 AM UTC-5, Maurice Diamantini wrote:

Le mardi 17 février 2015 17:47:08 UTC+1, Stefan Karpinski a écrit :

> IOBuffer is what you're looking for:
>The takebuf_string function really needs a new name.

Stefan , thank you and sorry for my double post.

I vote for generalizing the `to_s(xx)` method as the standard way for converting something to string (à la Ruby).

Conversion in Julia usually employs "convert" method, as in convert(Int, x) or (in Julia 0.4) the shorthand Int(x).   In the case of strings, partly for historical reasons we have string(x).   However, string(x) makes a string out of the printed representation of x, not out of the buffer contents in the case of IOBuffer.

Further, in this case, the "takebuf_string" function (or takebuf_array) isn't just conversion, it is mutation because it empties the buffer.  So, arguably it should follow the Julia convention and append a ! to the name.

 

Steven G. Johnson

unread,
Feb 18, 2015, 8:57:32 AM2/18/15
to julia...@googlegroups.com
On Wednesday, February 18, 2015 at 3:35:26 AM UTC-5, Maurice Diamantini wrote:
(out of subject, but as Julia already has the "!" character for mutator function, it could also
have the "?" character for a boolean accessor :-)
Reply all
Reply to author
Forward
0 new messages