Imitating scanf

610 views
Skip to first unread message

Steven White

unread,
Dec 24, 2015, 11:51:56 PM12/24/15
to julia-users
Say I  have a data file in plain text, which starts with some lines that describe matrix sizes, etc, for the data that follows.  For example, the first line might be
9 2 2 1.3

To read this file, in C I would use something like this call to scanf:
scanf("%d%d%d%lf",&n,&nup,&ndn,&a);

How do I do this in julia?  There doesn't seem to be a simple, obvious way.  Of course, you can read the line as a string, and parse it using regular expressions, but this is much messier than the call to scanf.

Here is a solution:
getline() = readdlm(IOBuffer(readline()),Any) 

n,nup,ndn,a = getline()

This seems to work reliably.  (For example, it doesn't choke if there happen to be two spaces between some of the numbers.) 

It is also obscure and doesn't give any warnings if some of the types aren't what was expected. It also seems inefficient if you try to read a lot of data this way.

Questions:  Why isn't something like this (but better) built in to julia?  Is there some nice method that I've missed that does this better?  Can this getline() function be improved? 

Adrian Cuthbertson

unread,
Dec 25, 2015, 5:20:21 AM12/25/15
to julia...@googlegroups.com
The essential part first:

julia> a,b,c,d=map(parse,split("9 2 2 1.3\n"))
4-element Array{Real,1}:
 9  
 2  
 2  
 1.3

but see what you actually get back...
julia> typeof(a)
Int64
julia> typeof(d)
Float64

That's Julia's type inferencing at play. And functional abilities (map, parse). See the manual on those. 

Now as originally needed...

julia> io=IOBuffer("9 2 2  1.3\n")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=10, maxsize=Inf, ptr=1, mark=-1)

julia> n,nup,ndn,a=map(parse,split(readline(io)))
4-element Array{Real,1}:
 9  
 2  
 2  
1.3

julia> typeof(n)
Int64

julia> typeof(a)
Float64

It's probably better to do it properly in a function rather than like that as a one-liner - so you can catch exceptions, etc. 

-- Adrian.

Steven White

unread,
Jan 3, 2016, 4:48:01 PM1/3/16
to julia-users
Adrian--thanks!  

I have a followup:  parse seems about 100 times slower than the equivalent C++ code.  I would have hoped for only a
factor of two or so.  Here is a C++ code fragment:

for(int i = 1; i <= 1000000; i++)
{
std::string s("4.123456789");
std::istringstream iss(s);
double d;
iss >> d;
}
This takes 0.8 seconds to run on a mac mini, so about a microsecond per parsing of a double.

Here is a julia version, with only 10^4 parses:
function testparse()
    parse("4.123456789")
end

@time testparse()
@time for i=1:10000
    testparse()
end
Running this on the same machine takes about 0.65 seconds.   I get similar results if I call @time for a single call to
testparse(), and any other variations I could think of.  This indicates julia's parse is about a factor of 80 slower than the equivalent
in C++.

Have I done something to make this slow, or is parse just very slow?   Why would it be so slow?

-Steve

Mauro

unread,
Jan 3, 2016, 5:08:08 PM1/3/16
to julia...@googlegroups.com
> function testparse()
> parse("4.123456789")
> end


Try:

function testparse()
parse(Float64, "4.123456789")
end

I get 100x speedup.

Steven White

unread,
Jan 3, 2016, 7:34:33 PM1/3/16
to julia-users
Excellent.  I get a more than 100x speedup to.  Then the question is, how to make a good utility function, which reads numbers from a line of input with the types specified for speed.
Here is something that seems to work correctly:

function getline(src,a,b...)
    ss = split(readline(src))
    t = (a,b...)
    ntuple(j->parse(t[j],ss[j]),length(t))
end

#=
STDIN has
9 2 2  1.3
=#

i,j,k,x = getline(STDIN,Int,Int,Int,Float64)
@show i,j,k,x
#=  Result:
(i,j,k,x) = (9,2,2,1.3)
=#  

For non-speed-intensive uses, here is a method without the types:
               
getline(src=STDIN) = map(parse,split(readline(src)))  

Any suggestions/comments?
Reply all
Reply to author
Forward
0 new messages