readchomp, readlines & friends on Windows

140 views
Skip to first unread message

Peter Simon

unread,
Mar 10, 2014, 8:20:27 PM3/10/14
to julia...@googlegroups.com
Just tried to grab the lines in a text file on Windows using Julia Version 0.3.0-prerelease+1570 with the following code:

lines = open("test.cut","r") do fid
   readchomp(fid)
end

lines came back to be a single string with embeded "\r\n" characters.  I had expected to get an array of strings without any trailing carriage returns/newlines.  Looking in the Julia source code I see that only '\n' is checked for.

I work in an environment where we freely intermix Linux-style ('\n'-terminated) and Windows-style ("\r\n"-terminated) files.  Most of our other tools work transparently with either file type, without any explicit need for conversion.

Are there any plans to make Julia's utility functions such as chomp(), readchomp(), etc., be able to handle these various line ending conventions?

Thanks,
--Peter

Kevin Squire

unread,
Mar 10, 2014, 8:31:03 PM3/10/14
to julia...@googlegroups.com
If you have time, why don't you give it a go at fixing this and submitting a pull request?

Kevin

Tony Kelman

unread,
Mar 10, 2014, 10:14:31 PM3/10/14
to julia...@googlegroups.com
+1 for more "doing-the-right-thing"-ness wrt annoying carriage returns.

Semi-related: in test/spawn.jl, the tests on lines 17 and 23 fail when run in a Windows command prompt but succeed from an MSYS/Cygwin terminal because they pipe to whichever sort.exe is found in PATH, and lines are terminated by "\r\n" in the output from Windows' sort.exe.
Message has been deleted

Peter Simon

unread,
Mar 10, 2014, 11:19:25 PM3/10/14
to julia...@googlegroups.com
Thank-you to Tony for straightening me out (off-line).  As clearly stated in the help, readchomp() is supposed to return a single string, and only strips newlines from the very end.  What I wanted was

lines = open("test.txt","r") do fid
  map(chomp,readlines(fid))
end

which works fine.

Sorry for all this noise.

--Peter

Ivar Nesje

unread,
Mar 11, 2014, 3:45:52 AM3/11/14
to julia...@googlegroups.com
Usually you should be using eachline instead of readlines, especially for large files, because they return an iterator and does not allocate strings for the individual lines, so that garbage collection can free the intermediary strings.
Reply all
Reply to author
Forward
0 new messages