CSV File with CR but no LF

176 views
Skip to first unread message

Alan Jones

unread,
Jun 6, 2014, 10:05:39 AM6/6/14
to csv...@googlegroups.com
I have a CSV file with a CR (CHAR 13 / 0Dx)  at the end of the line but no LR (CHAR 10 / 0Ax).

When i run CSV fix on the file it seems to strip out all  the  0Dx and thus when i look at the file in and editor i have one long line instead of multiple lines.

Is there a way to get CSVFix to understand this? 


I did some digging and found that files ending in CR were an older standard for Mac not sure why this output I am getting is that way.

I was thinking even about some sort of search and replace in hex but want something simple and portable.

thanks

Alan


Neil Butterworth

unread,
Jun 6, 2014, 11:03:55 AM6/6/14
to csv...@googlegroups.com
CSVfix (or more accurately, the C++ Standard Library) requires lines of text to be terminated by a newline. CSVfix cannot handle files that contain lines terminated only by a carriage return.

Alan Jones

unread,
Jun 6, 2014, 11:15:57 AM6/6/14
to csv...@googlegroups.com
Wow,  I was hoping you could suggest some sort of search/replace parameter.

then i was thinking of suggesting a command that looked for any of the 3 line endings and converted it to a single standard.


FYI i have found CSVFix worked with LF ( (CHAR 10 / 0Ax) just fine (the Unix standard) not just CR/LF.

It is just LF that is lacking.

I guess there is not an easy way to work around the C++ Standard Library.   

Well please keep it in mind in case you stumble across something simple. 

ggrothendieck

unread,
Jun 6, 2014, 9:27:21 PM6/6/14
to csv...@googlegroups.com
I think csvfix not only does not allow \r as end of line but it is also discarding it.  I tried 'csvfix seq myfile.csv' on a file with \r endings and it 

(1) treated all the lines as a single line and 
(2) it dropped the \r characters so the last field of each line ran into the first field of the next line with no CR.

Even if it did (1) could it not retain the \r characters?  

Also, if  csvfix is discarding them does this not suggest that csvfix is actually handling them in which case there is the possibility of handling them differently such as implementing a user-specified action?

Neil Butterworth

unread,
Jun 7, 2014, 4:39:34 AM6/7/14
to csv...@googlegroups.com
Translation of line endings is performed by the C++ Standard Library, before any CSVfix code gets a look-in. It would be possible to re-write CSVfix so this doesn't happen, but  I have no intention of doing so.


Tom Wieczorek

unread,
Jun 7, 2014, 6:05:30 AM6/7/14
to csv...@googlegroups.com
What about another tool to convert the line endings from Mac style to Windows or Unix style? You could use dos2unix for that, which should be available on most  OSX and Linux installations, and is available on Windows, too. Just search for it with your favorite search engine.

ggrothendieck

unread,
Jun 7, 2014, 8:29:06 AM6/7/14
to csv...@googlegroups.com
In the source file found here:

there is this comment:
// Read the next character from input, setting mNext accordingly. Discards
// any carriage returns, but if a newline is encountered bumps the line count
// used for error reporting and resets the accumulated line text, which is
// used for error reporting.

Can you comment on whether the part that says DISCARDS ANY CARRIAGE RETURNS is relevant to this discussion?  It does suggest that csvfix is processing the \r .  Is it not possible that the C library is processing the ends of line but at the same time characters not regarded to end a line (in this case \r) are passed through to csvfix where they are processed.

ggrothendieck

unread,
Jun 7, 2014, 8:31:13 AM6/7/14
to csv...@googlegroups.com
If you wanted to go that way I think the relevant command would be mac2unix or mac2dos.

Neil Butterworth

unread,
Jun 7, 2014, 8:39:00 AM6/7/14
to csv...@googlegroups.com
That parser is only used by the check command to validate CSV. The parser used to actually read CSV input for commands that accept such input is in the file a.csv.cpp in the alib library Actually, looking at the code (which I hadn't for a year or so for this particular file), I see that it does process carriage returns. However, I'm still not going to make the change.

Jonathan Leffler

unread,
Jun 7, 2014, 9:30:53 AM6/7/14
to csv...@googlegroups.com
On Unix-like systems, another standard way to convert files with Mac \r line endings to Unix \n line endings would be with the tr command:

    tr '\r' '\n' <macstyle-input >unixstyle-output

The tr command is more likely to be available than mac2unix or mac2dos.



--
You received this message because you are subscribed to the Google Groups "csvfix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to csvfix+un...@googlegroups.com.
To post to this group, send email to csv...@googlegroups.com.
Visit this group at http://groups.google.com/group/csvfix.
For more options, visit https://groups.google.com/d/optout.



--
Jonathan Leffler <jonathan...@gmail.com>  #include <disclaimer.h>
Guardian of DBD::Informix - v2013.0521 - http://dbi.perl.org
"Blessed are we who can laugh at ourselves, for we shall never cease to be amused."

ggrothendieck

unread,
Jun 7, 2014, 9:39:48 AM6/7/14
to csv...@googlegroups.com
If its agreed that \r will not be regarded as a line terminator should it not then follow that it is a valid data character and be left in the field rather than discarded?

Also, as an aside, does csvfix accept sequences such as \a, \b, \f, \n, \r, \t and \ddd to represent various special characters in regular expressions and various other commands?

Neil Butterworth

unread,
Jun 7, 2014, 11:18:34 AM6/7/14
to csv...@googlegroups.com
Newlines are retained in quoted fields. I am not interested in adding any special processing for carriage returns and am not going to change the parser. The regular expression engine in CSVfix is hand-rolled and does not support special characters - I intend to replace this engine with the std::regex standard library classes when GCC 4.9.0 becomes more widely available.

This discussion is now at an end, at least from my side.
Reply all
Reply to author
Forward
0 new messages