Handling backslashes

31 views
Skip to first unread message

slackbot

unread,
Jan 4, 2012, 9:05:41 AM1/4/12
to CSVChat
My input CSV file has a field f1 with value:

"a\\b"

I'm using the plain vanilla reader and writer and my output CSV
results in the same field, f1, having the value:

a\\b

Note that the quotes are lost. Is this expected? Can anyone explain
what's going on here? I would expect my output to look like my input.

Thanks

shriop

unread,
Jan 4, 2012, 10:09:09 AM1/4/12
to CSVChat
Yes, this is expected. Quotes are only required if the field value
contains a comma. Whoever created the original CSV file did not need
to quote it.

Bruce Dunwiddie

slackbot

unread,
Jan 4, 2012, 10:37:49 AM1/4/12
to CSVChat
Thanks for the prompt reply.

The problem with not quoting it is that it doubles. That is, if the
input value was a\\a (no quotes), then the output becomes a\\\\a. This
has an exponential growth on repeated read-writes.

Can that be avoided? If so, how?

shriop

unread,
Jan 4, 2012, 7:27:57 PM1/4/12
to CSVChat
That is not how it should work. Can I see some sample code that causes
this?

Bruce Dunwiddie

slackbot

unread,
Jan 5, 2012, 2:01:03 PM1/5/12
to CSVChat
Here is a test I quickly put together:

I quickly put a unit test together:

https://gist.github.com/1566658

Additionally, when repeated (read, write, read, write...), the \\
double.

shriop

unread,
Jan 5, 2012, 11:54:57 PM1/5/12
to CSVChat
Ok, a couple of things. I know it's not exactly obvious, but to get
the functionality that I think you're looking for, you will also need
to set reader.Settings.UseTextQualifier to false. Now to verify that
in your test case, you have a couple of flaws that you need to fix
besides setting that setting. Currently, you're writing out the
contents of data1 to the writer, when you want to be writing out the
contents of reader[0], otherwise you're not actually testing the
reader. But you also need to make sure you move that call to Write up
to before your second call to ReadRecord, so that reader[0] still has
the value you're looking for. I didn't feel like importing the actual
testing framework namespace so I used the builtin Assert and I took
out a couple of Asserts that I didn't care about, but I think the code
below is a successful proof. Also, to anyone else that might be just
reading this thread at some point without paying too much attention,
make sure that you're noticing the backslash escaping of C# that's
being used when hardcoding the test values, as it's easily overlooked.

var data1 = "a\\b";

using (var reader = CsvReader.Parse(data1))
{
reader.Settings.EscapeMode = EscapeMode.Backslash;
reader.Settings.UseTextQualifier = false;
using (var stream = new MemoryStream())
{
using (var writer = new CsvWriter(stream, ',', Encoding.Default))
{
writer.Settings.EscapeMode = EscapeMode.Backslash;
Debug.Assert(reader.ReadRecord());
writer.Write(reader[0]);
Debug.Assert(!reader.ReadRecord());
writer.Flush();
var buffer = stream.ToArray();
var data2 = Encoding.Default.GetString(buffer, 0,
buffer.Length);
Debug.Assert(data2 == data1);
}
}
}

Bruce Dunwiddie

shriop

unread,
Jan 5, 2012, 11:59:13 PM1/5/12
to CSVChat
In thinking about this along with your initial question about quotes,
I guess this may not be a valid solution for you. Do you have an
actual file from an external source that you're trying to parse that
uses quotes as text qualifiers, and backslash escaping both inside and
outside text qualifier occurrences, or are you just testing what
you're assuming are valid scenarios?

Bruce Dunwiddie

slackbot

unread,
Jan 6, 2012, 9:38:12 AM1/6/12
to CSVChat
The test didn't pass for me. The last assert fails and data2 is "a".

Setting UseTextQualifier to false will not handle quotes (which we use
the backslash to escape).

slackbot

unread,
Jan 6, 2012, 9:47:50 AM1/6/12
to CSVChat
The former. We default to the backslash for escaping. Altering that is
likely going to break existing (previously processed) files.

shriop

unread,
Jan 9, 2012, 9:02:05 AM1/9/12
to CSVChat
Unfortunately, CsvReader doesn't currently handle this escape pattern.
It would be a fairly significant branch to add to the parsing logic,
so it's not something I could add quickly.

Bruce Dunwiddie
Reply all
Reply to author
Forward
0 new messages