Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Scanf and Comma Delimiter

288 views
Skip to first unread message

F Russell

unread,
Jan 21, 2020, 3:34:38 PM1/21/20
to
I've always wondered why the scanf() function cannot accept
a comma as a delimiter. Reading CSV files would be so much
easier without having to write or use a special parser.

Is there some reason why a comma delimiter is not accepted
by scanf()?

Bonita Montero

unread,
Jan 21, 2020, 3:40:57 PM1/21/20
to
Use C++ and the regex-classes.
C++ supports all common regex-dialects.

Scott Lurndal

unread,
Jan 21, 2020, 3:51:33 PM1/21/20
to
Because scanf() was developed twenty years before Comma-Separated Value
(CSV) files were commonly used?

In any case, scanf() can handle lines with a fixed number of commas.

$ cat /tmp/a.c
#include <stdio.h>

int main(int argc, const char **argv)
{
unsigned int column1, column2, column3, column4, column5;

scanf("%u , %u , %u , %u , %u", &column1, &column2, &column3, &column4, &column5);

printf("c1=%u, c2=%u, c3=%u, c4=%u, c5=%u\n", column1, column2, column3, column4, column5);

return 0;
}

$ /tmp/a
5,3,2,1,0
c1=5, c2=3, c3=2, c4=1, c5=0
$ /tmp/a
15, 31 , 535 , 1, 31
c1=15, c2=31, c3=535, c4=1, c5=31

chad

unread,
Jan 21, 2020, 4:04:09 PM1/21/20
to
That's a fairly strong assertion since I have a 2000 page book that gives tbe theoretical explanation between the differences in Ruby, Python, PHP, and Perl regex implementations.

Jorgen Grahn

unread,
Jan 21, 2020, 4:31:30 PM1/21/20
to
What delimiters /are/ accepted, anyway? The word "delimiter" doesn't
appear in the scanf() documentation I have; a quick reading says the
only special handling is, a sequence of whitespace characters counts
as just a chunk of whitespace, like \s+ in a Perl regexp.

(I never use scanf() myself -- I find it safer and easier to do the
whole parsing manually, combined with strtoul() and friends.)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Jorgen Grahn

unread,
Jan 21, 2020, 4:34:40 PM1/21/20
to
Please don't feed the trolls.

Malcolm McLean

unread,
Jan 21, 2020, 4:43:38 PM1/21/20
to
It does.
int a,b;
scanf(%d, %d\n", &x, &b);

will read two comma-delimited values into a and b.

However CSV files have complex rules for strings and escapes, and also
allow missing values. You can't expect a standard library function to
support this with a single call, unless that function was specifically
designed for CSV files.


F Russell

unread,
Jan 21, 2020, 5:33:00 PM1/21/20
to
On Tue, 21 Jan 2020 13:43:27 -0800, Malcolm McLean wrote:

>>
> It does.
> int a,b;
> scanf(%d, %d\n", &x, &b);
>
> will read two comma-delimited values into a and b.
>

That won't work for strings:

fscanf(fd, "%s,%s\n", string1, string2);

Richard Tobin

unread,
Jan 21, 2020, 6:25:08 PM1/21/20
to
In article <r07u6...@news1.newsguy.com>, F Russell <f...@random.info> wrote:

>> It does.
>> int a,b;
>> scanf(%d, %d\n", &x, &b);
>>
>> will read two comma-delimited values into a and b.

>That won't work for strings:
>
>fscanf(fd, "%s,%s\n", string1, string2);

There is %[^,]

-- Richard

James Kuyper

unread,
Jan 22, 2020, 12:03:30 AM1/22/20
to
On 1/21/20 4:31 PM, Jorgen Grahn wrote:
> On Tue, 2020-01-21, F Russell wrote:
>> I've always wondered why the scanf() function cannot accept
>> a comma as a delimiter. Reading CSV files would be so much
>> easier without having to write or use a special parser.
>>
>> Is there some reason why a comma delimiter is not accepted
>> by scanf()?
>
> What delimiters /are/ accepted, anyway? The word "delimiter" doesn't
> appear in the scanf() documentation I have; a quick reading says the
> only special handling is, a sequence of whitespace characters counts
> as just a chunk of whitespace, like \s+ in a Perl regexp.

As you say, the standard doesn't talk in terms of delimiters. However,
scanf() format strings can be set up to parse delimited text in two
different ways:

1. Each ordinary multi-byte character in a format string that isn't
white-space and isn't part of a conversion specification is considered
to be a directive specifying the following behavior: "reading the next
characters of the stream. If any of those characters differ from the
ones composing the directive, the directive fails and the differing and
subsequent characters remain unread. Similarly, if end-of-file, an
encoding error, or a read error prevents a character from being read,
the directive fails." (7.21.6.2p6).
A nul character terminates the format string. A white-space character
causes the behavior you describe above. A '%' character necessarily
marks the start of the next conversion specification. All other
characters are allowed in such a directive. Note that if you with to
match a single '%' character, the conversion specifier "%%" will handle
that for you, so the null character is the only one you can't handle in
this fashion.

2. In a scanset conversion specifier "%[]", any character except a null
character can be part of the scanset. The '^', ']' and '-' characters
have special meanings in a scanset, but for each of them that meaning is
position dependent. If they appear in any other position, they are
treated as ordinary members of the scanset. The '^' character has a
special meaning only when it's the first character of a scan set. The
'-' and ']' characters have their special meanings only when they are
not the first character in the scanset (or the second character if the
first character is '^').
That leaves only one difficult case: a scanlist that contains only the
'^' character. This is not impossible to create, but it can only be
specified by writing a scan set that excludes every character except '^'.

James Kuyper

unread,
Jan 22, 2020, 12:30:52 AM1/22/20
to
Why do you think that it isn't? What happens if you use your compiler to
compile the following program:

#include <stdio.h>
int main(void)
{
int i, j;
double d, e;
char buffer[80];

scanf("%d,%lf,%79[^,],%lf,%d\n", &i, &d, buffer, &e, &j);
printf("%d,%lf,%s,%lf,%d\n", i, d, buffer, e, j);
}

and then pass it a file containing the following line:

1,234.567890, thousands_sep decimal_point, 1.234,5

When I run that program with that input, I get:

1,234.567890, thousands_sep decimal_point,1.234000,5

Note: the particular values I chose for that test are essentially an
in-joke, referring to an issue that makes comma-delimited input
problematic, but not for any reason that's directly related to your
question.

Keith Thompson

unread,
Jan 22, 2020, 2:16:07 AM1/22/20
to
That won't work for strings either. If the input is "foo bar,baz",
it will fail; the first "%s" will match "foo" and it will then fail
looking for a comma.

(I'm taking a risk by posting this without trying it.)

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Anton Shepelev

unread,
Jan 22, 2020, 5:37:28 AM1/22/20
to
Bonita Montero to F Russell:

> > I've always wondered why the scanf() function cannot
> > accept a comma as a delimiter. Reading CSV files would
> > be so much easier without having to write or use a
> > special parser. Is there some reason why a comma
> > delimiter is not accepted by scanf()?
>
> Use C++

But Mr Russel asked about C...

> and the regex-classes.
> C++ supports all common regex-dialects.

Using regex to read CSV crazy, and using `scanf' is
unnecessary. CSV is by design so simple

https://tools.ietf.org/html/rfc4180

that you can write a generic CSV reader and writer in clean
C in several hours, with only the basic IO routines.

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Kenny McCormack

unread,
Jan 22, 2020, 6:46:53 AM1/22/20
to
In article <20200122133713.a08ae7e1db2d5b68b06e4323@g{oogle}mail.com>,
Anton Shepelev <anton.txt@g{oogle}mail.com> wrote:
>Bonita Montero to F Russell:
>
>> > I've always wondered why the scanf() function cannot
>> > accept a comma as a delimiter. Reading CSV files would
>> > be so much easier without having to write or use a
>> > special parser. Is there some reason why a comma
>> > delimiter is not accepted by scanf()?
>>
>> Use C++
>
>But Mr Russel asked about C...
>
>> and the regex-classes.
>> C++ supports all common regex-dialects.
>
>Using regex to read CSV crazy, and using `scanf' is
>unnecessary. CSV is by design so simple
>
> https://tools.ietf.org/html/rfc4180
>
>that you can write a generic CSV reader and writer in clean
>C in several hours, with only the basic IO routines.

I'll bet your next creation is going to be this thing, nice and round, it
will revolutionize transportation. I think you've even got a name for it -
yeah, here it is - the wheel.

If there is anything in this world that should already be done and which
nobody in their right mind would consider writing from scratch, it's a CSV
parser.

--
"Women should not be enlightened or educated in any way. They should be
segregated because they are the cause of unholy erections in holy men.

-- Saint Augustine (354-430) --

Anton Shepelev

unread,
Jan 22, 2020, 7:26:13 AM1/22/20
to
Kenny McCormack to Anton Shepelev:

> > Using regex to read CSV crazy, and using `scanf' is
> > unnecessary. CSV is by design so simple
> >
> > https://tools.ietf.org/html/rfc4180
> >
> > that you can write a generic CSV reader and writer in
> > clean C in several hours, with only the basic IO
> > routines.
>
> I'll bet your next creation is going to be this thing,
> nice and round, it will revolutionize transportation. I
> think you've even got a name for it - yeah, here it
> is -- the wheel.
>
> If there is anything in this world that should already be
> done and which nobody in their right mind would consider
> writing from scratch, it's a CSV parser.

I do not object to using an existing CSV parser if its code
meets my criteria of sanity and simplicity.

Bart

unread,
Jan 22, 2020, 9:02:13 AM1/22/20
to
On 22/01/2020 11:46, Kenny McCormack wrote:
> In article <20200122133713.a08ae7e1db2d5b68b06e4323@g{oogle}mail.com>,
> Anton Shepelev <anton.txt@g{oogle}mail.com> wrote:

>> Using regex to read CSV crazy, and using `scanf' is
>> unnecessary. CSV is by design so simple
>>
>> https://tools.ietf.org/html/rfc4180
>>
>> that you can write a generic CSV reader and writer in clean
>> C in several hours, with only the basic IO routines.
>
> I'll bet your next creation is going to be this thing, nice and round, it
> will revolutionize transportation. I think you've even got a name for it -
> yeah, here it is - the wheel.
>
> If there is anything in this world that should already be done and which
> nobody in their right mind would consider writing from scratch, it's a CSV
> parser.

Since the mid-70s I've used languages where reading 3 variables could be
as easy as typing:

read a, b, c

Only C made it hard, and others have copied that (not even the
supposedly beginner-friendly Python can manage it that easily).

As for CSV, I'm not sure my own language needs anything special
(although I haven't read the spec at that link). Here's an example that
reads a CSV file and prints the fields on the console (without commas):


C:\mapps>type test.m
import clib
import mlib

proc start=
filehandle f # (FILE* in C)
int a # (int64_t in C)
ichar b, c # (char* in C)
real d # (double in C)

f:=fopen("test.csv","r")

while not myeof(f) do
readln @f, a, b, c, d
println a, b, c, d

pcm_freestr(b); pcm_freestr(c)
od

fclose(f)

end

A test input file:

C:\mapps>type test.csv
10, "twenty", thirty, 40.1 Extra
50 """sixty""" seventy 80.2

And the results of running that program:

C:\mapps>test
10 twenty thirty 40.100000
50 "sixty" seventy 80.200000

Challenge: write the equivalent in C with no more than double the number
of lines (ie. 30 or less plus blanks).

Notes:

* Notice the second line of the .csv file doesn't actually use commas,
which are not needed (so that you can't have spaces inside items
without quotes). The readln statement is actually general-purpose, not
just for csv. The C version can require commas.

* Notice the first line of the .csv has an extra field. Since the
program is line-oriented, it should read the first 4 fields and
ignore anything else following.

* It is assumed the .csv is correct and machine-generated (no error
checking for missing lines/fields, or incorrect numeric fields,
or overflows)

* I haven't maintained my 'readln' recently for strings, and the
quick method it uses is to return a pointer to an allocated string,
and store that in b or c. So these require freeing. The C version
can do what it likes.

* A maximum line length can be assumed if using line buffers (I think I
use 16KB, but the point is is that it's fixed; I can change to 1MB
for example).

* My version doesn't have an upper limit on the lengths on the middle
string fields (other than being limited by line buffer length, but
that is a separate limitation).

* The CSV spec used is simple and illustrated in my test input. Quotes
are not needed around alphanumeric fields, unless they contain
special characters. Embedded quotes are then written as "".

Jorgen Grahn

unread,
Jan 22, 2020, 10:00:40 AM1/22/20
to
On Wed, 2020-01-22, Anton Shepelev wrote:
> Bonita Montero to F Russell:
>
>> > I've always wondered why the scanf() function cannot
>> > accept a comma as a delimiter. Reading CSV files would
>> > be so much easier without having to write or use a
>> > special parser. Is there some reason why a comma
>> > delimiter is not accepted by scanf()?
>>
>> Use C++
>
> But Mr Russel asked about C...
>
>> and the regex-classes.
>> C++ supports all common regex-dialects.
>
> Using regex to read CSV crazy, and using `scanf' is
> unnecessary. CSV is by design so simple
>
> https://tools.ietf.org/html/rfc4180
>
> that you can write a generic CSV reader and writer in clean
> C in several hours, with only the basic IO routines.

Beware of the multiple dialects of CSV though. Although if the OP
considered using scanf(), he was probably not considering those.

/Jorgen

Andrey Tarasevich

unread,
Jan 22, 2020, 2:45:28 PM1/22/20
to
Um... What are you talking about? `scanf` has absolutely no problems
working with comma delimiter.

`scanf` is not very well suited for parsing CSV files mostly because
`scanf` format specifiers don't support empty sequences, i.e. when a
format specifier discovers that it matches an empty sequence, it
triggers an "error" and aborts the whole `scanf`. This issue applies
specifically to the `%[]` format.

This is something one should keep in mind when parsing CSV sequences like

value1,,,value2,value3,,value4

i.e. sequences where delimiters might be densely "packed" to designate
empty fields. But even that issue can be worked around, if one knows
what one's doing.

--
Best regards,
Andrey Tarasevich

Bart

unread,
Jan 22, 2020, 3:13:45 PM1/22/20
to
On 22/01/2020 14:02, Bart wrote:

> As for CSV, I'm not sure my own language needs anything special
> (although I haven't read the spec at that link). Here's an example that
> reads a CSV file and prints the fields on the console (without commas):
>
>
> C:\mapps>type test.m
...

> Challenge: write the equivalent in C with no more than double the number
> of lines (ie. 30 or less plus blanks).
>
> Notes:

> * The CSV spec used is simple and illustrated in my test input. Quotes
>   are not needed around alphanumeric fields, unless they contain
>   special characters. Embedded quotes are then written as "".


On 22/01/2020 19:45, Andrey Tarasevich wrote:

> `scanf` is not very well suited for parsing CSV files mostly because
> `scanf` format specifiers don't support empty sequences,
> This is something one should keep in mind when parsing CSV sequences like
>
> value1,,,value2,value3,,value4

The program I posted deals with empty sequences like this (here it
necessarily requires commas). Empty alphanumerics are empty strings;
empty numeric fields are 0 or 0.0.

I've looked at the link about the CSV spec; a lot of it I don't agree
with and would not implement (such as splitting fields across multiple
lines, or allowing spaces inside open (not quoted) fields.

CSV is supposed to be a line-oriented format, not character (that sounds
like something originating in C and/or Unix).

Doesn't bother me as usually my CSV readers are for files I generate. Or
I generate CSV files without any of those quirks.

Note that for a general CSV file reader, you don't know in advance what
types the fields are, or how many fields per line (until you've read the
first line), so they would probably all need reading as strings. This
requires a different approach from my little test program.

Lew Pitcher

unread,
Jan 29, 2020, 11:11:08 AM1/29/20
to
Anton Shepelev wrote:

[snip]
> Using regex to read CSV crazy, and using `scanf' is
> unnecessary. CSV is by design so simple
>
> https://tools.ietf.org/html/rfc4180
[snip]

Unfortunately, you (appear to) misunderstand the content and purpose of that
RFC.

From RFC 4180:

* "This memo provides information for the Internet community. It does
not specify an Internet standard of any kind."

* "While there are various specifications and implementations for the
CSV format (for ex. [4], [5], [6] and [7]), there is no formal
specification in existence, which allows for a wide variety of
interpretations of CSV files."

"[4] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File
Format", 2004,
<http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm>.

[5] Edoceo, Inc., "CSV Standard File Format", 2004,
<http://www.edoceo.com/utilis/csv-file-format.php>.

[6] Rodger, R. and O. Shanaghy, "Documentation for Ricebridge CSV
Manager", February 2005,
<http://www.ricebridge.com/products/csvman/reference.htm>.

[7] Raymond, E., "The Art of Unix Programming, Chapter 5", September
2003,
<http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>."

With respect to the CSV format, he key phrases from RFC 4180 are
"does not specify an Internet standard of any kind"
and
"there is no formal specification in existence"

The RFC does not "set a standard", but instead documents an apparent
minimum common subset of a variety of differing CSV specifications
and implementations.

CSV has no single "design", and cannot be described as "so simple".

However, if you are content to re-implement a minimum, mostly-universally-
compatable CSV parser, you /can/ do so easily in C.

--
Lew Pitcher
"In Skills, We Trust"
0 new messages