Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

reading a csv file and storing the fields seperately!uh!

1 view
Skip to first unread message

PatriciaBbn

unread,
May 29, 2001, 4:48:38 PM5/29/01
to
What is the bestway to read from a csv file wheresome of the fields have commas
and some are enclosed in quatation marks(" ") i.e.

"12345678","please, help","Harry, Tom",799,3,3854
"98765432","my brain's melted","sally","harry",6521,68,2154

I used

fscanf(file_ptr, "%[^,],%[^,],etc..",num,words,etc..);

/*where num and words are arrays*/

but this means that the number of fields is variable dependant on any extra
comma's that a field may have (unless I was misinformed).

and also thought about using

fgets(record, MAXLEN, record_ptr);

and then going through the record array removing the characters I didn't want("
") and storing in other variables but it still poses the same problem as the
fscanf.

I also thought it could be impossible but that would be too easy.

If anyone could shed some light on this rather dark topic I would really
appreciate it.

Thank you
Patricia


Mark McIntyre

unread,
May 29, 2001, 5:34:31 PM5/29/01
to
On 29 May 2001 20:48:38 GMT, patri...@aol.com (PatriciaBbn) wrote:

>What is the bestway to read from a csv file wheresome of the fields have commas
>and some are enclosed in quatation marks(" ") i.e.
>
>"12345678","please, help","Harry, Tom",799,3,3854
>"98765432","my brain's melted","sally","harry",6521,68,2154

I'd probably do this with strtok(), skipping the chars I didn't want.
I assume that the same fields always have quote marks round them.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>

Chris Torek

unread,
May 29, 2001, 11:39:40 PM5/29/01
to
In article <20010529164838...@ng-cq1.aol.com>

PatriciaBbn <patri...@aol.com> writes:
>What is the bestway to read from a csv file wheresome of the fields
>have commas and some are enclosed in quatation marks(" ") i.e.
>
>"12345678","please, help","Harry, Tom",799,3,3854
>"98765432","my brain's melted","sally","harry",6521,68,2154
>
>I used
> fscanf(file_ptr, "%[^,],%[^,],etc..",num,words,etc..);
> /*where num and words are arrays*/
>
>but this means that the number of fields is variable dependant on any extra
>comma's that a field may have (unless I was misinformed).

The scanf() family of functions are absolutely *horrible* for
doing input. The sscanf() variant is only tolerable because
it is not actually used for input -- it is only used to examine
something that was previously obtained, presumably via fgets().

The basic, fundamental underlying problem with the scanf() family
is that they execute a sequence of "directives" until one of the
directives fails, or they run out of directives. Each directive
does something almost but not entirely unlike what you intended.
For instance, "%d" usually means "confuse the user by not prompting
him when he fails to enter a number". :-)

(For fscanf(), the input is usually not interactive, which helps
a little.)

>and also thought about using
> fgets(record, MAXLEN, record_ptr);

[you probably mean "file_ptr" for the last argument here]

This is considerably more promising. The fgets() call will read
up to MAXLEN-1 characters, or to a newline, whichever occurs first,
and store them in record[0], record[1], record[2], ..., record[k-1].
It will then set record[k] to '\0'. For the first example line,
then, there are 50 characters including the newline, so if MAXLEN
is at least 52, record[0] will be '"', record[1] will be '1',
record[10] will be ',', record[18] will also be ',', and so on.
In record[49] you will find the last non-newline character '4',
record[50] will be '\n', and record[51] will be '\0'.

If there is no newline in record[] -- e.g., if strchr(record, '\n')
returns NULL -- then the input stream contained a '\0' character,
or the input line was too long to fit in MAXLEN characters and yet
add the terminating '\0', or -- just possibly -- the last line of
the input stream does not end in a newline, and the fgets() call
read up to EOF. What you might want to do in these cases is up to
you (many just ignore the possibility).

Now that you have a valid C string in record[] -- i.e., there is
a valid i such that record[i] == '\0', and for all record[j],
0 <= j < i, record[i] != '\0' -- you can begin examining it:

>then going through the record array removing the characters I didn't want("
>") and storing in other variables but it still poses the same problem as the
>fscanf.

Why?

You *do* need to decide what it means if you encounter an
input line like:

"abc

or:

1,,3,"four, five, six" "seven" "eight"

but it is easy enough to write code like this:

char *p;
int inside_quote = 0;

for (p = record; *p; p++) {
if (*p == '"') {
inside_quote = 1 - inside_quote;
continue;
}
if (*p == ',' && !inside_quote) {
... do something with un-quoted comma ...
} else {
... do something with non-comma character inside or
outside a quote ...
}
}

Note that this loop just examines the record[] array one character
at a time. That means there is no real need to "preload" the data
from the file -- you could instead write:

inside_quote = 0;
while ((c = getc(file_ptr)) != EOF) {
if (c == '"') ...
... everything else also using "c" instead of *p ...
}

although now you have to put the '\n'-handling inside this loop
(with the fgets(), you might have done it elsewhere already).
If newlines occuring while "inside a quote" are special, this may
actually give you a better way to deal with them.
--
In-Real-Life: Chris Torek, Wind River Systems
El Cerrito, CA, USA Domain: to...@bsdi.com +1 510 234 3167
http://claw.eng.bsdi.com/torek/ (not always up) I report spam to abuse@.

ke...@hplb.hpl.hp.com

unread,
May 30, 2001, 5:17:20 AM5/30/01
to
In article <20010529164838...@ng-cq1.aol.com>,

patri...@aol.com (PatriciaBbn) writes:
> What is the bestway to read from a csv file wheresome of the fields have
> commas and some are enclosed in quatation marks(" ") i.e.
>
> "12345678","please, help","Harry, Tom",799,3,3854
> "98765432","my brain's melted","sally","harry",6521,68,2154

I seem to recall that "Practical Programming" (Kernighan and Pike) has
a section on this, and it's a damn good book too.

Doesn't "C Unleashed", by some people who's names I can't quite recall
but I'm sure this group will remind me, have something on this? Both books
are at home, alas, so I can't check the details, and carrying Unleashed around
would wreck my back or my bag.

DON'T USE SCANF FOR THIS.

--
Chris "the car, I can wreck myself" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html

John Crowley

unread,
May 30, 2001, 9:05:27 AM5/30/01
to
ke...@hplb.hpl.hp.com wrote:
>
> Doesn't "C Unleashed", by some people who's names I can't quite recall
> but I'm sure this group will remind me, have something on this? Both books
> are at home, alas, so I can't check the details, and carrying Unleashed around
> would wreck my back or my bag.
>

"C Unleashed" Chapter 6 p. 191 -- Mssrs. Summit and Heathfield.

The OP could really benefit from reading it.

John Crowley

unread,
May 30, 2001, 9:14:46 AM5/30/01
to
PatriciaBbn wrote:
>
>
> fscanf(file_ptr, "%[^,],%[^,],etc..",num,words,etc..);
>

See the reference to "C Unleashed" in another thread. But, the
function that follows should get you thinking in the right direction.
I started down this road and then read Steve Summit's chapter and
got much better ideas on how to do it.

This function just does a generic split by delimiter. You just need
to start adding test for double quotes and embedded delimters :)

int split_by_delim (char *buf, char **fields, char delim,
int num_elems)
{

int i = 0;

fields[i++] = buf;

while (*buf != '\0') {

if (*buf == delim) {

if (i + 1 > num_elems)
return -1;

*buf++ = '\0';
fields[i++] = buf;

} else
buf++;

}

return 0;

}

0 new messages