"12345678","please, help","Harry, Tom",799,3,3854
"98765432","my brain's melted","sally","harry",6521,68,2154
I used
fscanf(file_ptr, "%[^,],%[^,],etc..",num,words,etc..);
/*where num and words are arrays*/
but this means that the number of fields is variable dependant on any extra
comma's that a field may have (unless I was misinformed).
and also thought about using
fgets(record, MAXLEN, record_ptr);
and then going through the record array removing the characters I didn't want("
") and storing in other variables but it still poses the same problem as the
fscanf.
I also thought it could be impossible but that would be too easy.
If anyone could shed some light on this rather dark topic I would really
appreciate it.
Thank you
Patricia
>What is the bestway to read from a csv file wheresome of the fields have commas
>and some are enclosed in quatation marks(" ") i.e.
>
>"12345678","please, help","Harry, Tom",799,3,3854
>"98765432","my brain's melted","sally","harry",6521,68,2154
I'd probably do this with strtok(), skipping the chars I didn't want.
I assume that the same fields always have quote marks round them.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
The scanf() family of functions are absolutely *horrible* for
doing input. The sscanf() variant is only tolerable because
it is not actually used for input -- it is only used to examine
something that was previously obtained, presumably via fgets().
The basic, fundamental underlying problem with the scanf() family
is that they execute a sequence of "directives" until one of the
directives fails, or they run out of directives. Each directive
does something almost but not entirely unlike what you intended.
For instance, "%d" usually means "confuse the user by not prompting
him when he fails to enter a number". :-)
(For fscanf(), the input is usually not interactive, which helps
a little.)
>and also thought about using
> fgets(record, MAXLEN, record_ptr);
[you probably mean "file_ptr" for the last argument here]
This is considerably more promising. The fgets() call will read
up to MAXLEN-1 characters, or to a newline, whichever occurs first,
and store them in record[0], record[1], record[2], ..., record[k-1].
It will then set record[k] to '\0'. For the first example line,
then, there are 50 characters including the newline, so if MAXLEN
is at least 52, record[0] will be '"', record[1] will be '1',
record[10] will be ',', record[18] will also be ',', and so on.
In record[49] you will find the last non-newline character '4',
record[50] will be '\n', and record[51] will be '\0'.
If there is no newline in record[] -- e.g., if strchr(record, '\n')
returns NULL -- then the input stream contained a '\0' character,
or the input line was too long to fit in MAXLEN characters and yet
add the terminating '\0', or -- just possibly -- the last line of
the input stream does not end in a newline, and the fgets() call
read up to EOF. What you might want to do in these cases is up to
you (many just ignore the possibility).
Now that you have a valid C string in record[] -- i.e., there is
a valid i such that record[i] == '\0', and for all record[j],
0 <= j < i, record[i] != '\0' -- you can begin examining it:
>then going through the record array removing the characters I didn't want("
>") and storing in other variables but it still poses the same problem as the
>fscanf.
Why?
You *do* need to decide what it means if you encounter an
input line like:
"abc
or:
1,,3,"four, five, six" "seven" "eight"
but it is easy enough to write code like this:
char *p;
int inside_quote = 0;
for (p = record; *p; p++) {
if (*p == '"') {
inside_quote = 1 - inside_quote;
continue;
}
if (*p == ',' && !inside_quote) {
... do something with un-quoted comma ...
} else {
... do something with non-comma character inside or
outside a quote ...
}
}
Note that this loop just examines the record[] array one character
at a time. That means there is no real need to "preload" the data
from the file -- you could instead write:
inside_quote = 0;
while ((c = getc(file_ptr)) != EOF) {
if (c == '"') ...
... everything else also using "c" instead of *p ...
}
although now you have to put the '\n'-handling inside this loop
(with the fgets(), you might have done it elsewhere already).
If newlines occuring while "inside a quote" are special, this may
actually give you a better way to deal with them.
--
In-Real-Life: Chris Torek, Wind River Systems
El Cerrito, CA, USA Domain: to...@bsdi.com +1 510 234 3167
http://claw.eng.bsdi.com/torek/ (not always up) I report spam to abuse@.
Doesn't "C Unleashed", by some people who's names I can't quite recall
but I'm sure this group will remind me, have something on this? Both books
are at home, alas, so I can't check the details, and carrying Unleashed around
would wreck my back or my bag.
DON'T USE SCANF FOR THIS.
--
Chris "the car, I can wreck myself" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html
"C Unleashed" Chapter 6 p. 191 -- Mssrs. Summit and Heathfield.
The OP could really benefit from reading it.
See the reference to "C Unleashed" in another thread. But, the
function that follows should get you thinking in the right direction.
I started down this road and then read Steve Summit's chapter and
got much better ideas on how to do it.
This function just does a generic split by delimiter. You just need
to start adding test for double quotes and embedded delimters :)
int split_by_delim (char *buf, char **fields, char delim,
int num_elems)
{
int i = 0;
fields[i++] = buf;
while (*buf != '\0') {
if (*buf == delim) {
if (i + 1 > num_elems)
return -1;
*buf++ = '\0';
fields[i++] = buf;
} else
buf++;
}
return 0;
}