On 22/01/2020 11:46, Kenny McCormack wrote:
> In article <20200122133713.a08ae7e1db2d5b68b06e4323@g{oogle}
mail.com>,
> Anton Shepelev <anton.txt@g{oogle}
mail.com> wrote:
>> Using regex to read CSV crazy, and using `scanf' is
>> unnecessary. CSV is by design so simple
>>
>>
https://tools.ietf.org/html/rfc4180
>>
>> that you can write a generic CSV reader and writer in clean
>> C in several hours, with only the basic IO routines.
>
> I'll bet your next creation is going to be this thing, nice and round, it
> will revolutionize transportation. I think you've even got a name for it -
> yeah, here it is - the wheel.
>
> If there is anything in this world that should already be done and which
> nobody in their right mind would consider writing from scratch, it's a CSV
> parser.
Since the mid-70s I've used languages where reading 3 variables could be
as easy as typing:
read a, b, c
Only C made it hard, and others have copied that (not even the
supposedly beginner-friendly Python can manage it that easily).
As for CSV, I'm not sure my own language needs anything special
(although I haven't read the spec at that link). Here's an example that
reads a CSV file and prints the fields on the console (without commas):
C:\mapps>type test.m
import clib
import mlib
proc start=
filehandle f # (FILE* in C)
int a # (int64_t in C)
ichar b, c # (char* in C)
real d # (double in C)
f:=fopen("test.csv","r")
while not myeof(f) do
readln @f, a, b, c, d
println a, b, c, d
pcm_freestr(b); pcm_freestr(c)
od
fclose(f)
end
A test input file:
C:\mapps>type test.csv
10, "twenty", thirty, 40.1 Extra
50 """sixty""" seventy 80.2
And the results of running that program:
C:\mapps>test
10 twenty thirty 40.100000
50 "sixty" seventy 80.200000
Challenge: write the equivalent in C with no more than double the number
of lines (ie. 30 or less plus blanks).
Notes:
* Notice the second line of the .csv file doesn't actually use commas,
which are not needed (so that you can't have spaces inside items
without quotes). The readln statement is actually general-purpose, not
just for csv. The C version can require commas.
* Notice the first line of the .csv has an extra field. Since the
program is line-oriented, it should read the first 4 fields and
ignore anything else following.
* It is assumed the .csv is correct and machine-generated (no error
checking for missing lines/fields, or incorrect numeric fields,
or overflows)
* I haven't maintained my 'readln' recently for strings, and the
quick method it uses is to return a pointer to an allocated string,
and store that in b or c. So these require freeing. The C version
can do what it likes.
* A maximum line length can be assumed if using line buffers (I think I
use 16KB, but the point is is that it's fixed; I can change to 1MB
for example).
* My version doesn't have an upper limit on the lengths on the middle
string fields (other than being limited by line buffer length, but
that is a separate limitation).
* The CSV spec used is simple and illustrated in my test input. Quotes
are not needed around alphanumeric fields, unless they contain
special characters. Embedded quotes are then written as "".