CSV Parser

GlenB

unread,

Dec 17, 2009, 2:45:34 PM12/17/09

to

I've yet to find a decent parser for CSV on here, so here's one I
threw together that appears to work well. This is written for D3. Feel
free to repost on PickWiki.

*
* (c)2009 - Glen Batchelor
* License: public domain, free to modify and distribute
* Please retain this information in all distributions
*
SUBROUTINE CSV.PARSER(STRING,ARRAY)
$OPTIONS EXT
ARRAY = STRING
*
* CARRIAGE RETURNS SHOULD NOT EXIST INSIDE QUOTED CELLS SO WE
CAN CONVERT QUOTED CELLS TO CR DELIMITED DATA
* NOTE: LINE FEEDS/BREAKS MAY EXIST INSIDE QUOTED CELLS SO YOU
MAY WANT TO LOOK FOR CHAR(10) AND CONVERT
* ACCORDINGLY IN YOUR APPLICATION.
*
ARRAY = SWAP(ARRAY,',"',CHAR(13))
ARRAY = SWAP(ARRAY,'",',CHAR(13))
*
* GET RID OF "ESCAPED" QUOTES SINCE WE AREN'T LOOKING AT QUOTES
AT ALL NOW
*
ARRAY = SWAP(ARRAY,'""','"')
STRLEN = LEN(ARRAY)
*
* INDEX OUR CR DELIMITED CELLS OF COMMA-CONTAINING DATA AND MAP
THE START/END CHAR LOCATIONS
*
IGNORE.RANGES = ''
RANGEPTR = 0
CELLACTIVE = 0
FOR PTR = 1 TO STRLEN
IF ARRAY[PTR,1] = CHAR(13) AND CELLACTIVE = 0 THEN
CELLACTIVE = 1
RANGEPTR = RANGEPTR + 1
IGNORE.RANGES<RANGEPTR,1> = PTR
GO SCAN.NEXT.CHAR
END
IF ARRAY[PTR,1] = CHAR(13) AND CELLACTIVE = 1 THEN
CELLACTIVE = 0
IGNORE.RANGES<RANGEPTR,2> = PTR
END
SCAN.NEXT.CHAR:
NEXT PTR
*
* CONVERT COMMAS TO CHAR(254), SKIPPING ALL THE IGNORED RANGES
FOR EMBEDDED COMMAS
*
DEL.CNT = 0
FOR PTR = 1 TO STRLEN
* SEE IF THE CURRENT CHAR SHOULD BE EXCLUDED FROM POSSIBLE
COMMA-TO-CHAR(254) CONVERSION
GOSUB CHECK.RANGE
IF IGNORE.CHAR = 1 THEN
IF ARRAY[PTR,1] = CHAR(13) AND DEL.CNT = 0 THEN
NEWARRAY = ARRAY[1,PTR-1]:CHAR(254):ARRAY[PTR+1,STRLEN]
ARRAY = NEWARRAY
PTR = PTR - 1
STRLEN = LEN(ARRAY)
DEL.CNT = 1
END
IF ARRAY[PTR,1] = CHAR(13) AND DEL.CNT = 1 THEN
NEWARRAY = ARRAY[1,PTR-1]:CHAR(254):ARRAY[PTR+1,STRLEN]
ARRAY = NEWARRAY
PTR = PTR - 1
STRLEN = LEN(ARRAY)
DEL.CNT = 0
END
END ELSE
IF ARRAY[PTR,1] = "," THEN ARRAY[PTR,1] = CHAR(254)
END
NEXT PTR
RETURN
*
* IS THE CURRENT CHAR INSIDE A COMMA-CONTAINING DATA CELL?
*
CHECK.RANGE:
IGNORE.CHAR = 0
FOR RPTR = 1 TO RANGEPTR
IF PTR >= IGNORE.RANGES<RPTR,1> AND PTR <= IGNORE.RANGES<RPTR,
2> THEN
IGNORE.CHAR = 1
END
NEXT RPTR
RETURN

x

unread,

Dec 18, 2009, 7:06:15 PM12/18/09

to

On Dec 17, 2:45 pm, GlenB <batch...@bellsouth.net> wrote:
> I've yet to find a decent parser for CSV...

Use this one:
http://home.primus.ca/~lxp/prog.convert.from.csv.htm

Feed the routine one Excel line at a time, without the trailing CR:LF
Each Excel line is converted to an array delimited by @FM (char(254)).
The routine properly handles cells that have double quotes and commas
as part of their value such as "X,Y,Z" (with double quotes as part of
the value).
Here is a test:
EQU CR TO CHAR(13) ;* Carriage Return
EQU LF TO CHAR(10) ;* Line Feed

A = "123,'ABC'":CR:LF:',"""X,Y,Z"""':CR:LF ;* CSV value
NRI = COUNT(A,CR)
FOR I = 1 TO NRI
B = FIELD(A,CR:LF,I)
CALL CONVERT.FROM.CSV(B,C)
CRT 'Line ':I:' ':C
NEXT I
END

Results:
Line 1 123þ'ABC'
Line 2 þ"X,Y,Z"

Tony Gravagno

unread,

Dec 19, 2009, 2:54:22 PM12/19/09

to

See new PickWiki page which references this thread:
http://www.pickwiki.com/cgi-bin/wiki.pl?CSV
Updates welcome - that was a fast post just to show how this process
works.

Tony Gravagno
Nebula Research and Development
TG@ remove.pleaseNebula-RnD.com
remove.pleaseNebula-RnD.com/blog
Visit PickWiki.com! Contribute!
http://Twitter.com/TonyGravagno

Rex Gozar

unread,

Dec 22, 2009, 9:35:52 AM12/22/09

to

Glen,

I did some testing on your program and it appears that it doesn't work
when the first CSV value is quoted. You may want to double check the
code to make sure it works on the first and last CSV values.

rex

Mark Fuller

unread,

Dec 22, 2009, 10:00:36 AM12/22/09

to

All

You could use Reality which has this in built as standard. Just check
out CSV-VIEW. This has the ability to effectively have a QPTR which
allows updates/retrieval from a CSV file as if it's a standard Reality
file - cool..

Happy Holidays everyone

Mark Fuller

Tony Gravagno

unread,

Dec 22, 2009, 2:53:14 PM12/22/09

to

Mark Fuller wrote:

>You could use Reality which has this in built as standard. Just check
>out CSV-VIEW. This has the ability to effectively have a QPTR which
>allows updates/retrieval from a CSV file as if it's a standard Reality
>file - cool..

Yes Mark, very cool, and a feature that our friends at RD/TL
repeatedly insist has no value even though people ask for it and at
least one competitor apparently does understand the value. I'll note
that with jBase and QM we can do this without requiring the vendor to
build it in. D3 supports the "Open Systems File Interface" (OSFI) but
since they don't publish the interface there's nothing Open about it,
so this potentially amazing functionality goes completely unused for
all sorts of applications like this. Their loss, your gain. I doubt
someone is going to change their DBMS to read CSV into a dynamic
array, but I encourage you to keep posting tidbits like this so that
people see a regular stream of things they can do with Reality, and
that might turn some heads.

Regards,
T

Scott Ballinger

unread,

Dec 22, 2009, 3:30:55 PM12/22/09

to

Here is my (very simple [D3]) version.
/Scott Ballinger
Pareto Corporation
Edmonds WA USA
206 713 6006

csv.parser
001 subroutine csv.parser(string0,rec)
002 * 11-20-00 asb
003 * parse comma separated value record watching out for quotes ("")
004 * attribute delimited rec is returned
005
006 string = string0 ;* don't return changed string
007
008 call clean.sub(string)
009 **subroutine clean.sub(string)
010 *** remove bad characters from a string/record/whatever
011 **bad.chars = ""
012 **for n = 0 to 31
013 ** bad.chars := char(n)
014 **next n
015 **for n = 128 to 250
016 ** bad.chars := char(n)
017 **next n
018 **convert bad.chars to "" in string
019 **return
020
021 rec = ""
022 an = 1
023 loop
024 if string[1,1] ne '"' then
025 result = field(string,",",1)
026 string = string[col2()+1,999999]
027 end else
028 pos = index(string,'",',1)
029 if pos eq 0 then pos = len(string)
030 result = string[2,pos-2]
031 string = string[pos+2,999999]
032 end
033 result = swap(result,'""','"')
034 rec<an> = trim(result)
035 until string eq "" do
036 an +=1
037 repeat
038
039 return

Message has been deleted

x

unread,

Dec 22, 2009, 7:50:23 PM12/22/09

to

On Dec 22, 3:30 pm, Scott Ballinger <scott.ballin...@gmail.com> wrote:
> Here is my (very simple [D3]) version.
> /Scott Ballinger

Scott, there is a bug somewhere.
Example:
cell 1 123
cell 2 "A,B",C
cell 3 XYZ
CSV data: 123,"""A,B"",C",XYZ
Results
correct : 123þ"A,B",CþXYZ
csv.parser: 123þ"A,B"þC"þXYZ

x

unread,

Dec 22, 2009, 9:40:36 PM12/22/09

to

When dealing with cells that have quotes only, CSV2FIELDS adds an
extra quote.

For the same test data, CSV2ARRAY gives erroneous output with the
option set to 'N' as well as the option set to null.
Also, CSV2ARRAY has at line 63 the unassigned variable "err" (opts =
err).

cell 1 "
cell 2 ,
cell 3 ""
cell 4 ,,
cell 5 """
cell 6 ,,,

csv data : """",",","""""",",,","""""""",",,,"

correct : "þ,þ""þ,,þ"""þ,,,

csv2fields: ""þ,þ"""þ,,þ""""þ,,,

csv2array
(opts='N'): "þþþ""þþþþ""""þþþþ
(opts='') : þþþþþþþ"þþþþ

Tony Gravagno

unread,

Dec 23, 2009, 1:33:10 AM12/23/09

to

Lucian - I'm not quite sure if that was directed to me since it was a
response to my note about the PickWiki CSV page. If you have a
contribution for the page, fix to code, comment for the page, etc,
please post it to the page. Registering as a contributor only takes a
minute. I'm happy to insert new content if required, but I can't take
time to debug programs that people post. It is, after all, a
community resource. Sorry if I misunderstood your intent.

Tony Gravagno
Nebula Research and Development
TG@ remove.pleaseNebula-RnD.com

Nebula R&D sells mv.NET and other Pick/MultiValue products
worldwide, and provides related development services

remove.pleaseNebula-RnD.com/blog
Visit PickWiki.com! Contribute!
http://Twitter.com/TonyGravagno

lucian wrote:

>When dealing with cells that have quotes only, CSV2FIELDS adds an
>extra quote.
>
>For the same test data, CSV2ARRAY gives erroneous output with the
>option set to 'N' as well as the option set to null.
>Also, CSV2ARRAY has at line 63 the unassigned variable "err" (opts =
>err).
>
>cell 1 "
>cell 2 ,
>cell 3 ""
>cell 4 ,,
>cell 5 """
>cell 6 ,,,
>
>csv data : """",",","""""",",,","""""""",",,,"
>

>correct : "�,�""�,,�"""�,,,
>
>csv2fields: ""�,�"""�,,�""""�,,,
>
>csv2array
>(opts='N'): "��""��""""��
>(opts='') : ��"��

x

unread,

Dec 23, 2009, 3:06:05 AM12/23/09

to

On Dec 23, 1:33 am, Tony Gravagno

<address.is.in.po...@removethis.com.invalid> wrote:
> Lucian - I'm not quite sure if that was directed to me since it was a
> response to my note about the PickWiki CSV page.

Sorry Tony,
I didn't mean to reply to anyone in particular - I guess I've clicked
the "Reply" button at the bottom of the page without looking.
I did some testing and posted the results so the authors can fix
whatever is to fix.
It's always better to have someone else test your code, to help smooth
the wrinkles if any.
Lucian

Rex Gozar

unread,

Dec 23, 2009, 3:01:42 PM12/23/09

to

On Dec 22, 9:40 pm, x <lucian_p...@yahoo.com> wrote:
> When dealing with cells that have quotes only, CSV2FIELDS adds an
> extra quote.

CSV2FIELDS now fixed to only convert "escaped" quotes within a quoted
cell.

rex

Scott Ballinger

unread,

Dec 28, 2009, 12:14:08 PM12/28/09

to

Dude, note the "very simple" preface.
In the (unlikely) case of a data cell containing an adjacent quote and
a comma, one fix would be to convert the double quote marks (produced
by the csv rules converting quote to quote quote) to something that
can't occur in the data set. I choose to convert quote quote to char
(7) [already removed by the clean.sub call]. Then at the end convert
char(7) back to (single) quote.

Here is the revised csv.parser that works with an embedded quote +
comma. Any other special cases where this doesn't work?

/Scott

csv.parser
001 subroutine csv.parser(string0,rec)
002 * 11-20-00 asb

003 * 12-26-09 asb: [1] Handle case of embedded quote+comma in data
cell.
004 * parse comma separated value record watching out for quotes ("")
005 * attribute delimited rec is returned
006
007 string = string0 ;* don't return changed string
008
009 call clean.sub(string)
010 **subroutine clean.sub(string)
011 *** remove bad characters from a string/record/whatever
012 **bad.chars = ""
013 **for n = 0 to 31
014 ** bad.chars := char(n)
015 **next n
016 **for n = 128 to 250
017 ** bad.chars := char(n)
018 **next n
019 **convert bad.chars to "" in string
020 **return
021
022 bell = char(7) ;* [1] removed by clean.sub if any
previously existed
023 string = swap(string,'""',bell) ;* single " converted to
double "" by c
sv rules
024 rec = ""
025 an = 1
026 loop
027 if string[1,1] ne '"' then
028 result = field(string,",",1)
029 string = string[col2()+1,999999]
030 end else
031 pos = index(string,'",',1)
032 if pos eq 0 then pos = len(string)
033 result = string[2,pos-2]
034 string = string[pos+2,999999]
035 end
036 rec<an> = trim(result)
037 until string eq "" do
038 an +=1
039 repeat
040
041 rec = swap(rec,bell,'"') ;* [1] convert back to single " as
appropriate
042
043 return

Message has been deleted

x

unread,

Dec 28, 2009, 9:24:27 PM12/28/09

to

> Dude, note the "very simple" preface.

Dude, I've seen the preface, thank you.

> In the (unlikely) case of a data cell containing an adjacent quote and

> a comma, ...
It's wishful thinking to assume that you cannot have quotes and commas
in the same cell.
CSV is for text data interchange and that includes sentences with
quotes and commas and some times even CR:LF therefore you should not
"clean" the embedded CR:LF.
For reference see http://www.rfc-editor.org/rfc/rfc4180.txt
If all you have are well formed numbers than you don't need a
subroutine to do the job, just do a
CONVERT ",":CR:LF TO @VM:@FM IN REC

> Here is the revised csv.parser that works with an embedded quote +
> comma. Any other special cases where this doesn't work?

Fixed. No testing ?
As far as testing goes, here are the results based on the test data
shown in the e-mail thread.

cell 1 123
cell 2 "A,B",C
cell 3 XYZ

CSV data: 123,"""A,B"",C",XYZ

Results
correct : 123þ"A,B",CþXYZ

csv.parser old : 123þ"A,B"þC"þXYZ
csv.parser new : 123þ""AþB"þC"þXYZ

cell 1 "
cell 2 ,
cell 3 ""
cell 4 ,,
cell 5 """
cell 6 ,,,

CSV data : """",",","""""",",,","""""""",",,,"

Results
correct : "þ,þ""þ,,þ"""þ,,,
csv.parser old : "þþþ""þþþþ"""þþþþ
csv.parser new : ""þþþ"""þþþþ""""þþþþ

Tony Gravagno

unread,

Dec 30, 2009, 1:24:56 AM12/30/09

to

Ahh the wonderful world of open source where no good contribution goes
unpunished. :)

My recommendation is that the code get posted to PickWiki (or some
other site for public editing like http://cl1p.net). From there it
can be modified, everyone can see the latest version, and those who
are interested can beat it up until it's of a quality that's
acceptable to all.

T