*
* (c)2009 - Glen Batchelor
* License: public domain, free to modify and distribute
* Please retain this information in all distributions
*
SUBROUTINE CSV.PARSER(STRING,ARRAY)
$OPTIONS EXT
ARRAY = STRING
*
* CARRIAGE RETURNS SHOULD NOT EXIST INSIDE QUOTED CELLS SO WE
CAN CONVERT QUOTED CELLS TO CR DELIMITED DATA
* NOTE: LINE FEEDS/BREAKS MAY EXIST INSIDE QUOTED CELLS SO YOU
MAY WANT TO LOOK FOR CHAR(10) AND CONVERT
* ACCORDINGLY IN YOUR APPLICATION.
*
ARRAY = SWAP(ARRAY,',"',CHAR(13))
ARRAY = SWAP(ARRAY,'",',CHAR(13))
*
* GET RID OF "ESCAPED" QUOTES SINCE WE AREN'T LOOKING AT QUOTES
AT ALL NOW
*
ARRAY = SWAP(ARRAY,'""','"')
STRLEN = LEN(ARRAY)
*
* INDEX OUR CR DELIMITED CELLS OF COMMA-CONTAINING DATA AND MAP
THE START/END CHAR LOCATIONS
*
IGNORE.RANGES = ''
RANGEPTR = 0
CELLACTIVE = 0
FOR PTR = 1 TO STRLEN
IF ARRAY[PTR,1] = CHAR(13) AND CELLACTIVE = 0 THEN
CELLACTIVE = 1
RANGEPTR = RANGEPTR + 1
IGNORE.RANGES<RANGEPTR,1> = PTR
GO SCAN.NEXT.CHAR
END
IF ARRAY[PTR,1] = CHAR(13) AND CELLACTIVE = 1 THEN
CELLACTIVE = 0
IGNORE.RANGES<RANGEPTR,2> = PTR
END
SCAN.NEXT.CHAR:
NEXT PTR
*
* CONVERT COMMAS TO CHAR(254), SKIPPING ALL THE IGNORED RANGES
FOR EMBEDDED COMMAS
*
DEL.CNT = 0
FOR PTR = 1 TO STRLEN
* SEE IF THE CURRENT CHAR SHOULD BE EXCLUDED FROM POSSIBLE
COMMA-TO-CHAR(254) CONVERSION
GOSUB CHECK.RANGE
IF IGNORE.CHAR = 1 THEN
IF ARRAY[PTR,1] = CHAR(13) AND DEL.CNT = 0 THEN
NEWARRAY = ARRAY[1,PTR-1]:CHAR(254):ARRAY[PTR+1,STRLEN]
ARRAY = NEWARRAY
PTR = PTR - 1
STRLEN = LEN(ARRAY)
DEL.CNT = 1
END
IF ARRAY[PTR,1] = CHAR(13) AND DEL.CNT = 1 THEN
NEWARRAY = ARRAY[1,PTR-1]:CHAR(254):ARRAY[PTR+1,STRLEN]
ARRAY = NEWARRAY
PTR = PTR - 1
STRLEN = LEN(ARRAY)
DEL.CNT = 0
END
END ELSE
IF ARRAY[PTR,1] = "," THEN ARRAY[PTR,1] = CHAR(254)
END
NEXT PTR
RETURN
*
* IS THE CURRENT CHAR INSIDE A COMMA-CONTAINING DATA CELL?
*
CHECK.RANGE:
IGNORE.CHAR = 0
FOR RPTR = 1 TO RANGEPTR
IF PTR >= IGNORE.RANGES<RPTR,1> AND PTR <= IGNORE.RANGES<RPTR,
2> THEN
IGNORE.CHAR = 1
END
NEXT RPTR
RETURN
Use this one:
http://home.primus.ca/~lxp/prog.convert.from.csv.htm
Feed the routine one Excel line at a time, without the trailing CR:LF
Each Excel line is converted to an array delimited by @FM (char(254)).
The routine properly handles cells that have double quotes and commas
as part of their value such as "X,Y,Z" (with double quotes as part of
the value).
Here is a test:
EQU CR TO CHAR(13) ;* Carriage Return
EQU LF TO CHAR(10) ;* Line Feed
A = "123,'ABC'":CR:LF:',"""X,Y,Z"""':CR:LF ;* CSV value
NRI = COUNT(A,CR)
FOR I = 1 TO NRI
B = FIELD(A,CR:LF,I)
CALL CONVERT.FROM.CSV(B,C)
CRT 'Line ':I:' ':C
NEXT I
END
Results:
Line 1 123þ'ABC'
Line 2 þ"X,Y,Z"
Tony Gravagno
Nebula Research and Development
TG@ remove.pleaseNebula-RnD.com
remove.pleaseNebula-RnD.com/blog
Visit PickWiki.com! Contribute!
http://Twitter.com/TonyGravagno
I did some testing on your program and it appears that it doesn't work
when the first CSV value is quoted. You may want to double check the
code to make sure it works on the first and last CSV values.
rex
You could use Reality which has this in built as standard. Just check
out CSV-VIEW. This has the ability to effectively have a QPTR which
allows updates/retrieval from a CSV file as if it's a standard Reality
file - cool..
Happy Holidays everyone
Mark Fuller
>You could use Reality which has this in built as standard. Just check
>out CSV-VIEW. This has the ability to effectively have a QPTR which
>allows updates/retrieval from a CSV file as if it's a standard Reality
>file - cool..
Yes Mark, very cool, and a feature that our friends at RD/TL
repeatedly insist has no value even though people ask for it and at
least one competitor apparently does understand the value. I'll note
that with jBase and QM we can do this without requiring the vendor to
build it in. D3 supports the "Open Systems File Interface" (OSFI) but
since they don't publish the interface there's nothing Open about it,
so this potentially amazing functionality goes completely unused for
all sorts of applications like this. Their loss, your gain. I doubt
someone is going to change their DBMS to read CSV into a dynamic
array, but I encourage you to keep posting tidbits like this so that
people see a regular stream of things they can do with Reality, and
that might turn some heads.
Regards,
T
csv.parser
001 subroutine csv.parser(string0,rec)
002 * 11-20-00 asb
003 * parse comma separated value record watching out for quotes ("")
004 * attribute delimited rec is returned
005
006 string = string0 ;* don't return changed string
007
008 call clean.sub(string)
009 **subroutine clean.sub(string)
010 *** remove bad characters from a string/record/whatever
011 **bad.chars = ""
012 **for n = 0 to 31
013 ** bad.chars := char(n)
014 **next n
015 **for n = 128 to 250
016 ** bad.chars := char(n)
017 **next n
018 **convert bad.chars to "" in string
019 **return
020
021 rec = ""
022 an = 1
023 loop
024 if string[1,1] ne '"' then
025 result = field(string,",",1)
026 string = string[col2()+1,999999]
027 end else
028 pos = index(string,'",',1)
029 if pos eq 0 then pos = len(string)
030 result = string[2,pos-2]
031 string = string[pos+2,999999]
032 end
033 result = swap(result,'""','"')
034 rec<an> = trim(result)
035 until string eq "" do
036 an +=1
037 repeat
038
039 return
Scott, there is a bug somewhere.
Example:
cell 1 123
cell 2 "A,B",C
cell 3 XYZ
CSV data: 123,"""A,B"",C",XYZ
Results
correct : 123þ"A,B",CþXYZ
csv.parser: 123þ"A,B"þC"þXYZ
For the same test data, CSV2ARRAY gives erroneous output with the
option set to 'N' as well as the option set to null.
Also, CSV2ARRAY has at line 63 the unassigned variable "err" (opts =
err).
cell 1 "
cell 2 ,
cell 3 ""
cell 4 ,,
cell 5 """
cell 6 ,,,
csv data : """",",","""""",",,","""""""",",,,"
correct : "þ,þ""þ,,þ"""þ,,,
csv2fields: ""þ,þ"""þ,,þ""""þ,,,
csv2array
(opts='N'): "þþþ""þþþþ""""þþþþ
(opts='') : þþþþþþþ"þþþþ
Tony Gravagno
Nebula Research and Development
TG@ remove.pleaseNebula-RnD.com
Nebula R&D sells mv.NET and other Pick/MultiValue products
worldwide, and provides related development services
remove.pleaseNebula-RnD.com/blog
Visit PickWiki.com! Contribute!
http://Twitter.com/TonyGravagno
lucian wrote:
>When dealing with cells that have quotes only, CSV2FIELDS adds an
>extra quote.
>
>For the same test data, CSV2ARRAY gives erroneous output with the
>option set to 'N' as well as the option set to null.
>Also, CSV2ARRAY has at line 63 the unassigned variable "err" (opts =
>err).
>
>cell 1 "
>cell 2 ,
>cell 3 ""
>cell 4 ,,
>cell 5 """
>cell 6 ,,,
>
>csv data : """",",","""""",",,","""""""",",,,"
>
>correct : "�,�""�,,�"""�,,,
>
>csv2fields: ""�,�"""�,,�""""�,,,
>
>csv2array
>(opts='N'): "���""����""""����
>(opts='') : �������"����
Sorry Tony,
I didn't mean to reply to anyone in particular - I guess I've clicked
the "Reply" button at the bottom of the page without looking.
I did some testing and posted the results so the authors can fix
whatever is to fix.
It's always better to have someone else test your code, to help smooth
the wrinkles if any.
Lucian
CSV2FIELDS now fixed to only convert "escaped" quotes within a quoted
cell.
rex
Here is the revised csv.parser that works with an embedded quote +
comma. Any other special cases where this doesn't work?
/Scott
csv.parser
001 subroutine csv.parser(string0,rec)
002 * 11-20-00 asb
003 * 12-26-09 asb: [1] Handle case of embedded quote+comma in data
cell.
004 * parse comma separated value record watching out for quotes ("")
005 * attribute delimited rec is returned
006
007 string = string0 ;* don't return changed string
008
009 call clean.sub(string)
010 **subroutine clean.sub(string)
011 *** remove bad characters from a string/record/whatever
012 **bad.chars = ""
013 **for n = 0 to 31
014 ** bad.chars := char(n)
015 **next n
016 **for n = 128 to 250
017 ** bad.chars := char(n)
018 **next n
019 **convert bad.chars to "" in string
020 **return
021
022 bell = char(7) ;* [1] removed by clean.sub if any
previously existed
023 string = swap(string,'""',bell) ;* single " converted to
double "" by c
sv rules
024 rec = ""
025 an = 1
026 loop
027 if string[1,1] ne '"' then
028 result = field(string,",",1)
029 string = string[col2()+1,999999]
030 end else
031 pos = index(string,'",',1)
032 if pos eq 0 then pos = len(string)
033 result = string[2,pos-2]
034 string = string[pos+2,999999]
035 end
036 rec<an> = trim(result)
037 until string eq "" do
038 an +=1
039 repeat
040
041 rec = swap(rec,bell,'"') ;* [1] convert back to single " as
appropriate
042
043 return
> In the (unlikely) case of a data cell containing an adjacent quote and
> a comma, ...
It's wishful thinking to assume that you cannot have quotes and commas
in the same cell.
CSV is for text data interchange and that includes sentences with
quotes and commas and some times even CR:LF therefore you should not
"clean" the embedded CR:LF.
For reference see http://www.rfc-editor.org/rfc/rfc4180.txt
If all you have are well formed numbers than you don't need a
subroutine to do the job, just do a
CONVERT ",":CR:LF TO @VM:@FM IN REC
> Here is the revised csv.parser that works with an embedded quote +
> comma. Any other special cases where this doesn't work?
Fixed. No testing ?
As far as testing goes, here are the results based on the test data
shown in the e-mail thread.
cell 1 123
cell 2 "A,B",C
cell 3 XYZ
CSV data: 123,"""A,B"",C",XYZ
Results
correct : 123þ"A,B",CþXYZ
csv.parser old : 123þ"A,B"þC"þXYZ
csv.parser new : 123þ""AþB"þC"þXYZ
cell 1 "
cell 2 ,
cell 3 ""
cell 4 ,,
cell 5 """
cell 6 ,,,
CSV data : """",",","""""",",,","""""""",",,,"
Results
correct : "þ,þ""þ,,þ"""þ,,,
csv.parser old : "þþþ""þþþþ"""þþþþ
csv.parser new : ""þþþ"""þþþþ""""þþþþ
My recommendation is that the code get posted to PickWiki (or some
other site for public editing like http://cl1p.net). From there it
can be modified, everyone can see the latest version, and those who
are interested can beat it up until it's of a quality that's
acceptable to all.
T