Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to handle comma in the fields winthin a csv file using awk

317 views
Skip to first unread message

pH

unread,
May 17, 2002, 9:39:12 AM5/17/02
to
Hi All,

I have a csv datafile. Some of the fields have commas within the
field enlosed by double quotes. Example smith,"232 herhey dr,
MA",Pleasantville
How can I get awk to ignore commas within the two double quotes? I
already have the datafile and I cannot do much in changing the field
separator there...

Appreciate any help
pH

Brian Inglis

unread,
May 17, 2002, 10:48:24 AM5/17/02
to
On 17 May 2002 06:39:12 -0700, pha...@leasedirect.com (pH)
wrote:

It doesn't appear to be in the FAQ, but awk is not the ideal
approach for handling CSV files: Google for previous discussions
about this; but awk is great for handling tab delimited files!

--

Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply

tos...@aol.com ab...@aol.com ab...@yahoo.com ab...@hotmail.com ab...@msn.com ab...@sprint.com ab...@earthlink.com ab...@cadvision.com ab...@ibsystems.com u...@ftc.gov
spam traps

Adrian Davis

unread,
May 17, 2002, 11:02:16 AM5/17/02
to

If you are interested in parsing CSV files in AWK I suggest you check
out "setcsv" which can be found at:-

http://www.deja.com/=dnc/getdoc.xp?AN=603309980

...it will even parse CSV files produced by Micosoft Excell, which
allows for newlines to be embedded in quoted fields!!

Regards,
=Adrian=


Dan Haygood

unread,
May 17, 2002, 1:15:51 PM5/17/02
to
awk doesn't do CSV, although, with changing times, it should, intriniscally.
There are a couple of suggestions for things like FPAT can get you to where
awk could handle csv, so look through old postings on Google's group search.

Otherwise, you're stuck with writing your own line-to-field processor that
knows about how your fields are delimited, how embedded quotes and commas
are handled, and so forth. In one such case, I was able to take advantage
of the fact that all fields were quoted, and that there was no whitespace
outside of quoted fields. I was able to take
"A","B"
and add quote-comma to the start, and comma-quote to the end.
","A","B","
Then I was able to break it apart on quote-comma-quote, using split(s, a,
/","/). This yeilds an array of four elements. The first and last are
blank.
{
line = "\"," $0 ",\""
n = split(line,array,/","/)
for (i = 2; i < n; i++)
print i ": " array[i]
}

And the embedded commas are now safe, unless they fall in between
Pascal-style quoted double-quotes. Pascal and Microsoft quote quotes by
doubling them. For instance, this would hose it:
"A","""B"",""C"""
which would be
A
"B","C"
but would be incorrectly interpretted by my code as
A
""B"
"C""

- Dan


"pH" <pha...@leasedirect.com> wrote in message
news:7ef63d4b.02051...@posting.google.com...

Christopher Hamel

unread,
May 17, 2002, 1:19:31 PM5/17/02
to
Perl also has a really nice module (Text::CSV) for working with CSV
files, and a2p will convert an AWK program to Perl pretty nicely,
minimzing the amount of work you will have to do (conceptually, at
least).

I've used Text::CSV a lot, especially in dealing with output from VBA.
Saves a lot of effort...

Disclaimer: I am not blasting AWK. In fact, I like it. Sometimes, a
screwdriver simply works better than a hammer.

Can't we all just get along?

laura fairhead

unread,
May 17, 2002, 1:43:21 PM5/17/02
to
On 17 May 2002 06:39:12 -0700, pha...@leasedirect.com (pH) wrote:

>Hi All,
>
>I have a csv datafile. Some of the fields have commas within the
>field enlosed by double quotes. Example smith,"232 herhey dr,
>MA",Pleasantville
>How can I get awk to ignore commas within the two double quotes? I

There is no way to do this with 'FS' because you would need an 'assertion'
operator like perl '(?=...)' and '(?!...)' in order to scan context ahead of
the seperator but not count the scanned area. Awk extended regular expressions
don't allow you to perform this operation so it's not possible with FS.
The meaning of the comma cannot be otherwise elucidated with FS; it's context
dependent.

>already have the datafile and I cannot do much in changing the field
>separator there...

Just use a routine to parse the fields into an array using a loop
and regular expression which matches a complete CSV field;

{
sub(/^[^"]+/,"") # skip non-quotes between fields
for(f=0;match($0,/"(""|[^"]*)*"/);sub(/^[^"]+/,"")) # parse fields in loop
{
fld[++f]=substr($0,RSTART+1,RLENGTH-2) # next field into fld[]
gsub(/""/,"\"",fld[f]) # change "" meta character to literal "
$0=substr($0,RSTART+RLENGTH,9999) # chop off field and loop
}
}


This example parses $0 as a CSV record putting the fields into the array fld[]
variable 'f' counts the number of fields extracted, index of the first field
is 1 like the split() function. Double quotes in the field are changed to
single ones and surrounding quotes are clipped off, this is under the assumption
you want to access the actual field data; modify the program as desired
if you do not. Change OFS and/or transfer the array back into $1/$2/.. to use
fields as 'awk' fields...


>
>Appreciate any help
>pH

byefrom

--
laura fairhead # la...@madonnaweb.com http://lf.8k.com
# if you are bored crack my sig.
1F8B0808CABB793C0000666667002D8E410E83300C04EF91F2877D00CA138A7A
EAA98F30C494480157B623C4EF1B508FDED1CEFA9152A23DE35D661593C5318E
630C313CD701BE92E390563326EE17A3CA818F5266E4C2461547F1F5267659CA
8EE2092F76C329ED02CA430C5373CC62FF94BAC6210B36D9F9BC4AB53378D978
80F2978A1A6E5D6F5133B67B6113178DC1059526698AFE5C17A5187E7D930492

Damon

unread,
May 17, 2002, 3:08:59 PM5/17/02
to
Brian Inglis <Brian....@SystematicSw.ab.ca> wrote in message news:<ku5aeug74ap54tn8p...@4ax.com>...

> It doesn't appear to be in the FAQ, but awk is not the ideal
> approach for handling CSV files: Google for previous discussions
> about this; but awk is great for handling tab delimited files!
>

What *would* be a better approach?

Phil Bewig

unread,
May 19, 2002, 7:25:31 PM5/19/02
to
Brian Inglis <Brian....@SystematicSw.ab.ca> wrote in message news:<ku5aeug74ap54tn8p...@4ax.com>...
> On 17 May 2002 06:39:12 -0700, pha...@leasedirect.com (pH)
> wrote:
>
> >Hi All,
> >
> >I have a csv datafile. Some of the fields have commas within the
> >field enlosed by double quotes. Example smith,"232 herhey dr,
> >MA",Pleasantville
> >How can I get awk to ignore commas within the two double quotes? I
> >already have the datafile and I cannot do much in changing the field
> >separator there...
>
> It doesn't appear to be in the FAQ, but awk is not the ideal
> approach for handling CSV files: Google for previous discussions
> about this; but awk is great for handling tab delimited files!

I agree that awk is not the proper tool for parsing csv files. Look
at
this message for my program that translates csv files to tab-separated
files, which can be easily parsed by awk:
http://groups.google.com/groups?q=csv.c&hl=en&lr=lang_en&ie=UTF8&oe=UTF8&newwindow=1&selm=455f7154.0203270634.58f33cf0%40posting.google.com&rnum=1.
It was posted to comp.lang.awk in
March 2002.

Watson Davis

unread,
May 24, 2002, 2:06:56 PM5/24/02
to

Wow.

Thanks.

Watson (the ninja of nice) Davis

0 new messages