CSV Parser

784 views
Skip to first unread message

Paul Borman

unread,
Jun 29, 2011, 1:33:11 PM6/29/11
to golang-dev
I would like to offer the CSV parser I have written for possible inclusion into the standard Go library.  I realize that there is no official standard for CSV files which makes CSV parsing problematical.  For guidance I followed RFC4180, which defines the CSV MIME type.  I wrote the module to handle a conservative set of CSV files used internally but attempted to make the package more broadly useful (for example, we have no need for multi-line fields).

Below is the output of "godoc csv"

    -Paul

PACKAGE

package csv
import "csv"

Package csv reads comma-separated values (CSV) files.

A csv file contains zero or more records of one or more fields per record.
Each record is separated by the newline character. The final record may
optionally be followed by a newline character.

field1,field2,field3

White space is considered part of a field.

Blank lines are ignored.  A line with only whitespace characters (excluding
the ending newline character) is not considered a blank line.

Fields which start and stop with the quote character " are called
quoted-fields.  The beginning and ending quote are not part of the
field.

The source:

normal string,"quoted-field"

results in the fields

·normal string·quoted-field·

Within a quoted-field a quote character followed by a second quote
character is considered a single quote.

"the ""word"" is true","a ""quoted-field"""

results in

·the "word" is true·a "quoted-field"·

Newlines and commas may be included in a quoted-field

"Multi-line
field","comma is ,"

results in

·Multi-line
field·comma is ,·


VARIABLES

var (
    ErrTrailingComma = &Error{"extra delimiter at end of line"}
    ErrBareQuote     = &Error{"bare \" in non-quoted-field"}
    ErrQuote         = &Error{"extraneous \" in field"}
    ErrFieldCount    = &Error{"wrong number of fields in line"}
)
These are the errors that can be returned in ParseError.Error


FUNCTIONS

func ReadFile(path string) (records [][]string, err os.Error)
ReadFile reads and returns all the records from the file path.


TYPES

type Error struct {
    os.ErrorString
}

type ParseError struct {
    Line   int      // Line where the error occurred
    Column int      // Column (byte index) where the error occurred
    Error  os.Error // The actual error
}
A ParseError is returned for parsing errors.
The first line is 1.  The first column is 0.

func (e *ParseError) String() string

type Reader struct {
    RFC4180         bool // Enforce RFC 4180 restrictions
    Comma           int  // Field delimiter (set to to ',' by NewReader)
    FieldsPerRecord int  // Number of expected fields per record (optional)
    // contains filtered or unexported fields
}
A Reader reads records from a CSV encoded file.

A Reader is created using NewReader.  Individual records are read from a
Reader using Read.  ReadAll will read all remaining records

The behavior of the Reader can be altered through its public elements.

The Comma element determines the field delimiter.  It defaults to ','.

If the FieldsPerRecord is greater than zero, Read requires each record to
have the given number of fields.

If RFC4180 is true, Read requires the input to conform to the restrictions
made by RFC 4180:

  - If FieldsPerRecord is 0, Read sets it to the number of fields
    in the first record and then requires future records to have the
    same field count.
  - A quote must not appear in an unquoted field.
  - A non-doubled quote must not appear in a quoted field.
  - A record must not end in an unquoted empty field.

If RFC4180 is false, Read ignores lines that begin with #.

func NewReader(r io.Reader) *Reader
NewReader returns a new Reader that reads from r.

func (r *Reader) Read() (record []string, err os.Error)
Read reads one record from r.  The record is a slice of strings with each
string representing one field.

func (r *Reader) ReadAll() (records [][]string, err os.Error)
ReadAll reads all the remaining records from f as a slice of records.
Each record in the slice is represented as a slice of strings with each
string representing one field.

peterGo

unread,
Jun 29, 2011, 6:29:02 PM6/29/11
to golang-dev
Paul,

[golang-dev] CSV library?
http://groups.google.com/group/golang-nuts/browse_thread/thread/a7da7b1719be5631/

CSV Package
http://groups.google.com/group/golang-nuts/browse_thread/thread/686db3f92a00df15/

Have you tested you program with all versions of Microsoft Office,
OpenOffice, LibreOffice, etc?

Peter

Russ Cox

unread,
Jun 29, 2011, 6:34:56 PM6/29/11
to peterGo, golang-dev
> Have you tested you program with all versions of Microsoft Office,
> OpenOffice, LibreOffice, etc?

As long as it implements the standard (the RFC)
I am happy to let others deal with testing against
the infinity of CSV generators.

Reply all
Reply to author
Forward
0 new messages