[go-nuts] Re: [golang-dev] CSV library?

380 views
Skip to first unread message

Giles Lean

unread,
Apr 29, 2010, 12:28:27 AM4/29/10
to Kyle Consalus, golan...@googlegroups.com, golan...@googlegroups.com

[ Probably belongs on golang-nuts? cc'd there and Reply-to: set ]

Kyle Consalus <cons...@gmail.com> wrote:

> Is there an interest in having a simple CSV package in the Go standard
> packages?

Sure, if not _too_ simple: to make the standard packages (where
the Go team are starting to push back about additions) I'd like
to see it handle quoting etc so that it can be _the_ CSV package.

So:

Yes, if it's good enough (reading and writing) that nobody's
likely to feel the need to write another one;

No, if there are known popular or semi-popular CSV variants out there
that aren't catered for.

And of course, I speak only for me, not for the Powers that Be
who decide what gets included.

Good luck,

Giles

hotei

unread,
Apr 29, 2010, 12:45:46 AM4/29/10
to golang-nuts
Kyle,
You might get some responses if you included a little more information
about what it is you're proposing. Provide an interfaces spec maybe?
What 'public' functions you're intending to provide? A sentence or
two about who might be interested in using it. As proposed I'm not
sure if you're into spreadsheets or particle physics.

Hotei

Kyle Consalus

unread,
Apr 29, 2010, 2:37:15 AM4/29/10
to hotei, golang-nuts
Sure. I was going to just present the code, but I wanted to check if it was actually something of interest first.

The current interface: 

package csv

type Reader struct{}

func NewReader(io.ReadByter) *Reader
func (*Reader) ReadRow() ([]string, os.Error)
func ReadAll(io.ReadByter) ([][]string, os.Error)

type Writer struct{}

func NewWriter(io.Writer) *Writer
func (*Writer) WriteRow([]string) os.Error
func WriteAll(io.Writer, [][]string) os.Error


It looks at each byte of the input once, and it avoids most all
unnecessary allocations. Quoting and whitespace removal are handled properly.

There is the matter of supporting different dialects (excel, for example), and I would probably want to add a settings struct to handle that, but I wanted to get some validation on the basic design before I did that.




Hotei


fge...@gmail.com

unread,
Apr 29, 2010, 4:04:42 PM4/29/10
to golang-nuts
I would think, that relevant C or C++ code from tpop book might be useful.
http://cm.bell-labs.com/cm/cs/tpop/code.html

Andrew Gerrand

unread,
Apr 29, 2010, 7:51:43 PM4/29/10
to Kyle Consalus, hotei, golang-nuts
The interface looks pretty sane to me. I'd like to see the underlying
implementation.

Does CSV usually include unicode? I suppose it's an ill- (or un-)
specified format, so it probably supports UTF8 by default.

Russ Cox

unread,
Apr 29, 2010, 8:09:38 PM4/29/10
to Andrew Gerrand, Kyle Consalus, hotei, golang-nuts
This would be a great package to make available via goinstall.
The interface seems fine, though the word Row seems unnecessary.
But the real problem is what does CSV mean?

I wrote a program a few years ago to accept data from
a commercial application and it seemed easiest to have
the user export to CSV and give me that. After the umpteenth
time it broke because there was yet another special case
I didn't know about in the CSV, I gave up. We changed
the process to be export to Excel instead. Parsing .xls files
was far easier, or at least more well defined.

Russ

Kyle Consalus

unread,
Apr 30, 2010, 1:43:14 PM4/30/10
to Andrew Gerrand, hotei, golang-nuts
On Thu, Apr 29, 2010 at 4:51 PM, Andrew Gerrand <a...@golang.org> wrote:
The interface looks pretty sane to me. I'd like to see the underlying
implementation.

Current implementation visible at code.google.com/p/gocsv/source/browse/csv.go.
Hasn't been tested heavily, but works with all of the CSV I've needed to parse.
I'll probably want to create a settings struct like python's "csv.Dialect" to make it possible
to parse some wackier variants.


Does CSV usually include unicode? I suppose it's an ill- (or un-)
specified format, so it probably supports UTF8 by default.
You are correct, I believe.

Kyle Consalus

unread,
Apr 30, 2010, 1:56:09 PM4/30/10
to r...@golang.org, Andrew Gerrand, hotei, golang-nuts
On Thu, Apr 29, 2010 at 5:09 PM, Russ Cox <r...@golang.org> wrote:
This would be a great package to make available via goinstall.

The interface seems fine, though the word Row seems unnecessary.
Perhaps. I was concerned that if I used just "Read", it wouldn't be clear if
a cell, a row, or a whole csv file were being read.
 
But the real problem is what does CSV mean?

I wrote a program a few years ago to accept data from
a commercial application and it seemed easiest to have
the user export to CSV and give me that.  After the umpteenth
time it broke because there was yet another special case
I didn't know about in the CSV, I gave up.  We changed
the process to be export to Excel instead.  Parsing .xls files
was far easier, or at least more well defined.

You won't find me arguing that CSV is a good data format.
However, it is a common format, and I created the library out of a real need to parse it.
For it to work for everything that people call CSV, I'll most likely have to allow nearly every
behavior to be configurable (I'm in the process of doing that now for space trimming).

Can you remember the special cases that broke you so I can make sure to handle them/


Russ

Andrew Gerrand

unread,
May 2, 2010, 9:33:53 PM5/2/10
to Kyle Consalus, r...@golang.org, hotei, golang-nuts
This sounds like a sane approach to me.

Did you know that many European versions of Excel export CSV files
using semicolons as separators? (Europe uses a comma is used to
separate the whole part from the fractional part of a decimal number.)
And this behaviour is not configurable! Crazy.

Andrew
Reply all
Reply to author
Forward
0 new messages