FasterCSV vs. Ruport holy war (well, not exactly)

17 views
Skip to first unread message

Gregory Brown

unread,
Apr 11, 2007, 6:21:58 PM4/11/07
to Ruby Reports
This discussion might be better suited for the development list if we
end up digging deep into implementation details, but since decisions
on this kind of stuff will make an impact on users, we'll start it
here.

== Background ==

FasterCSV supports a lightweight Table / Row implementation that is
pretty sufficient for basic data manipulation. It supports both
ordinal and by field access, and provides a lot of similar core
functionality that Ruport's Data::Table and Data::Record does.

James and I have argued about design issues between the two
implementations for a while, but there are also some technical issues
that keep us from generalizing our code enough so that Ruport doesn't
need to reinvent the FCSV wheel.

Some of those issues follow, and JEG2 will jump in to give us FCSV
insight as needed, but I'm interested to hear what people think about
these things.

== Issues ==

* FCSV Tables support both by column and by row access, and have
triggers to change how the iterators work. I personally think of
tables as being smart datastructures that know how to perform column
operations, but otherwise are a collection of Rows. JEG2 obviously
disagrees.

James, can you fill in some good examples of using the by column
access and how it works, so people can get an idea of what it looks
like? Where appropriate, I'll translate to Ruport.

* FasterCSV only implements a lightweight set of features

There is no direct support for
renaming,reordering,replacing,swapping,or otherwise messing with
columns. There is no support for doing sums. There is also no
sub_table / reduce method.

There is a #values_at method that allows you to possibly build up a
lot of these features.

* Ruport's Records do not handle Rows in a way FasterCSV needs

table = [[1,2,3],[4,5,6]].to_table(%w[a b a])

In Ruport, table[0]["a"] will read 3, and it'll throw out your first column.
We do consider it a bug, more or less. It's more an implementation
detail though, because Data::Record expects attribute uniqueness,
where FasterCSV::Row does not.

In FCSV, you could do something like:

table[0]["a"] #=> 1
table[0]["a",1] #=> 3

This is necessary for properly representing CSV data. How important
is this to Ruport users?

== More to come ==

This may not be the only set of issues, but it'll hopefully be enough
to spark ideas. James and I will probably trade FCSV vs. Ruport
examples in this thread to try to come up with a good feel for how
close or far apart our needs are.

Mike, Dudley, et. al, I'm especially interested in your feedback on
the column examples once James sends them, since I've always seen them
as gross :)

James Edward Gray II

unread,
Apr 11, 2007, 8:11:39 PM4/11/07
to Ruby Reports
On Apr 11, 5:21 pm, "Gregory Brown" <gregory.t.br...@gmail.com> wrote:
> James, can you fill in some good examples of using the by column
> access and how it works, so people can get an idea of what it looks
> like?

http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/56ede4ca5ecd741d/d87c3033d105cf5b?lnk=gst&q=FasterCSV+1.0.0&rnum=1#d87c3033d105cf5b

In short, FasterCSV tries to let you work with your data in whatever
way is convenient for you.

> * FasterCSV only implements a lightweight set of features
>
> There is no direct support for
> renaming,reordering,replacing,swapping,or otherwise messing with
> columns. There is no support for doing sums. There is also no
> sub_table / reduce method.
>
> There is a #values_at method that allows you to possibly build up a
> lot of these features.

I also have delete() and delete_if(), which might be what you refer to
as "reduce." I'm not sure.

James Edward Gray II

Gregory Brown

unread,
Apr 11, 2007, 8:20:20 PM4/11/07
to ruby-r...@googlegroups.com
On 4/11/07, James Edward Gray II <ja...@grayproductions.net> wrote:
>
> On Apr 11, 5:21 pm, "Gregory Brown" <gregory.t.br...@gmail.com> wrote:
> > James, can you fill in some good examples of using the by column
> > access and how it works, so people can get an idea of what it looks
> > like?
>
> http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/56ede4ca5ecd741d/d87c3033d105cf5b?lnk=gst&q=FasterCSV+1.0.0&rnum=1#d87c3033d105cf5b

Thanks

> In short, FasterCSV tries to let you work with your data in whatever
> way is convenient for you.

It also makes your each() work differently depending on what's
convenient for you. That is my primary beef with it, for those
curious.

> > * FasterCSV only implements a lightweight set of features
> >
> > There is no direct support for
> > renaming,reordering,replacing,swapping,or otherwise messing with
> > columns. There is no support for doing sums. There is also no
> > sub_table / reduce method.
> >
> > There is a #values_at method that allows you to possibly build up a
> > lot of these features.
>
> I also have delete() and delete_if(), which might be what you refer to
> as "reduce." I'm not sure.

Sort of, but reduce() is more powerful because it works on both rows
and columns, and can reorder columns if needed on the fly. reduce is
just an alias for sub_table! (well,actually vice-versa), but it's a
destructive sub table operation

That having been said, we might want a delete and delete_if, and
implement sub_table/reduce with it.

The tests below show explain how it works. I'm going to add single
argument support for range, i.e table.reduce(1..20)

table.reduce(%w[a b c]) will already work for column reduction

def test_reduce

table = [ [1,2,3,4],[5,6,7,9],
[10,11,12,13],[14,15,16,17] ].to_table(%w[a b c d])

table.reduce(%w[b c],1..-2)
assert_equal [[6,7],[11,12]].to_table(%w[b c]), table

table = [ [1,2,3,4],[5,6,7,9],
[10,11,12,13],[14,15,16,17] ].to_table(%w[a b c d])
table.reduce(%w[c d a]) { |r| r.a < 10 }

assert_equal [[3,4,1],[7,9,5]].to_table(%w[c d a]),
table

end

James Edward Gray II

unread,
Apr 12, 2007, 3:12:00 PM4/12/07
to Ruby Reports
On Apr 11, 7:20 pm, "Gregory Brown" <gregory.t.br...@gmail.com> wrote:

> On 4/11/07, James Edward Gray II <j...@grayproductions.net> wrote:
>
>
>
> > On Apr 11, 5:21 pm, "Gregory Brown" <gregory.t.br...@gmail.com> wrote:
> > > James, can you fill in some good examples of using the by column
> > > access and how it works, so people can get an idea of what it looks
> > > like?
>
> >http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/56...

>
> Thanks
>
> > In short, FasterCSV tries to let you work with your data in whatever
> > way is convenient for you.
>
> It also makes your each() work differently depending on what's
> convenient for you. That is my primary beef with it, for those
> curious.

Yes, it work's in a similar fashion to Ruby's IO and String iterators.

> The tests below show explain how it works. I'm going to add single
> argument support for range, i.e table.reduce(1..20)

Interesting. I'm trying to envision what that is for. Any good
example of where that came in handy?

James Edward Gray II

Gregory Brown

unread,
Apr 12, 2007, 10:08:21 PM4/12/07
to ruby-r...@googlegroups.com
On 4/12/07, James Edward Gray II <ja...@grayproductions.net> wrote:

> > The tests below show explain how it works. I'm going to add single
> > argument support for range, i.e table.reduce(1..20)
>
> Interesting. I'm trying to envision what that is for. Any good
> example of where that came in handy?

The range use case? Just because table[0..-19] would give you 20 rows
as an array, but if you wanted a table, you just use reduce or
sub_table and it'll wrap for you.

It's not *super* common, I typically use block form

table.reduce(%w[column sub set]) { |r| r.foo < 10 }

But, I filed that ticket during a job because it came up more than once.

Gregory Brown

unread,
Apr 12, 2007, 11:40:50 PM4/12/07
to ruby-r...@googlegroups.com
On 4/11/07, Gregory Brown <gregory...@gmail.com> wrote:
> On 4/11/07, James Edward Gray II <ja...@grayproductions.net> wrote:
> >
> > On Apr 11, 5:21 pm, "Gregory Brown" <gregory.t.br...@gmail.com> wrote:
> > > James, can you fill in some good examples of using the by column
> > > access and how it works, so people can get an idea of what it looks
> > > like?
> >
> > http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/56ede4ca5ecd741d/d87c3033d105cf5b?lnk=gst&q=FasterCSV+1.0.0&rnum=1#d87c3033d105cf5b
>
> Thanks
>
> > In short, FasterCSV tries to let you work with your data in whatever
> > way is convenient for you.

I've ported all these examples to Ruport's Table. They're more
verbose but for the most part we can do all of these things.

If people like the FCSV interface better in places, let me know so I
can think about it.

# fcsv row access
table[0].class # => FasterCSV::Row

# ruport
table[0].class # => Ruport::Data::Record

#---------------------

# fcsv
table[0].fields # => ["zaphod", "beeblebrox", "42"]

# ruport
table[0].attributes # => ["zaphod","beeblebox","42"]

#--------------------

# fcsv column access
table[:first_name] # => ["zaphod", "ara"]

#ruport
table.column("first_name") # => ["zaphod","ara"]

#--------------------

# cell access

#ruport/fcsv
table[1][0] # => "ara"

#--------------------

#fcsv
table[1][:first_name] # => "ara"

#ruport, by [] # => "ara"
table[1]["first_name"]

#ruport, by get() [indifferent access]
table[1].get(:first_name) # => "ara"

#-------------------

#fcsv
table[:first_name][1] # => "ara"

#ruport
table.column("first_name")[1] # => "ara"

#--------------------

#ruport / fcsv
table << %w[james gray 30]

# fcsv
table[-1].fields # => ["james", "gray", "30"]

# ruport

table[-1].to_a # => ["james", "gray", "30"]

#---------------


# fcsv
table[:type] = "name"
table[:type] # => ["name", "name", "name"]

# ruport
# also supports :before, :after, and :position for column insertion,
# also takes proc which yields row object
table.add_column(:type, :default => "name")

#----------------

# fcsv
table[:ssn] = %w[123-456-7890 098-765-4321]
table[:ssn] # => ["123-456-7890", "098-765-4321", nil]

#ruport (not as pretty)
ssn = %w[123-456-7890 098-765-4321]
table.replace_column("ssn") { |r| ssn.shift }

#----------------------

# iteration ruport/fcsv
table.each do |row|
# ...
end

#---------------------

# FCSV
table.by_col!
table.each do |col_name, col_values|
# ...
end

# Ruport, not supported, however, looks like this:
table.column_names.map { |n| table.column(n) }.each { |c| }

Reply all
Reply to author
Forward
0 new messages