Noob question about performance

25 views
Skip to first unread message

Edgar Ortega

unread,
Nov 5, 2018, 12:40:25 PM11/5/18
to wwwmake
Hey! I've been checking your library recently and the first thing that comes to my mind is how is the reading performance is compared against the std library?

I know I'm lazy, wanted to know if have already made a benchmark of both reading a standard csv.

Gerald Bauer

unread,
Nov 5, 2018, 12:47:07 PM11/5/18
to www...@googlegroups.com
Hello,
Welcome. Thanks for your interest in the (new) csv library /
reader. The best way is to try it :-)

Sorry I haven't run any benchmarks yet (and do not claim that it's
faster). The best way for a faster csv library is to use a
c-extension :-). See the fastcsv, fasterercsv, fastestcsv and so on
libraries.

The (new) csv library has a "front-end" and "back-end" - the idea
is that you can plugin new parsers - and one would be with a
c-extension. Since I currently have no need that's months off on the
roadmap.

Anyways, if you try the (new) csv library and compare / benchmark,
let us know! All the best. Cheers. Prost.

Steffen Roller

unread,
Nov 6, 2018, 3:16:42 PM11/6/18
to wwwmake
I'm fiddling with some weather data from the University of Waterloo in Ontario, Canada.

I just did some very basic testing and found CsvReader 2-3 times slower than the builtin gem,
*BUT* the builtin gem couldn't even read the file on Windows!
My tests were executed on a Linux system (Ubuntu 18.4 Bionic) with Ruby 2.5.3

You can find my test script on PasteBin https://pastebin.com/XvPeuP3M
-st

Gerald Bauer

unread,
Nov 6, 2018, 3:30:07 PM11/6/18
to www...@googlegroups.com
Hello,
Thanks for testing and sharing the benchmark results.

I looked at the weather data e.g.:

year,month,day,hour,minute,Temperature,Precipitation - Tipping
Bucket,Precipitation - Weighing,Solar - Incoming,Solar - Outgoing,Wind
Speed - Average 4.4,Wind Speed - Gust 4.4,Wind Speed - Average
2.0,Wind Speed - Gust 2.0,Wind Direction,RH,Pressure,Soil Moisture - 5
cm,Soil Moisture - 10 cm,Soil Moisture - 20 cm,Battery,Dew Point
2017, 1, 1, 0, 0, -27.95667, 0.00000, 0.00000,
1.00000, 1.00000, 0.00000, 0.00000, 0.00000,
0.00000, 329.00000, 80.50667, 103.26868, -9999.90039,
-9999.90039, 0.28500, 4.11733, -30.30333,

One plus of the new csvreader is that it supports many flavors /
formats / dialects out-of-the-box without any configuration. In the
case about that would be the CSV <3 Numerics format [1]

Try changing:

Csv.read(FILE, { headers: true, converters: :all })

to

Csv.num.read(FILE, headers: true ) ## or Csv.numeric.read(FILE,
headers: true )

and it should be faster (in theory - always benchmark, of course)
because the data conversion pipeline is seriously broken (and will get
replaced / redone ). See What's Your Type? [2] on the inside story /
details.

Thanks again. Cheers. Prost.

[1] https://github.com/csvspecs/csv-numerics
[2] https://github.com/csvreader/docs/blob/master/csv-types.md
Message has been deleted

Gerald Bauer

unread,
Nov 7, 2018, 3:22:31 PM11/7/18
to www...@googlegroups.com
Hello,
Thanks for the benchmark updates. Greatly appreciated. Great to see
that the theory holds up to reality (that is, that the numeric is
faster than the :converters => :all option) :-). Cheers. Prost.


PS: Note: The parser, that is, ParserStd [1] is the same for numeric
and "default" (with or without converters). For numeric the "data
converter pipeline" is different. The parser is different, however,
for tab, table, strict and others. See the base.rb script for all
parser configurations (if interested)
[1] https://github.com/csvreader/csvreader/blob/master/lib/csvreader/parser_std.rb
[2] https://github.com/csvreader/csvreader/blob/master/lib/csvreader/base.rb
Reply all
Reply to author
Forward
0 new messages