Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Why is Ruby so slow?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Venherm Borchers  
View profile  
 More options Mar 18 2002, 1:47 pm
Newsgroups: comp.lang.ruby
From: Venherm.Borch...@t-online.de (Venherm Borchers)
Date: Mon, 18 Mar 2002 18:47:26 GMT
Local: Mon, Mar 18 2002 1:47 pm
Subject: Why is Ruby so slow?
WHY IS RUBY SO SLOW?

I implemented a _DataReader_ class in Ruby and Python. The reader:

  - reads in a CSV file, in this case tab-separated,
  - gets variable names from the header line,
  - splits up each row into single items,
  - checks for and counts missing values,
  - determines the type of the item - using regular expressions -
    (integer, float, or else classified as string), and
  - counts the number of unique items in each column

finally outputting a short report on what it found. And this result is
quite useful even if you later on perform data mining tasks on these
data utilizing other tools.

The implementation is straightforward with no attempts to optimize in
the first run. I tested it on a quite large data file with 4.3 MB and
1.6 Mill. data items, most of them integers.

Here are the running times for some available Ruby implementations
under Windows:

        ________data items______1,600,000_________320,000_______

Ruby 1.6.5-2                    17:10 min          46 sec
Ruby 1.6.6-0                    18:43 min          58 sec
Ruby 1.7.2 (i586-mswin32)       18:05 min          54 sec

As a comparision, I implemented the method in Python too with the
following results:

Python 2.1.1 (Zope)                58 sec          10 sec
Python 2.2                         49 sec           9 sec
Active State Python 2.2a           49 sec          11 sec

And I also tested the data with the _read.table_ function of the
public domain statistical package *R* that has a almost the same
functionality (in a way I tried to model it)

R::read.table                      30 sec           2 sec

One can see that the Python implementation compares reasonably with
such a well-known package.  Unfortunately, the Ruby implementation of
the same method is *unacceptably* slow.

I had experiences with some text analysis functionalities where I did
split some 5,000 news messages into words and then counted and stored
these words for retrieval and for determining similarity between the
news articles.

Ruby was 20-30% slower than Python in this task, which I could really
accept because Ruby is such a nice language. But the time differences
above will kill my project, I'm afraid.

The tests were done on a 1.1 GHz Pentium III PC under Windows 2000 and
with 512 MB main memory. I didn't try Linux for that because the final
application has to run under MS Windows anyway.

So for me the question remains: Why is Ruby so unbelievably slow (more
than 5-20 times slower than Python) in this task -- esp. for larger
data sets?

Many thanks,  Hans Werner.
______________________________________________________________________

Loading data set test.dat...
10001 rows loaded, of required length 32.
2.824 secs needed.

  0              Id:    TYPE Integer (10000 items, 0 missing).
  1              V1:    TYPE Set (2 items, 0 missing).
  2              V2:    TYPE Integer (75 items, 0 missing).
  3              V3:    TYPE Set (2 items, 0 missing).
  4              V4:    TYPE Set (6 items, 0 missing).
  5              V5:    TYPE Integer (885 items, 0 missing).
  6              V6:    TYPE Integer (467 items, 0 missing).
  7              V7:    TYPE Integer (402 items, 0 missing).
  8              V8:    TYPE Set (9 items, 0 missing).
  9              V9:    TYPE Integer (19 items, 0 missing).
 10             V10:    TYPE Integer (70 items, 0 missing).
 11             V11:    TYPE Integer (1653 items, 0 missing).
 12             V12:    TYPE Integer (1316 items, 0 missing).
 13             V13:    TYPE Integer (52 items, 0 missing).
 14             V14:    TYPE Set (6 items, 0 missing).
 15             V15:    TYPE Set (2 items, 0 missing).
 16             V16:    TYPE Integer (29 items, 0 missing).
 17             V17:    TYPE Integer (49 items, 0 missing).
 18             V18:    TYPE Integer (69 items, 0 missing).
 19             V19:    TYPE Integer (13 items, 0 missing).
 20             V20:    TYPE Set (11 items, 0 missing).
 21             V21:    TYPE Set (9 items, 0 missing).
 22             V22:    TYPE Integer (15 items, 0 missing).
 23             V23:    TYPE Integer (19 items, 0 missing).
 24             V24:    TYPE Set (10 items, 0 missing).
 25             V25:    TYPE Integer (15 items, 0 missing).
 26             V26:    TYPE Set (12 items, 0 missing).
 27             V27:    TYPE Integer (17 items, 0 missing).
 28             V28:    TYPE Integer (15 items, 0 missing).
 29             V29:    TYPE Integer (25 items, 0 missing).
 30             V30:    TYPE Set (2 items, 0 missing).
 31          Target:    TYPE Set (2 items, 0 missing).

48.655 secs needed.
______________________________________________________________________

module CSV

def parse_line(line, sep="\t", missing='?', comment='#')
    line.chomp!
    if line == '' or line[0] == comment
        fields  = []
        nfields = 0
    else
        fields  = line.split(sep)
        nfields = fields.length
    end
    return nfields, fields
end

end #module

### --  c l a s s  DataReader  ---------------------------------------

class DataReader

include CSV

def initialize(fname, header=true, sep="\t", missing="?", comment="#")
### ------------------------------------------------
    @fname    = fname;
    @header   = header;         @hfields = []
    @dtypes   = [];             @dfields = []
    @nrows    = 0;              @ncols   = 0
    @sep      = sep;            @missing  = missing
    @comment  = comment
### ------------------------------------------------
end

def load(logging=false)
    t1 = Time.now
    if logging
        puts
        puts "---------------------------------------------- LOADING DATA ----"
        puts "Loading data set #{@fname}..."
    end
    csvFile = File.open(@fname, 'r')

    if @header
        @ncols, @hfields = parse_line(csvFile.gets, \
                            sep=@sep, missing=@missing, comment=@comment)
    else
        raise "Not Implemented Error."
    end
    @row = []; @col = []
    @row[0] = @hfields
    (0...@ncols).each { |j| @col << [] }

    no_short = 0;  no_long = 0
    ln_short = []; ln_long = []

    n = 0
    while line = csvFile.gets
        n += 1
        m, fields = parse_line(line, \
                            sep=@sep, missing=@missing, comment=@comment)
        if m == 0 then next end
        # fill row up with NA character or cut if too long
        if m < @ncols
            no_short +=1; ln_short << n+1
            (@ncols - m).times { fields << @missing }
        elsif m > @ncols
            no_long += 1; ln_long << n+1
            fields = fields[0...@ncols]
        end

        @row[n] = fields
        (0...@ncols).each { |j| @col[j] << fields[j] }
    end
    csvFile.close
    @nrows = @row.size

    t2 = Time.now
    if logging
        puts "#{@nrows} rows loaded, of required length #{@ncols}."
        if no_short > 0
            puts "#{no_short} rows too short: #{ln_short[0]}, ..."
        end
        if no_long > 0
            puts "#{no_long} rows too long: #{ln_long[0]}, ..."
        end
        puts "#{t2 - t1} secs needed."
        puts
    end

end

def prelyze(logging=false, missing=@missing)
    t1 = Time.now
    dtypes = {0 => 'NA', 1 => 'Integer', 2 => 'Continuous',
              3 => 'String', 4 => 'Set'}
    @dtypes = []
    for j in (0...@ncols) do
        ctype = 0; mitms = 0
        @col[j].each { |item|
            if item == missing
                ctype = [ctype, 0].max
                mitms += 1
            elsif item =~ /^\s*[+\-]?\d+\s*$/
                ctype = [ctype, 1].max
            elsif item =~ /^\s*[+\-]?(?:\d+\.\d*|\d*\.\d+)\s*$/
                ctype = [ctype, 2].max
            else
                ctype = [ctype, 3].max
            end
        }

        nitms = (@col[j]-['']).nitems
        if 0 < nitms and nitms <= 12 and nitms <= 0.1*(@nrows-mitms) then ctype
= 4 end
        ctype = dtypes[ctype]
        @dtypes << ctype

        if logging
            puts "#{j.to_s.rjust(3)} #{(@row[0][j]).rjust(15)}:\tTYPE #{ctype}
(#{nitms} items, #{mitms} missing)."
        end
    end

    t2 = Time.now
    if logging
        puts
        puts "#{t2 - t1} secs needed."
        puts "----------------------------------------------------------------"
        puts "    Copyright (C) 2001, Data Mining Center."
        puts
    end

end

### --  accessor functions --

attr_reader :nrows, :ncols
attr_reader :dtypes

def nrow(); @nrows; end
def ncol(); @ncols; end
def hfields(); @row[0]; end
def [](i, j); @row[i][j]; end
def col(j); @col[j]; end
def row(i); @row[i]; end

end #class

### --  m a i n ( )  ------------------------------------------------#

    tData = DataReader.new("test2.dat", header=true, \
                sep="\t", missing="", comment="%")
    tData.load(logging=true)
    tData.prelyze(logging=true)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kent Dahl  
View profile  
 More options Mar 18 2002, 2:02 pm
Newsgroups: comp.lang.ruby
From: Kent Dahl <ken...@stud.ntnu.no>
Date: Mon, 18 Mar 2002 22:02:58 +0100
Local: Mon, Mar 18 2002 4:02 pm
Subject: Re: Why is Ruby so slow?

Venherm Borchers wrote:

> WHY IS RUBY SO SLOW?

Please see the "file reading impossibly slow?" thread.

 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/35367

HTH

--
<[ Kent Dahl ]>================<[ http://www.stud.ntnu.no/~kentda/ ]>
  )__(stud.techn.;  ind. econ & management: computer technology)__(
 /"Opinions expressed are mine and not those of my Employer,      "\
( "the University, my girlfriend, stray cats, banana fruitflies,  " )
 \"nor the frontal lobe of my left cerebral hemisphere.           "/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Curt Hibbs  
View profile  
 More options Mar 18 2002, 3:27 pm
Newsgroups: comp.lang.ruby
From: "Curt Hibbs" <c...@hibbs.com>
Date: Mon, 18 Mar 2002 20:28:56 GMT
Local: Mon, Mar 18 2002 3:28 pm
Subject: RE: Why is Ruby so slow?

Kent Dahl wrote:

> Venherm Borchers wrote:

> > WHY IS RUBY SO SLOW?

> Please see the "file reading impossibly slow?" thread.

>  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/35367

The thread you referenced above said that this problem is fixed in 1.7, but
Venherm provided a 1.7.2 test which showed it to be nearly as slow as the
other versions of Ruby.

If Venherm's numbers are correct, it looks like there is still a problem
here.

Curt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kent Dahl  
View profile  
 More options Mar 18 2002, 3:32 pm
Newsgroups: comp.lang.ruby
From: Kent Dahl <ken...@stud.ntnu.no>
Date: Mon, 18 Mar 2002 21:32:38 +0100
Local: Mon, Mar 18 2002 3:32 pm
Subject: Re: Why is Ruby so slow?

Curt Hibbs wrote:
> The thread you referenced above said that this problem is fixed in 1.7, but
> Venherm provided a 1.7.2 test which showed it to be nearly as slow as the
> other versions of Ruby.

IIRC, the problem was very dependant on how it is compiled up on
Windows, with regards to Cygwin, MinGW etc. I was under the impression
that it was fixed as far as Linux goes, but that portability to Windows
left something to be desired. Last I remember was a post that commented
on some "tricks" Cygwin apparently was doing. (Not too sure, as I
started reading it with only half-a-brain, after the problem sounded
Winblows specific :-)

--
<[ Kent Dahl ]>================<[ http://www.stud.ntnu.no/~kentda/ ]>
  )__(stud.techn.;  ind. econ & management: computer technology)__(
 /"Opinions expressed are mine and not those of my Employer,      "\
( "the University, my girlfriend, stray cats, banana fruitflies,  " )
 \"nor the frontal lobe of my left cerebral hemisphere.           "/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jim Freeze  
View profile  
 More options Mar 18 2002, 3:43 pm
Newsgroups: comp.lang.ruby
From: Jim Freeze <j...@freeze.org>
Date: Mon, 18 Mar 2002 20:45:34 GMT
Local: Mon, Mar 18 2002 3:45 pm
Subject: Re: Why is Ruby so slow?

On Tue, Mar 19, 2002 at 03:41:31AM +0900, Venherm Borchers wrote:
> WHY IS RUBY SO SLOW?

>     if @header
>         @ncols, @hfields = parse_line(csvFile.gets, \
>                             sep=@sep, missing=@missing, comment=@comment)

Have you tried using something other than gets?

Jim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Tomson  
View profile  
 More options Mar 18 2002, 4:35 pm
Newsgroups: comp.lang.ruby
From: pt...@shell1.aracnet.com (Phil Tomson)
Date: 18 Mar 2002 21:08:18 GMT
Local: Mon, Mar 18 2002 4:08 pm
Subject: Re: Why is Ruby so slow?
In article <INEGJNJOFAMNDPNEABNEOEFEDCAA.c...@hibbs.com>,

Perhaps, but as I recall he showed that the 1.7 test he ran was under
cygwin (Windows and cygwin) - once again, could cygwin be the culperit in
that case?

Phil


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Armstrong  
View profile  
 More options Mar 18 2002, 4:44 pm
Newsgroups: comp.lang.ruby
From: Matt Armstrong <m...@lickey.com>
Date: Mon, 18 Mar 2002 21:44:11 GMT
Local: Mon, Mar 18 2002 4:44 pm
Subject: Re: Why is Ruby so slow?

Kent Dahl <ken...@stud.ntnu.no> writes:
> Curt Hibbs wrote:
>> The thread you referenced above said that this problem is fixed in 1.7, but
>> Venherm provided a 1.7.2 test which showed it to be nearly as slow as the
>> other versions of Ruby.

> IIRC, the problem was very dependant on how it is compiled up on
> Windows, with regards to Cygwin, MinGW etc. I was under the
> impression that it was fixed as far as Linux goes, but that
> portability to Windows left something to be desired. Last I remember
> was a post that commented on some "tricks" Cygwin apparently was
> doing. (Not too sure, as I started reading it with only
> half-a-brain, after the problem sounded Winblows specific :-)

With respect to IO, 1.7.* native windows should be as fast as cygwin
windows or Unix.  That is what I find with empirical tests.

--
matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Armstrong  
View profile  
 More options Mar 18 2002, 4:50 pm
Newsgroups: comp.lang.ruby
From: Matt Armstrong <m...@lickey.com>
Date: Mon, 18 Mar 2002 21:49:44 GMT
Local: Mon, Mar 18 2002 4:49 pm
Subject: Re: Why is Ruby so slow?

Venherm.Borch...@t-online.de (Venherm Borchers) writes:
> WHY IS RUBY SO SLOW?

We probably can't tell just be reading your code.  You may have
identified an area where Ruby can be improved, or where the speed
under Windows isn't good.

To rule out the IO slowness mentioned earlier, you could write a
simple program like this:

    save = []
    while line = datafile.gets
        save << line
    end

And time that on your data files.  That'll tell you how much time it
takes Ruby to read in the file and store it in an array.  The rest of
the overhead will likely be in your string manipulation code.

Also, ruby has a built in profiler.  Run your program like "ruby -r
profile your_program.rb" Make sure to use a much smaller data set,
since it will really slow your program down.

But I find that I like RBProf much better:

    http://aspectr.sourceforge.net/rbprof/

It is a little quirky, but it gives you more information and doesn't
profile every built in function so the program doesn't run anywhere
near as slowly.

--
matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stathy G. Touloumis  
View profile  
 More options Mar 18 2002, 5:00 pm
Newsgroups: comp.lang.ruby
From: "Stathy G. Touloumis" <stathy.toulou...@edventions.com>
Date: Mon, 18 Mar 2002 22:03:37 GMT
Local: Mon, Mar 18 2002 5:03 pm
Subject: RE: Why is Ruby so slow?
Can the test be provided online so that we can test with other builds?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nobu.nokada  
View profile  
 More options Mar 18 2002, 9:27 pm
Newsgroups: comp.lang.ruby
From: nobu.nok...@softhome.net
Date: Tue, 19 Mar 2002 02:29:48 GMT
Local: Mon, Mar 18 2002 9:29 pm
Subject: Re: Why is Ruby so slow?
Hi,

At Tue, 19 Mar 2002 03:41:31 +0900,

Venherm Borchers wrote:
> WHY IS RUBY SO SLOW?

> I implemented a _DataReader_ class in Ruby and Python. The reader:
(snip)
> Here are the running times for some available Ruby implementations
> under Windows:

>         ________data items______1,600,000_________320,000_______

> Ruby 1.6.5-2                    17:10 min          46 sec
> Ruby 1.6.6-0                    18:43 min          58 sec
> Ruby 1.7.2 (i586-mswin32)       18:05 min          54 sec

tData.load ran in 2.824 secs, but tData.prelyze spent 48.655
secs, it's exactly too slow.

Possibly here.  Array#- makes a hash once so a little
expensive.  Try with:

        nitms = @col[j].nitems - @col.grep(/^$/).nitems

Or it may be better to count nitms up in @col[j].each block.

--
Nobu Nakada


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jim Freeze  
View profile  
 More options Mar 19 2002, 8:57 am
Newsgroups: comp.lang.ruby
From: Jim Freeze <j...@freeze.org>
Date: Tue, 19 Mar 2002 13:59:56 GMT
Local: Tues, Mar 19 2002 8:59 am
Subject: Re: Why is Ruby so slow?
On Tue, Mar 19, 2002 at 06:48:43AM +0900, Matt Armstrong wrote:
> To rule out the IO slowness mentioned earlier, you could write a
> simple program like this:

>     save = []
>     while line = datafile.gets
>         save << line
>     end

On A sun machine I changed the above to

   save = IO.readlines(file)

and got about a 50% improvement.

7.81u 1.73s 0:13.95 68.3%  # 4 line method
4.48u 1.54s 0:06.25 96.3%  # 1 line method

dir bigfile
-rw-rw-r--   1 jfn      cad      40500000 Mar 19 08:47 bigfile

--
Jim Freeze
~


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »