Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
TumbleDRYer (#53)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ruby Quiz  
View profile  
 More options Oct 28 2005, 4:58 pm
Newsgroups: comp.lang.ruby
From: Ruby Quiz <ja...@grayproductions.net>
Date: Sat, 29 Oct 2005 05:58:50 +0900
Local: Fri, Oct 28 2005 4:58 pm
Subject: [QUIZ] TumbleDRYer (#53)
The three rules of Ruby Quiz:

1.  Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2.  Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3.  Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=

by Hugh Sasse

The Pragmatic Programmers (and others) recommend that you remove repetition from
code.  They call this principle DRY: Don't Repeat Yourself.  Benefits include
not having to track down multiple places to make a change affecting the whole
system, prevention of inconsistency, as well as reduced debugging time.
Sometimes code ends up being repetitive.  Take this MySQL example from:

        http://manuals.rubyonrails.com/read/chapter/48

        CREATE TABLE `authors` (
          `id` int(11) NOT NULL auto_increment,
          `firstname` varchar(50) NOT NULL default '',
          `name` varchar(50) NOT NULL default '',
          `nickname` varchar(50) NOT NULL default '',
          `contact` varchar(50) NOT NULL default '',
          `password` varchar(50) NOT NULL default '',
          `description` text NOT NULL,
          PRIMARY KEY  (`id`)
        ) TYPE=MyISAM AUTO_INCREMENT=3 ;

        CREATE TABLE `categories` (
          `id` int(11) NOT NULL auto_increment,
          `name` varchar(20) NOT NULL default '',
          `description` varchar(70) NOT NULL default '',
          PRIMARY KEY  (`id`)
        ) TYPE=MyISAM AUTO_INCREMENT=3 ;

        CREATE TABLE `categories_documents` (
          `category_id` int(11) NOT NULL default '0',
          `document_id` int(11) NOT NULL default '0',
        ) TYPE=MyISAM ;

        CREATE TABLE `documents` (
          `id` int(11) NOT NULL auto_increment,
          `title` varchar(50) NOT NULL default '',
          `description` text NOT NULL,
          `author_id` int(11) NOT NULL default '0',
          `date` date NOT NULL default '0000-00-00',
          `filename` varchar(50) NOT NULL default '',
          PRIMARY KEY  (`id`),
          KEY `document` (`title`)
        ) TYPE=MyISAM AUTO_INCREMENT=14 ;

clearly there's a lot of repetition in there.  It would be nice if it were
possible to eliminate it.  This quiz is about writing a program which we'll call
TumbleDRYer, which, given repetitive input, will create ruby code to generate
that input, such that the generating program has much less repetition in it.
One way might be to generate a program like this:

        # Undoes the work of TumbleDRYer :-)
        class WashingMachine
          def initialize
            @create = 'CREATE TABLE'
            @type = 'TYPE=MyISAM'
            @varchar_50 = "varchar(50) NOT NULL default '',"
            @varchar_20 = "varchar(20) NOT NULL default '',"
            @int_11 = "int(11) NOT NULL auto_increment,"
            @int_11_0 = "int(11) NOT NULL default '0',"
            @pk = "PRIMARY KEY  (`id`)"
            @auto = "AUTO_INCREMENT="
          end

          def output
            print <<EOF
           #{@create} `authors` (
             `id` #{@int_11}
             `firstname` #{@varchar_50}
             `name` #{@varchar_50}
             `nickname` #{@varchar_50}
             `contact` #{@varchar_50}
             `password` #{@varchar_50}
             `description` text NOT NULL,
             #{@pk}
           ) #{@type} #{@auto}3 ;

           #{@create} `categories` (
             `id` #{@int_11}
             `name` #{@varchar_20}
             `description` varchar(70) NOT NULL default '',
             #{@pk}
           ) #{@type} #{@auto}3 ;

           #{@create} `categories_documents` (
             `category_id` #{@int_11_0}
             `document_id` #{@int_11_0}
           ) #{@type} ;

           #{@create} `documents` (
             `id` #{@int_11}
             `title` #{@varchar_50}
             `description` text NOT NULL,
             `author_id` #{@int_11_0}
             `date` date NOT NULL default '0000-00-00',
             `filename` #{@varchar_50}
             #{@pk},
             KEY `document` (`title`)
           ) #{@type} #{@auto}14 ;
 EOF
          end
        end

        WashingMachine.new.output

or something of the kind.   Or it might be worth using ERB.

Obviously the more human readable to the resulting program is the better: the
point is to produce something that simplifies maintenance, rather than data
compression.  Techniques from data compression would seem to be useful, however.
Building up a dictionary of character groups and their frequencies might be
useful, and various strategies are possible for how to fragment the string into
repeated chunks, such as: the longest repeated substring; splitting on some kind
of boundary; or techniques from Ziv-Lempel coding to create the groups.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bill Guindon  
View profile  
 More options Oct 28 2005, 7:06 pm
Newsgroups: comp.lang.ruby
From: Bill Guindon <agori...@gmail.com>
Date: Sat, 29 Oct 2005 08:06:32 +0900
Local: Fri, Oct 28 2005 7:06 pm
Subject: Re: [QUIZ] TumbleDRYer (#53)
On 10/28/05, Ruby Quiz <ja...@grayproductions.net> wrote:

[snip]

> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=

> by Hugh Sasse

[snip]

>         CREATE TABLE `documents` (
>           `id` int(11) NOT NULL auto_increment,
>           `title` varchar(50) NOT NULL default '',
>           `description` text NOT NULL,
>           `author_id` int(11) NOT NULL default '0',
>           `date` date NOT NULL default '0000-00-00',
>           `filename` varchar(50) NOT NULL default '',
>           PRIMARY KEY  (`id`),
>           KEY `document` (`title`)
>         ) TYPE=MyISAM AUTO_INCREMENT=14 ;

This looks like a fun one to me, as it's one of the main things I use
Ruby for.  Looking forward to seeing what others do with it -- and
hoping to clean up an example of my own.

Since I had to rtfm [1] for a couple simple SQL questions (yeah, my
SQL is probably rustier than most), thought I'd save others the time.

q) Why doesn't the 'description' field have a 'default'?
a) "BLOB and TEXT columns cannot be assigned a default value."

q) How should you treat columns that allow NULLs (none of the examples do)?
a) "If neither NULL nor NOT NULL is specified, the column is treated
as though NULL had been specified."

[1] http://dev.mysql.com/doc/refman/5.0/en/create-table.html

--
Bill Guindon (aka aGorilla)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bill Guindon  
View profile  
 More options Oct 28 2005, 8:02 pm
Newsgroups: comp.lang.ruby
From: Bill Guindon <agori...@gmail.com>
Date: Sat, 29 Oct 2005 09:02:03 +0900
Local: Fri, Oct 28 2005 8:02 pm
Subject: Re: [QUIZ] TumbleDRYer (#53)
On 10/28/05, Bill Guindon <agori...@gmail.com> wrote:

nm, I seriously misread the challenge.  either way, looking forward to
it, and might take a stab at it.

> --
> Bill Guindon (aka aGorilla)

--
Bill Guindon (aka aGorilla)

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Burt  
View profile  
 More options Oct 28 2005, 8:29 pm
Newsgroups: comp.lang.ruby
From: "Dave Burt" <d...@burt.id.au>
Date: Sat, 29 Oct 2005 00:29:40 GMT
Local: Fri, Oct 28 2005 8:29 pm
Subject: Re: [QUIZ] TumbleDRYer (#53)

Bill Guindon wrote:
> Since I had to rtfm [1] for a couple simple SQL questions (yeah, my
> SQL is probably rustier than most), thought I'd save others the time.

> q) Why doesn't the 'description' field have a 'default'?
> a) "BLOB and TEXT columns cannot be assigned a default value."

I'm pretty sure that's DBMS-specific (MySQL), and I'm pretty sure SQLite
(for example) allows this. I'm actually not sure about other major DBMSes.

> q) How should you treat columns that allow NULLs (none of the examples
> do)?
> a) "If neither NULL nor NOT NULL is specified, the column is treated
> as though NULL had been specified."

That's standard SQL. Most schema-DDLs include optional columns; I like to
omit both the NULL and the NOT.

While we're looking at the MySQL DDL:

q) What's with the backticks (`)?
a) That's MySQL's way of quoting column names. They're unnecessary as long
as the name is alphabetic and not a keyword. I don't know of any other DBMS
that uses backticks for this.

q) What's with the TYPE=?
a) Again, a MySQL feature - MySQL supports different storage engines (and
here was I thinking a DBMS was supposed to manage storage...) and this
specifies which to use. MyISAM is the crippled default one which doesn't
support referential integrity.

q) Why doesn't auto_increment work with other databases?
a) Because defining an auto-increment field is not specified in SQL-92. They
can all do it, though.

q) Why are data types, "default" and "auto_increment" in lowercase when
other keywords are uppercase?
a) I don't know.

For what it's worth, MySQL 5 is a lot better than MySQL 4 (now with in-line
views, I believe), but I'd still recommend Postgresql or really anything
else over it.

Cheers,
Dave


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "what version of ruby will run on osx 10.2" by Donald Huebschman
Donald Huebschman  
View profile  
 More options Oct 31 2005, 10:34 am
Newsgroups: comp.lang.ruby
From: Donald Huebschman <dhueb...@mac.com>
Date: Tue, 1 Nov 2005 00:34:11 +0900
Local: Mon, Oct 31 2005 10:34 am
Subject: what version of ruby will run on osx 10.2
I have a beige g3 that apple will not support past 10.2. What version  
of ruby is the latest version that will run on a 10.2 system and  
where can I get it. 1.8.3 does not compile. Is there anything else I  
would need to run ruby?

Thanks,
Donald


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michal Suchanek  
View profile  
 More options Oct 31 2005, 3:17 pm
Newsgroups: comp.lang.ruby
From: Michal Suchanek <hramr...@gmail.com>
Date: Tue, 1 Nov 2005 05:17:46 +0900
Local: Mon, Oct 31 2005 3:17 pm
Subject: Re: what version of ruby will run on osx 10.2
On 10/31/05, Donald Huebschman <dhueb...@mac.com> wrote:

> I have a beige g3 that apple will not support past 10.2. What version
> of ruby is the latest version that will run on a 10.2 system and
> where can I get it. 1.8.3 does not compile. Is there anything else I
> would need to run ruby?

Of course it does not. Look at the patches included in the fink
package for 10.3.
I guess the package would build on 10.2 as well. But I cannot try myself.
Specifically I am not sure the environ patch would work on 10.2 as well.

hth

Michal Suchanek


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "[SOLUTION] TumbleDRYer (#53)" by Bob Showalter
Bob Showalter  
View profile  
 More options Nov 1 2005, 2:10 pm
Newsgroups: comp.lang.ruby
From: Bob Showalter <bob_showal...@taylorwhite.com>
Date: Wed, 2 Nov 2005 04:10:12 +0900
Local: Tues, Nov 1 2005 2:10 pm
Subject: [QUIZ][SOLUTION] TumbleDRYer (#53)
My approach was to look for repeated "phrases" made up of one or more
"words" within a line. Running my solution against the sample input data
produces the following output:

class WashingMachine
   def initialize
     @id = '`id` int(11) NOT NULL auto_increment,'
     @va = 'varchar(50) NOT NULL default \'\','
     @in = 'int(11) NOT NULL default \'0\','
     @no = 'NOT NULL default'
     @ty = ') TYPE=MyISAM'
     @de = '`description`'
     @cr = 'CREATE TABLE'
     @pr = 'PRIMARY KEY'
   end
   def output
print <<EOF
#{@cr} `authors` (
   #{@id}
   `firstname` #{@va}
   `name` #{@va}
   `nickname` #{@va}
   `contact` #{@va}
   `password` #{@va}
   #{@de} text NOT NULL,
   #{@pr} (`id`)
#{@ty} AUTO_INCREMENT=3 ;

#{@cr} `categories` (
   #{@id}
   `name` varchar(20) #{@no} '',
   #{@de} varchar(70) #{@no} '',
   #{@pr} (`id`)

#{@ty} AUTO_INCREMENT=3 ;
   #{@cr} `categories_documents` (
   `category_id` #{@in}
   `document_id` #{@in}
#{@ty} ;

#{@cr} `documents` (
   #{@id}
   `title` #{@va}
   #{@de} text NOT NULL,
   `author_id` #{@in}
   `date` date #{@no} '0000-00-00',
   `filename` #{@va}
   #{@pr} (`id`),
   KEY `document` (`title`)
#{@ty} AUTO_INCREMENT=14 ;
EOF
   end
end

Here's my solution:

require 'strscan'
require 'abbrev'

class TumbleDRYer

   # minimum length of a phrase to consider
   MIN_PHRASE = 10

   # minimum times a phrase must occur to consider
   MIN_OCCUR = 3

   # minimum length for abbreviation
   MIN_ABBR = 2

   def initialize(string)
     @input = string
   end

   def dry

     # this will accumulate a list of repeated phrases to condense
     phrases = Array.new

     # this will receive the abbreviation for each phrase
     abbr = Hash.new

     lines = @input.to_a
     loop do

       # process the input data by lines. we find "phrases" by
       # first finding the  start and end of each "word" in the line,
       # and then combining those words into longer phrases. for
       # each phrase, we count the number of times it occurs in the
       # total input.
       phr = Hash.new
       lines.each do |line|
         s = StringScanner.new(line)
         words = Array.new
         loop do
           s.scan_until(/(?=\S)/) or break
           beg = s.pos
           s.scan(/\S+/)
           words << [ beg, s.pos ]
         end
         require 'pp'

         # combine words to make 'phrases'
         combos(words)

         # accumulate phrases, counting their occurences.
         # skip phrases that are too short.
         words.each do |from, to|
           p = line[from, to - from]
           next unless p.length >= MIN_PHRASE
           phr[p] ||= 0
           phr[p] += 1
         end
       end

       # get the longest phrase that occurs the most times
       longest = phr.sort_by { |k,v| -(k.length * 1000 + v)
         }.find { |k,v| v >= MIN_OCCUR } or break
       phrase, occurs = longest

       # save the phrase, and then blank it out of the input data
       # so we can search for more phrases
       phrases << phrase
       lines.each { |line| line.gsub!(phrase, ' ' * phrase.length) }

     end

     # now we have all the phrases we want to replace.
     # find unique abbreviations for each phrase.
     temp = Hash.new
     phrases.each do |phrase|
       key = phrase.scan(/\w+/).flatten.to_s.downcase
       key = '_' + key unless key =~ /^[_a-zA-Z]/
       key += '_' while temp.has_key? key
       temp[key] = phrase
     end
     temp.keys.abbrev.sort.each do |s, key|
       phrase = temp[key]
       abbr[phrase] = s if abbr[phrase].nil? ||
         abbr[phrase].length < MIN_ABBR
     end

     # generate the output class
     puts "class WashingMachine"
     puts "  def initialize"
     phrases.each do |phrase|
       puts '    @' + abbr[phrase] + " = '" +
         phrase.gsub("'", "\\\\'") + "'"
       @input.gsub!(phrase, '#{@' + abbr[phrase] + '}')
     end
     puts "  end\n"
     puts "  def output\nprint <<EOF"
     puts @input
     puts "EOF\n  end\n"
     puts "end"

   end

   private

   def combos(arr, max = arr.size - 1, i = 0)
     (i+1..max).each do |j|
       arr << [ arr[i][0], arr[j][1] ]
     end
     combos(arr, max, i + 1) if i < max - 1
   end

end

TumbleDRYer.new(ARGF.read).dry


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Sasse  
View profile  
 More options Nov 1 2005, 2:42 pm
Newsgroups: comp.lang.ruby
From: Hugh Sasse <h...@dmu.ac.uk>
Date: Wed, 2 Nov 2005 04:42:12 +0900
Local: Tues, Nov 1 2005 2:42 pm
Subject: Re: [QUIZ][SOLUTION] TumbleDRYer (#53)
Interesting solution.  I've overlooked a lot of the functionality of
strscan, so that is worth knowing about.  The approach for
generating mnemonics was nice as well, as well as the reverse sort
by negation.
        Thank you,
        Hugh

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "TumbleDRYer (#53)" by Brian Schröder
Brian Schröder  
View profile  
 More options Nov 1 2005, 4:34 pm
Newsgroups: comp.lang.ruby
From: Brian Schröder <ruby.br...@gmail.com>
Date: Wed, 2 Nov 2005 06:34:08 +0900
Local: Tues, Nov 1 2005 4:34 pm
Subject: Re: [QUIZ] TumbleDRYer (#53)
I decided to cheat, here is my solution:

bschroed@oerfi:~/svn/projekte/tumblezip$ ./tumblezip.rb tumblezip.rb >
tumblezip.tumbled.rb
bschroed@oerfi:~/svn/projekte/tumblezip$ cat tumblezip.tumbled.rb
require "zlib"
puts Zlib::Inflate.inflate(DATA.read)
__END__
xSVÔ/-.ÒOÊÌÓ/*Mªäâ*J-,Í,JUP¯ÊÉLRçâ*(-)VPÕ      
+4u¸@@U,³²òÌKËI,IÕËÐ▒.!zE(c))0åJññ(r)~.ññJ:
FT­á▒äîÑÈ1}/
bschroed@oerfi:~/svn/projekte/tumblezip$ ruby tumblezip.tumbled.rb
#!/usr/bin/ruby

require 'zlib'

puts %(require "zlib"),
     %(puts Zlib::Inflate.inflate(DATA.read)),
     "__END__",
     Zlib::Deflate.deflate(ARGF.read)

It definitely fails on human readability, though compression and dry
is quite good ;-)

cheers,

Brian

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Sasse  
View profile  
 More options Nov 2 2005, 5:31 am
Newsgroups: comp.lang.ruby
From: Hugh Sasse <h...@dmu.ac.uk>
Date: Wed, 2 Nov 2005 19:31:31 +0900
Local: Wed, Nov 2 2005 5:31 am
Subject: Re: [QUIZ] TumbleDRYer (#53)

[ multipart_mixed_part < 1K ]
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

On Wed, 2 Nov 2005, Brian Schröder wrote:
> I decided to cheat, here is my solution:

        [...]

> xSVÔ/-.Ã’OÊÌÓ/*Mªäâ*J-,à ,JUP¯ÊÉLRçâ*(-)VPÕ      
> +4u¸@@U,³²òÌKËI,IÕËà ▒.!zE(c))0åJññ(r)~.ññJ:
> FT­á▒äîÑÈ1}/
> bschroed@oerfi:~/svn/projekte/tumblezip$ ruby tumblezip.tumbled.rb
> #!/usr/bin/ruby

> require 'zlib'

> puts %(require "zlib"),
>      %(puts Zlib::Inflate.inflate(DATA.read)),
>      "__END__",
>      Zlib::Deflate.deflate(ARGF.read)

> It definitely fails on human readability, though compression and dry

:-)  Well, as the reason for this idea was clearly an aid to code
maintenance, I think I'd fail this, but you'd have to get marks for
subversion, in both the subversIVE and svn senses of the word :-)

> is quite good ;-)

> cheers,

> Brian

        Thank you,
        Hugh

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ruby Quiz  
View profile  
 More options Nov 3 2005, 8:44 am
Newsgroups: comp.lang.ruby
From: Ruby Quiz <ja...@grayproductions.net>
Date: Thu, 3 Nov 2005 22:44:29 +0900
Local: Thurs, Nov 3 2005 8:44 am
Subject: [SUMMARY] TumbleDRYer (#53)
Great quiz Hugh!  This is one of those unique ideas that was a lot of fun to
play with.  There's always some neat things to be accomplished with code that
writes code.

I want to take a look at Bob Showalter's solution below.  It's really just one
linear procedure, so I'll show all the code and then break it down:

        require 'strscan'
        require 'abbrev'

        class TumbleDRYer

          # minimum length of a phrase to consider
          MIN_PHRASE = 10

          # minimum times a phrase must occur to consider
          MIN_OCCUR = 3

          # minimum length for abbreviation
          MIN_ABBR = 2

          def initialize(string)
            @input = string
          end

          def dry

            # this will accumulate a list of repeated phrases to condense
            phrases = Array.new

            # this will receive the abbreviation for each phrase
            abbr = Hash.new

            lines = @input.to_a
            loop do

              # process the input data by lines. we find "phrases" by
              # first finding the start and end of each "word" in the line,
              # and then combining those words into longer phrases. for
              # each phrase, we count the number of times it occurs in the
              # total input.
              phr = Hash.new
              lines.each do |line|
                s = StringScanner.new(line)
                words = Array.new
                loop do
                  s.scan_until(/(?=\S)/) or break
                  beg = s.pos
                  s.scan(/\S+/)
                  words << [ beg, s.pos ]
                end

                # combine words to make 'phrases'
                combos(words)

                # accumulate phrases, counting their occurences.
                # skip phrases that are too short.
                words.each do |from, to|
                  p = line[from, to - from]
                  next unless p.length >= MIN_PHRASE
                  phr[p] ||= 0
                  phr[p] += 1
                end
              end

              # get the longest phrase that occurs the most times
              longest = phr.sort_by { |k,v| -(k.length * 1000 + v)
                }.find { |k,v| v >= MIN_OCCUR } or break
              phrase, occurs = longest

              # save the phrase, and then blank it out of the input data
              # so we can search for more phrases
              phrases << phrase
              lines.each { |line| line.gsub!(phrase, ' ' * phrase.length) }

            end

            # now we have all the phrases we want to replace.
            # find unique abbreviations for each phrase.
            temp = Hash.new
            phrases.each do |phrase|
              key = phrase.scan(/\w+/).flatten.to_s.downcase
              key = '_' + key unless key =~ /^[_a-zA-Z]/
              key += '_' while temp.has_key? key
              temp[key] = phrase
            end
            temp.keys.abbrev.sort.each do |s, key|
              phrase = temp[key]
              abbr[phrase] = s if abbr[phrase].nil? ||
                abbr[phrase].length < MIN_ABBR
            end

            # generate the output class
            puts "class WashingMachine"
            puts "  def initialize"
            phrases.each do |phrase|
              puts '    @' + abbr[phrase] + " = '" +
                phrase.gsub("'", "\\\\'") + "'"
              @input.gsub!(phrase, '#{@' + abbr[phrase] + '}')
            end
            puts "  end\n"
            puts "  def output\nprint <<EOF"
            puts @input
            puts "EOF\n  end\n"
            puts "end"

          end

          private

          def combos(arr, max = arr.size - 1, i = 0)
            (i+1..max).each do |j|
              arr << [ arr[i][0], arr[j][1] ]
            end
            combos(arr, max, i + 1) if i < max - 1
          end

        end

        TumbleDRYer.new(ARGF.read).dry

That's the code, save one unused line I removed.

The first thing I do when reading code is to try and understand the process.  To
get the lay of the land, we need to know what's called where.  The path of
execution is trivial to find here.  The last line is the only code, besides a
couple of require statements, outside of a class definition.  We know that's
what Ruby will run.

That line tells us that we need to be examining two methods
TumbleDRYer#initialize and TumbleDRYer#dry.  Don't be shy with that scrollbar
now.  Zip around until you find them.  TumbleDRYer#initialize is easy enough,
all it does is record the passed input.  TumbleDRYer#dry is a monster, so let's
come back to it.

Anything else in this file?  Well, we can see that two standard libraries are
used, strscan and abbrev.  We can also see three constants at the top of the
class that start to give us ideas about how the code will break down the
sections it is meant to simplify.

Beyond that, there's only one other method, which we can assume is a helper to
TumbleDRYer#dry.  It's smaller, so let's see if we can figure it out first.  Uh
oh, recursion!  My brain just melted and leaked out of my ear.  Okay, let's see
if we can cheat and just call it.  Copy and paste and irb to the rescue:

        >> def combos(arr, max = arr.size - 1, i = 0)
        >>   (i+1..max).each do |j|
        ?>     arr << [ arr[i][0], arr[j][1] ]
        >>   end
        >>   combos(arr, max, i + 1) if i < max - 1
        >> end
        => nil
        >> combos( (1..10).to_a )
        => nil

Or not.  I was hoping for a more informative return value obviously.

Okay, it looks like I actually need to read a little.  I see that `arr` is a
parameter of the method.  That's how I decided on a sample data set.  A little
farther in I can see `arr << ...`.  Bingo.  It modifies the array.  Now I know
how to check it:

        >> test_data = (1..10).to_a
        => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
        >> combos test_data
        => nil
        >> test_data
        => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, [1, 1], [1, 1], [1, 0], [1, 0], [1, 1],
        [1, 1], [1, 0], [1, 0], [1, 1], [0, 1], [0, 0], [0, 0], [0, 1], [0, 1],
        [0, 0], [0, 0], [0, 1], [1, 0], [1, 0], [1, 1], [1, 1], [1, 0], [1, 0],
        [1, 1], [0, 0], [0, 1], [0, 1], [0, 0], [0, 0], [0, 1], [1, 1], [1, 1],
        [1, 0], [1, 0], [1, 1], [0, 1], [0, 0], [0, 0], [0, 1], [1, 0], [1, 0],
        [1, 1], [0, 0], [0, 1], [1, 1]]

Yuck.  In the dictionary under "unhelpful", you will find the above listing.
More reading is required.

Well, if I finish the line I stopped at last time, I find this little nugget:
`[ arr[i][0], arr[j][1] ]`.  That makes me think `arr` is expected to be an
Array of Arrays.  It looks like the subarrays only need two items, since the
indexes are 0 and 1.  Third try is a charm:

        >> test_data = [1, 3, 5, 7, 9].map { |n| [n, n + 1] }
        => [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
        >> combos test_data
        => nil
        >> test_data
        => [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [1, 4], [1, 6], [1, 8], [1, 10],
        [3, 6], [3, 8], [3, 10], [5, 8], [5, 10], [7, 10]]

That looks promising.  Now what are we seeing?  Obviously it added some
groupings to the end of the Array.  1 was already seen with 2, but it also
paired it with 4, 6, 8, and 10.  That's all of the other second elements.  Then
3 is paired with 6, 8, and 10 and it was already paired with 4, which leaves
only 2 missing.  Now I see the pattern.  Each first element is paired with all
of the second elements that come after it, in the original Array.

Even knowing how that works, I'm not sure I understand what that is actually for
yet.  I'll file it away in the back of my mind and see if I'm ready to tackle
TumbleDRYer#dry.

Here's the first chunk of that method, to save you some scrolling:

        def dry

          # this will accumulate a list of repeated phrases to condense
          phrases = Array.new

          # this will receive the abbreviation for each phrase
          abbr = Hash.new

          lines = @input.to_a
          loop do

            # process the input data by lines. we find "phrases" by
            # first finding the start and end of each "word" in the line,
            # and then combining those words into longer phrases. for
            # each phrase, we count the number of times it occurs in the
            # total input.
            phr = Hash.new
            lines.each do |line|
              s = StringScanner.new(line)
              words = Array.new
              loop do
                s.scan_until(/(?=\S)/) or break
                beg = s.pos
                s.scan(/\S+/)
                words << [ beg, s.pos ]
              end

              # combine words to make 'phrases'
              combos(words)

              # accumulate phrases, counting their occurrences.
              # skip phrases that are too short.
              words.each do |from, to|
                p = line[from, to - from]
                next unless p.length >= MIN_PHRASE
                phr[p] ||= 0
                phr[p] += 1
              end
            end

            # get the longest phrase that occurs the most times
            longest = phr.sort_by { |k,v| -(k.length * 1000 + v)
              }.find { |k,v| v >= MIN_OCCUR } or break
            phrase, occurs = longest

            # save the phrase, and then blank it out of the input data
            # so we can search for more phrases
            phrases << phrase
            lines.each { |line| line.gsub!(phrase, ' ' * phrase.length) }

          end

          # ...

Luckily, that's well commented code.  The comments easily explain what the
variables are for, and then we get an introduction to the process in `loop do
.. end`.  As soon as I read that, TumbleDRYer#combos clicks into place.

The Array of Arrays passed to TumbleDRYer#combos holds a start and end position
for each word.  The start positions are combined with all of the end positions
after that word to give the possible phrases.

Before the call to TumbleDRYer#combos, we can see the Array of Arrays being
created, with a little StringScanner work.  Words are found by scanning
non-whitespace characters, because replacing at any other boundaries pretty much
kills the human readable goal of the output.

Just beneath that call to TumbleDRYer#combos the phrases are assembled, checked
for the minimum length, and tallied in the phrase count.  The chunk of code
right after that selects the longest phrase that occurs the most times for
substitution.  Note the clever use of negation in #sort_by here, to reverse the
order.  The last couple of lines save the phrase and replace it with whitespace,
so it won't match in future iterations.

Remember, this is all inside of a big `loop do ... end`, so by the time we break
out of here, all the replacement phrases will have been selected.

          # ...

          # now we have all the phrases we want to replace.
          # find unique abbreviations for each phrase.
          temp = Hash.new
          phrases.each do |phrase|
...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Late Ruby Quiz Notice (was Re: [SUMMARY] TumbleDRYer (#53))" by James Edward Gray II
James Edward Gray II  
View profile  
 More options Nov 4 2005, 7:56 pm
Newsgroups: comp.lang.ruby
From: James Edward Gray II <ja...@grayproductions.net>
Date: Sat, 5 Nov 2005 09:56:00 +0900
Subject: Late Ruby Quiz Notice (was Re: [SUMMARY] TumbleDRYer (#53))
On Nov 3, 2005, at 7:44 AM, Ruby Quiz wrote:

> Ruby Quiz will take a week off now...

Just FYI, there will be a Ruby Quiz on the 11th, but it will be  
later.  My return flight doesn't land until that evening and I can't  
launch it until I'm back home.  It will make the right day, but not  
at the usual morning time.  Thanks for your patients.

James Edward Gray II


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anthony Moralez  
View profile  
 More options Nov 5 2005, 7:57 pm
Newsgroups: comp.lang.ruby
From: Anthony Moralez <anthony.mora...@gmail.com>
Date: Sun, 6 Nov 2005 09:57:55 +0900
Local: Sat, Nov 5 2005 7:57 pm
Subject: Re: Late Ruby Quiz Notice (was Re: [SUMMARY] TumbleDRYer (#53))
On 11/4/05, James Edward Gray II <ja...@grayproductions.net> wrote:

> Thanks for your patients.

But I'm not a doctor ;)
Thanks James for all of your hard work making the Quz happen.

Anthony Moralez


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google