So, would anyone object to this strategy?
Is the problem expressed below sufficiently challenging to be of
interest to those who know they could tackle it? Does it make a
good quiz? James said I should ask, so I am.
<quote>
Date: Wed, 26 Oct 2005 18:11:12 +0100 (WEST)
From: Hugh Sasse <>
To: James Edward Gray II <>
Cc: Hugh Sasse <>
Subject: Another Ruby Quiz suggestion: generic diff/patch
I don't know if this is too big for the Ruby quiz. I'm not sure how
to tackle it (though my maximum string length code might be of some
use) but I can tell you why it would be useful. First: the
problem:
Given 2 versions of a file, which may be binary, generate the
difference in GDIFF format, detailed here:
http://www.w3.org/TR/NOTE-gdiff-19970901
For extra marks, implement the patch program that will generate the
new file from the old file and the GDIFF file.
The reason this is needed:
RFC3229 Delta encoding in HTTP
ftp://ftp.rfc-editor.org/in-notes/rfc3229.txt
may make use of this format, and it seems there are no rights to it
as there are for vcdiff referenced in the same document.
Why is this of interest to us?
Rubygems is facing the problem that as more gems are added the index
is getting far too big to handle. Implementing RFC3229 seems
feasible, given the existence of a pure ruby - and thus as portable
as rubygems - differencing library. The logic of the protocol is
fairly clear, but without some means to handle the differences it is
a non-starter.
So, is this too hard for a quiz? Would the Ruby black-belts relish
the challenge?
Hugh
</quote>
So I open this to the floor, so to speak...
Hugh
I agree with Jacob. This particular problem might be a bit involved for
RubyQuiz. Perhaps if we could break it down into smaller problems. I really
would love a native ruby diff. Having said that, I think Hughs idea is a
good one.
Dave
Dave
Jacob
I'm sure there are many better ways of doing this - indeed my algorithm
is very naive, it doesn't scan for matches in an efficient way (just
barely scans :)
The approach I was going for was to have a GDiffFile class, and then
create various GDiff command objects and pump them into the file.
At the moment it will spit out a semi-compliant gdiff file, but it's
horrendously bloated, and I couldn't find a nice way to quote strings as
in the example on
http://www.w3.org/TR/NOTE-gdiff-19970901
Kev
GDiff composer (just bytes, not longs etc), not in class, just hacking
at the script really
gdiff.rb
require 'lib/gdiff_copy'
require 'lib/gdiff_data'
require 'lib/gdiff_file'
diff_file = GDiff::GDiffFile.new("d:\\ruby_projects\\gdiff\\test")
diff_file.put_header
# get data into arrays
old = IO.read("d:\\ruby_projects\\gdiff\\gdiff.rb").scan(/./)
new = IO.read("d:\\ruby_projects\\gdiff\\gdiff_new.rb").scan(/./)
# dumb scan comparing single chars, should make it compare matching
sequences
pos =0
old.each do |oldb|
diff_file.put_cmd_and_data(GDiff::GDiffCopy.new(pos, 1)) if new[pos] == oldb
diff_file.put_cmd_and_data(GDiff::GDiffData.new(1,new[pos])) if new[pos]
!= oldb
pos +=1
end
diff_file.put_trailer
diff_file.write_diff
lib/gdiff_cmd.rb
module GDiff
class GDiffCmd
attr_accessor :cmd
attr_accessor :data
def initialize(cmd, data)
@cmd = cmd
@data = data
end
end
end
lib/gdiff_copy.rb
require 'lib/gdiff_cmd'
module GDiff
class GDiffCopy < GDiffCmd
def initialize(position, length)
@cmd = 249
@data = [0]
@data << position
@data << length
end
end
end
lib/gdiff_data.rb
require 'lib/gdiff_cmd'
module GDiff
class GDiffData < GDiffCmd
def initialize(cmd, data)
#if cmd < 1 or cmd > 248
# raise DataError
#end
@cmd = cmd
@data = data
end
end
end
lib/gdiff_file.rb
module GDiff
class GDiffFile < File
@@magic = 0xd1ffd1ff
@@version = 0x04
@@EOF = 0
def initialize(filename)
@filename = filename
@filedata = []
end
def put_header
put_data(@@magic)
put_data(@@version)
end
def put_trailer
put_data(@@EOF)
end
def copy_byte(start)
write_diff()
end
def put_data(data)
if data.respond_to?(length) then
if data.length==1 then
@filedata << data
else
data.each_byte do |b|
@filedata << data + ","
end
else
@filedata << data
end
end
def put_cmd(cmd)
@filedata << cmd.cmd << ","
end
def put_cmd_and_data(cmd)
put_cmd(cmd)
put_data(cmd.data)
end
def write_diff
p @filedata
File.open(@filename, "w") { |f|
f << @filedata.flatten
}
end
end
end
> Well I've got to dash off early today, but here's something I've hacked
> together (very rough)
>
Well, I was asking if it would be acceptable as a quiz. I wasn't
expecting any attempts at a solution yet! Thank you. I'll let
James decide how he'd like to fit this into the quizzes, be that in
parallel with the normal ones or otherwise.
Thank you again
Hugh
Hello Hugh,
I now you did not ask for an implementation right now, but with the
help of Zed Shaws wonderfull suffix-tree implementation I whipped up a
gdiff / gpatch release.
It can be found at
http://ruby.brian-schroeder.de/gdiff/
If there is interest in continuing this project I will register a
rubyforge project for it and pack it also as a gem. ATM the release is
packaged with setup.rb
The suffix tree seems to be licensed under gpl, I don't know if zed
would accept a ruby-licence release too. I will write him an email.
best regards,
Brian
--
http://ruby.brian-schroeder.de/
Stringed instrument chords: http://chordlist.brian-schroeder.de/
http://blog.davebalmain.com/pages/rdiff
Enjoy,
Dave
I think it's a good idea. Quizzes are great for fun and learning, but
that doesn't mean the results have to be thrown away.
The only thing to worry about is licensing. As long as the intentions
of the quiz are clear, and the participants agree to including the
code under a Ruby license (or whatever), we should be fine.
> Is the problem expressed below sufficiently challenging to be of
> interest to those who know they could tackle it? Does it make a
> good quiz? James said I should ask, so I am.
The day before you posted, I was thinking about starting a diff
library for Ruby, as the ones I found were pure ports and/or out of
date and undocumented (in English). I just started thinking about it,
so don't commit me to anything. :-)
I think the quiz would be on par with the other quizzes, and should be
considered for inclusion in the regular cycle. I would participate,
time allowing.
--
Rob
> I think it's a good idea. Quizzes are great for fun and learning, but
> that doesn't mean the results have to be thrown away.
>
> The only thing to worry about is licensing. As long as the intentions
> of the quiz are clear, and the participants agree to including the
> code under a Ruby license (or whatever), we should be fine.
I'm all for quizzes that help build Ruby technologies. I strongly
encourage library authors to farm some work out to us. Just try to
get it down to a small enough piece and announce that you intend to
steal any posted solutions for whatever purposes. I'll run them.
> I think the quiz would be on par with the other quizzes, and should be
> considered for inclusion in the regular cycle. I would participate,
> time allowing.
This is all my fault. I told Hugh I was concerned you guys wouldn't
work this problem. Clearly I was wrong.
What the opinion now that we've seen solutions? Go ahead and make it
official or consider it a lesson learned and hope James is more
intuitive next time?
James Edward Gray II
It pretty much already was a quiz, without the waiting period. If
Hugh already has what he needs, I'd say skip it. C'est la vie...
I just printed off Zed's paper and McIlroy's "An algorithm for
differential file comparison" for some fun weekend reading.
--
Rob
I think it is a good idea to use quizzes to advance certain libraries.
But I would like it also if someone could put up a website with these
kind of requests, and people like the gem people would write up small
"wished" projects or subprojects. Or maybe just use the rubyforge
"help wanted" feature for this. I do not have the capacities to join a
big project, but submitting something once in a while is a great
thing.
I agree with this. I think it would be a great idea to often and
openly ping the community, and if it fits a quiz, that's a great way
to do it. This [the quizes and the application of projects to the
quiz] sort of seems like a whole new take on open source software too
-- go to the community rather having them come to the project. I like
it!