[QUIZ] SerializableProc (#38)

Ruby Quiz

unread,

Jul 8, 2005, 3:17:49 PM7/8/05

to

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

I'm a Proc addict. I use them all over the place in my code. Because of that,
whenever I end up needing persistence and I call Marshal.dump() or YAML.dump()
on some object hierarchy, I get to watch everything explode (since Procs cannot
be serialized).

This week's Ruby Quiz is to build a Proc that can be serialized.

I'm not aware of any possible way to add serialization capabilities to Ruby's
core Proc, which rules out a complete solution. However, even if what we build
is a hack, at least one person finds it super useful.

The task then is to build SerializableProc. It should support being serialized
by Marshal, PStore, and YAML and otherwise behave as close to a Proc as
possible. Put another way, make the following code run for your creation:

require "pstore"
require "yaml"

code = # Build your SerializableProc here!

File.open("proc.marshalled", "w") { |file| Marshal.dump(code, file) }
code = File.open("proc.marshalled") { |file| Marshal.load(file) }

code.call

store = PStore.new("proc.pstore")
store.transaction do
store["proc"] = code
end
store.transaction do
code = store["proc"]
end

code.call

File.open("proc.yaml", "w") { |file| YAML.dump(code, file) }
code = File.open("proc.yaml") { |file| YAML.load(file) }

code.call

Robin Stocker

unread,

Jul 10, 2005, 3:25:36 PM7/10/05

to

Hi,

This is the second solution that I could finish in time. Well, it was
pretty easy.

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.
The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition
of custom dump methods for each Dumper so that it would not attempt to
dump @proc.

Here's my solution...
(Question: Is it better if I attach it or just paste it like this?)

class SerializableProc

def initialize( block )
@block = block
# Test if block is valid.
to_proc
end

def to_proc
# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end

def method_missing( *args )
to_proc.send( *args )
end

end

if $0 == __FILE__

require 'yaml'
require 'pstore'

code = SerializableProc.new %q{ |a,b| [b,a] }

# Marshal

File.open('proc.marshalled', 'w') { |file| Marshal.dump(code, file) }
code = File.open('proc.marshalled') { |file| Marshal.load(file) }

p code.call( 1, 2 )

# PStore

store = PStore.new('proc.pstore')
store.transaction do
store['proc'] = code
end
store.transaction do
code = store['proc']
end

p code.call( 1, 2 )

# YAML

File.open('proc.yaml', 'w') { |file| YAML.dump(code, file) }
code = File.open('proc.yaml') { |file| YAML.load(file) }

p code.call( 1, 2 )

p code.arity

end

Ryan Leavengood

unread,

Jul 10, 2005, 4:57:22 PM7/10/05

to

Florian Groß wrote:
> And mine's attached to this mail.
>
> I wrote this a while ago and it works by extracting a proc's origin file
> name and line number from its .inspect string and using the source code
> (which usually does not have to be read from disc) -- it works with
> procs generated in IRB, eval() calls and regular files. It does not work
> from ruby -e and stuff like "foo".instance_eval "lambda {}".source
> probably doesn't work either.
>
> Usage:
>
> code = lambda { puts "Hello World" }
> puts code.source
> Marshal.load(Marshal.dump(code)).call
> YAML.load(code.to_yaml).call

Interesting. I was considering taking this approach until I realized I'd
have to implement a partial Ruby parser, which is what I see you did.
Still, it is pretty cool, though obviously a bit hackish.

I wonder if YARV and Ruby byte-code will make it easier for procs to be
serialized? I'm not sure how the binding would work (hmmm, if it is just
objects maybe they could be serialized as normal), but the proc itself
could just be serialized as is if it is self-contained Ruby byte-code.

Does anyone know if this is how YARV will be? Because I'm just guessing
here.

Ryan

Dominik Bathon

unread,

Jul 10, 2005, 6:07:50 PM7/10/05

to

On Sun, 10 Jul 2005 21:25:36 +0200, Robin Stocker
<robin-list...@nibor.org> wrote:

> def to_proc
> # Raises exception if block isn't valid, e.g. SyntaxError.
> eval "Proc.new{ #{@block} }"
> end
>
> def method_missing( *args )
> to_proc.send( *args )
> end

Nice idea, to avoid storing the Proc object in an instance variable and so
being able to just use the default serializing. But I guess this is quite
slow ;-)

So, here is my solution. It should be almost as fast as normal procs, but
I had to implement custom serializing methods. I also implemented a custom
==, because that doesn't really work with method_missing/delegate.

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

attr_reader :proc_src

def initialize(proc_src)
super(eval("Proc.new { #{proc_src} }"))
@proc_src = proc_src
end

def ==(other)
@proc_src == other.proc_src rescue false
end

def inspect
"#<SProc: #{@proc_src.inspect}>"
end
alias :to_s :inspect

def marshal_dump
@proc_src
end

def marshal_load(proc_src)
initialize(proc_src)
end

def to_yaml(opts = {})
YAML::quick_emit(self.object_id, opts) { |out|
out.map("!rubyquiz.com,2005/SProc" ) { |map|
map.add("proc_src", @proc_src)
}
}
end

end

YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
SProc.new(val["proc_src"])
}

if $0 == __FILE__
require "pstore"

code = SProc.new %q{ |*args|
puts "Hello world"
print "Args: "
p args
}

orig = code

code.call(1)

File.open("proc.marshalled", "w") { |file| Marshal.dump(code, file) }
code = File.open("proc.marshalled") { |file| Marshal.load(file) }

code.call(2)

store = PStore.new("proc.pstore")
store.transaction do
store["proc"] = code
end
store.transaction do
code = store["proc"]
end

code.call(3)

File.open("proc.yaml", "w") { |file| YAML.dump(code, file) }
code = File.open("proc.yaml") { |file| YAML.load(file) }

code.call(4)

p orig == code
end

Christian Neukirchen

unread,

Jul 11, 2005, 8:21:14 AM7/11/05

to

Robin Stocker <robin-list...@nibor.org> writes:

> Hi,
>
> This is the second solution that I could finish in time. Well, it was
> pretty easy.
>
> I imagine my solution is not very fast, as each time a method on the
> SerializableProc is called, a new Proc object is created.
> The object could be saved in an instance variable @proc so that speed
> is only low on the first execution. But that would require the
> definition of custom dump methods for each Dumper so that it would not
> attempt to dump @proc.
>
> Here's my solution...

My code is very similar, but only eval()s once:

require 'delegate'

class SerializableProc < DelegateClass(Proc)
attr_reader :__code

def initialize(code)
@__code = code.lstrip
super eval("lambda { #@__code }")
end

def marshal_dump; @__code; end
def marshal_load(code); initialize code; end

def to_yaml
Object.instance_method(:to_yaml).bind(self).call
end

def to_yaml_properties; ["@__code"]; end
def to_yaml_type; "!ruby/serializableproc"; end
end

# .oO(Is there no easier way to do this?)
YAML.add_ruby_type( /^serializableproc/ ) { |type, val|
type, obj_class = YAML.read_type_class( type, SerializableProc )
o = YAML.object_maker( obj_class, val )
o.marshal_load o.__code
}

Usage:

code = SerializableProc.new %{
puts "this is serialized!"
p binding; p caller
}

Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :P), better
results can be reached with flgr's hack to look for the source on disk
or by using nodewrap to serialize the AST. See
http://rubystuff.org/nodewrap/ for details.

That was a nice quiz.

--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org

James Edward Gray II

unread,

Jul 11, 2005, 8:57:49 AM7/11/05

to

On Jul 10, 2005, at 2:25 PM, Robin Stocker wrote:

> Here's my solution...

Here's what I came up with while building the quiz:

class SerializableProc
def self._load( proc_string )
new(proc_string)
end

def initialize( proc_string )
@code = proc_string
@proc = nil
end

def _dump( depth )
@code
end

def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end
end

def to_proc( )
return @proc unless @proc.nil?

if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)
\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end
end

def to_yaml( )
@proc = nil
super
end
end

> (Question: Is it better if I attach it or just paste it like this?)

It doesn't much matter, but I favor inlining it when it's a single file.

James Edward Gray II

Dave Burt

unread,

Jul 12, 2005, 9:34:13 AM7/12/05

to

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43

File.open("proc.marshalled", "w") { |file| Marshal.dump(code, file) }

# New context, e.g. new file:

code = File.open("proc.marshalled") { |file| Marshal.load(file) }

code.call #=> 44
local_var #=> NameError - undefined here

AFAICT, the only one is Christian Neukirchen's Nodewrap suggestion, which
looks very cool. From <http://rubystuff.org/nodewrap/>:

Sample code
This will dump the class Foo (including its instance methods, class
variables, etc.) and re-load it as an anonymous class:
class Foo
def foo; puts "this is a test..."; end
end

s = Marshal.dump(Foo)
p Marshal.load(s) #=> #<Class 0lx4027be20>

Here's another, trickier test for SerializableProcs. Can multiple Procs
sharing context, as returned by the following method, be made to behave
consistently across serialization? If the Procs are serialized
independently, I believe this is impossible - an inherent problem with the
idea of serializing Procs (or anything with shared context).
def two_procs
x = 1
[proc { x }, proc { x += 1 }]
end

p1, p2 = two_procs
[p1.call, p2.call, p1.call, p2.call] #=> [1, 2, 2, 3]
q1, q2 = Marshal.load(Marshal.dump(p1)), Marshal.load(Marshal.dump(p2))
[q1.call, q2.call, q1.call, q2.call] #=> [3, 4, 4, 5]
# I expect Nodewrap can get [3, 4, 3, 5] for this last result.

Dave

Ryan Davis

unread,

Jul 14, 2005, 4:13:07 AM7/14/05

to

Granted, we cheated, quite a bit at that, but I think the solution we
came up with is pretty:

require 'r2c_hacks'

class ProcStore # We have to have this because yaml calls allocate on
Proc
def initialize(&proc)
@p = proc.to_ruby
end

def call(*args)
eval(@p).call(*args)
end
end

code = ProcStore.new { |x| return x+1 }
=> #<ProcStore:0x3db25c @p="proc do |x|\n return (x + 1)\nend">

The latest release of ZenHacks added Proc.to_ruby among other things.
Granted, it doesn't preserve the actual closure, just the code, but
it looks like that is a limitation of the other solutions as well, so
we aren't crying too much.

Our original solution just patched Proc and added _load/_store on it,
but it choked on the YAML serialization side of things. Not entirely
sure why, and we were too tired to care at the time.

To see what we do to implement Proc.to_ruby:

class Proc
ProcStoreTmp = Class.new unless defined? ProcStoreTmp
def to_ruby
ProcStoreTmp.send(:define_method, :myproc, self)
m = ProcStoreTmp.new.method(:myproc)
result = m.to_ruby.sub!(/def myproc$([^$]+)\)/, 'proc do |\1|')
return result
end
end

--
ryand...@zenspider.com - Seattle.rb - http://www.zenspider.com/
seattle.rb
http://blog.zenspider.com/ - http://rubyforge.org/projects/ruby2c

Ruby Quiz

unread,

Jul 14, 2005, 8:51:25 AM7/14/05

to

The solutions this time show some interesting differences in approach, so I want
to walk through a handful of them below. The very first solution was from Robin
Stocker and that's a fine place to start. Here's the class:

class SerializableProc

def initialize( block )
@block = block
# Test if block is valid.
to_proc
end

def to_proc

# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end

def method_missing( *args )
to_proc.send( *args )
end

end

It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source is
just a String so we can serialize that with ease and we can always create a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:

Proc.new {
puts "Hello world!"
}

To this:

SerializableProc.new %q{
puts "Hello world!"
}

In the first pure Ruby version we're building a Proc with the block of code to
define the body. In the second SerializableProc version, we're just passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:

Obvious problems of this approach are the lack of closures and editor

support (depending on the inverse quality of your editor :P)...

We'll get back to the lack of closures issue later, but I found the "inverse
quality of your editor" claim interesting. The meaning is that a poor editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a String is
being entered, it may continue to syntax highlight the code inside. Of course,
you could always remove the %q whenever you want to see the code highlighting,
but that's tedious.

Getting back to Robin's class, initialize() just stores the String and creates a
Proc from it so an Exception will be thrown at construction time if fed invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing makes
SerializableProc behave close to a Proc. Anytime it sees a method call that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.

We don't see anything specific to Serialization in Robin's code, because both
Marshal (PStore uses Marshal) and YAML can handle a custom class with String
instance data. Like magic, it all just works.

Robin had a complaint though:

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.

The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.

My own solution (and others), do cache the Proc and define some custom dump
methods. Let's have a look at how something like that comes out:

My initialize() is the same, save that I create a variable to hold the Proc
object and I wasn't clever enough to trigger the early Exception when the code
is bad. My to_proc() looks scary but I just try to accept a wider range of
Strings, wrapping them in only what they need. The end result is the same.
Note that any Proc created is cached. My method_missing() is also very similar.
If the Proc object responds to the method, it is forwarded. The first line of
method_missing() calls to_proc() to ensure we've created one. After that, it
can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to support
Marshal. First, _dump() is expected to return a String that could be used to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in this
case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected by the
Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor and
never has to recreate it:

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

attr_reader :proc_src

def marshal_dump
@proc_src
end

def marshal_load(proc_src)
initialize(proc_src)
end

end

Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is defined to
inherit from DelegateClass(Proc), which sets a type for the object so delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass. That's the
instance that will receive forwarded messages. Dominik also defined a custom
==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the same
thing I did, as you can see. The YAML support is different. SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in, whenever
it is encountered. Here's what the class looks like when it's resting in a YAML
file:

!rubyquiz.com,2005/SProc
proc_src: |2-

|*args|
puts "Hello world"
print "Args: "
p args

The advantage here is that the YAML export procedure never touches the Proc so
it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely different
road to solving the problem. Time and space don't allow me to recreate and
annotate the code here, but Florian described the premise well in the submission
message:

I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.

Usage:

code = lambda { puts "Hello World" }
puts code.source
Marshal.load(Marshal.dump(code)).call
YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that
source, and even implements a partial Ruby parser with standard libraries. I'm
telling you, that code reads like a good mystery novel for programmers. Don't
miss it!

One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43

An excellent point. These toys we're creating have serious limitations to be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly serialized.

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...

why the lucky stiff

unread,

Jul 14, 2005, 11:33:10 AM7/14/05

to

Ruby Quiz wrote:

>My thanks to all who committed such wonderful code and discussion to this week's
>quiz. I know I learned multiple new things and I hope others did too.
>
>

Good stuff, JEGII, Robin, Chris2, Dave.

I can also really sympathize with Chris' disgust over the
YAML.add_ruby_type methods... It is undergoing deprecation in favor of:

class SerializableProc
yaml_type "tag:rubyquiz.org,2005:SerializableProc"
end

_why

Christian Neukirchen

unread,

Jul 14, 2005, 11:53:45 AM7/14/05

to

And then #yaml_dump and #yaml_load? That would rule.

> _why

why the lucky stiff

unread,

Jul 14, 2005, 11:57:47 AM7/14/05

to

Christian Neukirchen wrote:

> And then #yaml_dump and #yaml_load? That would rule.

Class.yaml_new or Object.yaml_initialize. And Object.to_yaml.

If folks prefer the Marshal setup, though, I'll change it. It's only
been like this for a handful of minor releases.

_why

Jeffrey Moss

unread,

Jul 14, 2005, 12:34:06 PM7/14/05

to

Has anybody thought about serialized enclosures? I was thinking of a way to
use enclosures across multiple apache requests, and came to the conclusion
that it was too much trouble. In this case I just use a standard proc object
and it gets re-initialized on each requests and don't serialize it, but I
always thought it would be nice to maintain some sort of persistent state
across requests.

Wouldn't it be possible to write a C extension for serializable closures?

-Jeff

Florian Groß

unread,

Jul 14, 2005, 12:58:25 PM7/14/05

to

Jeffrey Moss wrote:

> Wouldn't it be possible to write a C extension for serializable closures?

I think NodeWrap does this. See http://rubystuff.org/nodewrap/

It's pretty cool stuff.

Christian Neukirchen

unread,

Jul 14, 2005, 1:54:14 PM7/14/05

to

why the lucky stiff <ruby...@whytheluckystiff.net> writes:

Very good too, I'm looking forward to that.

Does this get into 1.8.3 (if that version will ever appear)?