deep cloning, how?

Rajinder Yadav

unread,

Oct 14, 2009, 2:05:41 AM10/14/09

to

I am trying to figure out how to perform a deep clone

class A
attr_accessor :name
end

a1 = A.new
a1.name = "yoyoma"
a2 = a1.dup
a1.name.chop!
puts a2.name

I found the following way to write a deep clone method

class A
attr_accessor :name
def dup
Marshal::load(Marshal.dump(self))
end
end

If I wanted to write my own specialize deep_cloner, how would I do this. If I
try the obvious way to do it,

def dup
@name = self.name.dup
end

I get an error when 'puts a2.name' is executed saying:

NoMethodError: undefined method `name' for "yoyom":String

Can someone explain what's going on when dup is called. What gets passes to dup
(i assume self) and why my code is wrong?

--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely

Brian Candler

unread,

Oct 14, 2009, 3:54:35 AM10/14/09

to

Rajinder Yadav wrote:
> Can someone explain what's going on when dup is called. What gets passes
> to dup
> (i assume self) and why my code is wrong?

dup is a method of your existing object, and should return the new
object instance.

class A
attr_accessor :name
def dup

res = self.class.new
res.name = name.dup
res
end
end

a1 = A.new
a1.name = "yoyoma"
a2 = a1.dup
a1.name.chop!
puts a2.name

There is a subtle distinction between 'dup' and 'clone' which I'll leave
someone else to explain...
--
Posted via http://www.ruby-forum.com/.

Robert Klemme

unread,

Oct 14, 2009, 6:57:55 AM10/14/09

to

2009/10/14 Rajinder Yadav <devg...@gmail.com>:

> I am trying to figure out how to perform a deep clone
>
>
> class A
> attr_accessor :name
> end
>
> a1 = A.new
> a1.name = "yoyoma"
> a2 = a1.dup
> a1.name.chop!
> puts a2.name
>
>
> I found the following way to write a deep clone method
>
> class A
> attr_accessor :name
> def dup
> Marshal::load(Marshal.dump(self))
> end
> end
>
> If I wanted to write my own specialize deep_cloner, how would I do this. If
> I try the obvious way to do it,
>
> def dup
> @name = self.name.dup
> end

No, this is by far not the obvious way since you would at least have
to make sure there is a copy of self and this is returned from dup. I
would rather do

def dup
copy = super
copy.name = @name.dup
copy
end

or even

def dup
self.class.new(@name.dup)
end

Although that approach is fragile depending on the code in #initialize.

> I get an error when 'puts a2.name' is executed saying:
>
> NoMethodError: undefined method `name' for "yoyom":String
>
>
> Can someone explain what's going on when dup is called. What gets passes to
> dup (i assume self) and why my code is wrong?

Hope the above sheds some light.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Caleb Clausen

unread,

Oct 14, 2009, 1:03:23 PM10/14/09

to

The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead. I'm not
sure why this should be or why overriding dup would be bad. But
anyway, I would suggest this:

def deep_clone
copy=clone
copy.name=@name.clone
copy
end

just so you can keep the existing semantics of clone as a shallow copy.

I'm really not sure why there are 2 methods to create shallow copies
in ruby and what all the differences are supposed to be. Other than
not overriding dup(?), the only other difference between them that I
can discover is that clone copies the metaclass of the object, whereas
dup reverts the copy's metaclass to being just its class. I've been
wondering about the difference between the 2 recently; I hope someone
out there can provide some enlightenment on why there are 2 and what
the differences are.

Robert Klemme

unread,

Oct 14, 2009, 2:27:46 PM10/14/09

to

On 14.10.2009 19:03, Caleb Clausen wrote:

> The documentation of Object#dup seems to suggest that subclasses
> should not override dup, preferring to override clone instead.

Where do you take that from? In the docs referenced below I cannot find
anything like that. The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.

> I'm not
> sure why this should be or why overriding dup would be bad. But
> anyway, I would suggest this:
>
> def deep_clone
> copy=clone
> copy.name=@name.clone
> copy
> end
>
> just so you can keep the existing semantics of clone as a shallow copy.
>
> I'm really not sure why there are 2 methods to create shallow copies
> in ruby and what all the differences are supposed to be. Other than
> not overriding dup(?), the only other difference between them that I
> can discover is that clone copies the metaclass of the object, whereas
> dup reverts the copy's metaclass to being just its class. I've been
> wondering about the difference between the 2 recently; I hope someone
> out there can provide some enlightenment on why there are 2 and what
> the differences are.

There are more differences namely in the area of frozen and tainted state.

http://www.ruby-doc.org/core/classes/Object.html#M000351
http://www.ruby-doc.org/core/classes/Object.html#M000352

Paul Smith

unread,

Oct 14, 2009, 4:16:37 PM10/14/09

to

Rajinder Yadav

unread,

Oct 14, 2009, 9:06:25 PM10/14/09

to

Brian Candler wrote:
> Rajinder Yadav wrote:
>> Can someone explain what's going on when dup is called. What gets passes
>> to dup
>> (i assume self) and why my code is wrong?
>
> dup is a method of your existing object, and should return the new
> object instance.
>
> class A
> attr_accessor :name
> def dup
> res = self.class.new
> res.name = name.dup
> res
> end
> end

Brain, I am starting to see where I went wrong, this clears it up, thanks!

>
> a1 = A.new
> a1.name = "yoyoma"
> a2 = a1.dup
> a1.name.chop!
> puts a2.name
>
> There is a subtle distinction between 'dup' and 'clone' which I'll leave
> someone else to explain...

--

Rajinder Yadav

unread,

Oct 14, 2009, 9:12:04 PM10/14/09

to

cool, do a shallow clone and then do a specialized deep cloning, i like this! I
did not get as far as you did about dup and clone and not to redefine dup, this
is news to me but something worth looking into.

> just so you can keep the existing semantics of clone as a shallow copy.
>
> I'm really not sure why there are 2 methods to create shallow copies
> in ruby and what all the differences are supposed to be. Other than
> not overriding dup(?), the only other difference between them that I
> can discover is that clone copies the metaclass of the object, whereas
> dup reverts the copy's metaclass to being just its class. I've been
> wondering about the difference between the 2 recently; I hope someone
> out there can provide some enlightenment on why there are 2 and what
> the differences are.
>
>

Rajinder Yadav

unread,

Oct 14, 2009, 9:39:54 PM10/14/09

to

Thanks for the links and solutions Robert. This one got some good replies.

> Kind regards
>
> robert

Caleb Clausen

unread,

Oct 15, 2009, 2:07:19 PM10/15/09

to

On 10/14/09, Robert Klemme <short...@googlemail.com> wrote:

> On 14.10.2009 19:03, Caleb Clausen wrote:
>
>> The documentation of Object#dup seems to suggest that subclasses
>> should not override dup, preferring to override clone instead.
>
> Where do you take that from? In the docs referenced below I cannot find
> anything like that. The only indication I can see is that #dup uses
> #initialize_copy and we should probably override that instead of #dup
> itself.

I'm looking at these 2 sentences:

In general, +clone+ and +dup+ may have different
semantics in descendent classes. While +clone+ is used to duplicate
an object, including its internal state, +dup+ typically uses the
class of the descendent object to create the new instance.

Frankly, I've never been real sure what this is supposed to mean, so
my reading may well be wrong. In fact, it probably is.

initialize_copy apparently is used by both dup and clone. You're
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

> There are more differences namely in the area of frozen and tainted state.
>
> http://www.ruby-doc.org/core/classes/Object.html#M000351
> http://www.ruby-doc.org/core/classes/Object.html#M000352

Ah, yes. But only frozen state, not tainted.

Robert Klemme

unread,

Oct 15, 2009, 4:23:22 PM10/15/09

to

On 10/15/2009 08:07 PM, Caleb Clausen wrote:
> On 10/14/09, Robert Klemme <short...@googlemail.com> wrote:
>> On 14.10.2009 19:03, Caleb Clausen wrote:
>>
>>> The documentation of Object#dup seems to suggest that subclasses
>>> should not override dup, preferring to override clone instead.
>> Where do you take that from? In the docs referenced below I cannot find
>> anything like that. The only indication I can see is that #dup uses
>> #initialize_copy and we should probably override that instead of #dup
>> itself.
>
> I'm looking at these 2 sentences:
>
> In general, +clone+ and +dup+ may have different
> semantics in descendent classes. While +clone+ is used to duplicate
> an object, including its internal state, +dup+ typically uses the
> class of the descendent object to create the new instance.
>
> Frankly, I've never been real sure what this is supposed to mean, so
> my reading may well be wrong. In fact, it probably is.
>
> initialize_copy apparently is used by both dup and clone. You're
> right, that should be defined (overridden?) instead of dup/clone
> themselves. I rarely remember that.

Mee, too. :-) Just for the reference

irb(main):019:0> class X
irb(main):020:1> def initialize_copy(*a) p [self,a] end
irb(main):021:1> end
=> nil
irb(main):022:0> x=X.new
=> #<X:0x8670108>
irb(main):023:0> x.dup
[#<X:0x865d9cc>, [#<X:0x8670108>]]
=> #<X:0x865d9cc>
irb(main):024:0> x.clone
[#<X:0x864ddb0>, [#<X:0x8670108>]]
=> #<X:0x864ddb0>

And it's noteworthy that frozen state is only established _after_
initialize_copy has returned:

irb(main):025:0> class X
irb(main):026:1> def initialize_copy(old)
irb(main):027:2> @x=1
irb(main):028:2> end
irb(main):029:1> end
=> nil
irb(main):030:0> X.new.freeze.clone
=> #<X:0x86451b0 @x=1>

>> There are more differences namely in the area of frozen and tainted state.
>>
>> http://www.ruby-doc.org/core/classes/Object.html#M000351
>> http://www.ruby-doc.org/core/classes/Object.html#M000352
>
> Ah, yes. But only frozen state, not tainted.

Right you are.

Rajinder Yadav

unread,

Oct 15, 2009, 11:51:37 PM10/15/09

to

Celeb, Robert thanks for bringing this point home. I keep having to update my
Ruby notes =) .... let me try the initialize_copy way!

> irb(main):025:0> class X
> irb(main):026:1> def initialize_copy(old)
> irb(main):027:2> @x=1
> irb(main):028:2> end
> irb(main):029:1> end
> => nil
> irb(main):030:0> X.new.freeze.clone
> => #<X:0x86451b0 @x=1>
>
>>> There are more differences namely in the area of frozen and tainted
>>> state.
>>>
>>> http://www.ruby-doc.org/core/classes/Object.html#M000351
>>> http://www.ruby-doc.org/core/classes/Object.html#M000352
>>
>> Ah, yes. But only frozen state, not tainted.
>
> Right you are.
>
> Kind regards
>
> robert
>

--

Brian Candler

unread,

Oct 16, 2009, 4:03:38 AM10/16/09

to

Caleb Clausen wrote:0

> In general, +clone+ and +dup+ may have different
> semantics in descendent classes. While +clone+ is used to duplicate
> an object, including its internal state, +dup+ typically uses the
> class of the descendent object to create the new instance.
>
> Frankly, I've never been real sure what this is supposed to mean

I *think* what it means is: clone just copies all the instance
variables, whilst dup calls self.class.new().

It's quite common for initialize() to have all sorts of side effects,
creating new objects and so on. So you can expect dup to do all this,
whilst you can expect clone to create an identical object with all the
instance variables pointing at the same objects.

Nothing enforces that of course, so it's just a convention.

The only *real* differences I can see are:
- clone also copies the frozen state of the object
- clone makes a copy of the singleton class

(whereas in dup, by default the newly-created object has an empty
singleton class; it's assumed that if there are any methods to be added
to that, your own dup method will do that for you, possibly with the
assistance of your initialize method)

> initialize_copy apparently is used by both dup and clone. You're
> right, that should be defined (overridden?) instead of dup/clone
> themselves. I rarely remember that.

I'm not sure I agree with that. The *default* implementation of both dup
and clone does this, as it's the only reasonable thing for Object to do
without any knowledge of its subclasses. But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Since I never use clone, it's a moot point for me as to what it should
do in a subclass.

Regards,

Brian.

lith

unread,

Oct 16, 2009, 7:11:49 AM10/16/09

to

> But I think the spirit of dup
> described above is that dup defined in a subclass should initialize it
> using its constructor.

I'd understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).
Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don't quite understand why
there are two methods.

Robert Klemme

unread,

Oct 16, 2009, 7:25:44 AM10/16/09

to

2009/10/16 lith <mini...@gmail.com>:

>> But I think the spirit of dup
>> described above is that dup defined in a subclass should initialize it
>> using its constructor.

Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.

> I'd understand the description in such a way that user should
> override
> neither #dup not #clone but instead create a #initialize_copy method
> to
> implement anything class-specific (including a non-shallow copy).

Also for shallow copy in order to avoid aliasing! IMHO a proper setup
looks like this:

class A
attr_reader :x
attr_accessor :y

def initialize
@x = []
@y = 10
end

def initialize_copy(source)
super
# p self
@x = source.x.dup
end
end

class B < A
attr_accessor :z

def initialize
super()
@z = {}
end

def initialize_copy(source)
super
@z = source.z.dup
end
end

Note that the copy is initialized with the same set of references when
entering #initialize_copy so you need only deal with members that
could cause aliasing issues (unfrozen strings and collections for
example).

> Since
> that method is called by #clone and #dup and the frozen/tainted state
> could be easily reset, I personally still don't quite understand why
> there are two methods.

You cannot reset frozen state - for good reasons.

Caleb Clausen

unread,

Oct 16, 2009, 12:30:19 PM10/16/09

to

On 10/16/09, Brian Candler <b.ca...@pobox.com> wrote:
> Caleb Clausen wrote:0
>> In general, +clone+ and +dup+ may have different
>> semantics in descendent classes. While +clone+ is used to duplicate
>> an object, including its internal state, +dup+ typically uses the
>> class of the descendent object to create the new instance.
>>
>> Frankly, I've never been real sure what this is supposed to mean
>
> I *think* what it means is: clone just copies all the instance
> variables, whilst dup calls self.class.new().
>
> It's quite common for initialize() to have all sorts of side effects,
> creating new objects and so on. So you can expect dup to do all this,
> whilst you can expect clone to create an identical object with all the
> instance variables pointing at the same objects.

Object#dup does not call new; I think it's more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

My reading of those 2 sentences I quoted has now changed. Now I
believe that all it's saying is that clone copies the singleton class
whereas dup reverts the copy to the object's original class. Tho I
still don't fully understand what 'internal state' is supposed to
mean. Are instance variables not part of the internal state? Yet both
dup and clone copy them.

>> initialize_copy apparently is used by both dup and clone. You're
>> right, that should be defined (overridden?) instead of dup/clone
>> themselves. I rarely remember that.
>
> I'm not sure I agree with that. The *default* implementation of both dup
> and clone does this, as it's the only reasonable thing for Object to do
> without any knowledge of its subclasses. But I think the spirit of dup
> described above is that dup defined in a subclass should initialize it
> using its constructor.
>
> Since I never use clone, it's a moot point for me as to what it should
> do in a subclass.

I never used to use clone either, til I discovered a case where I
needed to copy the singleton class. Now I'm of the opinion that one
should default to clone when a copy is needed, and fall back to dup
only when clone is unsuitable.

Rajinder Yadav

unread,

Oct 16, 2009, 3:36:46 PM10/16/09

to

This is exactly the approach I am now following after the various discussions
and insights. I just leave dup and clone to keep their *default* behavior
intact, so they both end up calling initialize_copy and you don't get some
bizarre Frankenstein clone if you were to redefine dup or clone. I am thinking
about someone else using my code and what will cause less headache for them in
the end.

Rajinder Yadav

unread,

Oct 16, 2009, 3:45:22 PM10/16/09

to

Robert Klemme wrote:
> 2009/10/16 lith <mini...@gmail.com>:
>>> But I think the spirit of dup
>>> described above is that dup defined in a subclass should initialize it
>>> using its constructor.
>
> Brian, I disagree. The proper way is to implement #initialize_copy.
> That way you can make sure you do not get aliasing effects even if
> source and copy are frozen because in #initialize_copy frozen state is
> not applied.
>
>> I'd understand the description in such a way that user should
>> override
>> neither #dup not #clone but instead create a #initialize_copy method
>> to
>> implement anything class-specific (including a non-shallow copy).
>
> Also for shallow copy in order to avoid aliasing! IMHO a proper setup
> looks like this:
>

Robert, I like this setup, thanks for the sample code to look over, just
discovered why adding 'super' is important, which was missing from my notes and
an oversight on my part.

It is sufficient to call 'super' and not 'super source'? if you are passing
stuff up the hierarchy construction chain.

I am going to conjecture 'super' ends up becoming 'super self', which make sense
because the parent constructor don't care about sub class data members. Does
that make any sense to you?

--

Robert Klemme

unread,

Oct 16, 2009, 4:02:40 PM10/16/09

to

On 10/16/2009 09:45 PM, Rajinder Yadav wrote:
> Robert Klemme wrote:
>> 2009/10/16 lith <mini...@gmail.com>:
>>>> But I think the spirit of dup
>>>> described above is that dup defined in a subclass should initialize it
>>>> using its constructor.
>> Brian, I disagree. The proper way is to implement #initialize_copy.
>> That way you can make sure you do not get aliasing effects even if
>> source and copy are frozen because in #initialize_copy frozen state is
>> not applied.
>>
>>> I'd understand the description in such a way that user should
>>> override
>>> neither #dup not #clone but instead create a #initialize_copy method
>>> to
>>> implement anything class-specific (including a non-shallow copy).
>> Also for shallow copy in order to avoid aliasing! IMHO a proper setup
>> looks like this:
>
> Robert, I like this setup, thanks for the sample code to look over, just
> discovered why adding 'super' is important, which was missing from my notes and
> an oversight on my part.
>
> It is sufficient to call 'super' and not 'super source'? if you are passing
> stuff up the hierarchy construction chain.

You seem to be mixing two things: super in #initialize and
#initialize_copy. In #initialize_copy you can simply write "super"
(without brackets) because that will make sure the argument list is
propagated. You can do this because #initialize_copy will always only
have one argument, the object that was duped / cloned.

In the constructor I explicitly wrote "super()" because the super class
#initialize does not have arguments and "super" will break as soon as
you add parameters to the sub class constructor. Of course, if you
change both classes in parallel you can stick with "super".

> I am going to conjecture 'super' ends up becoming 'super self', which make sense
> because the parent constructor don't care about sub class data members. Does
> that make any sense to you?

No. Neither for #initialize nor for #initialize_copy you want self as
argument to super.

Rick DeNatale

unread,

Oct 17, 2009, 10:19:00 AM10/17/09

to

On Fri, Oct 16, 2009 at 12:30 PM, Caleb Clausen <vik...@gmail.com> wrote:

> Object#dup does not call new; I think it's more like:
> self.class.allocate.initialize_copy(self). See what happens here:
>
> irb(main):001:0> class K
> irb(main):002:1> def initialize
> irb(main):003:2> p :initialize
> irb(main):004:2> end
> irb(main):005:1> end
> => nil
> irb(main):006:0> k=K.new
> :initialize
> => #<K:0xb7ce8ee0>
> irb(main):008:0> k2=k.dup
> => #<K:0xb7ce0f38>

And clone doesn't call initialize EITHER:

class A
def initialize(iv)
@iv = iv
puts "initialize called"
end

def initialize_copy(arg)
puts "initialize copy called, my iv is #{@iv}"

end
end

puts "Creating original"
a = A.new(42)
puts "calling dup"
a1 = a.dup
puts "calling clone"
a2 = a.clone

outputs

Creating original
initialize called
calling dup
initialize copy called, my iv is 42
calling clone
initialize copy called, my iv is 42

It you look at the source code in object.c It becomes apparent that
Object#dup and Object#clone do pretty much the same thing except for
propagating the frozen bit and singleton classes:

VALUE
rb_obj_clone(obj)
VALUE obj;
{
VALUE clone;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't clone %s", rb_obj_classname(obj));
}
clone = rb_obj_alloc(rb_obj_class(obj));
RBASIC(clone)->klass = rb_singleton_class_clone(obj);
RBASIC(clone)->flags = (RBASIC(obj)->flags | FL_TEST(clone,
FL_TAINT)) & ~(FL_FREEZE|FL_FINALIZE);
init_copy(clone, obj);
RBASIC(clone)->flags |= RBASIC(obj)->flags & FL_FREEZE;

return clone;
}

VALUE
rb_obj_dup(obj)
VALUE obj;
{
VALUE dup;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't dup %s", rb_obj_classname(obj));
}
dup = rb_obj_alloc(rb_obj_class(obj));
init_copy(dup, obj);

return dup;
}
static void
init_copy(dest, obj)
VALUE dest, obj;
{
if (OBJ_FROZEN(dest)) {
rb_raise(rb_eTypeError, "[bug] frozen object (%s) allocated",
rb_obj_classname(dest));
}
RBASIC(dest)->flags &= ~(T_MASK|FL_EXIVAR);
RBASIC(dest)->flags |= RBASIC(obj)->flags & (T_MASK|FL_EXIVAR|FL_TAINT);
if (FL_TEST(obj, FL_EXIVAR)) {
rb_copy_generic_ivar(dest, obj);
}
rb_gc_copy_finalizer(dest, obj);
switch (TYPE(obj)) {
case T_OBJECT:
case T_CLASS:
case T_MODULE:
if (ROBJECT(dest)->iv_tbl) {
st_free_table(ROBJECT(dest)->iv_tbl);
ROBJECT(dest)->iv_tbl = 0;
}
if (ROBJECT(obj)->iv_tbl) {
ROBJECT(dest)->iv_tbl = st_copy(ROBJECT(obj)->iv_tbl);
}
}
rb_funcall(dest, id_init_copy, 1, obj);
}

This code is from 1.8.6 just cuz that's what I happened to grab.

In both cases the same subroutine is used to create the state of the
new object prior to calling intialize_copy and that subroutine
basically allocates the new object, copies instance variables "under
the table" and then invokes initialize_copy, no initialize method is
ever called on the result object.

Which makes me thing that the whole "+dup+ typically uses the class
of the descendent object to create the new instance" is meaningless,
or untrue. Probably this is a vestige of an older implementation.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Rick DeNatale

unread,

Oct 17, 2009, 10:35:50 AM10/17/09

to

And after another experiment or two, it would appear that the depth of
the copy produced by either dup or clone is the same and depends
entirely on what intialize_copy does.

Robert Klemme

unread,

Oct 17, 2009, 11:10:04 AM10/17/09

to

That's what I'd guess, too. Basically the documentation should state
for #dup and #clone something like this: "It is not normally necessary
to override this method in subclasses. Customization of copying is done
via method #initialize_copy."

Rajinder Yadav

unread,

Oct 17, 2009, 5:05:07 PM10/17/09

to

Rick DeNatale wrote:
> On Fri, Oct 16, 2009 at 12:30 PM, Caleb Clausen <vik...@gmail.com> wrote:
>
>> Object#dup does not call new; I think it's more like:
>> self.class.allocate.initialize_copy(self). See what happens here:
>>
>> irb(main):001:0> class K
>> irb(main):002:1> def initialize
>> irb(main):003:2> p :initialize
>> irb(main):004:2> end
>> irb(main):005:1> end
>> => nil
>> irb(main):006:0> k=K.new
>> :initialize
>> => #<K:0xb7ce8ee0>
>> irb(main):008:0> k2=k.dup
>> => #<K:0xb7ce0f38>
>
> And clone doesn't call initialize EITHER:

I had made this assumption, otherwise it would have made more sense to
overload initialize to accept the source object that gets passed to
initialize_copy ... the code would be ugly as you would need to do type checking
at runtime ( if iv.class = A using your sample ) to execute the correct code.

my c++ and copy constructor concept got in the way earlier, ruby doesn't quite
do what a c++ developer would expect, the initialize_copy is a cleaner way to do
this lacking static type checking

thanks for the sample code to validate this point

--

Brian Candler

unread,

Oct 19, 2009, 4:32:29 AM10/19/09

to

Robert Klemme wrote:
> 2009/10/16 lith <mini...@gmail.com>:
>>> But I think the spirit of dup
>>> described above is that dup defined in a subclass should initialize it
>>> using its constructor.
>
> Brian, I disagree. The proper way is to implement #initialize_copy.
> That way you can make sure you do not get aliasing effects even if
> source and copy are frozen because in #initialize_copy frozen state is
> not applied.

I don't understand what you mean by that. If #dup calls self.class.new
then you obviously get a new and hence unfrozen object.

It is certainly true that the *default* implementation of both #dup and
#clone (defined in Object) calls initialize_copy. A generic #dup must
behave this way; it doesn't know what the new() method arguments are in
any particular subclass of Object. I don't think this should be taken as
necessarily implying that you are expected to leave #dup alone in your
own classes, and only override #initialize_copy instead.

The way I read the documentation implies to me that #dup in user defined
classes *should* call new. Silly example:

class NewsReader
def initialize(url, state_filename)
@url = url
@http_client = HTTPClient.new(@url)
@state_filename = state_filename
@state_file = File.open(@state_filename)
end
def dup
self.class.new(@url, @state_filename.dup)
end
end

Here the logic of how to build a NewsReader, including building all the
associated helper objects, is built into the #initialize method. I don't
think you would want to duplicate all this logic in #initialize_copy.
Furthermore, I think I would expect #clone only to copy the top object,
and leave all the instance variables aliased.

Obviously there are no hard-and-fast rules here, and with Ruby there are
many ways to achieve the same goal.

I'd certainly agree this is an area where Ruby's documentation falls
short.

Taking another example: I don't think you'll disagree that 99% of the
time you are expected to leave Object.new alone and instead define
#initialize in your own classes. But you wouldn't find that out from the
documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()
------------------------------------------------------------------------
Not documented

$ ri Object#initialize
Nothing known about Object#initialize

Robert Klemme

unread,

Oct 19, 2009, 11:07:22 AM10/19/09

to

Brian, the approach shown above does not work well with subclasses. The
code attempts to be safe with regard to inheritance (by doing
self.class.new instead of NewsReader.new) but it will fail miserably as
soon as a sub class constructor has a different argument list (which is
not too uncommon).

I completely agree with Rick here: the comment in Object#dup is probably
outdated. The most reasonable way to customize object cloning *and*
dupping is to implement #initialize_copy in a way to at least ensure no
aliasing of unfrozen members takes place.

> I don't
> think you would want to duplicate all this logic in #initialize_copy.

You would not duplicate the logic from #initialize in #initialize_copy
because #initialize_copy does a completely different job: it copies
state of an instance which is known to be consistent and just needs to
ensure that aliasing of object references does not break your class
invariants later accidentally. This is the reason why in
#initialize_copy different logic should be applied - even for shallow
copies! Method #initialize OTOH needs to work with its arguments which
were provided from the outside (outside of this class that is) and may
not meet expectations or valid ranges.

> Furthermore, I think I would expect #clone only to copy the top object,
> and leave all the instance variables aliased.

As far as I can see both #clone and #dup are meant to do shallow copies
but I may be wrong here. At least this is what the contract ob Object
promises and I tend to be cautious about changing such things. Even if
you redefine semantics to being deep copy for certain classes then
implementing it in #initialize_copy is superior to other approaches IMHO.

> Obviously there are no hard-and-fast rules here, and with Ruby there are
> many ways to achieve the same goal.

That's true. But I would say at least when considering inheritance some
ways are better than others. In fact I have been doing self.class.new
most of the time in #dup because I completely forgot about
#initialize_copy. But I will certainly change that habit from now on.

> I'd certainly agree this is an area where Ruby's documentation falls
> short.

Right.

> Taking another example: I don't think you'll disagree that 99% of the
> time you are expected to leave Object.new alone and instead define
> #initialize in your own classes. But you wouldn't find that out from the
> documentation:
>
> $ ri Object.new
> ------------------------------------------------------------ Object::new
> Object::new()
> ------------------------------------------------------------------------
> Not documented
>
> $ ri Object#initialize
> Nothing known about Object#initialize

Funny that you mention it: #new and #initialize on one side and #dup /
#clone and #initialize_copy on the other side have one thing in common:
object allocation is separated from initialization. I believe this was
a wise decision because that way allocation policies can be implemented
easier than in languages like C++ and Java where both are inseparable.

For example, you can add your own #deep_dup to the language:

class Object
def deep_dup
cp = self.class.allocate

instance_variables.each do |var|
cp.instance_variable_set(instance_variable_get(var))
end

cp.initialize_deep_copy(self)

cp
end

def initialize_deep_copy(source)
# nothing to do here
end
end

class String
def initialize_deep_copy(source)
replace source
end
end

# note this implementation is not robust against
# circles in the object graph!
class Array
def initialize_deep_copy(source)
source.each do |y|
self << y.deep_dup
end
end
end

a = %w{foo bar baz}
b = a.dup
b[2].replace "CHANGED"

p a, b

a = %w{foo bar baz}
b = a.deep_dup
b[2].replace "CHANGED"

p a, b

Rajinder Yadav

unread,

Oct 19, 2009, 12:14:57 PM10/19/09

to

On Mon, Oct 19, 2009 at 11:10 AM, Robert Klemme
<short...@googlemail.com> wrote:
> On 10/19/2009 10:32 AM, Brian Candler wrote:
>>
>> Robert Klemme wrote:
>>>
>>> 2009/10/16 lith <mini...@gmail.com>:
>>>>>

>> Taking another example: I don't think you'll disagree that 99% of the time
>> you are expected to leave Object.new alone and instead define #initialize in
>> your own classes. But you wouldn't find that out from the documentation:
>>
>> $ ri Object.new
>> ------------------------------------------------------------ Object::new
>> Object::new()
>> ------------------------------------------------------------------------
>> Not documented
>>
>> $ ri Object#initialize
>> Nothing known about Object#initialize
>
> Funny that you mention it: #new and #initialize on one side and #dup /
> #clone and #initialize_copy on the other side have one thing in common:
> object allocation is separated from initialization. I believe this was a
> wise decision because that way allocation policies can be implemented easier
> than in languages like C++ and Java where both are inseparable.
>

I wonder as I mention already maybe this design has more to do with
the fact that Ruby does not perform static type checking like C++ /
Java does at compile time. In C++ you just declare a copy constructor
(initialize/constructor), if you have other (overloaded) constructor
code, then static type checking ensure the correct code logic is
executed, thus allowing you to write a cleaner clone method. In Ruby
if initialize was called during cloning, you would need to add the
logic to perform the dynamic type checking test using Object.class.
Who would want to write this boilerplate code over and over? So Ruby's
was around this was to use initialize_copy as I am going to assume
here.

I think cloning in the initializer code would be a better design if
Ruby did static type checking. The fact Ruby still does (dynamic) type
checking at runtime, means Ruby code gets penalized for performance.

It seems the way Ruby does dup/clone/initialize/initialize_copy *throw
in subclassing* is a source of confusion for many and not really
intuitive, barring good or bad design. The length of this thread and
replies would seem to indicate this is a weakness in Ruby design, or I
am simply biased with my C++ background? Definitely better updated
documentation would help to ensure the correct policy to follow in
Ruby.

>
> Kind regards
>
> robert
>
> --
> remember.guy do |as, often| as.you_can - without end
> http://blog.rubybestpractices.com/

--
Kind Regards,
Rajinder Yadav

http://devmentor.org
Do Good! ~ Share Freely

Robert Klemme

unread,

Oct 20, 2009, 3:45:13 PM10/20/09

to

C++'s copy constructor is not a "clone method". For example, it will
happily "clone" any subclass instance. Cloning typically ensures at
least the class of the new instance is the same as for the original.

Static typing is just one reason why Ruby and C++ differ here: another
important reason is the memory model of both languages. In Ruby you
only have object references which can only be copied by value. In C++
on the other hand you have a whole toolbox of options (value objects,
pointers, references - plus constant variants). You can see that when
looking at Java: it has static typing but just one way to access objects
- via references. This is the same model as in Ruby and alas, also Java
has a method clone() which behaves similar (although the programming
model is different), i.e. it creates a new instance of the same class
with all members set to the same references as the original.

Side note: I find Java's cloning is broken in several ways. If you want
to make a class Cloneable you can only use "final" for primitive value
members because otherwise you cannot prevent aliasing between old and
new instance. Then, interface Cloneable does not contain method clone
which does not make the compiler catch a missing public method clone().
Lastly, I would have preferred the return type to be generic; although
I do have to admit that I did not think this through completely. I
guess Sun's engineers had good reasons not to change this.

> In Ruby
> if initialize was called during cloning, you would need to add the
> logic to perform the dynamic type checking test using Object.class.

In other words: you would have to manually implement method overloading.

> Who would want to write this boilerplate code over and over? So Ruby's
> was around this was to use initialize_copy as I am going to assume
> here.

You make it sound like a workaround but it isn't. For a language like
Ruby this is a good solution - and compared to Java's it's almost
perfect. It just lacks the public recognition. :-)

> I think cloning in the initializer code would be a better design if
> Ruby did static type checking. The fact Ruby still does (dynamic) type
> checking at runtime, means Ruby code gets penalized for performance.

I don't follow you here. If you want a language with static type
checking you'll have to look elsewhere. We don't have static type
checking in Ruby - in fact it's one of the core assets of the language.
Ruby with static typing would not be Ruby. Reasoning about which
approach would be best if Ruby had static typing is pretty useless.

> It seems the way Ruby does dup/clone/initialize/initialize_copy *throw
> in subclassing* is a source of confusion for many and not really
> intuitive, barring good or bad design. The length of this thread and
> replies would seem to indicate this is a weakness in Ruby design, or I
> am simply biased with my C++ background? Definitely better updated
> documentation would help to ensure the correct policy to follow in
> Ruby.

I would attribute this confusion to the documentation and to the fact
that this is a rare topic to come up. I cannot remember a "how to
properly clone objects" thread in the last years that would have covered
the topic as thoroughly as we did here.

I don't think we are facing a weakness in Ruby's design here. C++
cannot be a role model for Ruby (regardless of whether you consider
C++'s approach good or bad) because both languages are very different as
I have tried to show above. It may be that your "C++ background" clouds
your view on Ruby. :-)

Thanks for the interesting discussion!

lith

unread,

Oct 20, 2009, 4:53:07 PM10/20/09

to

> Side note: I find Java's cloning is broken in several ways.

You're not alone:
http://www.artima.com/intv/bloch13.html

Robert Klemme

unread,

Oct 20, 2009, 5:51:27 PM10/20/09

to

He must have copied it from me. :-)

Seriously, although I agree to almost everything he says I would like to
add that cloning (done properly, for example as done in Ruby) does have
advantages over copy construction as well (just to name the most
prominent one: you do not need to know the class of the object to
clone). In fact, they are two different concepts and sometimes one is
more appropriate and sometimes the other one.

Cheers