JRuby disabling ObjectSpace: what implications?

Charles Oliver Nutter

unread,

Oct 28, 2007, 2:53:24 AM10/28/07

to

As some of you may have heard, we're considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we have to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists. Here's some example performance,
from a fractal benchmark in the JRuby source:

With ObjectSpace: Ruby Elapsed 45.967000
Without ObjectSpace: Ruby Elapsed 4.280000

What's most frustrating about this is that almost *no* libraries or apps
use each_object, and it's a terrible performance hit for us.

The one really visible use of each_object is in test/unit, where the
default console-based runner does each_object(Class) to find all
subclasses of TestCase. Because this is a heavily-used library (to say
the least), I've made modifications to JRuby to always support
each_object(Class) by maintaining a bidirectional graph of parent and
child classes. So that much wouldn't go away (but I'd prefer an
implementation that uses Class#inherited, since it would be cleaner,
faster, and deterministic).

So...I'm writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).

I think more and more of you may want to give JRuby another look over
the next few months, so I think we need to involve you in such decisions.

- Charlie

Bill Kelly

unread,

Oct 28, 2007, 2:59:19 AM10/28/07

to

From: "Charles Oliver Nutter" <charles...@sun.com>

>
> As some of you may have heard, we're considering disabling
> ObjectSpace.each_object by default in JRuby. Primarily, this is for
> performance; to support each_object, we have to bend over backwards,
> maintaining lists of weak references to all objects in the system and
> periodically cleaning out those lists.

Is this also true for ObjectSpace#_id2ref ?

Regards,

Bill

Charles Oliver Nutter

unread,

Oct 28, 2007, 3:06:32 AM10/28/07

to

ara.t.howard wrote:
> hmmm. ok i'm brainstorming here which you can ignore if you like as i
> know less that nothing about jvms or implementing ruby but here goes:
> what if you could invert the problem? what i objects knew about the
> global ObjectSpaceThang and could be forced to register themselves on
> demand somehow? without a reference i've no idea how, just throwing
> that out there. or, another stupid idea, what if the objects themselves
> were the tree/graph of weak references parent -> children. crawling it
> would be, um, fun - but you could prune dead objects *only* when walking
> the graph. this should be possible in ruby since you always have the
> notion of a parent object - which is Object - so all objects should be
> either reachable or leaks. now back to drinking my regularly scheduled
> beer...

Continuing this discussion here...

Please, continue to brainstorm. I don't claim to have thought out every
aspect of this problem or every possible solution. I'd *love* to
discover I've missed an obvious fix.

Your idea has come up in the past, and it would probably eliminate the
cost of an ObjectSpace list. However that doesn't appear to be where we
pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

- Constructing an extra object for every Ruby object...namely, the
WeakReference object to point to it. So we pay a
memory/allocation/initialization cost.
- WeakReference itself causes Java's GC to have to do additional checks,
so it can notify the WeakReference that the object it points at has gone
away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some
implementations of ObjectSpace-like behavior, so it's perfectly valid.
But again, there's certain aspects of ObjectSpace that are just
problematic...

- threading or concurrency of any kind? No, you can't have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace doesn't have to be
deterministic"...but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS isn't
deterministic, then *nobody* should be relying in its contents for core
libraries, and we could reasonably claim that each_object will never
return *anything*.

- Charlie

Charles Oliver Nutter

unread,

Oct 28, 2007, 3:16:25 AM10/28/07

to

Not directly. _id2ref is handled in a similar way, but we have an event
we can trigger off to start tracking an object; namely, Object#id.

When you request an id, we start tracking that object for purposes of
_id2ref. Not until. So that would not be affected by disabling ObjectSpace.

In actually, however, _id2ref is primarily used for things like weak
references, so you can hold a virtual reference to an object without
preventing it from being collected. We could provide an implementation
of Ruby's weak references using Java's weak references that would allow
us to escape _id2ref entirely for that use case.

Are there other places _id2ref is used?

- Charlie

Bill Kelly

unread,

Oct 28, 2007, 3:59:20 AM10/28/07

to

From: "Charles Oliver Nutter" <charles...@sun.com>

> Bill Kelly wrote:
>>
>> Is this also true for ObjectSpace#_id2ref ?
>
> Not directly. _id2ref is handled in a similar way, but we have an event
> we can trigger off to start tracking an object; namely, Object#id.
>
> When you request an id, we start tracking that object for purposes of
> _id2ref. Not until. So that would not be affected by disabling ObjectSpace.

I see, thanks. Nifty. :)

> In actually, however, _id2ref is primarily used for things like weak
> references, so you can hold a virtual reference to an object without
> preventing it from being collected. We could provide an implementation
> of Ruby's weak references using Java's weak references that would allow
> us to escape _id2ref entirely for that use case.
>
> Are there other places _id2ref is used?

I think I've used _id2ref exactly twice. I can't recall the first
usage; I don't think it made it into production code. The most
recent use was to store some ruby object id's in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.

(I suppose DRb might do something similar?)

Regards,

Bill

Nobuyoshi Nakada

unread,

Oct 28, 2007, 8:25:25 AM10/28/07

to

Hi,

At Sun, 28 Oct 2007 16:16:25 +0900,
Charles Oliver Nutter wrote in [ruby-talk:276236]:

> Are there other places _id2ref is used?

drb.

--
Nobu Nakada

Robert Klemme

unread,

Oct 28, 2007, 9:06:50 AM10/28/07

to

On 28.10.2007 08:06, Charles Oliver Nutter wrote:
> ara.t.howard wrote:
> > hmmm. ok i'm brainstorming here which you can ignore if you like as i
> > know less that nothing about jvms or implementing ruby but here goes:
> > what if you could invert the problem? what i objects knew about the
> > global ObjectSpaceThang and could be forced to register themselves on
> > demand somehow? without a reference i've no idea how, just throwing
> > that out there. or, another stupid idea, what if the objects themselves
> > were the tree/graph of weak references parent -> children. crawling it
> > would be, um, fun - but you could prune dead objects *only* when walking
> > the graph. this should be possible in ruby since you always have the
> > notion of a parent object - which is Object - so all objects should be
> > either reachable or leaks. now back to drinking my regularly scheduled
> > beer...
>
>
> Continuing this discussion here...
>
> Please, continue to brainstorm. I don't claim to have thought out every
> aspect of this problem or every possible solution. I'd *love* to
> discover I've missed an obvious fix.

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM
has to keep track of instances anyway and implementing this in Java via
WeakReferences seems to duplicate functionality that is already there.
Did you consider using "Java Virtual Machine Tools Interface"?

http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls

You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).

Alternatively there may be another method that does not need
instrumentation and that can give you access to every (reachable) object
in the JVM.

> Your idea has come up in the past, and it would probably eliminate the
> cost of an ObjectSpace list. However that doesn't appear to be where we
> pay the highest cost.
>
> The two items that (we believe) cost the most for us on the JVM are:
>
> - Constructing an extra object for every Ruby object...namely, the
> WeakReference object to point to it. So we pay a
> memory/allocation/initialization cost.
> - WeakReference itself causes Java's GC to have to do additional checks,
> so it can notify the WeakReference that the object it points at has gone
> away. So that slows down the legendary HotSpot GC and we pay again.
>
> I believe the parent -> weakref -> children algorithm is used in some
> implementations of ObjectSpace-like behavior, so it's perfectly valid.
> But again, there's certain aspects of ObjectSpace that are just
> problematic...
>
> - threading or concurrency of any kind? No, you can't have
> multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
> potentially excludes other advanced GC designs too).
> - determinism? Matz told me that "ObjectSpace doesn't have to be
> deterministic"...but when it starts getting wired into libraries like
> test/unit, it seems like people expect it to be. If we can say OS isn't
> deterministic, then *nobody* should be relying in its contents for core
> libraries, and we could reasonably claim that each_object will never
> return *anything*.

I'd reformulate the requirement here: ObjectSpace.each_object must yield
every object that was existent before the invocation and that is
strongly reachable. I believe for the typical use case (e.g. traversing
all class instances) this is enough while leaving enough flexibility for
the implementation (i.e. create s snapshot of some form, iterate through
some internal structure that may change due to new objects being created
during #each_object etc.).

Kind regards

robert

Daniel Berger

unread,

Oct 28, 2007, 9:13:38 AM10/28/07

to

On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:
<snip>

> So...I'm writing this to see what the general Ruby world thinks of us
> having ObjectSpace disabled by default, enableable via a command line
> option (or perhaps through a library? -robjectspace?).

ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer
self, @@final.call(@hkeyfinal)
ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
{File.unlink("tmp.txt")}
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|
obj|
ext\Win32API\lib\win32\registry.rb:569:
ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal)
lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self,
Session::callback(@dbprot))
lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This
means that the dRuby
lib\drb\drb.rb:361: # This, the default implementation, uses an
object's local ObjectSpace
lib\drb\drb.rb:375: ObjectSpace._id2ref(ref)
lib\finalize.rb:59: ObjectSpace.call_finalizer(obj)
lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc)
lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc)
lib\finalize.rb:180: # registering function to
ObjectSpace#add_finalizer
lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc)
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj,
HistorySavingAbility.create_finalizer)
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
|io|
lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # =>
0.
lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass}
instance(s)"
lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc)
lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self)
lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self)
lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class)
do |klass|
lib\test\unit\autorunner.rb:54: :objectspace => proc do |r|
lib\test\unit\autorunner.rb:55: require 'test/unit/collector/
objectspace'
lib\test\unit\autorunner.rb:56: c =
Collector::ObjectSpace.new
lib\test\unit\autorunner.rb:80: @collector =
COLLECTORS[(standalone ? :dir : :objectspace)]
lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir,
file=::File, object_space=::ObjectSpace, req=nil)
lib\test\unit\collector\objectspace.rb:10: class ObjectSpace
lib\test\unit\collector\objectspace.rb:13: NAME = 'collected
from the ObjectSpace'
lib\test\unit\collector\objectspace.rb:15: def
initialize(source=::ObjectSpace)
lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite
for you. It then runs
lib\weakref.rb:16:# ObjectSpace.garbage_collect
lib\weakref.rb:62: ObjectSpace._id2ref(@__id)
lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final
lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final
lib\weakref.rb:98: ObjectSpace.garbage_collect
test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj|
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\ruby\test_objectspace.rb:3:class TestObjectSpace <
Test::Unit::TestCase
test\ruby\test_objectspace.rb:10: o =
ObjectSpace._id2ref(obj.object_id);\
test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj|
test\testunit\collector\test_dir.rb:62: class ObjectSpace
test\testunit\collector\test_dir.rb:81: @object_space =
ObjectSpace.new
test\testunit\collector\test_objectspace.rb:6:require 'test/unit/
collector/objectspace'
test\testunit\collector\test_objectspace.rb:11: class
TC_ObjectSpace < TestCase
test\testunit\collector\test_objectspace.rb:41: @c =
ObjectSpace.new(@object_space)
test\testunit\collector\test_objectspace.rb:44: def
full_suite(name=ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:51:
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:83: expected =
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:89: expected =
TestSuite.new(ObjectSpace::NAME)
test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do |
klass|

So, in summary, if we exclude those libraries where only tests are
affected, this would affect:

win32-registry
tk
cgi
drb
finalize
irb
shell
singleton
tempfile
test-unit
weakref

Some comments on each of these as they relate to JRuby:

win32-registry: You have no hope of implementing this without JNA
anyway, unless there's some Java binding I don't know about. Besides,
I couldn't tell you why on Earth win32-registry would need a
finalizer.

tk: No one will care. They'll use SWT or Swing bindings. Besides, you
would need JNA.

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.

drb: This could be a big deal.

finalize: Did anyone even know about this? Does anyone use it?

irb: You've got jirb.

shell: This could be a problem.

singleton: Ditto.

tempfile: Meh, I'm guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

test-unit: Already mentioned.

weakref: You've stated that Java has its own implementation.

Regards,

Dan

ara.t.howard

unread,

Oct 28, 2007, 10:27:19 AM10/28/07

to

On Oct 28, 2007, at 1:16 AM, Charles Oliver Nutter wrote:

>
> Are there other places _id2ref is used?

i use it quite often as a way to have meta-programming 'storage'
without polluting instances:

foo = method :foo

module_eval <<-code
def foo(*a, &b)
ObjectSpace._id2ref(#{ foo.id }).bind(self).call(*a, &b)
end
code

which is fabricated - but you get the concept: string in eval maps to
live object at run time. when #define_method takes a block this
won't be used much i think though...

cheers.

a @ http://codeforpeople.com/
--
it is not enough to be compassionate. you must act.
h.h. the 14th dalai lama

Charles Oliver Nutter

unread,

Oct 28, 2007, 12:10:00 PM10/28/07

to

Bill Kelly wrote:
> I think I've used _id2ref exactly twice. I can't recall the first
> usage; I don't think it made it into production code. The most
> recent use was to store some ruby object id's in a separate C++
> process, which was able to fire an event back to ruby and provide
> the object id for the object to receive the event.
>
> (I suppose DRb might do something similar?)

Yeah, sounds like that's mostly a "poor man's remote hash". I'd expect
that just creating a hash specifically for that purpose and passing a
key around would be a "better" way to do it.

_id2ref is just another one of those features that gets rarely used, and
whose use cases can often be implemented in "better" ways.

- Charlie

Charles Oliver Nutter

unread,

Oct 28, 2007, 12:19:07 PM10/28/07

to

Robert Klemme wrote:
> IMHO ObjectSpace should not be implemented in Java land. Why? The JVM
> has to keep track of instances anyway and implementing this in Java via
> WeakReferences seems to duplicate functionality that is already there.
> Did you consider using "Java Virtual Machine Tools Interface"?
>
> http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls
>
> You could either follow the same approach of the heapTracker presented
> on that page and use a flag or require a lib that enables ObjectSpace
> (because of the overhead of instrumentation).

You just hit on exactly why we don't use JVMTI for ObjectSpace. It would
certainly work, but it would add a lot of overhead we'd never expect
people to accept in a real application. Plus, it would track far more
object instances than we actually want tracked. We'd love to include a
JVMTI-based ObjectSpace implementation, however...it just hasn't been a
high priority to implement since 99% of users never actually need
ObjectSpace.

> Alternatively there may be another method that does not need
> instrumentation and that can give you access to every (reachable) object
> in the JVM.

If there is...we haven't found it. The "linked weakref list" has been
the least overhead so far, and it's still a lot of overhead.

The problem here is "strongly reachable". During ObjectSpace processing,
the last strong reference to an object may go away and the garbage
collector may run. Should ObjectSpace prevent GC from running if it's
traversed and now references that object? If not, how should it be
handled if immediately before you return an object from each_object, it
gets garbage collected? There's no way to catch that, so each_object may
end up returning a reference to an object that's gone away, or
reconstituting an object whose finalization has already fired. Bad
things happen.

ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory, run in parallel, and so on. It can
*never* be deterministic unless it can "stop the world", so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.

- Charlie

Charles Oliver Nutter

unread,

Oct 28, 2007, 12:31:55 PM10/28/07

to

Daniel Berger wrote:
> On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com>
> wrote:
> <snip>
>
>> So...I'm writing this to see what the general Ruby world thinks of us
>> having ObjectSpace disabled by default, enableable via a command line
>> option (or perhaps through a library? -robjectspace?).
>

> .ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer

Of these, only the following would be affected, since only each_object
would be disabled by default:

> tk: No one will care. They'll use SWT or Swing bindings. Besides, you
> would need JNA.

> ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|

Quite right, and there are currently no plans (or demand) for Tk support
in JRuby. Swing is a far better GUI API, especially when wrapped in Ruby.

> irb: You've got jirb.

> lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|

This could still be supported through a similar mechanism as
each_object(Class), by keeping a weak hash of all Module instances.

> shell: This could be a problem.

> lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do

I'd be surprised if shell worked 100% correctly right now anyway, due to
process-control requirements we can't support well on JVM. But I would
also expect this use of each_object to have a "better" implementation,
and if not it could again be a specific-purpose weak hash for IO streams
(which we almost have already since we want to be able to clean them up
on exit.

> singleton: Ditto.

I'd have to look at this one. This could be another good candidate for
reimplementation in a lot less Java code; singleton support would be
pretty easy to write up in a few lines of Java.

> test-unit: Already mentioned.

So pretty few libraries would be affected, and I don't think any
couldn't be dealt with in other ways. And to reiterate: finalizers and
_id2ref wouldn't be affected (though I'd prefer to find alternative
mechanisms for _id2ref).

- Charlie

Ken Bloom

unread,

Oct 28, 2007, 12:49:55 PM10/28/07

to

I don't think they're making ObjectSpace go away. Just
ObjectSpace#each_object.

(I'm not a Jruby developer, so I don't trust the correctness of anything
I say.)

On Sun, 28 Oct 2007 22:13:38 +0900, Daniel Berger wrote:
> drb: This could be a big deal.

> weakref: You've stated that Java has its own implementation.

This uses _id2ref, which doesn't appear to be going away.

> cgi: This could be a problem. Then again, some people say this library
> should be refactored or tossed.

> finalize: Did anyone even know about this? Does anyone use it?

> tempfile: Meh, I'm guessing Java has its own library for temp files. I
> never liked the current implementation anyway (which is why I wrote
> file-temp).

Finalizers could be implemented using Java's finalize() method for
classes that need it. This method of implementing finalizers could
probably be compatibly exposed using ObjectSpace.

> shell: This could be a problem.

This looks broken anyway since it uses fork.

> singleton: Ditto.

one is in documentation comment, giving an example of a specific behavior
of the library. the other is in the same example, included executably
after an if __FILE__ == $0 condition. So no actual problem here.

> irb: You've got jirb.

jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you
pointed ou is class (Module) iteration, used for completion, but it's
more general iteration than test/unit uses, and #inherited techniques
that can be used for test/unit may not work here.

--Ken

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Robert Klemme

unread,

Oct 28, 2007, 1:11:35 PM10/28/07

to

On 28.10.2007 17:19, Charles Oliver Nutter wrote:
> Robert Klemme wrote:
>> IMHO ObjectSpace should not be implemented in Java land. Why? The
>> JVM has to keep track of instances anyway and implementing this in
>> Java via WeakReferences seems to duplicate functionality that is
>> already there. Did you consider using "Java Virtual Machine Tools
>> Interface"?
>>
>> http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls
>>
>>
>> You could either follow the same approach of the heapTracker presented
>> on that page and use a flag or require a lib that enables ObjectSpace
>> (because of the overhead of instrumentation).
>
> You just hit on exactly why we don't use JVMTI for ObjectSpace. It would
> certainly work, but it would add a lot of overhead we'd never expect
> people to accept in a real application. Plus, it would track far more
> object instances than we actually want tracked.

Why is that? I mean, you could selectively decide which instances to track.

> We'd love to include a
> JVMTI-based ObjectSpace implementation, however...it just hasn't been a
> high priority to implement since 99% of users never actually need
> ObjectSpace.
>
>> Alternatively there may be another method that does not need
>> instrumentation and that can give you access to every (reachable)
>> object in the JVM.
>
> If there is...we haven't found it. The "linked weakref list" has been
> the least overhead so far, and it's still a lot of overhead.

Hmm, but there are iteration methods like #each_object:
http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#Heap

Did you put them down because of the "stop the world" approach? I'd say
that would be ok - at least it's better than not having ObjectSpace.
And also, there would be no overhead. Question is only whether it's ok
to invoke arbitrary byte code (which would happen during the iteration
callback).

You are right: objects can "disappear" (i.e. loose their strong
reachability) during traversal. Obviously my suggested requirement was
still too strong.

> There's no way to catch that, so each_object may
> end up returning a reference to an object that's gone away, or
> reconstituting an object whose finalization has already fired. Bad
> things happen.

Recreation is a bad idea. I agree, objects that are no longer strongly
reachable at the moment they are about to be passed to the block should
*not* be passed.

> ObjectSpace is just not compatible with any GC that requires the ability
> to move objects around in memory,

I don't think that moving is an issue. If it were, JVM's would not work
the way they do (object references are no pointers to memory locations).
In other words, all programs would have the same problems #each_object
had.

> run in parallel, and so on. It can
> *never* be deterministic unless it can "stop the world", so it should
> not be used for algorithms that require any level of determinism, such
> as the test search in test/unit.

Right you are. #each_object should not be used in regular code - it's
more for ad hoc statistics ("how many instances of a class?") and the like.

Kind regards

robert

Charles Oliver Nutter

unread,

Oct 28, 2007, 1:13:30 PM10/28/07

to

Ken Bloom wrote:
> I don't think they're making ObjectSpace go away. Just
> ObjectSpace#each_object.

Correct.

> On Sun, 28 Oct 2007 22:13:38 +0900, Daniel Berger wrote:
>> drb: This could be a big deal.
>> weakref: You've stated that Java has its own implementation.
>
> This uses _id2ref, which doesn't appear to be going away.

Not that I wouldn't like it to :)

>> cgi: This could be a problem. Then again, some people say this library
>> should be refactored or tossed.
>> finalize: Did anyone even know about this? Does anyone use it?
>> tempfile: Meh, I'm guessing Java has its own library for temp files. I
>> never liked the current implementation anyway (which is why I wrote
>> file-temp).
>
> Finalizers could be implemented using Java's finalize() method for
> classes that need it. This method of implementing finalizers could
> probably be compatibly exposed using ObjectSpace.

Correct; we do support finalizers already. They weren't actually that
hard to support, since as you say Java already supports finalization.

>> shell: This could be a problem.
>
> This looks broken anyway since it uses fork.

Ahh yes, fork is a killer. We will never, ever support fork.

>> singleton: Ditto.
>
> one is in documentation comment, giving an example of a specific behavior
> of the library. the other is in the same example, included executably
> after an if __FILE__ == $0 condition. So no actual problem here.

Whew, that's good to hear. I know singleton is used a bit in Rails, and
most people run JRuby on Rails with ObjectSpace disabled...so this seems
to fit with your findings.

>> irb: You've got jirb.
>
> jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you
> pointed ou is class (Module) iteration, used for completion, but it's
> more general iteration than test/unit uses, and #inherited techniques
> that can be used for test/unit may not work here.

inherited, perhaps not. But a JRuby-internal weak list of Module
instances would put this one to rest.

- Charlie

Charles Oliver Nutter

unread,

Oct 28, 2007, 1:39:02 PM10/28/07

to

Robert Klemme wrote:
> On 28.10.2007 17:19, Charles Oliver Nutter wrote:
>> You just hit on exactly why we don't use JVMTI for ObjectSpace. It
>> would certainly work, but it would add a lot of overhead we'd never
>> expect people to accept in a real application. Plus, it would track
>> far more object instances than we actually want tracked.
>
> Why is that? I mean, you could selectively decide which instances to
> track.

Actually, we do that a bit already. For example, we do not track arrays
constructed during argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example...which would cripple other JRuby apps running in
the same process.

In general, though, we haven't explored JVMTI because we want JRuby to
be the best production environment for deploying apps, and nobody will
EVER turn on JVMTI on their production servers.

>>> Alternatively there may be another method that does not need
>>> instrumentation and that can give you access to every (reachable)
>>> object in the JVM.
>>
>> If there is...we haven't found it. The "linked weakref list" has been
>> the least overhead so far, and it's still a lot of overhead.
>
> Hmm, but there are iteration methods like #each_object:
> http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#Heap

I was referring to non-JVMTI solutions, but you're right, JVMTI does
provide this capability.

> Did you put them down because of the "stop the world" approach? I'd say
> that would be ok - at least it's better than not having ObjectSpace. And
> also, there would be no overhead. Question is only whether it's ok to
> invoke arbitrary byte code (which would happen during the iteration
> callback).

Is it really ok? You need to remember that JRuby opens up the
possibility of running many, many applications in the same process, as
well as asynchronous algorithms with true parallel threads. We can't
expect people to cripple all that so they can walk EVERY object in the
system. "Stop the world" is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.

But it may be that for cases where each_object is needed, this is a
reasonable thing to do. I think if someone were to submit an
implementation of each_object that uses JVMTI, we would certainly accept
it :)

>> ObjectSpace is just not compatible with any GC that requires the
>> ability to move objects around in memory,
>
> I don't think that moving is an issue. If it were, JVM's would not work
> the way they do (object references are no pointers to memory locations).
> In other words, all programs would have the same problems #each_object
> had.

The problem is not so much that the object references move as that you
would have to lock the memory locations for some period of time to be
able to walk the object table. And I think that's *bad* especially when
we're looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can't lock things down like
that.

- Charlie

Konrad Meyer

unread,

Oct 28, 2007, 3:07:57 PM10/28/07

to

Quoth Daniel Berger:
> ...

> shell: This could be a problem.

> ...

As far as I know, shell isn't used extensively. From reading the source, it
appears to be very much linked to the host system's processes, files, etc,
which may be inappropriate for JRuby anyways (I'm guessing here).

Regards,
--
Konrad Meyer <kon...@tylerc.org> http://konrad.sobertillnoon.com/

signature.asc

mortee

unread,

Oct 28, 2007, 11:17:15 PM10/28/07

to

Charles Oliver Nutter wrote:
> Actually, we do that a bit already. For example, we do not track arrays
> constructed during argument processing, since they are typically
> transient. The problem is that we could only choose to track all Ruby
> objects, for example...which would cripple other JRuby apps running in
> the same process.

[...]

> The problem is not so much that the object references move as that you
> would have to lock the memory locations for some period of time to be
> able to walk the object table. And I think that's *bad* especially when
> we're looking at JRuby allowing folks to run dozens of apps in the same
> process and memory space out of the box. We can't lock things down like
> that.

Sorry for the extremely uninitiated and naive question - but when you're
about to enumerate each object in an application, aren't you interested
only in this application's objects anyway? So why would you have to lock
anything about the other ruby apps in the same process? Is that kind of
distinguishing objects impossible on the GC/enumeration level?

mortee

unread,

Oct 28, 2007, 11:25:26 PM10/28/07

to

Charles Oliver Nutter wrote:

> Daniel Berger wrote:
>> irb: You've got jirb.
>> lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
>
> This could still be supported through a similar mechanism as
> each_object(Class), by keeping a weak hash of all Module instances.
>
>> shell: This could be a problem.
>> lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
>
> I'd be surprised if shell worked 100% correctly right now anyway, due to
> process-control requirements we can't support well on JVM. But I would
> also expect this use of each_object to have a "better" implementation,
> and if not it could again be a specific-purpose weak hash for IO streams
> (which we almost have already since we want to be able to clean them up
> on exit.

Speaking of multiple cases of possible class-specific instance
tracking... isn't it possible to register your interest in some such
class at some point explicitely from program code - and then any class
could be made enumerable.

mortee

evan...@gmail.com

unread,

Oct 29, 2007, 1:16:05 AM10/29/07

to

On Oct 28, 9:19 am, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:
<snip>
>

> ObjectSpace is just not compatible with any GC that requires the ability
> to move objects around in memory, run in parallel, and so on. It can
> *never* be deterministic unless it can "stop the world", so it should
> not be used for algorithms that require any level of determinism, such
> as the test search in test/unit.

This is the exact reason we haven't yet implemented each_object in
Rubinius yet.

Having a generational GC that moves objects, iterating over all
objects is very,
very non-deterministic unless the GC is totally turned off while
objects are walked.

Thats at least an option we have that we may roll with for the initial
release, but
it's less than ideal.

I think of each_object as very much a MRI implementation feature that
the rest of us
implementors struggle to implement. Because of this, the community and
core members of
each implementation need to really beginning discussing whether or not
each_object is a
Ruby feature or an MRI feature.

- Evan

Robert Klemme

unread,

Oct 29, 2007, 6:39:16 AM10/29/07

to

2007/10/28, Charles Oliver Nutter <charles...@sun.com>:

> Robert Klemme wrote:
> > On 28.10.2007 17:19, Charles Oliver Nutter wrote:
> >> You just hit on exactly why we don't use JVMTI for ObjectSpace. It
> >> would certainly work, but it would add a lot of overhead we'd never
> >> expect people to accept in a real application. Plus, it would track
> >> far more object instances than we actually want tracked.
> >
> > Why is that? I mean, you could selectively decide which instances to
> > track.

> In general, though, we haven't explored JVMTI because we want JRuby to

> be the best production environment for deploying apps, and nobody will
> EVER turn on JVMTI on their production servers.

Well, it depends on the overhead and on the invocation model. I
assumed you would be starting a JVM per process but your other remarks
sound more like there is one JVM for JRuby programs...

> > Did you put them down because of the "stop the world" approach? I'd say
> > that would be ok - at least it's better than not having ObjectSpace. And
> > also, there would be no overhead. Question is only whether it's ok to
> > invoke arbitrary byte code (which would happen during the iteration
> > callback).
>
> Is it really ok? You need to remember that JRuby opens up the
> possibility of running many, many applications in the same process, as
> well as asynchronous algorithms with true parallel threads. We can't
> expect people to cripple all that so they can walk EVERY object in the
> system. "Stop the world" is awful when you start breaking the ability to
> do many things in parallel, as you can in JRuby.

Ok, I see I need to dive further into JRuby before I discuss this further. :-)

> But it may be that for cases where each_object is needed, this is a
> reasonable thing to do. I think if someone were to submit an
> implementation of each_object that uses JVMTI, we would certainly accept
> it :)

Hint, hint... :-)

> >> ObjectSpace is just not compatible with any GC that requires the
> >> ability to move objects around in memory,
> >
> > I don't think that moving is an issue. If it were, JVM's would not work
> > the way they do (object references are no pointers to memory locations).
> > In other words, all programs would have the same problems #each_object
> > had.
>
> The problem is not so much that the object references move as that you
> would have to lock the memory locations for some period of time to be
> able to walk the object table. And I think that's *bad* especially when
> we're looking at JRuby allowing folks to run dozens of apps in the same
> process and memory space out of the box. We can't lock things down like
> that.

I don't understand this remark of yours. If you implement this in Java
land (as you did apparently with WeakReferences) then there is no need
to lock anything. You just traverse the list (or a copy of the list)
and if a ref has been set to null you do not pass it to the callback.

If it is some kind of native code (possibly via JNI or other
interfaces) probably more care has to be taken, although I'd assume
that JNI takes care of this (i.e. once the callback is invoked with a
non null argument the object stays life until after the callback
returns unless you clear that reference of course).

Traversal during #each_object in that respect is similar to traversal
through an ordinary collection - during that a GC can occur just the
same but that does not affect the traversal in any way.

What am I missing?

Kind regards

robert

Charles Oliver Nutter

unread,

Oct 29, 2007, 8:02:56 AM10/29/07

to

As far as I know there's no way to have JVMTI enumerate only objects
created by a specific application in a given JVM. So any sort of
ObjectSpace impl based on it would have to take that into consideration.

- Charlie

Charles Oliver Nutter

unread,

Oct 29, 2007, 8:08:09 AM10/29/07

to

evan...@gmail.com wrote:
> I think of each_object as very much a MRI implementation feature that
> the rest of us
> implementors struggle to implement. Because of this, the community and
> core members of
> each implementation need to really beginning discussing whether or not
> each_object is a
> Ruby feature or an MRI feature.

That's actually a really good point. each_object is more a feature of an
individual implementation's memory model than a general feature that can
be applied to every Ruby implementation. In many cases, like ours, you
simply don't have control over that memory model enough to provide a
real each_object implementation (and _id2ref requires tricks too, but
it's at least bounded and explicit). So it may be fair to say that
each_object is an MRI feature we emulate, but cannot simulate well
enough for it to translate appropriately.

Robert Klemme

unread,

Oct 29, 2007, 8:48:07 AM10/29/07

to

2007/10/29, Charles Oliver Nutter <charles...@sun.com>:

Hm, if you host different applications in the same JVM you probably
need separate class loaders anyway to separate changes on classes.
Maybe you can use that to partition the heap. Alternatively you could
use IterateOverObjectsReachableFromObject() and start from main. Just
a few wild guesses.

Btw, but the issue with stopping the world would still not go away.
Too bad. A possible solution would be to implement the callback in a
way that it places all references in a Java collection. Only after it
finishes the Ruby land callback is invoked for each instance. The
downside is that you need more space (i.e. for the collection which
could become largish) but on the plus side is that you do not have any
overhead (other than incurred by JVMTI) during "normal" operation and
you can limit the stop the world time to just the copying phase which
might be acceptable. Charles, what do you think?

Kind regards

robert

Helder Ribeiro

unread,

Oct 29, 2007, 9:26:13 AM10/29/07

to

On Oct 28, 3:39 pm, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:

Exactly. I think that each_object rarely has to go into production
code, but is very handy (and, to be honest, just fun, really) in
debugging/testing/experimenting. For those type situations, I don't
really think a "stop the world" approach is so terrible. I find it
less of a disturbance than having this off-code switch.

charles...@sun.com

unread,

Nov 5, 2007, 12:53:08 PM11/5/07

to

Charles Oliver Nutter wrote:
> As some of you may have heard, we're considering disabling
> ObjectSpace.each_object by default in JRuby.

I brought this up at RubyConf, and got about 50% of people saying "I
agree" and 50% of people saying "I do not agree". As it stands now, we
will proceed with having ObjectSpace.each_object disabled by default in
JRuby 1.1 final. See the rest of this thread for the backstory and notes
on test/unit.

The folks who disagree appear to only disagree on principal, rather than
based on any real demonstrable problem with turning each_object off. On
the other hand, the folks who want to disable it have real-world
concerns: performance on the apps they're running. Until there's a
compelling, real-world, non-ideological reason to leave each_object
enabled by default, it will be disabled in JRuby (enable with +O flag or
jruby.objectspace.enabled=true property).

This change is already there in 1.1b1, released on Friday evening.

- Charlie

charles...@sun.com

unread,

Nov 5, 2007, 1:08:49 PM11/5/07

to

Robert Klemme wrote:
> Btw, but the issue with stopping the world would still not go away.
> Too bad. A possible solution would be to implement the callback in a
> way that it places all references in a Java collection. Only after it
> finishes the Ruby land callback is invoked for each instance. The
> downside is that you need more space (i.e. for the collection which
> could become largish) but on the plus side is that you do not have any
> overhead (other than incurred by JVMTI) during "normal" operation and
> you can limit the stop the world time to just the copying phase which
> might be acceptable. Charles, what do you think?

It's certainly possible to do this, but it would probably need to create
a giant strong-referenced list of objects for iteration. Part of my hard
rules for implementing ObjectSpace is that it MUST NOT interfere with an
object's normal lifecycle.

- Charlie

charles...@sun.com

unread,

Nov 5, 2007, 1:11:47 PM11/5/07

to

Yes, that is possible...but it solves only part of the problem. Just
having ObjectSpace.each_object enableable through a flag allows it to be
fully functional when you want it and out of the way the rest of the time.

- Charlie