As some of you may have heard, we're considering disabling ObjectSpace.each_object by default in JRuby. Primarily, this is for performance; to support each_object, we have to bend over backwards, maintaining lists of weak references to all objects in the system and periodically cleaning out those lists. Here's some example performance, from a fractal benchmark in the JRuby source:
With ObjectSpace: Ruby Elapsed 45.967000 Without ObjectSpace: Ruby Elapsed 4.280000
What's most frustrating about this is that almost *no* libraries or apps use each_object, and it's a terrible performance hit for us.
The one really visible use of each_object is in test/unit, where the default console-based runner does each_object(Class) to find all subclasses of TestCase. Because this is a heavily-used library (to say the least), I've made modifications to JRuby to always support each_object(Class) by maintaining a bidirectional graph of parent and child classes. So that much wouldn't go away (but I'd prefer an implementation that uses Class#inherited, since it would be cleaner, faster, and deterministic).
So...I'm writing this to see what the general Ruby world thinks of us having ObjectSpace disabled by default, enableable via a command line option (or perhaps through a library? -robjectspace?).
I think more and more of you may want to give JRuby another look over the next few months, so I think we need to involve you in such decisions.
From: "Charles Oliver Nutter" <charles.nut...@sun.com>
> As some of you may have heard, we're considering disabling > ObjectSpace.each_object by default in JRuby. Primarily, this is for > performance; to support each_object, we have to bend over backwards, > maintaining lists of weak references to all objects in the system and > periodically cleaning out those lists.
> hmmm. ok i'm brainstorming here which you can ignore if you like as i > know less that nothing about jvms or implementing ruby but here goes: > what if you could invert the problem? what i objects knew about the > global ObjectSpaceThang and could be forced to register themselves on > demand somehow? without a reference i've no idea how, just throwing > that out there. or, another stupid idea, what if the objects themselves > were the tree/graph of weak references parent -> children. crawling it > would be, um, fun - but you could prune dead objects *only* when walking > the graph. this should be possible in ruby since you always have the > notion of a parent object - which is Object - so all objects should be > either reachable or leaks. now back to drinking my regularly scheduled > beer...
Continuing this discussion here...
Please, continue to brainstorm. I don't claim to have thought out every aspect of this problem or every possible solution. I'd *love* to discover I've missed an obvious fix.
Your idea has come up in the past, and it would probably eliminate the cost of an ObjectSpace list. However that doesn't appear to be where we pay the highest cost.
The two items that (we believe) cost the most for us on the JVM are:
- Constructing an extra object for every Ruby object...namely, the WeakReference object to point to it. So we pay a memory/allocation/initialization cost. - WeakReference itself causes Java's GC to have to do additional checks, so it can notify the WeakReference that the object it points at has gone away. So that slows down the legendary HotSpot GC and we pay again.
I believe the parent -> weakref -> children algorithm is used in some implementations of ObjectSpace-like behavior, so it's perfectly valid. But again, there's certain aspects of ObjectSpace that are just problematic...
- threading or concurrency of any kind? No, you can't have multithreading with ObjectSpace, nor a concurrent/parallel GC (and it potentially excludes other advanced GC designs too). - determinism? Matz told me that "ObjectSpace doesn't have to be deterministic"...but when it starts getting wired into libraries like test/unit, it seems like people expect it to be. If we can say OS isn't deterministic, then *nobody* should be relying in its contents for core libraries, and we could reasonably claim that each_object will never return *anything*.
> From: "Charles Oliver Nutter" <charles.nut...@sun.com>
>> As some of you may have heard, we're considering disabling >> ObjectSpace.each_object by default in JRuby. Primarily, this is for >> performance; to support each_object, we have to bend over backwards, >> maintaining lists of weak references to all objects in the system and >> periodically cleaning out those lists.
> Is this also true for ObjectSpace#_id2ref ?
Not directly. _id2ref is handled in a similar way, but we have an event we can trigger off to start tracking an object; namely, Object#id.
When you request an id, we start tracking that object for purposes of _id2ref. Not until. So that would not be affected by disabling ObjectSpace.
In actually, however, _id2ref is primarily used for things like weak references, so you can hold a virtual reference to an object without preventing it from being collected. We could provide an implementation of Ruby's weak references using Java's weak references that would allow us to escape _id2ref entirely for that use case.
From: "Charles Oliver Nutter" <charles.nut...@sun.com>
> Bill Kelly wrote:
>> Is this also true for ObjectSpace#_id2ref ?
> Not directly. _id2ref is handled in a similar way, but we have an event > we can trigger off to start tracking an object; namely, Object#id.
> When you request an id, we start tracking that object for purposes of > _id2ref. Not until. So that would not be affected by disabling ObjectSpace.
I see, thanks. Nifty. :)
> In actually, however, _id2ref is primarily used for things like weak > references, so you can hold a virtual reference to an object without > preventing it from being collected. We could provide an implementation > of Ruby's weak references using Java's weak references that would allow > us to escape _id2ref entirely for that use case.
> Are there other places _id2ref is used?
I think I've used _id2ref exactly twice. I can't recall the first usage; I don't think it made it into production code. The most recent use was to store some ruby object id's in a separate C++ process, which was able to fire an event back to ruby and provide the object id for the object to receive the event.
> ara.t.howard wrote: > > hmmm. ok i'm brainstorming here which you can ignore if you like as i > > know less that nothing about jvms or implementing ruby but here goes: > > what if you could invert the problem? what i objects knew about the > > global ObjectSpaceThang and could be forced to register themselves on > > demand somehow? without a reference i've no idea how, just throwing > > that out there. or, another stupid idea, what if the objects themselves > > were the tree/graph of weak references parent -> children. crawling it > > would be, um, fun - but you could prune dead objects *only* when walking > > the graph. this should be possible in ruby since you always have the > > notion of a parent object - which is Object - so all objects should be > > either reachable or leaks. now back to drinking my regularly scheduled > > beer...
> Continuing this discussion here...
> Please, continue to brainstorm. I don't claim to have thought out every > aspect of this problem or every possible solution. I'd *love* to > discover I've missed an obvious fix.
IMHO ObjectSpace should not be implemented in Java land. Why? The JVM has to keep track of instances anyway and implementing this in Java via WeakReferences seems to duplicate functionality that is already there. Did you consider using "Java Virtual Machine Tools Interface"?
You could either follow the same approach of the heapTracker presented on that page and use a flag or require a lib that enables ObjectSpace (because of the overhead of instrumentation).
Alternatively there may be another method that does not need instrumentation and that can give you access to every (reachable) object in the JVM.
> Your idea has come up in the past, and it would probably eliminate the > cost of an ObjectSpace list. However that doesn't appear to be where we > pay the highest cost.
> The two items that (we believe) cost the most for us on the JVM are:
> - Constructing an extra object for every Ruby object...namely, the > WeakReference object to point to it. So we pay a > memory/allocation/initialization cost. > - WeakReference itself causes Java's GC to have to do additional checks, > so it can notify the WeakReference that the object it points at has gone > away. So that slows down the legendary HotSpot GC and we pay again.
> I believe the parent -> weakref -> children algorithm is used in some > implementations of ObjectSpace-like behavior, so it's perfectly valid. > But again, there's certain aspects of ObjectSpace that are just > problematic...
> - threading or concurrency of any kind? No, you can't have > multithreading with ObjectSpace, nor a concurrent/parallel GC (and it > potentially excludes other advanced GC designs too). > - determinism? Matz told me that "ObjectSpace doesn't have to be > deterministic"...but when it starts getting wired into libraries like > test/unit, it seems like people expect it to be. If we can say OS isn't > deterministic, then *nobody* should be relying in its contents for core > libraries, and we could reasonably claim that each_object will never > return *anything*.
I'd reformulate the requirement here: ObjectSpace.each_object must yield every object that was existent before the invocation and that is strongly reachable. I believe for the typical use case (e.g. traversing all class instances) this is enough while leaving enough flexibility for the implementation (i.e. create s snapshot of some form, iterate through some internal structure that may change due to new objects being created during #each_object etc.).
On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com> wrote: <snip>
> So...I'm writing this to see what the general Ruby world thinks of us > having ObjectSpace disabled by default, enableable via a command line > option (or perhaps through a library? -robjectspace?).
ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal) ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp) {File.unlink("tmp.txt")} ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){| obj| ext\Win32API\lib\win32\registry.rb:569: ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal) lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self, Session::callback(@dbprot)) lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This means that the dRuby lib\drb\drb.rb:361: # This, the default implementation, uses an object's local ObjectSpace lib\drb\drb.rb:375: ObjectSpace._id2ref(ref) lib\finalize.rb:59: ObjectSpace.call_finalizer(obj) lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc) lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc) lib\finalize.rb:180: # registering function to ObjectSpace#add_finalizer lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc) lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m| lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj, HistorySavingAbility.create_finalizer) lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do |io| lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # => 0. lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass} instance(s)" lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc) lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self) lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self) lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class) do |klass| lib\test\unit\autorunner.rb:54: :objectspace => proc do |r| lib\test\unit\autorunner.rb:55: require 'test/unit/collector/ objectspace' lib\test\unit\autorunner.rb:56: c = Collector::ObjectSpace.new lib\test\unit\autorunner.rb:80: @collector = COLLECTORS[(standalone ? :dir : :objectspace)] lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir, file=::File, object_space=::ObjectSpace, req=nil) lib\test\unit\collector\objectspace.rb:10: class ObjectSpace lib\test\unit\collector\objectspace.rb:13: NAME = 'collected from the ObjectSpace' lib\test\unit\collector\objectspace.rb:15: def initialize(source=::ObjectSpace) lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite for you. It then runs lib\weakref.rb:16:# ObjectSpace.garbage_collect lib\weakref.rb:62: ObjectSpace._id2ref(@__id) lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final lib\weakref.rb:98: ObjectSpace.garbage_collect test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj| test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj| test\ruby\test_objectspace.rb:3:class TestObjectSpace < Test::Unit::TestCase test\ruby\test_objectspace.rb:10: o = ObjectSpace._id2ref(obj.object_id);\ test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj| test\testunit\collector\test_dir.rb:62: class ObjectSpace test\testunit\collector\test_dir.rb:81: @object_space = ObjectSpace.new test\testunit\collector\test_objectspace.rb:6:require 'test/unit/ collector/objectspace' test\testunit\collector\test_objectspace.rb:11: class TC_ObjectSpace < TestCase test\testunit\collector\test_objectspace.rb:41: @c = ObjectSpace.new(@object_space) test\testunit\collector\test_objectspace.rb:44: def full_suite(name=ObjectSpace::NAME) test\testunit\collector\test_objectspace.rb:51: TestSuite.new(ObjectSpace::NAME) test\testunit\collector\test_objectspace.rb:83: expected = TestSuite.new(ObjectSpace::NAME) test\testunit\collector\test_objectspace.rb:89: expected = TestSuite.new(ObjectSpace::NAME) test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do | klass|
So, in summary, if we exclude those libraries where only tests are affected, this would affect:
Some comments on each of these as they relate to JRuby:
win32-registry: You have no hope of implementing this without JNA anyway, unless there's some Java binding I don't know about. Besides, I couldn't tell you why on Earth win32-registry would need a finalizer.
tk: No one will care. They'll use SWT or Swing bindings. Besides, you would need JNA.
cgi: This could be a problem. Then again, some people say this library should be refactored or tossed.
drb: This could be a big deal.
finalize: Did anyone even know about this? Does anyone use it?
irb: You've got jirb.
shell: This could be a problem.
singleton: Ditto.
tempfile: Meh, I'm guessing Java has its own library for temp files. I never liked the current implementation anyway (which is why I wrote file-temp).
test-unit: Already mentioned.
weakref: You've stated that Java has its own implementation.
which is fabricated - but you get the concept: string in eval maps to live object at run time. when #define_method takes a block this won't be used much i think though...
cheers.
a @ http://codeforpeople.com/ -- it is not enough to be compassionate. you must act. h.h. the 14th dalai lama
Bill Kelly wrote: > I think I've used _id2ref exactly twice. I can't recall the first > usage; I don't think it made it into production code. The most > recent use was to store some ruby object id's in a separate C++ > process, which was able to fire an event back to ruby and provide > the object id for the object to receive the event.
> (I suppose DRb might do something similar?)
Yeah, sounds like that's mostly a "poor man's remote hash". I'd expect that just creating a hash specifically for that purpose and passing a key around would be a "better" way to do it.
_id2ref is just another one of those features that gets rarely used, and whose use cases can often be implemented in "better" ways.
Robert Klemme wrote: > IMHO ObjectSpace should not be implemented in Java land. Why? The JVM > has to keep track of instances anyway and implementing this in Java via > WeakReferences seems to duplicate functionality that is already there. > Did you consider using "Java Virtual Machine Tools Interface"?
> You could either follow the same approach of the heapTracker presented > on that page and use a flag or require a lib that enables ObjectSpace > (because of the overhead of instrumentation).
You just hit on exactly why we don't use JVMTI for ObjectSpace. It would certainly work, but it would add a lot of overhead we'd never expect people to accept in a real application. Plus, it would track far more object instances than we actually want tracked. We'd love to include a JVMTI-based ObjectSpace implementation, however...it just hasn't been a high priority to implement since 99% of users never actually need ObjectSpace.
> Alternatively there may be another method that does not need > instrumentation and that can give you access to every (reachable) object > in the JVM.
If there is...we haven't found it. The "linked weakref list" has been the least overhead so far, and it's still a lot of overhead.
>> Your idea has come up in the past, and it would probably eliminate the >> cost of an ObjectSpace list. However that doesn't appear to be where >> we pay the highest cost.
>> The two items that (we believe) cost the most for us on the JVM are:
>> - Constructing an extra object for every Ruby object...namely, the >> WeakReference object to point to it. So we pay a >> memory/allocation/initialization cost. >> - WeakReference itself causes Java's GC to have to do additional >> checks, so it can notify the WeakReference that the object it points >> at has gone away. So that slows down the legendary HotSpot GC and we >> pay again.
>> I believe the parent -> weakref -> children algorithm is used in some >> implementations of ObjectSpace-like behavior, so it's perfectly valid. >> But again, there's certain aspects of ObjectSpace that are just >> problematic...
>> - threading or concurrency of any kind? No, you can't have >> multithreading with ObjectSpace, nor a concurrent/parallel GC (and it >> potentially excludes other advanced GC designs too). >> - determinism? Matz told me that "ObjectSpace doesn't have to be >> deterministic"...but when it starts getting wired into libraries like >> test/unit, it seems like people expect it to be. If we can say OS >> isn't deterministic, then *nobody* should be relying in its contents >> for core libraries, and we could reasonably claim that each_object >> will never return *anything*.
> I'd reformulate the requirement here: ObjectSpace.each_object must yield > every object that was existent before the invocation and that is > strongly reachable. I believe for the typical use case (e.g. traversing > all class instances) this is enough while leaving enough flexibility for > the implementation (i.e. create s snapshot of some form, iterate through > some internal structure that may change due to new objects being created > during #each_object etc.).
The problem here is "strongly reachable". During ObjectSpace processing, the last strong reference to an object may go away and the garbage collector may run. Should ObjectSpace prevent GC from running if it's traversed and now references that object? If not, how should it be handled if immediately before you return an object from each_object, it gets garbage collected? There's no way to catch that, so each_object may end up returning a reference to an object that's gone away, or reconstituting an object whose finalization has already fired. Bad things happen.
ObjectSpace is just not compatible with any GC that requires the ability to move objects around in memory, run in parallel, and so on. It can *never* be deterministic unless it can "stop the world", so it should not be used for algorithms that require any level of determinism, such as the test search in test/unit.
Daniel Berger wrote: > On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com> > wrote: > <snip>
>> So...I'm writing this to see what the general Ruby world thinks of us >> having ObjectSpace disabled by default, enableable via a command line >> option (or perhaps through a library? -robjectspace?).
> .ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer > self, @@final.call(@hkeyfinal) > ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp) > {File.unlink("tmp.txt")} > ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){| > obj| > ext\Win32API\lib\win32\registry.rb:569: > ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal) > lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self, > Session::callback(@dbprot)) > lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This > means that the dRuby > lib\drb\drb.rb:361: # This, the default implementation, uses an > object's local ObjectSpace > lib\drb\drb.rb:375: ObjectSpace._id2ref(ref) > lib\finalize.rb:59: ObjectSpace.call_finalizer(obj) > lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc) > lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc) > lib\finalize.rb:180: # registering function to > ObjectSpace#add_finalizer > lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc) > lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m| > lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj, > HistorySavingAbility.create_finalizer) > lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do > |io| > lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # => > 0. > lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass} > instance(s)" > lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc) > lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self) > lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self) > lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class) > do |klass| > lib\test\unit\autorunner.rb:54: :objectspace => proc do |r| > lib\test\unit\autorunner.rb:55: require 'test/unit/collector/ > objectspace' > lib\test\unit\autorunner.rb:56: c = > Collector::ObjectSpace.new > lib\test\unit\autorunner.rb:80: @collector = > COLLECTORS[(standalone ? :dir : :objectspace)] > lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir, > file=::File, object_space=::ObjectSpace, req=nil) > lib\test\unit\collector\objectspace.rb:10: class ObjectSpace > lib\test\unit\collector\objectspace.rb:13: NAME = 'collected > from the ObjectSpace' > lib\test\unit\collector\objectspace.rb:15: def > initialize(source=::ObjectSpace) > lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite > for you. It then runs > lib\weakref.rb:16:# ObjectSpace.garbage_collect > lib\weakref.rb:62: ObjectSpace._id2ref(@__id) > lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final > lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final > lib\weakref.rb:98: ObjectSpace.garbage_collect > test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj| > test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj| > test\ruby\test_objectspace.rb:3:class TestObjectSpace < > Test::Unit::TestCase > test\ruby\test_objectspace.rb:10: o = > ObjectSpace._id2ref(obj.object_id);\ > test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj| > test\testunit\collector\test_dir.rb:62: class ObjectSpace > test\testunit\collector\test_dir.rb:81: @object_space = > ObjectSpace.new > test\testunit\collector\test_objectspace.rb:6:require 'test/unit/ > collector/objectspace' > test\testunit\collector\test_objectspace.rb:11: class > TC_ObjectSpace < TestCase > test\testunit\collector\test_objectspace.rb:41: @c = > ObjectSpace.new(@object_space) > test\testunit\collector\test_objectspace.rb:44: def > full_suite(name=ObjectSpace::NAME) > test\testunit\collector\test_objectspace.rb:51: > TestSuite.new(ObjectSpace::NAME) > test\testunit\collector\test_objectspace.rb:83: expected = > TestSuite.new(ObjectSpace::NAME) > test\testunit\collector\test_objectspace.rb:89: expected = > TestSuite.new(ObjectSpace::NAME) > test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do | > klass|
> So, in summary, if we exclude those libraries where only tests are > affected, this would affect:
This could still be supported through a similar mechanism as each_object(Class), by keeping a weak hash of all Module instances.
> shell: This could be a problem.
> lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
I'd be surprised if shell worked 100% correctly right now anyway, due to process-control requirements we can't support well on JVM. But I would also expect this use of each_object to have a "better" implementation, and if not it could again be a specific-purpose weak hash for IO streams (which we almost have already since we want to be able to clean them up on exit.
> singleton: Ditto.
I'd have to look at this one. This could be another good candidate for reimplementation in a lot less Java code; singleton support would be pretty easy to write up in a few lines of Java.
> test-unit: Already mentioned.
So pretty few libraries would be affected, and I don't think any couldn't be dealt with in other ways. And to reiterate: finalizers and _id2ref wouldn't be affected (though I'd prefer to find alternative mechanisms for _id2ref).
I don't think they're making ObjectSpace go away. Just ObjectSpace#each_object.
(I'm not a Jruby developer, so I don't trust the correctness of anything I say.)
On Sun, 28 Oct 2007 22:13:38 +0900, Daniel Berger wrote: > drb: This could be a big deal. > weakref: You've stated that Java has its own implementation.
This uses _id2ref, which doesn't appear to be going away.
> cgi: This could be a problem. Then again, some people say this library > should be refactored or tossed. > finalize: Did anyone even know about this? Does anyone use it? > tempfile: Meh, I'm guessing Java has its own library for temp files. I > never liked the current implementation anyway (which is why I wrote > file-temp).
Finalizers could be implemented using Java's finalize() method for classes that need it. This method of implementing finalizers could probably be compatibly exposed using ObjectSpace.
> shell: This could be a problem.
This looks broken anyway since it uses fork.
> singleton: Ditto.
one is in documentation comment, giving an example of a specific behavior of the library. the other is in the same example, included executably after an if __FILE__ == $0 condition. So no actual problem here.
> irb: You've got jirb.
jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you pointed ou is class (Module) iteration, used for completion, but it's more general iteration than test/unit uses, and #inherited techniques that can be used for test/unit may not work here.
--Ken
-- Ken Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/
> Robert Klemme wrote: >> IMHO ObjectSpace should not be implemented in Java land. Why? The >> JVM has to keep track of instances anyway and implementing this in >> Java via WeakReferences seems to duplicate functionality that is >> already there. Did you consider using "Java Virtual Machine Tools >> Interface"?
>> You could either follow the same approach of the heapTracker presented >> on that page and use a flag or require a lib that enables ObjectSpace >> (because of the overhead of instrumentation).
> You just hit on exactly why we don't use JVMTI for ObjectSpace. It would > certainly work, but it would add a lot of overhead we'd never expect > people to accept in a real application. Plus, it would track far more > object instances than we actually want tracked.
Why is that? I mean, you could selectively decide which instances to track.
> We'd love to include a > JVMTI-based ObjectSpace implementation, however...it just hasn't been a > high priority to implement since 99% of users never actually need > ObjectSpace.
>> Alternatively there may be another method that does not need >> instrumentation and that can give you access to every (reachable) >> object in the JVM.
> If there is...we haven't found it. The "linked weakref list" has been > the least overhead so far, and it's still a lot of overhead.
Did you put them down because of the "stop the world" approach? I'd say that would be ok - at least it's better than not having ObjectSpace. And also, there would be no overhead. Question is only whether it's ok to invoke arbitrary byte code (which would happen during the iteration callback).
>>> Your idea has come up in the past, and it would probably eliminate >>> the cost of an ObjectSpace list. However that doesn't appear to be >>> where we pay the highest cost.
>>> The two items that (we believe) cost the most for us on the JVM are:
>>> - Constructing an extra object for every Ruby object...namely, the >>> WeakReference object to point to it. So we pay a >>> memory/allocation/initialization cost. >>> - WeakReference itself causes Java's GC to have to do additional >>> checks, so it can notify the WeakReference that the object it points >>> at has gone away. So that slows down the legendary HotSpot GC and we >>> pay again.
>>> I believe the parent -> weakref -> children algorithm is used in some >>> implementations of ObjectSpace-like behavior, so it's perfectly >>> valid. But again, there's certain aspects of ObjectSpace that are >>> just problematic...
>>> - threading or concurrency of any kind? No, you can't have >>> multithreading with ObjectSpace, nor a concurrent/parallel GC (and it >>> potentially excludes other advanced GC designs too). >>> - determinism? Matz told me that "ObjectSpace doesn't have to be >>> deterministic"...but when it starts getting wired into libraries like >>> test/unit, it seems like people expect it to be. If we can say OS >>> isn't deterministic, then *nobody* should be relying in its contents >>> for core libraries, and we could reasonably claim that each_object >>> will never return *anything*.
>> I'd reformulate the requirement here: ObjectSpace.each_object must >> yield every object that was existent before the invocation and that is >> strongly reachable. I believe for the typical use case (e.g. >> traversing all class instances) this is enough while leaving enough >> flexibility for the implementation (i.e. create s snapshot of some >> form, iterate through some internal structure that may change due to >> new objects being created during #each_object etc.).
> The problem here is "strongly reachable". During ObjectSpace processing, > the last strong reference to an object may go away and the garbage > collector may run. Should ObjectSpace prevent GC from running if it's > traversed and now references that object? If not, how should it be > handled if immediately before you return an object from each_object, it > gets garbage collected?
You are right: objects can "disappear" (i.e. loose their strong reachability) during traversal. Obviously my suggested requirement was still too strong.
> There's no way to catch that, so each_object may > end up returning a reference to an object that's gone away, or > reconstituting an object whose finalization has already fired. Bad > things happen.
Recreation is a bad idea. I agree, objects that are no longer strongly reachable at the moment they are about to be passed to the block should *not* be passed.
> ObjectSpace is just not compatible with any GC that requires the ability > to move objects around in memory,
I don't think that moving is an issue. If it were, JVM's would not work the way they do (object references are no pointers to memory locations). In other words, all programs would have the same problems #each_object had.
> run in parallel, and so on. It can > *never* be deterministic unless it can "stop the world", so it should > not be used for algorithms that require any level of determinism, such > as the test search in test/unit.
Right you are. #each_object should not be used in regular code - it's more for ad hoc statistics ("how many instances of a class?") and the like.
Ken Bloom wrote: > I don't think they're making ObjectSpace go away. Just > ObjectSpace#each_object.
Correct.
> On Sun, 28 Oct 2007 22:13:38 +0900, Daniel Berger wrote: >> drb: This could be a big deal. >> weakref: You've stated that Java has its own implementation.
> This uses _id2ref, which doesn't appear to be going away.
Not that I wouldn't like it to :)
>> cgi: This could be a problem. Then again, some people say this library >> should be refactored or tossed. >> finalize: Did anyone even know about this? Does anyone use it? >> tempfile: Meh, I'm guessing Java has its own library for temp files. I >> never liked the current implementation anyway (which is why I wrote >> file-temp).
> Finalizers could be implemented using Java's finalize() method for > classes that need it. This method of implementing finalizers could > probably be compatibly exposed using ObjectSpace.
Correct; we do support finalizers already. They weren't actually that hard to support, since as you say Java already supports finalization.
>> shell: This could be a problem.
> This looks broken anyway since it uses fork.
Ahh yes, fork is a killer. We will never, ever support fork.
>> singleton: Ditto.
> one is in documentation comment, giving an example of a specific behavior > of the library. the other is in the same example, included executably > after an if __FILE__ == $0 condition. So no actual problem here.
Whew, that's good to hear. I know singleton is used a bit in Rails, and most people run JRuby on Rails with ObjectSpace disabled...so this seems to fit with your findings.
>> irb: You've got jirb.
> jirb is (at its core) still the Ruby 1.8 IRB codebase. The example you > pointed ou is class (Module) iteration, used for completion, but it's > more general iteration than test/unit uses, and #inherited techniques > that can be used for test/unit may not work here.
inherited, perhaps not. But a JRuby-internal weak list of Module instances would put this one to rest.
Robert Klemme wrote: > On 28.10.2007 17:19, Charles Oliver Nutter wrote: >> You just hit on exactly why we don't use JVMTI for ObjectSpace. It >> would certainly work, but it would add a lot of overhead we'd never >> expect people to accept in a real application. Plus, it would track >> far more object instances than we actually want tracked.
> Why is that? I mean, you could selectively decide which instances to > track.
Actually, we do that a bit already. For example, we do not track arrays constructed during argument processing, since they are typically transient. The problem is that we could only choose to track all Ruby objects, for example...which would cripple other JRuby apps running in the same process.
In general, though, we haven't explored JVMTI because we want JRuby to be the best production environment for deploying apps, and nobody will EVER turn on JVMTI on their production servers.
>>> Alternatively there may be another method that does not need >>> instrumentation and that can give you access to every (reachable) >>> object in the JVM.
>> If there is...we haven't found it. The "linked weakref list" has been >> the least overhead so far, and it's still a lot of overhead.
I was referring to non-JVMTI solutions, but you're right, JVMTI does provide this capability.
> Did you put them down because of the "stop the world" approach? I'd say > that would be ok - at least it's better than not having ObjectSpace. And > also, there would be no overhead. Question is only whether it's ok to > invoke arbitrary byte code (which would happen during the iteration > callback).
Is it really ok? You need to remember that JRuby opens up the possibility of running many, many applications in the same process, as well as asynchronous algorithms with true parallel threads. We can't expect people to cripple all that so they can walk EVERY object in the system. "Stop the world" is awful when you start breaking the ability to do many things in parallel, as you can in JRuby.
But it may be that for cases where each_object is needed, this is a reasonable thing to do. I think if someone were to submit an implementation of each_object that uses JVMTI, we would certainly accept it :)
>> ObjectSpace is just not compatible with any GC that requires the >> ability to move objects around in memory,
> I don't think that moving is an issue. If it were, JVM's would not work > the way they do (object references are no pointers to memory locations). > In other words, all programs would have the same problems #each_object > had.
The problem is not so much that the object references move as that you would have to lock the memory locations for some period of time to be able to walk the object table. And I think that's *bad* especially when we're looking at JRuby allowing folks to run dozens of apps in the same process and memory space out of the box. We can't lock things down like that.
As far as I know, shell isn't used extensively. From reading the source, it appears to be very much linked to the host system's processes, files, etc, which may be inappropriate for JRuby anyways (I'm guessing here).
Charles Oliver Nutter wrote: > Actually, we do that a bit already. For example, we do not track arrays > constructed during argument processing, since they are typically > transient. The problem is that we could only choose to track all Ruby > objects, for example...which would cripple other JRuby apps running in > the same process.
[...]
> The problem is not so much that the object references move as that you > would have to lock the memory locations for some period of time to be > able to walk the object table. And I think that's *bad* especially when > we're looking at JRuby allowing folks to run dozens of apps in the same > process and memory space out of the box. We can't lock things down like > that.
Sorry for the extremely uninitiated and naive question - but when you're about to enumerate each object in an application, aren't you interested only in this application's objects anyway? So why would you have to lock anything about the other ruby apps in the same process? Is that kind of distinguishing objects impossible on the GC/enumeration level?
Charles Oliver Nutter wrote: > Daniel Berger wrote: >> irb: You've got jirb. >> lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
> This could still be supported through a similar mechanism as > each_object(Class), by keeping a weak hash of all Module instances.
>> shell: This could be a problem. >> lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
> I'd be surprised if shell worked 100% correctly right now anyway, due to > process-control requirements we can't support well on JVM. But I would > also expect this use of each_object to have a "better" implementation, > and if not it could again be a specific-purpose weak hash for IO streams > (which we almost have already since we want to be able to clean them up > on exit.
Speaking of multiple cases of possible class-specific instance tracking... isn't it possible to register your interest in some such class at some point explicitely from program code - and then any class could be made enumerable.
On Oct 28, 9:19 am, Charles Oliver Nutter <charles.nut...@sun.com> wrote: <snip>
> ObjectSpace is just not compatible with any GC that requires the ability > to move objects around in memory, run in parallel, and so on. It can > *never* be deterministic unless it can "stop the world", so it should > not be used for algorithms that require any level of determinism, such > as the test search in test/unit.
This is the exact reason we haven't yet implemented each_object in Rubinius yet.
Having a generational GC that moves objects, iterating over all objects is very, very non-deterministic unless the GC is totally turned off while objects are walked.
Thats at least an option we have that we may roll with for the initial release, but it's less than ideal.
I think of each_object as very much a MRI implementation feature that the rest of us implementors struggle to implement. Because of this, the community and core members of each implementation need to really beginning discussing whether or not each_object is a Ruby feature or an MRI feature.
2007/10/28, Charles Oliver Nutter <charles.nut...@sun.com>:
> Robert Klemme wrote: > > On 28.10.2007 17:19, Charles Oliver Nutter wrote: > >> You just hit on exactly why we don't use JVMTI for ObjectSpace. It > >> would certainly work, but it would add a lot of overhead we'd never > >> expect people to accept in a real application. Plus, it would track > >> far more object instances than we actually want tracked.
> > Why is that? I mean, you could selectively decide which instances to > > track. > In general, though, we haven't explored JVMTI because we want JRuby to > be the best production environment for deploying apps, and nobody will > EVER turn on JVMTI on their production servers.
Well, it depends on the overhead and on the invocation model. I assumed you would be starting a JVM per process but your other remarks sound more like there is one JVM for JRuby programs...
> > Did you put them down because of the "stop the world" approach? I'd say > > that would be ok - at least it's better than not having ObjectSpace. And > > also, there would be no overhead. Question is only whether it's ok to > > invoke arbitrary byte code (which would happen during the iteration > > callback).
> Is it really ok? You need to remember that JRuby opens up the > possibility of running many, many applications in the same process, as > well as asynchronous algorithms with true parallel threads. We can't > expect people to cripple all that so they can walk EVERY object in the > system. "Stop the world" is awful when you start breaking the ability to > do many things in parallel, as you can in JRuby.
Ok, I see I need to dive further into JRuby before I discuss this further. :-)
> But it may be that for cases where each_object is needed, this is a > reasonable thing to do. I think if someone were to submit an > implementation of each_object that uses JVMTI, we would certainly accept > it :)
Hint, hint... :-)
> >> ObjectSpace is just not compatible with any GC that requires the > >> ability to move objects around in memory,
> > I don't think that moving is an issue. If it were, JVM's would not work > > the way they do (object references are no pointers to memory locations). > > In other words, all programs would have the same problems #each_object > > had.
> The problem is not so much that the object references move as that you > would have to lock the memory locations for some period of time to be > able to walk the object table. And I think that's *bad* especially when > we're looking at JRuby allowing folks to run dozens of apps in the same > process and memory space out of the box. We can't lock things down like > that.
I don't understand this remark of yours. If you implement this in Java land (as you did apparently with WeakReferences) then there is no need to lock anything. You just traverse the list (or a copy of the list) and if a ref has been set to null you do not pass it to the callback.
If it is some kind of native code (possibly via JNI or other interfaces) probably more care has to be taken, although I'd assume that JNI takes care of this (i.e. once the callback is invoked with a non null argument the object stays life until after the callback returns unless you clear that reference of course).
Traversal during #each_object in that respect is similar to traversal through an ordinary collection - during that a GC can occur just the same but that does not affect the traversal in any way.
mortee wrote: > Charles Oliver Nutter wrote: >> Actually, we do that a bit already. For example, we do not track arrays >> constructed during argument processing, since they are typically >> transient. The problem is that we could only choose to track all Ruby >> objects, for example...which would cripple other JRuby apps running in >> the same process.
> [...]
>> The problem is not so much that the object references move as that you >> would have to lock the memory locations for some period of time to be >> able to walk the object table. And I think that's *bad* especially when >> we're looking at JRuby allowing folks to run dozens of apps in the same >> process and memory space out of the box. We can't lock things down like >> that.
> Sorry for the extremely uninitiated and naive question - but when you're > about to enumerate each object in an application, aren't you interested > only in this application's objects anyway? So why would you have to lock > anything about the other ruby apps in the same process? Is that kind of > distinguishing objects impossible on the GC/enumeration level?
As far as I know there's no way to have JVMTI enumerate only objects created by a specific application in a given JVM. So any sort of ObjectSpace impl based on it would have to take that into consideration.
evanw...@gmail.com wrote: > I think of each_object as very much a MRI implementation feature that > the rest of us > implementors struggle to implement. Because of this, the community and > core members of > each implementation need to really beginning discussing whether or not > each_object is a > Ruby feature or an MRI feature.
That's actually a really good point. each_object is more a feature of an individual implementation's memory model than a general feature that can be applied to every Ruby implementation. In many cases, like ours, you simply don't have control over that memory model enough to provide a real each_object implementation (and _id2ref requires tricks too, but it's at least bounded and explicit). So it may be fair to say that each_object is an MRI feature we emulate, but cannot simulate well enough for it to translate appropriately.
> mortee wrote: > > Charles Oliver Nutter wrote: > >> Actually, we do that a bit already. For example, we do not track arrays > >> constructed during argument processing, since they are typically > >> transient. The problem is that we could only choose to track all Ruby > >> objects, for example...which would cripple other JRuby apps running in > >> the same process.
> > [...]
> >> The problem is not so much that the object references move as that you > >> would have to lock the memory locations for some period of time to be > >> able to walk the object table. And I think that's *bad* especially when > >> we're looking at JRuby allowing folks to run dozens of apps in the same > >> process and memory space out of the box. We can't lock things down like > >> that.
> > Sorry for the extremely uninitiated and naive question - but when you're > > about to enumerate each object in an application, aren't you interested > > only in this application's objects anyway? So why would you have to lock > > anything about the other ruby apps in the same process? Is that kind of > > distinguishing objects impossible on the GC/enumeration level?
> As far as I know there's no way to have JVMTI enumerate only objects > created by a specific application in a given JVM. So any sort of > ObjectSpace impl based on it would have to take that into consideration.
Hm, if you host different applications in the same JVM you probably need separate class loaders anyway to separate changes on classes. Maybe you can use that to partition the heap. Alternatively you could use IterateOverObjectsReachableFromObject() and start from main. Just a few wild guesses.
Btw, but the issue with stopping the world would still not go away. Too bad. A possible solution would be to implement the callback in a way that it places all references in a Java collection. Only after it finishes the Ruby land callback is invoked for each instance. The downside is that you need more space (i.e. for the collection which could become largish) but on the plus side is that you do not have any overhead (other than incurred by JVMTI) during "normal" operation and you can limit the stop the world time to just the copying phase which might be acceptable. Charles, what do you think?
> Robert Klemme wrote: > > On 28.10.2007 17:19, Charles Oliver Nutter wrote: > >> You just hit on exactly why we don't use JVMTI for ObjectSpace. It > >> would certainly work, but it would add a lot of overhead we'd never > >> expect people to accept in a real application. Plus, it would track > >> far more object instances than we actually want tracked.
> > Why is that? I mean, you could selectively decide which instances to > > track.
> Actually, we do that a bit already. For example, we do not track arrays > constructed during argument processing, since they are typically > transient. The problem is that we could only choose to track all Ruby > objects, for example...which would cripple other JRuby apps running in > the same process.
> In general, though, we haven't explored JVMTI because we want JRuby to > be the best production environment for deploying apps, and nobody will > EVER turn on JVMTI on their production servers.
> >>> Alternatively there may be another method that does not need > >>> instrumentation and that can give you access to every (reachable) > >>> object in the JVM.
> >> If there is...we haven't found it. The "linked weakref list" has been > >> the least overhead so far, and it's still a lot of overhead.
> I was referring to non-JVMTI solutions, but you're right, JVMTI does > provide this capability.
> > Did you put them down because of the "stop the world" approach? I'd say > > that would be ok - at least it's better than not having ObjectSpace. And > > also, there would be no overhead. Question is only whether it's ok to > > invoke arbitrary byte code (which would happen during the iteration > > callback).
> Is it really ok? You need to remember that JRuby opens up the > possibility of running many, many applications in the same process, as > well as asynchronous algorithms with true parallel threads. We can't > expect people to cripple all that so they can walk EVERY object in the > system. "Stop the world" is awful when you start breaking the ability to > do many things in parallel, as you can in JRuby.
> But it may be that for cases where each_object is needed, this is a > reasonable thing to do.
Exactly. I think that each_object rarely has to go into production code, but is very handy (and, to be honest, just fun, really) in debugging/testing/experimenting. For those type situations, I don't really think a "stop the world" approach is so terrible. I find it less of a disturbance than having this off-code switch.
> implementation of each_object that uses JVMTI, we would certainly accept > it :)
> >> ObjectSpace is just not compatible with any GC that requires the > >> ability to move objects around in memory,
> > I don't think that moving is an issue. If it were, JVM's would not work > > the way they do (object references are no pointers to memory locations). > > In other words, all programs would have the same problems #each_object > > had.
> The problem is not so much that the object references move as that you > would have to lock the memory locations for some period of time to be > able to walk the object table. And I think that's *bad* especially when > we're looking at JRuby allowing folks to run dozens of apps in the same > process and memory space out of the box. We can't lock things down like > that.