many of the lessons learned from LOOM still apply, in my opinion. Just
imagine that you need an image larger than 4GB on a 32 bit machine
(there are still many of those around). I don't think pointer sizes
are as much of a problem for directly adopting LOOM today as the
widespread use of direct pointers instead of object tables. But it
would be certainly possible to adapt things using the "pointer
swizzling" from the Texas guys:
http://portal.acm.org/citation.cfm?id=1267991.1267999
Though LOOM was created to fix several problems of the previous
design, OOZE, that was a system that was actually used in production
for several years and you might find some interesting ideas there:
http://www-cs-students.stanford.edu/~eswierk/misc/kaehler81/
A very advanced virtual memory system for Smalltalk was the one
developed for the Mushroom project in the early 1990s. Several papers
describe different aspects of it at:
http://www.wolczko.com/mushroom/index.html
-- Jecel
Mariano,
many of the lessons learned from LOOM still apply, in my opinion. Just
imagine that you need an image larger than 4GB on a 32 bit machine
(there are still many of those around).
I don't think pointer sizes
are as much of a problem for directly adopting LOOM today as the
widespread use of direct pointers instead of object tables.
But it
would be certainly possible to adapt things using the "pointer
swizzling" from the Texas guys:
http://portal.acm.org/citation.cfm?id=1267991.1267999
Though LOOM was created to fix several problems of the previous
design, OOZE, that was a system that was actually used in production
for several years and you might find some interesting ideas there:
http://www-cs-students.stanford.edu/~eswierk/misc/kaehler81/
A very advanced virtual memory system for Smalltalk was the one
developed for the Mushroom project in the early 1990s. Several papers
describe different aspects of it at:
http://www.wolczko.com/mushroom/index.html
I was not able to find a copy of that paper. The ACM link offers only
the bibliographic data.
Do you have an alternative link, or could you provide us at least with
the abstract?
How does your interest relate to for instance NXTalk?
http://www.hpi.uni-potsdam.de/hirschfeld/projects/nxtalk/index.html
If I remember correctly, these Lego devices are also using two
different kinds of memory, a fast and small RAM and some flash memory.
But I don't remember well, who that was solved in NXTalk.
Best regards
Stefan
On Feb 9, 4:16 pm, Mariano Martinez Peck <marianop...@gmail.com>
wrote:
Hi Mariano:
I was not able to find a copy of that paper. The ACM link offers only
the bibliographic data.
Do you have an alternative link, or could you provide us at least with
the abstract?
How does your interest relate to for instance NXTalk?
http://www.hpi.uni-potsdam.de/hirschfeld/projects/nxtalk/index.html
> However, my idea is not to
> have a complete different VM neither image. It would be cool to have a
> "standard" VM that can have a mechanism to act as Virtual Memory. This can
> be VERY valuated in such devices, but probably also useful for a desktop
> application. Suppose you are deploying a seaside app. The image takes 50 mb.
> Why ? ok, that's not too much....but if you have several images
> running...that can be a lot. And what percentage of the image is really used
> ? why I should use 50 if maybe I just use 10 ?
Hm, am not entirely sure, but thats sounds much more like you are
looking for a garbage collector which is aware of paging/virtual
memory provided by the operating system. Even so, I think that VMs
should take more responsibilities and OS should be more like an
extended hypervisor, wouldn't it be a practical solution to have
something like a generational GC which might also take the history of
reads/writes into account when it moves objects in memory? I would
guess, that such a GC would allow the OS to swap out unused stuff to
the disk easily.
From the top of my head, I don't remember any GC which did such a
thing, but the lecture I followed on the IBM Metronome GC gave me the
impression, that they do very similar stuff, to what I vaguely
remember from my OS lectures (http://www.cs.uoregon.edu/research/
summerschool/summer09/curriculum.html see David Bacon).
> I know you are developing VMs also (or similar). I have your rss and I
> saw your paper the other day....it is in my toreads
Well, kind of related to your question, too: http://doi.acm.org/10.1145/1640134.1640149
Here, David has divided the heap in a read/write and read-mostly heap.
For our usecase that works well, for your I think you need a more
sophisticated solution. And, well, I have ported that VM to standard
multi-core hardware, nothing more to report on for the moment :-/
On Feb 9, 9:07 pm, Mariano Martinez Peck <marianop...@gmail.com>
wrote:
Thanks.
Hm, am not entirely sure, but thats sounds much more like you are
> However, my idea is not to
> have a complete different VM neither image. It would be cool to have a
> "standard" VM that can have a mechanism to act as Virtual Memory. This can
> be VERY valuated in such devices, but probably also useful for a desktop
> application. Suppose you are deploying a seaside app. The image takes 50 mb.
> Why ? ok, that's not too much....but if you have several images
> running...that can be a lot. And what percentage of the image is really used
> ? why I should use 50 if maybe I just use 10 ?
looking for a garbage collector which is aware of paging/virtual
memory provided by the operating system.
Even so, I think that VMs
should take more responsibilities and OS should be more like an
extended hypervisor, wouldn't it be a practical solution to have
something like a generational GC which might also take the history of
reads/writes into account when it moves objects in memory?
I would
guess, that such a GC would allow the OS to swap out unused stuff to
the disk easily.
From the top of my head, I don't remember any GC which did such a
thing, but the lecture I followed on the IBM Metronome GC gave me the
impression, that they do very similar stuff, to what I vaguely
remember from my OS lectures (http://www.cs.uoregon.edu/research/
summerschool/summer09/curriculum.html see David Bacon).
Well, kind of related to your question, too: http://doi.acm.org/10.1145/1640134.1640149
> I know you are developing VMs also (or similar). I have your rss and I
> saw your paper the other day....it is in my toreads
Here, David has divided the heap in a read/write and read-mostly heap.
For our usecase that works well, for your I think you need a more
sophisticated solution. And, well, I have ported that VM to standard
multi-core hardware, nothing more to report on for the moment :-/
Gemstone was something that I considered mentioning, but since I don't
have a good reference for it handy, I didn't. It is certainly worth
looking into.
Virtual memory will only help with the problem of limited RAM if you
have a device to swap to, either locally on over a fast network. That
would be the case for a PDA if it had something like 8MB of RAM and
64MB of Flash but you couldn't directly address the Flash for some
reason. It certainly isn't true of something like the NXT and many
other embedded devices.
When using virtual memory to deal with limited RAM, having an
intermediate level using compressed data in RAM normally helps.
Craig Latta's work on Spoon might be interesting for you, though there
is little documentation about it, unfortunately.
http://netjam.org/projects/spoon/
His patches to the Squeak VM are very small. The idea is that two or
more images can talk to each other and move objects between them. So
you can have something like a virtual memory where the "swap device"
is actually another, much larger and fully running Squeak image
instead of a simple file.
I have done a lot of work on really small implementations, but those
involved hardware, virtual machine and image format changes. Just
going from 32 bits to 16 saves about 33% of the RAM. I used OOZE style
pointer encoding to eliminate the object headers. I don't think any of
that stuff would help with your goals.
About the LOOM paper, the system is also described in chapter 14
(pages 251 to 271) of "Smalltalk-80: Bits of History, Words of
Advice" (the "green book") which is available online at
http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/
-- Jecel
Mariano,
Gemstone was something that I considered mentioning, but since I don't
have a good reference for it handy, I didn't. It is certainly worth
looking into.
Mariano,
Gemstone was something that I considered mentioning, but since I don't
have a good reference for it handy, I didn't. It is certainly worth
looking into.
Virtual memory will only help with the problem of limited RAM if you
have a device to swap to, either locally on over a fast network. That
would be the case for a PDA if it had something like 8MB of RAM and
64MB of Flash but you couldn't directly address the Flash for some
reason. It certainly isn't true of something like the NXT and many
other embedded devices.
When using virtual memory to deal with limited RAM, having an
intermediate level using compressed data in RAM normally helps.
Craig Latta's work on Spoon might be interesting for you, though there
is little documentation about it, unfortunately.
http://netjam.org/projects/spoon/
His patches to the Squeak VM are very small. The idea is that two or
more images can talk to each other and move objects between them. So
you can have something like a virtual memory where the "swap device"
is actually another, much larger and fully running Squeak image
instead of a simple file.
I have done a lot of work on really small implementations, but those
involved hardware, virtual machine and image format changes. Just
going from 32 bits to 16 saves about 33% of the RAM.
I used OOZE style
pointer encoding to eliminate the object headers. I don't think any of
that stuff would help with your goals.
About the LOOM paper, the system is also described in chapter 14
(pages 251 to 271) of "Smalltalk-80: Bits of History, Words of
Advice" (the "green book") which is available online at
http://stephane.ducasse.free.fr/FreeBooks/BitsOfHistory/
> > When using virtual memory to deal with limited RAM, having an
> > intermediate level using compressed data in RAM normally helps.
>
> compressing which kind of data ? internal representation of the object model
> ?
In the case of a paged virtual memory system, it would be the pages
that would be compressed. This works well if the processor is fast
compared to the speeds of the network or the local swap device. The
main RAM is divided into two regions: normal data and a cache for
compressed pages. When a page is supposed to be sent back to the
backing store, it is instead compressed and saved in the cache. If it
is not used for a while then it actually goes to the swap file, but if
it is requested again before that then it can simply be uncompressed.
The sizes of the two regions have to be carefully chosen or this
scheme can perform worse than a simple virtual memory. If done right,
however, it can perform the same as a machine with twice as much
physical RAM.
For the design I am working on for Squeak, the "swap file" is actually
a large number of smaller files stored in a compressed format. If any
object in one of these is needed, the whole file is brought into a
special "compressed segments" region of RAM and the individual objects
are expanded as needed. In LOOM terms, this is like converting a whole
group of objects from "not resident" to leafs. Unlike pages, objects
have different sizes so changing their contents and compressing will
probably result in something that won't fit where the original
compressed object was. No problem - I just save it elsewhere like all
good journalling systems should.
> > [work on tiny images - 16 bits]
>
> This work you did was "recently" or several years ago ? I ask because I
> wonder how you manage to work with 16 bits pointers while the world of the
> OS and processor is 32 or even 64.
This was done in 2001 and was a real product rather than a research
project. The machine had only 512KB of Flash but a full 8MB of RAM, so
some of my constraint were the opposite of what we normally had. I
used a full class/method table for lookups, for example. That is
extremely wasteful (normally more than 80% of the entries will point
to "Does Not Understand") but it gave me a guaranteed 1 clock cycle
message send. Even with compression, I wouldn't get more than 32K
objects in 512KB so 16 bit pointers were a nice fit. The processor was
one that I designed myself, so if other people are doing 32 and 64
bits it didn't matter to me.
http://www.merlintec.com:8080/hardware/Oliver
-- Jecel
Mariano,
In the case of a paged virtual memory system, it would be the pages
> > When using virtual memory to deal with limited RAM, having an
> > intermediate level using compressed data in RAM normally helps.
>
> compressing which kind of data ? internal representation of the object model
> ?
that would be compressed. This works well if the processor is fast
compared to the speeds of the network or the local swap device. The
main RAM is divided into two regions: normal data and a cache for
compressed pages. When a page is supposed to be sent back to the
backing store, it is instead compressed and saved in the cache. If it
is not used for a while then it actually goes to the swap file, but if
it is requested again before that then it can simply be uncompressed.
The sizes of the two regions have to be carefully chosen or this
scheme can perform worse than a simple virtual memory. If done right,
however, it can perform the same as a machine with twice as much
physical RAM.
For the design I am working on for Squeak,
the "swap file" is actually
a large number of smaller files stored in a compressed format.
If any
object in one of these is needed, the whole file is brought into a
special "compressed segments" region of RAM and the individual objects
are expanded as needed.
In LOOM terms, this is like converting a whole
group of objects from "not resident" to leafs.
Unlike pages, objects
have different sizes so changing their contents and compressing will
probably result in something that won't fit where the original
compressed object was. No problem - I just save it elsewhere like all
good journalling systems should.
> > [work on tiny images - 16 bits]
>This was done in 2001 and was a real product rather than a research
> This work you did was "recently" or several years ago ? I ask because I
> wonder how you manage to work with 16 bits pointers while the world of the
> OS and processor is 32 or even 64.
project. The machine had only 512KB of Flash but a full 8MB of RAM, so
some of my constraint were the opposite of what we normally had. I
used a full class/method table for lookups, for example.
That is
extremely wasteful (normally more than 80% of the entries will point
to "Does Not Understand") but it gave me a guaranteed 1 clock cycle
message send. Even with compression, I wouldn't get more than 32K
objects in 512KB so 16 bit pointers were a nice fit. The processor was
one that I designed myself, so if other people are doing 32 and 64
bits it didn't matter to me.
http://www.merlintec.com:8080/hardware/Oliver
Something like this, I imagine:
http://www.phdcomics.com/comics/archive.php?comicid=1275
There is indeed a lot of material to study, and just looking at the
most recent ones isn't always best. As Alan Kay likes to say, it is
important to also know the original ideas and not just what someone
many years later thinks they were.
> > For the design I am working on for Squeak,
>
> You are working now on that ? Is that something related to SiliconSqueak ?
It is more or less related: the first few machines with SiliconSqueak
will just run normal images with a patched up SqueakNOS.
Unfortunately, I don't think what we have now can be considered end
user friendly. There needs to be lots of improvement before I can
serious tell someone to consider using Smalltalk instead of Mac OS X.
One way to do this is to just add some fancy layers on top of the
current system, but I prefer the alternative which is to clean up the
foundation so that making the higher levels friendlier is a reasonably
small task.
> Did you take a look to parcels ? It is in my to-do also :)
I have read a bit about it and used VisualWorks a little, but I
haven't studied the details of their implementation.
> > If any
> > object in one of these is needed, the whole file is brought into a
> > special "compressed segments" region of RAM and the individual objects
> > are expanded as needed.
>
> I thought having something similar, to swap unused objects into an
> ImageSegment, but one of the problems I thought is that, if one of those
> object is needed, is likely the whole segment will be loaded in memory.
Exactly. But as technology changes, some of the rules we are used to
also change. For the Amoeba operating system in the early 1990s, the
designers noticed that, given the sizes of most files and the RAM in
workstations, it was better to transfer whole file from and to the
file server than individual blocks like NFS and such did. It seemed to
me that their case was even more compelling if all of the files
happened to be compressed.
http://www.cs.vu.nl/pub/amoeba/
> > [full selector/class table for 16 bit 2001 project]
>
> Can you explain me please a bit more this ? Nowadays, there is no such a
> table for method loockups, isn't it ? They just ask the parent class and
> goes up in the chain. I know there is a shared cache somewhere.
> What did you put in that table ?
Yes, there is normally a very small global cache to speed up the
message lookups. If you eliminate the cache, things will be slower but
will still work. This is just one way to do things, however. The Self,
Java Hotspot and StrongTalk VMs (among others) use what is called an
"inline cache" and "polymophic inline cache" (PICs) that are not
global. We will probably see this in Squeak pretty soon too.
A full selector/class table is just a 2D array where each row
corresponds to one selector and each column to one class. So, given
the selector and the receiver's class you can just go directly to the
corresponding element in the array. That contains the pointer to the
actual code to be executed, or it can point to the "does not
understand" code if that combination is not valid. This table can take
up several megabytes in a small, 16 bit system and has 80% or more of
its elements pointing to DNU, so is a fast but extremely wasteful
solution. It so happened that I had a few MB that would have been
wasted anyway.
For the Squeak 3.9 image I use for email, there are 3529 classes and
45800 selectors (ByteSymbols, actually, but for a quick calculation
that is good enough). So we would need an array with 161 million
entries, only a tiny fraction of which we would actually use. I don't
recommend this as a general solution. There are several great papers
comparing the alternatives and how well they deal with different
hardware in the "Dynamic Dispatch" section of this page:
http://www.cs.ucsb.edu/~urs/oocsb/papers.shtml
I don't think that will help your own research very much, but might be
very useful for others.
> Very interesting project! you even designed the processor ? hahahaha
> Cool.
Alan Kay says "People who are really serious about software should
make their own hardware." near the bottom of
http://folklore.org/StoryView.py?project=Macintosh&story=Creative_Think.txt
Sounds about right to me. Of course, he warns us against "reverse
vandalism" where you do stuff because you can independently of whether
it is a good idea or not. So just because I *can* design my own
processor doesn't mean that I *should*. If an ARM or a PIC will get
the job done, then it is much better than that. Though I ended up
cancelling that 16 bit Forth-like Smalltalk processor for non
technical reasons, I feel it was a worthy alternative to anything that
is out there. I hope that will be the case for SiliconSqueak as well.
-- Jecel
> On Feb 11, 11:19 am, Mariano Martinez Peck wrote:
>>> [compressed objects idea]
>>
>> This sounds very interesting. Even more if I combine it to the scheme I have
>> been reading about GC with different memory regions, for different kind of
>> memories and having policies to allocate objects. Thanks a lot. I am
>> creating a big list of interesting ideas hahaha.
>
> Something like this, I imagine:
>
> http://www.phdcomics.com/comics/archive.php?comicid=1275
>
> There is indeed a lot of material to study, and just looking at the
> most recent ones isn't always best. As Alan Kay likes to say, it is
> important to also know the original ideas and not just what someone
> many years later thinks they were.
Yes, but ignoring all papers published after 1970 is not a good idea, either.
Marcus
--
Marcus Denker -- http://www.marcusdenker.de
INRIA Lille -- Nord Europe. Team RMoD.
Indeed! Chronological snobbery (as C. S. Lewis calls it) goes both
ways. In the past you could argue that references would only take you
back, but today it is just as easy to go forward. If you want to
research about subjective programming (you can give it your own name,
like "worlds", before you look into what others are doing) then you
might want to start with one of the classic "PIE" papers:
http://portal.acm.org/citation.cfm?id=802792
Right there you can see 15 other papers that cite this one, so you can
not only see where stuff came from but what it evolved into. You can
get a list of 46 papers citing the classic easily enough:
http://scholar.google.com/scholar?cites=5668487159081112111&hl=en&as_sdt=2000
There is no excuse for not being aware of what is being done today.
-- Jecel
> On Feb 15, 7:00 pm, Marcus Denker wrote:
>> On Feb 15, 2010, at 9:51 PM, Jecel wrote:
>>> [Alan says: read the original papers]
>>
>> Yes, but ignoring all papers published after 1970 is not a good idea, either.
>
...
>
> There is no excuse for not being aware of what is being done today.
>
Well said!
But I need to stress: Only of what is done *and* published...
(just trying to find energy for starting up my latex editor again...)
I know that feeling :)