Instance Descriptors (DescriptorArray)

116 views
Skip to first unread message

sqrt...@googlemail.com

unread,
Jun 6, 2012, 4:14:37 AM6/6/12
to v8-...@googlegroups.com
Hi guys,

can anyone tell me what the instance descriptors are used for and how they are initiliazed? I want to store additional information on objects - that's when I came across bit_field3 which would be fine with me, as it only needs one bit and not an int (as far as I can grasp). When investigang further I stumpled across the descriptors where the comment only states that they store instance descriptors which did not help me all that much :-)

I appreciate the help,
 Ben

Toon Verwaest

unread,
Jun 6, 2012, 4:51:19 AM6/6/12
to v8-...@googlegroups.com
Hi Ben,

the instance descriptors obviously contain descriptors that describe instances ;-) They (currently) are used to store two different concepts: properties descriptors and map transitions.

The property descriptors describe what properties look like, and how they are stored, within instances of the current map.

Transitions mean that you used an object in a way that its current map did not support, hence we have to transition to a new map. The transitions are stored in the descriptors so we can share maps with the same semantics between instances; which, in addition to the reduced memory overhead, is necessary for effective inline caching. The transitions can be map transitions (added a property), callback transitions (added setters and/or getters) and elements transitions (when storing for example a double in an array that until now only contained Smis).

Basically whenever you do something like "obj.property = value" on an object that previously didn't have "property", a new map is created that contains the new property in its descriptor array. The obj will use this map as its map from then on. At the same time, the descriptor array of the old map is modified to contain a map transition to this new map; under the name "property", so that instances similar to the old obj can also use the new map if they get the "property" added. Finally, the new map gets a BackPointer (stored where the prototype transitions are potentially stored) to the old map for incremental marking.

Since the descriptor array is stored in the location where bit_field3 is stored, we move bit_field3 into the descriptor array when such an array is present. 

To support enumeration of properties in the order of addition (for for-in-loops), the descriptor array also keeps track of the order of addition of its properties. For this reason it also contains an enumeration index; and potentially an enumeration cache.

If all you want to do is add information to bit_field3, you should be able to do so without knowing much about all this machinery, however.

I hope that helps,
Toon

sqrt...@googlemail.com

unread,
Jun 6, 2012, 9:57:03 AM6/6/12
to v8-...@googlegroups.com
Hi Toon,

I greatly appreciate the input - that helped me, thank you. 

The basic idea is to have a taint tracking tag added to the unused 30bits of bit_field3. I also want to pass on that tag when e.g. appending strings (I'm just at the start of the project and this is a nice little test) - can you point in the direction of the proper function for that? I added debug logging to Heap::AllocateStringFromAscii and Heap::AllocateConsString but those did not trigger. The idea of what I want to get can be grasped in the JS example below. 

a=document.title;
a+=" some other string is appended";
document.title=a;

In the DOM I want to be able to see that what I wrote in document.title actually in parts came from there as well :-)

Cheers,
 Ben

Am Mittwoch, 6. Juni 2012 10:51:19 UTC+2 schrieb Toon Verwaest:
Hi Ben,

the instance descriptors obviously contain descriptors that describe instances ;-) They (currently) are used to store two different concepts: properties descriptors and map transitions.

The property descriptors describe what properties look like, and how they are stored, within instances of the current map.

Transitions mean that you used an object in a way that its current map did not support, hence we have to transition to a new map. The transitions are stored in the descriptors so we can share maps with the same semantics between instances; which, in addition to the reduced memory overhead, is necessary for effective inline caching. The transitions can be map transitions (added a property), callback transitions (added setters and/or getters) and elements transitions (when storing for example a double in an array that until now only contained Smis).

Basically whenever you do something like "obj.property = value" on an object that previously didn't have "property", a new map is created that contains the new property in its descriptor array. The obj will use this map as its map from then on. At the same time, the descriptor array of the old map is modified to contain a map transition to this new map; under the name "property", so that instances similar to the old obj can also use the new map if they get the "property" added. Finally, the new map gets a BackPointer (stored where the prototype transitions are potentially stored) to the old map for incremental marking.

Since the descriptor array is stored in the location where bit_field3 is stored, we move bit_field3 into the descriptor array when such an array is present. 

To support enumeration of properties in the order of addition (for for-in-loops), the descriptor array also keeps track of the order of addition of its properties. For this reason it also contains an enumeration index; and potentially an enumeration cache.

If all you want to do is add information to bit_field3, you should be able to do so without knowing much about all this machinery, however.

I hope that helps,
Toon

Erik Corry

unread,
Jun 7, 2012, 8:07:22 AM6/7/12
to v8-...@googlegroups.com
On Wed, Jun 6, 2012 at 3:57 PM, <sqrt...@googlemail.com> wrote:
> Hi Toon,
>
> I greatly appreciate the input - that helped me, thank you.
>
> The basic idea is to have a taint tracking tag added to the unused 30bits of
> bit_field3. I also want to pass on that tag when e.g. appending strings (I'm
> just at the start of the project and this is a nice little test) - can you
> point in the direction of the proper function for that? I added debug
> logging to Heap::AllocateStringFromAscii and Heap::AllocateConsString but

We don't always go into these C++ routines. The generated code can
create strings. Search for string_map in the src/ia32 subdirectory to
see examples.

> those did not trigger. The idea of what I want to get can be grasped in the
> JS example below.

The instance descriptors that Toon describes are for JS Objects.
These are 'real objects' in the JS sense that you can attach arbitrary
properties to.

The strings are primitive objects that you cannot attach arbitrary
properties to. They do not have identity: the == and === operators
just test character-for-character equivalence they don't tell you if
two objects have the same object identity like == and === will on real
JS objects. The strings have their own maps. There are currently a
lot of different maps for different kinds of strings:

7-bit vs. 16 bit
Sequential, cons, slice and external strings
Symbols and non-symbols

That's 16 different string maps.

You probably want to double that by having a tainted and a non-tainted
map for each.

Symbols may be tricky for you. They are internally canonicalized so
that there cannot be two different symbols that have the same sequence
of characters from start to end. When a string is used for a property
name it can be turned into a symbol if there was not already a symbol
with those characters.

So for example:

var key = "x" + tainted_foo; // key is tainted.

hash.key = 0; // key is now a symbol.

// The following now happens in a completely different unrelated part
of the program:

var key2 = "x" + "foo"; // Not tainted.

hash2.key2 = 0; // The symbol "xfoo" is used in hash2, which is tainted.

for (k in hash2) {
do_something(k); // k is tainted, this may fail.
}

Perhaps you just want to forbid using tainted strings for property
names. It's often a bug, due to things like hash collision DOSs or
untrusted sources of the __proto__ string.

Note that all 1-character and most 2-character strings are symbols.
--
Erik Corry, Software Engineer
Google Denmark ApS - Frederiksborggade 20B, 1 sal,
1360 København K - Denmark - CVR nr. 28 86 69 84

dim...@gmail.com

unread,
Mar 13, 2013, 11:57:36 AM3/13/13
to v8-...@googlegroups.com, sqrt...@googlemail.com
Hi Ben,

did you get tainted propagation working?  do you have your modified version of v8/Chromiums posted anywhere - I would be interested to take a look and try.

Thanks,

Dmitri
Reply all
Reply to author
Forward
0 new messages