Dex files and the Dalvik bytecode

204 views
Skip to first unread message

msg555

unread,
Jul 4, 2010, 3:58:08 PM7/4/10
to android-platform
I want to make a tool for editing dex files so I've been reading
through much of the documentation for the dex file format and related
documents on the dalvik bytecode. Before I continue I want to see if
I can confirm a few things.

1. It seems that all of the iget, iput, sget, sput, and invoke
methods take a 16 bit field/method identifier. As far as I can tell
there doesn't seem to be any way to access fields/methods directly
with an index >= 2^16. As far as I can tell the only way the compiler
could work around this if more fields/methods needed to be accessed is
to insert reflection code. Is this accurate?

On the other hand it seems that strings with large index (< 2^32) can
be accessed.

2. All references to data within the bytecode (jumps, packed-switch
and family) are relative offsets to data within the same method's
bytecode, correct? Really what I'm wondering is if a method's
bytecode is self contained and if changing it could have an effect on
anything outside of the method.

3. In the dex file format static fields can be initialized with a
list of encoded-values. None of the encoded values seem to allow for
the static field to be initialized to an instance of an object,
however this is possible in the java language. How is this kind
initialization done?

I might have more questions but that's all I can think of for now.

Thanks
-msg555

Dan Bornstein

unread,
Jul 7, 2010, 4:11:36 PM7/7/10
to android-...@googlegroups.com
On Sun, Jul 4, 2010 at 12:58 PM, msg555 <msg...@gmail.com> wrote:
> 1.  It seems that all of the iget, iput, sget, sput, and invoke
> methods take a 16 bit field/method identifier.  As far as I can tell
> there doesn't seem to be any way to access fields/methods directly
> with an index >= 2^16.  As far as I can tell the only way the compiler
> could work around this if more fields/methods needed to be accessed is
> to insert reflection code.  Is this accurate?

That's correct. At some point, there will be an extension to the
bytecode format to allow for wider field and method references. When
we were developing the dex format originally, it looked like we were
far enough away from that being necessary that we decided not to
bother in the short term. It looks like the time is finally drawing
close: I've been told that the Scala core library has to be split in
two exactly because of this issue.

With extended versions of these opcodes, the remaining 16-bit limits
will be on (a) class references and (b) method prototypes. That is,
you will still be limited to no more than 65536 classes in a dex file,
and though you will be able to have 2^32 methods, you will only be
able to have 65536 different prototypes (list of arguments and return
type). You might be concerned about the latter, but the last time I
checked it seemed that even method-heavy code shared enough prototypes
that this won't become an issue for a while, still.

> On the other hand it seems that strings with large index (< 2^32) can
> be accessed.

Yep. The writing was already on the wall about that in the 1.0
timeframe. Hence, const-string/jumbo.

> 2.  All references to data within the bytecode (jumps, packed-switch
> and family) are relative offsets to data within the same method's
> bytecode, correct?

Correct.

> Really what I'm wondering is if a method's
> bytecode is self contained and if changing it could have an effect on
> anything outside of the method.

The only thing that would matter is the overall size, since that would
end up affecting the file offsets of everything in the file after the
code in question.

> 3.  In the dex file format static fields can be initialized with a
> list of encoded-values.  None of the encoded values seem to allow for
> the static field to be initialized to an instance of an object,
> however this is possible in the java language.  How is this kind
> initialization done?

It's done by a <clinit> method.

Cheers,

-dan

msg555

unread,
Jul 8, 2010, 12:24:45 PM7/8/10
to android-platform
Thanks a lot for the help.

On Jul 7, 4:11 pm, Dan Bornstein <danf...@android.com> wrote:
> On Sun, Jul 4, 2010 at 12:58 PM, msg555 <msg...@gmail.com> wrote:
> > 1.  It seems that all of the iget, iput, sget, sput, and invoke
> > methods take a 16 bit field/method identifier.  As far as I can tell
> > there doesn't seem to be any way to access fields/methods directly
> > with an index >= 2^16.  As far as I can tell the only way the compiler
> > could work around this if more fields/methods needed to be accessed is
> > to insert reflection code.  Is this accurate?
>
> That's correct. At some point, there will be an extension to the
> bytecode format to allow for wider field and method references. When
> we were developing the dex format originally, it looked like we were
> far enough away from that being necessary that we decided not to
> bother in the short term. It looks like the time is finally drawing
> close: I've been told that the Scala core library has to be split in
> two exactly because of this issue.

Interesting. I wonder how this kind of access will be facilitated as
there doesn't seem to be room for many more op codes. By a quick
glance over the docs it looks like there are 32 op codes left though
perhaps psuedo-opcodes like those used in packed-switch table could be
used. Perhaps there could be some mechanism to temporarily making
these larger indexed things temporarily accessible at lower indexes.
Might make some verification problems I suppose.

>
> With extended versions of these opcodes, the remaining 16-bit limits
> will be on (a) class references and (b) method prototypes. That is,
> you will still be limited to no more than 65536 classes in a dex file,
> and though you will be able to have 2^32 methods, you will only be
> able to have 65536 different prototypes (list of arguments and return
> type). You might be concerned about the latter, but the last time I
> checked it seemed that even method-heavy code shared enough prototypes
> that this won't become an issue for a while, still.

Seems reasonable. I wonder if you started to get libraries that
actually exceeded these criteria if they would be too large to be put
on android phones at all.
>
> > On the other hand it seems that strings with large index (< 2^32) can
> > be accessed.
>
> Yep. The writing was already on the wall about that in the 1.0
> timeframe. Hence, const-string/jumbo.
>
> > 2.  All references to data within the bytecode (jumps, packed-switch
> > and family) are relative offsets to data within the same method's
> > bytecode, correct?
>
> Correct.
>
> > Really what I'm wondering is if a method's
> > bytecode is self contained and if changing it could have an effect on
> > anything outside of the method.
>
> The only thing that would matter is the overall size, since that would
> end up affecting the file offsets of everything in the file after the
> code in question.

Ok that's good. I suspect that at least most of the time I can keep
the byte code the same size, just replacing some indexes into the
tables. The one annoying case might be when I need to change a 16 bit
string index access to a 32 bit one. Even then though, I think only
the code offset in encoded_method would need to change as well as the
try_items associated with that method.
>
> > 3.  In the dex file format static fields can be initialized with a
> > list of encoded-values.  None of the encoded values seem to allow for
> > the static field to be initialized to an instance of an object,
> > however this is possible in the java language.  How is this kind
> > initialization done?
>
> It's done by a <clinit> method.

Thanks
>
> Cheers,
>
> -dan

I still have one small question about how native code is handled. In
the encoded_method format description it says that code_off will be 0
if the method is abstract or native. If the method is native how does
the vm know what native code to call? The native code in question
would be stored in a dynamic linked library, right? Is this library
and the method name somehow in the dex file? I didn't see anything on
this.

Dan Bornstein

unread,
Jul 8, 2010, 7:00:13 PM7/8/10
to android-...@googlegroups.com
On Thu, Jul 8, 2010 at 9:24 AM, msg555 <msg...@gmail.com> wrote:
> On Jul 7, 4:11 pm, Dan Bornstein <danf...@android.com> wrote:
>> That's correct. At some point, there will be an extension to the
>> bytecode format to allow for wider field and method references. [...]

>
> Interesting.  I wonder how this kind of access will be facilitated as
> there doesn't seem to be room for many more op codes.

The nominal plan is to introduce the concept of "wide opcodes," where,
e.g. an 0xff in the opcode position of the first code unit of an
instruction would mean that the other 8 bits are taken to be part of
the opcode and not used as other arguments. Basically, if we will have
to burn 16 more bits for a member reference, we can afford to burn 8
more bits for an opcode too, but we only impose this penalty on code
that's already going to be huge enough that this won't make a
meaningful difference.

> I wonder if you started to get libraries that
> actually exceeded these criteria if they would be too large to be put
> on android phones at all.

Moore's Law does seem to apply to all aspects of portable devices
except battery capacity.

> I still have one small question about how native code is handled.  In
> the encoded_method format description it says that code_off will be 0
> if the method is abstract or native.  If the method is native how does
> the vm know what native code to call?  The native code in question
> would be stored in a dynamic linked library, right?  Is this library
> and the method name somehow in the dex file?  I didn't see anything on
> this.

Native libraries get hooked in by calling System.loadLibrary(). Look
for the jni-tips.html document in the Dalvik docs directory for more
details.

-dan

Reply all
Reply to author
Forward
0 new messages