Thanks, Johan
Rene
>Is there a solution for decompiling a dcu-file (so I can e.g. learn
>something more about how to create comonents myself)
No, there is no real solution, since the DCU file format is not
documented. A dirty work-around is to compile the DCU into an EXE with
TD32 symbol information turned on, then throw all that at a
disassembler and try to reverse the code.
A bit too much for just learning how to create components, I suppose.
--
Stefan Hoffmeister http://www.econos.de/
No private email, please, unless expressly invited.
Rene Tschaggelar wrote:
> No, but there are dozends of examples.
Ok, but I spoke at school with our System Programming (as we had to
decompile modula-2 files) teacher and he said that almost every
programming language has decomilers, so he thought also the dcu-files
could be reversed engineered. What if I loose some pas files when my
computer crashes? If I could decompile the dcu's that would save a lot of
time! So I think it can be quite annoying to be unable to decompile those
files!
>
>
> Rene
>
> J.A. Korten wrote:
> >
> > Is there a solution for decompiling a dcu-file (so I can e.g. learn
> > something more about how to create comonents myself)
Greetings Johan Korten
>Rene Tschaggelar wrote:
>
>> No, but there are dozends of examples.
>
>Ok, but I spoke at school with our System Programming (as we had to
>decompile modula-2 files) teacher and he said that almost every
>programming language has decomilers, so he thought also the dcu-files
>could be reversed engineered. What if I loose some pas files when my
>computer crashes? If I could decompile the dcu's that would save a lot of
>time! So I think it can be quite annoying to be unable to decompile those
>files!
There are dissertations on decompiling **binary** code. These
dissertations pretty much show how difficult it is to get back at
least some gibberish from a decompiler.
The one dissertation I am thinking of is by Cristina Fuentes' and can
be found at the Decompilation Home page as a downloadable PS including
a DOS based C decompiler. Now, back in the old DOS days C compilers
were not that ugly, nowadays they are, with optimization interfering
frequently.
Result: it is pretty much infeasible to create a fully-fledged
decompiler for any of the "better" languages that compile to machine
code. Visual Basic 3.0 (and perhaps 4.0?) *could* be decompiled
perfectly. Delphi CANNOT, even iff you have extremely accurate
information due to TD32 information.
Now, the Delphi code generator, as I have experienced it, uses some
sort of a code template approach, so it might be possible to
automatically recover quite a number of bit and pieces. But that IMO
falls into the "how to waste your spare time on a fun project"
category.
Please pass this along to your teacher: there is *no* such thing as a
*decompiler*.
The process of compiling takes source code and converts it to a binary
executable. In a truly compiled language identifiers like variable
and procedure names are lost forever.
It is possible to *disassemble* a binary executable into assembly
language, but that's not the same thing as the original source code.
The other post mentioned work done to supposedly decompile C programs.
The idea is that knowing what kind of code a particular compiler
produces it would be possible to recontruct the original source code
*structure*. For instance, it might be able to look at binary code
and determine whether a particular piece of it was originally a for
loop or a while loop. However, you would still lose identifiers. I
am not aware of any such work having been done for Delphi.
-Mike
Here is one for "C":
http://www.cs.uq.edu.au/groups/csm/dcc.html
Stefan Hoffmeister wrote:
>
> There are dissertations on decompiling **binary** code. These
> dissertations pretty much show how difficult it is to get back at
> least some gibberish from a decompiler.
>
> The one dissertation I am thinking of is by Cristina Fuentes' and can
> be found at the Decompilation Home page as a downloadable PS including
> a DOS based C decompiler. Now, back in the old DOS days C compilers
> were not that ugly, nowadays they are, with optimization interfering
> frequently.
>
> Result: it is pretty much infeasible to create a fully-fledged
> decompiler for any of the "better" languages that compile to machine
> code. Visual Basic 3.0 (and perhaps 4.0?) *could* be decompiled
> perfectly. Delphi CANNOT, even iff you have extremely accurate
> information due to TD32 information.
>
> Now, the Delphi code generator, as I have experienced it, uses some
> sort of a code template approach, so it might be possible to
> automatically recover quite a number of bit and pieces. But that IMO
> falls into the "how to waste your spare time on a fun project"
> category.
>
> --
> Stefan Hoffmeister http://www.econos.de/
> No private email, please, unless expressly invited.
--
Joe C. Hecht
http://home1.gte.net/joehecht/index.htm
Taz Higgins...
>At one time, there was a dcu uncompiler that produced
>some interesting results.
Yep. I have that one and if works up to TP 7, I believe (possibly
could be extended to cover D1, too). I have not yet come across a
complete specification of the 32 bit DCU file format, although I know
of a few people of been poking around this a bit.
>Here is one for "C":
>
>http://www.cs.uq.edu.au/groups/csm/dcc.html
That should be Cristina's page... Looks similar - thanks for digging
that out!
>I
>am not aware of any such work having been done for Delphi.
I am playing around with it, just for fun, and in my spare time (of
which there is increasingly less left over).
I don't really expect any results to pop up (lest a dissertation), but
it's always a very interesting challenge to figure out how the
compiler works. Delphi output can be analyzed pretty well, though.
Let me know when you've got something. In the mean time I'll continue
to have good backups. <g>
-Mike
> >0k, but I spoke at school with our System Programming (as we had to
> >decompile modula-2 files) teacher and he said that almost every
> >programming language has decomilers, so he thought also the dcu-files
> >could be reversed engineered.
>
> Please pass this along to your teacher: there is *no* such thing as a
> *decompiler*.
>
> The process of compiling takes source code and converts it to a binary
> executable. In a truly compiled language identifiers like variable
> and procedure names are lost forever.
But DCU is not executable. All identifier names and type definitions
contains inside DCU. For example, I can make some like dcu2int - 99%
extractor of interface part of unit (all before 'interface') if exist
at least 50 peoples, that want to buy such program. I already make
such program above year ago, but lost all sources on crashed server.
99% - because comments and formatting information is really lost :)
Decompiler for DCU is also real work, but very very hard.
Bye.
ps. Sorry for my English.
> But DCU is not executable. All identifier names and type definitions
> contains inside DCU. For example, I can make some like dcu2int - 99%
> extractor of interface part of unit (all before 'interface') if exist
> at least 50 peoples, that want to buy such program.
How many Dollars ?
Martin
It was theoritical question :)
I cant think up, such program what for can may be necessary.
... For example 35$ ?
Bye.
Add "interface" after "all".
PhR
Does this mean "private" interface variables are stored in the DCU?
If so that refutes a previous thread (remember the "promote") whereby
someone said that they weren't.
Thanks,
--------------------------------------------
Brad, Rose & Tia
ALL names if $L+,$D+ (even names of local variables in local functions)
and all (except private sections in classes) interface names and
_some_ local names if $L-,$D-.
And even if $L-, then name "LocalFunc4" instead "MyLocalFunc" can be
generated. :)
DCU cant be decompiled for 100% equal source PAS, but can be decompiled
to so PAS, that can be compiled to same DCU.
You mean class attributes, not vars. No, I don't think they are. Didn't
think of those. Thanks for the reminder.
PhR
Well, yes, if the DCU was compiled in debugging mode, and you nevertheless
don't have the source.
>>
DCU cant be decompiled for 100% equal source PAS, but can be decompiled
to so PAS, that can be compiled to same DCU.
<<
That's a good technical statement, true or false. With $D-, it's definitely
false. I wonder what Stefan would say in the case of $D+.
PhR
Oops.. Yea that's what I mean :)
As mentioned earlier sans comments & formatting, so not 100%.
Although I would go so far as to say anything more than 70% would be
great for those units that don't come with source (or source is
unatainable). What would not be of any help would be obtuse "C"ish
code.
>Dmitryi: >>ALL names if $L+,$D+ (even names of local variables in local
>functions)
>and all (except private sections in classes) interface names and
>_some_ local names if $L-,$D-.
><<
>
>Well, yes, if the DCU was compiled in debugging mode, and you nevertheless
>don't have the source.
>
>>>
>DCU cant be decompiled for 100% equal source PAS, but can be decompiled
>to so PAS, that can be compiled to same DCU.
><<
>
>That's a good technical statement, true or false. With $D-, it's definitely
>false. I wonder what Stefan would say in the case of $D+.
I cannot comment on DCUs directly, but I can comment on the case where
a DCU has been compiled with D+ *and* if that DCU has been linked into
an binary with Turbo Debugger information generated. (Technically,
these are not too different.)
Under these assumptions you see an incredible amount of information in
the Turbo Debugger info, including local stack vars, scoping and so
on. My approach is to "attack" the compiled binary rather than the
unknown beast DCU - and attacking the DCU with the maximum amount of
information including D+ I wildly guess about 70-80%, in extreme cases
up to 100% of source code can be recovered.
That this amount of code is recoverable is mainly due to the Delphi
optimizer not garbling the output too much - pretty much in contrast
to Borland C++ Builder (!), for instance, with each and every
optimization switch turned on.
In short, yes, I think that with D+ on average it should be possible
to get back about 70-80% of code, most of the code structure (but
perhaps not matching the orginal structure) and most of the data
structures, although (obviously) without comments. The problems
really, really start with D- and O+. In that case I am rather
pessimistic, as it is extremely difficult to recover data structures
in a meaningful manner. Chances of recovering something meaningful
should be at, say, 30-40%, in particular for non-trivial cases.
I cannot currently prove my point, since the tool I am using for that
is under reconstruction - the disassembler needs to be rewritten in a
much more modular manner, the TD32 reader expanded and so on. If I
find some time...
> >>
> DCU cant be decompiled for 100% equal source PAS, but can be decompiled
> to so PAS, that can be compiled to same DCU.
> <<
>
> That's a good technical statement, true or false. With $D-, it's
definitely
> false. I wonder what Stefan would say in the case of $D+.
Why?! See example.
Original local function:
function MyShowMessage( Caption, Title: String ): Boolean;
begin
Result := MessageBox( 0, PChar(Caption), PChar(Title), MB_OKCANCEL ) =
1;
end;
if it compiled with $L+, then will be decompiled as is. But if $L-, then
decomplier
will make some like this:
function LocalFunction15( Arg1, Arg2: String ): Boolean;
begin
Result := MessageBox( 0, PChar(Arg1), PChar(Arg2), 1 ) = 1;
end;
and after compiling it will be same code in dcu.
Bye.
Whetting our appetites again!
PhR
The problem is that you're generalizing from that example. The challenge
you've set is to decompile (not disassemble) in such a way that when
recompiled the code works the same. This breaks if ONE thing in there
doesn't yield complete decompile info (and I don't care about the names,
that's another question).
PhR
Ha, but it's the Gobi desert here, currently!
I am currently running my own hacked, timed-out (that really was my
latest build) compiled copy and the new version just does not even
compile yet... And I have to study Cristina Fuentes' decompilation
thesis in more detail to get some ideas for the structural and data
analysis from it.
In short, this might take some time - of course, if someone can offer
me a 48 hour extension of the day, matters would improve vastly.
Chances are quite low though, and I am not yet in the right frame of
mind to make that OSS.