How To Decompile Software

1 view

Skip to first unread message

Kizzy Burnworth

unread,

Aug 5, 2024, 11:31:07 AM8/5/24

to thewindsori

SoI've just realized how easy it is to decompile my Java code. I've been searching around the net and I can't seem to figure out WHY its so easy. Every time I google something like "Why can I decomilple .class files?" or "Why does Java decompile so easily", all I get is links to software that can easily deompile my code. So I turn to you StackOverflow: why is it that Java can be converted back to easlily readable source code while C++ and other languages aren't very friendly to decompiling?

In particular, .class files include metadata for classnames, method names, field & parameter types, etc...

All a Java (or .Net) decompiler needs to do is look at the instructions in each method body, and turn them into the appropriate syntactic constructs.

Java is compiled into an intermediate form, JVM bytecode, that retains a large amount of the information contained in the original Java code. A language like C++ compiles into assembly code, with looks a lot different from the original code, and is, therefore, harder to reverse.

A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in Assembly or attach a debugger?

With a debugger you can step through the program assembly interactively.

With a disassembler, you can view the program assembly in more detail.

With a decompiler, you can turn a program back into partial source code, assuming you know what it was written in (which you can find out with free tools such as PEiD - if the program is packed, you'll have to unpack it first OR Detect-it-Easy if you can't find PEiD anywhere. DIE has a strong developer community on github currently).

Additionally, if you are doing malware analysis (or use SICE), I wholeheartedly suggest running everything inside a virtual machine, namely VMware Workstation. In the case of SICE, it will protect your actual system from BSODs, and in the case of malware, it will protect your actual system from the target program. You can read about malware analysis with VMware here.

psoul's excellent post answers to your question so I won't replicate his good work, but I feel it'd help to explain why this is at once a perfectly valid but also terribly silly question. After all, this is a place to learn, right?

Modern computer programs are produced through a series of conversions, starting with the input of a human-readable body of text instructions (called "source code") and ending with a computer-readable body of instructions (called alternatively "binary" or "machine code").

The way that a computer runs a set of machine code instructions is ultimately very simple. Each action a processor can take (e.g., read from memory, add two values) is represented by a numeric code. If I told you that the number 1 meant scream and the number 2 meant giggle, and then held up cards with either 1 or 2 on them expecting you to scream or giggle accordingly, I would be using what is essentially the same system a computer uses to operate.

Now, assembly language is a computer language where each command word in the language represents exactly one op-code on the processor. There is a direct 1:1 translation between an assembly language command and a processor op-code. This is why coding assembly for an x386 processor is different than coding assembly for an ARM processor.

Disassembly is simply this: a program reads through the binary (the machine code), replacing the op-codes with their equivalent assembly language commands, and outputs the result as a text file. It's important to understand this; if your computer can read the binary, then you can read the binary too, either manually with an op-code table in your hand (ick) or through a disassembler.

Disassemblers have some new tricks and all, but it's important to understand that a disassembler is ultimately a search and replace mechanism. Which is why any EULA which forbids it is ultimately blowing hot air. You can't at once permit the computer reading the program data and also forbid the computer reading the program data.

However, there are caveats to the disassembly approach. Variable names are non-existent; such a thing doesn't exist to your CPU. Library calls are confusing as hell and often require disassembling further binaries. And assembly is hard as hell to read in the best of conditions.

If you are just trying to figure out what a malware does, it might be much easier to run it under something like the free tool Process Monitor which will report whenever it tries to access the filesystem, registry, ports, etc...

You may get some information viewing it in assembly, but I think the easiest thing to do is fire up a virtual machine and see what it does. Make sure you have no open shares or anything like that that it can jump through though ;)

Immunity Debugger is a powerful tool to write exploits, analyze malware, and reverse engineer binary files. It was initially based on Ollydbg 1.0 source code, but with names resoution bug fixed. It has a well supported Python API for easy extensibility, so you can write your python scripts to help you out on the analysis.

It really depends on what kind of code you are writing. If you use a lot of the new features in C# 3 and above like lambda expressions, automatic properties, and yield, the decompiled source code is not runnable and requires quite a bit of work to get it to compile.

There are many obfuscators these days that protect your .NET applications from decompilation. One such obfuscator is -gate.com/products/smartassembly/index.htm . They try to make your well structured .NET IL code into spegatti code (which still works) that decompilers cannot generate original code. It's not like 100% sure piraters cannot get the recompilable code but it will not be easy for them to decompile when using obfuscator.

I don't think there's anything that fully automates this for an app made up of multiple assemblies, but I can say that it's really not that hard to stitch the pieces together into a solution yourself. Perhaps a bit tedious for a large app, but if you really want to it's certainly doable.

Yes, you simply Export from Reflector and get a complete runnable Project for your assembly.I've done it a couple of times.Usually, I have to migrate the project to my version of VS and some times it requires some minor fixes, but in general it works.

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

The decompiler adds some noise mostly in the form of comments, however I've found it to be surprisingly clean and faithful to my original code. You will have to remove a little line of text beginning with +++ near the end of the recovered file to be able to run your code.

Ned Batchelder has posted a short script that will unmarshal a .pyc file and disassemble any code objects within, so you'll be able to see the Python bytecode. It looks like with newer versions of Python, you'll need to comment out the lines that set modtime and print it (but don't comment the line that sets moddate).

A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java, requiring more sophisticated techniques. Decompilers are usually unable to perfectly reconstruct the original source code, thus will frequently produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.

The term decompiler is most commonly applied to a program which translates executable programs (the output from a compiler) into source code in a (relatively) high level language which, when compiled, will produce an executable whose behavior is the same as the original executable program. By comparison, a disassembler translates an executable program into assembly language (and an assembler could be used for assembling it back into an executable program).

Decompilation is the act of using a decompiler, although the term can also refer to the output of a decompiler. It can be used for the recovery of lost source code, and is also useful in some cases for computer security, interoperability and error correction.[1] The success of decompilation depends on the amount of information present in the code being decompiled and the sophistication of the analysis performed on it. The bytecode formats used by many virtual machines (such as the Java Virtual Machine or the .NET Framework Common Language Runtime) often include extensive metadata and high-level features that make decompilation quite feasible. The application of debug data, i.e. debug-symbols, may enable to reproduce the original names of variables and structures and even the line numbers. Machine language without such metadata or debug data is much harder to decompile.[2]

Some compilers and post-compilation tools produce obfuscated code (that is, they attempt to produce output that is very difficult to decompile, or that decompiles to confusing output). This is done to make it more difficult to reverse engineer the executable.

The success level achieved by decompilers can be impacted by various factors. These include the abstraction level of the source language, if the object code contains explicit class structure information, it aids the decompilation process. Descriptive information, especially with naming details, also accelerates the compiler's work. Moreover, less optimized code is quicker to decompile since optimization can cause greater deviation from the original code.[5]