I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been working on must not ever be inspected or understandable, for the security of various people.
If you give people a program that they are able to run, then they will also be able to reverse-engineer it given enough time. That is the nature of programs. As soon as the binary is available to someone who wants to decipher it, you cannot prevent eventual reverse-engineering. After all, the computer has to be able to decipher it in order to run it, and a human is simply a slower computer.
That said, the best anti-reverse-engineering techniques that I've seen focused not on obfuscating the code, but instead on breaking the tools that people usually use to understand how code works. Finding creative ways to break disassemblers, debuggers, etc is both likely to be more effective and also more intellectually satisfying than just generating reams of horrible spaghetti code. This does nothing to block a determined attacker, but it does increase the likelihood that J Random Cracker will wander off and work on something easier instead.
I've used their hardware protection method (Sentinel HASP HL) for many years. It requires a proprietary USB key fob which acts as the 'license' for the software. Their SDK encrypts and obfuscates your executable & libraries, and allows you to tie different features in your application to features burned into the key. Without a USB key provided and activated by the licensor, the software can not decrypt and hence will not run. The Key even uses a customized USB communication protocol (outside my realm of knowledge, I'm not a device driver guy) to make it difficult to build a virtual key, or tamper with the communication between the runtime wrapper and key. Their SDK is not very developer friendly, and is quite painful to integrate adding protection with an automated build process (but possible).
Before we implemented the HASP HL protection, there were 7 known pirates who had stripped the dotfuscator 'protections' from the product. We added the HASP protection at the same time as a major update to the software, which performs some heavy calculation on video in real time. As best I can tell from profiling and benchmarking, the HASP HL protection only slowed the intensive calculations by about 3%. Since that software was released about 5 years ago, not one new pirate of the product has been found. The software which it protects is in high demand in it's market segment, and the client is aware of several competitors actively trying to reverse engineer (without success so far). We know they have tried to solicit help from a few groups in Russia which advertise a service to break software protection, as numerous posts on various newsgroups and forums have included the newer versions of the protected product.
Recently we tried their software license solution (HASP SL) on a smaller project, which was straightforward enough to get working if you're already familiar with the HL product. It appears to work; there have been no reported piracy incidents, but this product is a lot lower in demand..
Effective obfuscation isn't something you can do purely at the compilation stage. Whatever the compiler can do, a decompiler can do. Sure, you can increase the burden on the decompilers, but it's not going to go far. Effective obfuscation techniques, inasmuch as they exist, involve writing obfuscated source from day 1. Make your code self-modifying. Litter your code with computed jumps, derived from a large number of inputs. For example, instead of a simple call
If you're serious about obfuscation, add several months to your planning; obfuscation doesn't come cheap. And do consider that by far the best way to avoid people reverse-engineering your code is to make it useless so that they don't bother. It's a simple economic consideration: they will reverse-engineer if the value to them is greater than the cost; but raising their cost also raises your cost a lot, so try lowering the value to them.
Take, for example, the AES algorithm. It's a very, very public algorithm, and it is VERY secure. Why? Two reasons: It's been reviewed by lots of smart people, and the "secret" part is not the algorithm itself - the secret part is the key which is one of the inputs to the algorithm. It's a much better approach to design your protocol with a generated "secret" that is outside your code, rather than to make the code itself secret. The code can always be interpreted no matter what you do, and (ideally) the generated secret can only be jeopardized by a massive brute force approach or through theft.
I think an interesting question is "Why do you want to obfuscate your code?" You want to make it hard for attackers to crack your algorithms? To make it harder for them to find exploitable bugs in your code? You wouldn't need to obfuscate code if the code were uncrackable in the first place. The root of the problem is crackable software. Fix the root of your problem, don't just obfuscate it.
Also, the more confusing you make your code, the harder it will be for YOU to find security bugs. Yes, it will be hard for hackers, but you need to find bugs too. Code should be easy to maintain years from now, and even well-written clear code can be difficult to maintain. Don't make it worse.
The disassembler has to resolve the problem that a branch destination is the second byte in a multi byte instruction. An instruction set simulator will have no problem though. Branching to computed addresses, which you can cause from C, also make the disassembly difficult to impossible. Instruction set simulator will have no problem with it. Using a simulator to sort out branch destinations for you can aid the disassembly process. Compiled code is relatively clean and easy for a disassembler. So I think some assembly is required.
I think it was near the beginning of Michael Abrash's Zen of Assembly Language where he showed a simple anti disassembler and anti-debugger trick. The 8088/6 had a prefetch queue what you did was have an instruction that modified the next instruction or a couple ahead. If single stepping then you executed the modified instruction, if your instruction set simulator did not simulate the hardware completely, you executed the modified instruction. On real hardware running normally the real instruction would already be in the queue and the modified memory location wouldnt cause any damage so long as you didnt execute that string of instructions again. You could probably still use a trick like this today as pipelined processors fetch the next instruction. Or if you know that the hardware has a separate instruction and data cache you can modify a number of bytes ahead if you align this code in the cache line properly, the modified byte will not be written through the instruction cache but the data cache, and an instruction set simulator that did not have proper cache simulators would fail to execute properly. I think software only solutions are not going to get you very far.
The above are old and well known, I dont know enough about the current tools to know if they already work around such things. The self modifying code can/will trip up the debugger, but the human can/will narrow in on the problem and then see the self modifying code and work around it.
It used to be that the hackers would take about 18 months to work something out, dvds for example. Now they are averaging around 2 days to 2 weeks (if motivated) (blue ray, iphones, etc). That means to me if I spend more than a few days on security, I am likely wasting my time. The only real security you will get is through hardware (for example your instructions are encrypted and only the processor core well inside the chip decrypts just before execution, in a way that it cannot expose the decrypted instructions). That might buy you months instead of days.
Also, read Kevin Mitnick's book The Art of Deception. A person like that could pick up a phone and have you or a coworker hand out the secrets to the system thinking it is a manager or another coworker or hardware engineer in another part of the company. And your security is blown. Security is not all about managing the technology, gotta manage the humans too.
Many a times, fear of your product getting reverse engineered is misplaced. Yes, it can get reverse engineered; but will it become so famous over a short period of time, that hackers will find it worth to reverse engg. it ? (this job is not a small time activity, for substantial lines of code).
IMHO, take the basic precautions you are going to take and release it. If it becomes a point of reverse engineering that means you have done a really good job, you yourself will find better ways to overcome it. Good luck.
It should be entirely possible, using modern cryptographic techniques, to have your system be open (I'm not saying it should be open, just that it could be), and still have total security, so long as the cryptographic algorithm doesn't have a hole in it (not likely if you choose a good one), your private keys/passwords remain private, and you don't have security holes in your code (this is what you should be worrying about).
Since July 2013, there is renewed interest in cryptographically robust obfuscation (in the form of Indistinguishability Obfuscation) which seems to have spurred from original research from Amit Sahai.
I say this very casually, but to everyone who's used to instinctively dismiss obfuscation technology -- this is different. If it's proven to be truly working and made practical, this is major indeed, and not just for obfuscation.
To inform yourself, read the academic literature on code obfuscation. Christian Collberg of the University of Arizona is a reputable scholar in this field; Salil Vadhan of Harvard University has also done some good work.
c80f0f1006