Whatwe're looking at here is how to convert the bytecode into symbolic operations, and then evaluate these to figure out what the code is doing. The bulk of this is something that you'll need to do yourself by trial and error, but I'm hoping these notes will help anyone who gets stuck. Let's get started.
I've seen other people trying to "lift" vmprotect bytecode to a higher level intermediate language, but I've found this has limited my options when it comes to decompilation. Rolf Rolles recommends converting each bytecode instruction to a set of asm-like instructions (fairly similar to what the VM handlers do), and this has made the later steps a lot easier. It makes the code a little bulky, but we can always fix this later with copy and constant propagation.
The big advantage I've found with this approach is that it's kept me focused on what each individual instruction is doing: What does a push mean here? What does a mov mean here? There are a couple of edge cases but focusing on individual commands has helped me come up with some consistent rules that make life a lot easier when decompiling/emulating.
These don't exactly match how the VM handlers have been done, but what matters is that they return the same results. Care needs to be taken with a couple of operations. Even the bitwise NOR which is done back-to-front (NOT on both operands and then an AND) doesn't really matter, but in the VM handler having the AND at the end gives you the flags for free (the NOT opcode doesn't set any flags).
For the most part you can manage the stack with objects rather than trying to be clever and convert into bytes (like a complex symbolic execution engine like Triton would do). There are a few cases to consider though:
Rolles mentions this as a useful optimization step and I agree. There's an edge case with stack pointers (we'll cover that in a second) but otherwise the stack is just a pain. Something gets pushed onto the stack? Call it a variable. You pop 2 things off, add them, then put them back on? That's just c = a + b. Get this part right and the rest is very easy.
You can choose how you do this, I create a new symbol with every push (unless it's already a symbol, e.g. coming from a scratch register), but ultimately having nested symbols isn't the end of the world. The big bonus with creating symbol objects is that you can store them in a table and use them to cache evaluated symbols to speed things up later.
If you were simply writing an emulator then you'd put real values in the top and run the whole way through and get your results at the bottom. We can do the same thing when symbolizing, we can put placeholders in at the top, run through and build up a symbol table, and then we're left with our outputs at the bottom. Once we have these though, it's trivial to go backwards and resolve all of the symbols that were used to get to the result. The nice thing with this approach is that you don't need to think too hard about dead store removal as dead-end symbols just don't get referenced from the bottom so they get ignored implicitly.
Because I disassemble into an x86-like language, I end up just handling push [eax] and push esp as Dereference and Pointer objects. These are fine to leave hanging around when symbolizing, but when emulating we need to resolve them differently.
As a rule, we should only see Pointer objects generated when we see push esp, so these are always going to be stack pointers. We can dereference both memory and stack pointers though, so I check for these at emulation time; if we have an immediate value then I pull this from the emulated memory of the app, otherwise we have a pointer and we just get them to cancel each other out.
VMProtect emulates branching by loading two function addresses onto the stack, executing the opcode that would set the flags (e.g. a CMP to compare registers is emulated as a SUB which in practice is ADD a, -b), converting the flags into an offset (either 0 or 4), and then advancing the stack pointer by that much, reading that value, then sometimes overwriting another stack pointer (?!).
The way I handle this is that when I process a dereference (pop [eax]) I go and wrap everything in the stack inside a new object called a ConditionalOverride which includes the old value, the potential new value, the pointer, and the stack depth of this object. When it comes time to evaluate the result, I evaluate the pointer, and test the stack size of the pointer against the stack depth of this object. If they're the same then I evaluate the overwritten value, otherwise I evaluate the original value. The code looks like this:
The other option is creating a new entity every time something hits the stack and overwriting it just-in-time, but that would mean going back through and updating everything in order (as well as global state, ew). This way keeps a bit more state but means every terminal symbol contains all the information it needs, provided any global memory has been updated correctly.
Every other command moves a symbol to somewhere, be it a register, a stack register, or the stack. Here we store some information somewhere but because we haven't resolved anything we don't actually know where it's going.
I store these in a simple list as pairs of (location, value). Depending on how complicated your code is, it's probably worth storing the height of the symbol table here in case this gets read in between writes. Then when it comes to emulation we just emulate these. If the location is a stack pointer then we discard it, otherwise we log a write at that point in time. I haven't built this part yet, but a copy-on-write setup would be really useful here.
3a8082e126