[Proposal] Bytecode decompiler

93 views
Skip to first unread message

Martin Eden

unread,
Dec 4, 2025, 8:25:30 AM (12 days ago) Dec 4
to lu...@googlegroups.com
Hello list,

On 2025-12-03 15:15, Roberto Ierusalimschy wrote:
> All internal data structures in Lua are there to support its official
API.
> You may want to propose some way to have this knowledge though the debug
> API.

Part 1

"Imagine you're a scrambler."

(
1b4c7561540019930d0a1a0a0408087856000000000000000000000028774001
80808000010a94510000004f000000cf80000013010007520000008101008001
82008081020180018301808103028001840280810403804e0107008001010000
020200c40102018001000000020200c4010201c6010101808101000082808992
01000796810000803401000081010080ca0008008b0200008e0205010c030004
c4020201b402000039020500380200808b0200008e02050103030100c4020201
b80100808b0200008e02050103830100c4020201c9800800c700010084048369
6f0486777269746504820a0482208100000080808080808095a5010005888800
00004f0100008000020000010100810100803402000044010301468101008080
818098a202000699390001003800008047010100090100008901000009020000
0c020401890200008c0205009001010510010004220100012e0001061c010200
3001000c890101000002000080020200c401030189010100150202802f018006
80020100c4010301c70101008103020000000000000082010000010100808080
80808080808080808080
)

Imagine your task is to add more structure to this data.
You can add () around data to create part. You can move parts.
You can't delete parts.

Okay, somehow I know this is compiled Lua 5.4's bytecode.

Running "luac -l -l" on it gives me a lot of low-entropy text like

([=[
function <?:24,34> (25 instructions at 0x55ed31b990c0)
2 params, 6 slots, 2 upvalues, 0 locals, 1 constant, 0 functions
  1 [-] EQ        0 1 0
  2 [-] JMP       1 ; to 4
  3 [-] RETURN0
  4 [-] GETUPVAL  2 0 ; -
  5 [-] GETUPVAL  3 0 ; -
  6 [-] GETUPVAL  4 0 ; -
  7 [-] GETTABLE  4 4 1
  8 [-] GETUPVAL  5 0 ; -
  9 [-] GETTABLE  5 5 0
  10  [-] SETTABLE  3 1 5
  11  [-] SETTABLE  2 0 4
  12  [-] ADD       2 0 1
  13  [-] MMBIN     0 1 6 ; __add
  14  [-] IDIVK     2 2 0 ; 2
  15  [-] MMBINK    2 0 12 0  ; __idiv 2
  16  [-] GETUPVAL  3 1 ; -
  17  [-] MOVE      4 0
  18  [-] MOVE      5 2
  19  [-] CALL      3 3 1 ; 2 in 0 out
  20  [-] GETUPVAL  3 1 ; -
  21  [-] ADDI      4 2 1
  22  [-] MMBINI    2 1 6 0 ; __add
  23  [-] MOVE      5 1
  24  [-] CALL      3 3 1 ; 2 in 0 out
  25  [-] RETURN0
constants (1) for 0x55ed31b990c0:
  0 I 2
locals (0) for 0x55ed31b990c0:
upvalues (2) for 0x55ed31b990c0:
  0 - 1 0
  1 - 1 1
]=])

Maybe good for that humans but not for me. Where is this part in code?

For that code I want output with structure, without names.

For example first structure I expect is signature:

(1b4c75615400)

Next there should be list (or tree?) of functions.
Each function uses list of "constants", "locals" and "upvalues".
And of course list of "instructions".

I have no information of "locals", but looks like "constants" and
"upvalues" are zero-indexed lists of two-tuples:

((I 2))()((1 0)(1 1))

Structure of "instruction" depends of "operation code". Generally
it's opcode and arglist.

(MMBINI (2 1 6 0))(MOVE (5 1))


Part 2

At present time Lua has no stock function to structurize bytecode.

I would be happy to see something like

  debug.parse_bytecode(string) -> table

It takes string with binary data and returns table with structure
(and maybe even with names):

{
  -- functions =
  {
    -- ...
    [3] =
      {
        -- constants =
        {
          [0] = {'I', 2},
        },
        -- locals =
        {},
        -- upvalues =
        {
          [0] = {1, 0},
          [1] = {1, 1},
        },
        -- instructions =
        {
          -- ...
          {'MMBINI', {2, 1, 6, 0}},
          {'MOVE', {5, 1}},
          -- ...
        },
      },
  }
}

With this partial information and additional hardcoded knowledge
me-scrambler can create callgraph, compile to bytecode and even
execute this code.

-- Martin

Luiz Henrique de Figueiredo

unread,
Dec 4, 2025, 4:15:20 PM (12 days ago) Dec 4
to lu...@googlegroups.com
> For that code I want output with structure, without names.

It should be fairly easy to write variants of ldump.c and lundump.c to
handle other formats.

Martin Eden

unread,
Dec 6, 2025, 4:53:02 AM (11 days ago) Dec 6
to lu...@googlegroups.com
Hello Luiz,

Yeah, it can be done. I can do it. But in simple words "proposal" means
at least "I want you think about it." (Implying "I want to get analysis
response" and even "I want _you_ do it" in some cases. But my flavor
of this proposal is just "think".)


That's what inside "standard" library mainly determines what kind
of projects are written in this language.

You wrote "string.unpack"? I wrote ZIP file parser using it.

We have metatables? People using them for fancier code.

We have coroutines? People creating whole frameworks on them.

From the other hand we have "table.reverse". We had hyperbolic cosine.

Once thing is included it's almost impossible to remove it. Because
code from couple of other guys relies on it. And "breaking changes"
are bad for reputation. So you have to maintain it forever.

And also Lua design refuses to "eat it's own dogfood". It refuses to use
tables. See "package.path" and "package.config". (But I think such design
is for pyuer C functions.)


So that's problem with my proposal as it suggests to return list of lists.

And for sure next thing somebody would ask will be bytecode compiler for
that structure. And then some promising student will config fuzzer for
it and will try to issue dozens of CVEs and become respected security
expert. But "debug"'s library "unsafe" clause will will protect us.
As it protected from "setupvalue" fuzzers from another promising students.


I believe several people already wrote their own bytecode compiler/
decompiler. LuaJIT and Fengari at least. TinyLuaCompiler uses v5.1 opcodes.
Luac.Nl is probably scanning text output for extracting them.

It's always project specific and so fragile and without guarantees.
And writing such codec is not a light work.

If it already was in standard library probably authors would spend
energy on writing more higher-level code.

And having bytecode parser in machine-friendly format increases portability.
Because execution can be delegated and thus managed.


-- Martin

Bas Groothedde

unread,
Dec 6, 2025, 5:14:34 AM (10 days ago) Dec 6
to lu...@googlegroups.com

> On 6 Dec 2025, at 10:53, 'Martin Eden' via lua-l <lu...@googlegroups.com> wrote:
>
> On 2025-12-04 23:14, Luiz Henrique de Figueiredo wrote:
>>> For that code I want output with structure, without names.
>> It should be fairly easy to write variants of ldump.c and lundump.c to
>> handle other formats.
>
> I believe several people already wrote their own bytecode compiler/
> decompiler. LuaJIT and Fengari at least. TinyLuaCompiler uses v5.1 opcodes.
> Luac.Nl is probably scanning text output for extracting them.

luac.nl works with a combination of text output and full bytecode parsing for all information other than the instructions. The bytecode parsing is done in pure Lua currently.

I’m reworking the back end and will be replacing what’s there with full bytecode parsing, I don’t like my current codebase. That version, in due time, will be fully open source. It will allow you to parse bytecode and serialise it to i.e. json. The parsing library will be a separate library, written in rust because I use it in an unsafe context, but could very well bind to Lua as a library too.

I will have a public beta for its front end hopefully around February. I have implemented full parsers for Lua 5.3 through 5.5 so far

~b
Reply all
Reply to author
Forward
0 new messages