Hi There,I'm excited about the prospect of potentially using MLIR for symbolic execution on blockchain, but my understanding of MLIR is not yet good enough to fully evaluate this use case. The practical problem I'm facing is the following.Problem descriptionWe want to build a tool that takes a smart contract (i.e., a program) and can:1) interpret it2) convert it into a set of formulae for symbolic execution (the formulas would be expressions for an SMT solver like Z3). For example a program (a) -> { x = a; x += 1; return x } could be turned into these formulas { x1 = a; x2 = x1 + 1; return = x2; }This is how I would be interested in using MLIR for this problem:DiagramAST -> (a) Smart contract dialect -> (b) IR for ops that are easy to model with SMT solver -> LLVM dialect-> SMT formulas- Create an MLIR dialect (call it (a)) for the smart contract programming language/bytecode in SSA form- Create another dialect (call it (b)) that is closer to what SMT solvers can understand- Implement conversion (a) -> (b)- Get (1) by writing a (b) -> LLVM dialect converter- Get (2) by implementing a rewriter from (b) to the SMT solver outside of MLIR or even build the SMT solver as a dialect (c) using MLIR
Why use MLIR in this case?- Get debugging, analysis, verification tools and a type system for free / almost using MLIR as a general framework for building an IR- LLVM already implemented so easier to write an interpreter for (1)- By virtue of leveraging the common language (b), it's easier to achieve correctness for (1) and (2)- If any optimizations are needed, they are easy to implement as rewrites
Where I see challenges/questions- Can MLIR deal with a re-definition of "state" for a dialect? For example, the K framework for formal semantics supports the introduction of new state objects called "cells". Smart contract platforms like Ethereum have a specific architecture with their own memory, storage, that doesn't map directly to traditional computers. Is MLIR relying on low-level state model of machine RAM, etc. as implicit in LLVM, or does it support broader concepts of "state"?
- If not is there a way to "hack" this?- Another way of asking this is: could MLIR be a good way to describe the canonical semantics of a programming language?- Is MLIR powerful enough to express concepts like mappings and sets?
- What is the status/coverage of Python bindings right now?- What sorts of debugging, analysis, tools are already working/planned at this point/Appreciate your insight.
Best
Peteris
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/a768c2b4-5b53-454f-bd14-5860c77fe4c5%40tensorflow.org.
syntax BinStackOp ::= "SSTORE" // ------------------------------ rule <k> SSTORE INDEX NEW => . ... </k> <id> ACCT </id> <account> <acctID> ACCT </acctID> <storage> ... (INDEX |-> (CURR => NEW)) ... </storage> <origStorage> ORIGSTORAGE </origStorage> ... </account> <refund> R => R +Int Rsstore(SCHED, NEW, CURR, #lookup(ORIGSTORAGE, INDEX)) </refund> <schedule> SCHED </schedule>
Hi Mehdi,
Thank you for your response.In terms of mappings, one thing to model in the Ethereum Virtual Machine is its storage. It's "hard drive" is a hash map from 256-bit accounts to hash maps of 256-bit keys and 256-bit values, i.e., map<uint256, map<uint256, uint256>>. There are some opcodes in the Ethereum Virtual Machine that modify this storage map.Here are the semantics (expressed in K framework) of an operation that modifies storage called SSTORE:syntax BinStackOp ::= "SSTORE" // ------------------------------ rule <k> SSTORE INDEX NEW => . ... </k> <id> ACCT </id> <account> <acctID> ACCT </acctID> <storage> ... (INDEX |-> (CURR => NEW)) ... </storage> <origStorage> ORIGSTORAGE </origStorage> ... </account> <refund> R => R +Int Rsstore(SCHED, NEW, CURR, #lookup(ORIGSTORAGE, INDEX)) </refund> <schedule> SCHED </schedule>The way K framework works is that there is a general, custom definition of global state and each operation is a rewrite of that state. As you see, there are several modifications to the global state that are specialized to this virtual machine: the stack is modified, the underlying storage is modified, and a gas refund is potentially allocated.I'd like to understand:- How does MLIR support expressing novel types of virtual machine state that are non-trivial to map to existing memory architecture. In particular, given MLIR is SSA by default, I wonder if the global state would therefore have to be implicit in every single function/block as a custom type/argument?func simple_f(ethereum_global_state, funct_arguments):ethereum_global_state2 = ...return ethereum_global_state2, func_return_values
- In this case, would it be fair to say that MLIR is like a "general framework for implementing IRs that will eventually be executed on a standard architecture".
So any representation of "semantics" in MLIR would not be direct/mathematically clean, it would represent how a given virtual machine/language could be translated for execution on standard computers. So it would be defining a language indirectly by detailing how it should eventually be lowered to machine level code.
- What types of data structures are encodable? EVM is a stack-based VM. Can we express a stack? Can we express a hash table like above?
--
You received this message because you are subscribed to the Google Groups "MLIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mlir+uns...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/mlir/b0adcb9d-1224-46ac-958d-e01c26da0abb%40tensorflow.org.