Hello,
As you know, one of the major objectives of TP4 is to generalize the virtual machine in order to support any data structure (not just graph).
Here is an idea that Kuppitz and I batted around yesterday and I spent this morning implementing on the tp4/ branch.
From the Stream Ring Theory paper [
https://zenodo.org/record/2565243], we know that universal computation is possible with branch, initial, map, flatmap, filter, reduce stream-based functions. If this is the case, why not make those instructions the TP4 VM instruction set.
If
arg = constant | bytecode | method call,
then the general pattern for each instruction type is:
[branch, (arg, bytecode)*]
[initial, arg]
[map, arg]
[flatmap, arg]
[filter, ?predicate, arg]
[reduce, operator, arg]
Let this be called the “core instruction set."
Now check this out:
g.inject(7L).choose(is(7L), incr()).sum()
[initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]
g.inject(Map.of("name", "marko", "age", 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
[initial({age=29, name=marko}), filter([flatmap(map::keys), filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]), map(map::get,age)]
These core bytecode chunks currently execute on Pipes and Beam processors as expected.
Pretty trippy eh?
Now the beautiful thing about this is:
1. Implementing a TP4 VM is trivial. All you have to do is support 6 instruction types.
- You could rip out a TP4 VM implementation in 1-2 days time.
- We can create a foundational C#, Python, C/C++, etc. TP4 VM implementation.
- this foundation can then be evolved over time at our leisure. (see next point)
2. More advanced TP4 VMs will compile the the core bytecode to a TP4 VM-native bytecode.
- This is just like Java’s JIT compiler. For example, the core instruction:
filter([map(dictionary::get,name), filter(eq,marko)])
is compiled to the TP4-Java instruction:
has(name,marko)
- Every processor must be able to work with core bytecode, but can support VM native instructions such as has(), is(), path(), loops(), groupCount(), etc.
- These instructions automatically work for all integrating processors (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
- these higher-level instructions don’t require any updates to the processors as these are still (abstractly) filter, flatmap, reduce, etc. functions.
3. Core bytecode is as data agnostic as you can possibly get.
- Data structures are accessed via method call references — e.g. map::keys, list::get, vertex::outEdges, etc.
- Adding new data structures is simply a matter of adding new datatypes.
- The TP4 VM can be used as a general purpose, universal stream-based VM.
Here is the conceptual mapping between Java and TP4 terminology:
Java sourcecode <=> Gremlin traversal
Java bytecode <=> Core bytecode
JIT trees <=> TP4-Java-native bytecode
Machine code <=> Processor execution plan
Its a pretty intense move and all the kinks haven’t been fully worked out, but its definitely something to consider.
Your questions and comments are welcome.