Hi all,
I would appreciate technical feedback on a Java 21 runtime I have been building:
https://github.com/ElevatedDev/LatticeLattice is aimed at a specific shape of service that I kept seeing in low-latency JVM work: the processing topology is known before startup, the work stays inside one JVM, and the runtime shape is effectively fixed — ingest, validate, enrich, route, join, sink.
In practice I often saw those systems implemented as either:
- hand-rolled queues and worker conventions,
- one ordered ring with application-specific structure built around it,
- or a stream processor / broker that was larger than the in-process problem.
Lattice is an attempt to make the middle case explicit.
The model is a typed static graph. You declare sources, stages, routing nodes, joins, sinks, and bounded SPSC/MPSC edges. Before workers start, Lattice validates the graph and compiles it into a fixed worker plan. The compiler can then make decisions around source specialization, edge-local backpressure, eligible linear-chain fusion, worker placement, and preallocated payload reuse. It also emits a compilation report so the user can see what fused, what stayed physical, and why.
The scope is intentionally narrow:
- Java 21
- one JVM
- static topology
- bounded edges
- explicit ownership
- no durability
- no replay
- no distributed delivery
- no runtime topology changes
- no exactly-once external effects
So this is not trying to be Kafka, Flink, a broker, a persistence layer, or a general queue replacement.
The obvious comparison is LMAX Disruptor. I still think Disruptor is excellent when the problem is fundamentally one ordered ring. Lattice is making a different bet: if the application is really a fixed typed DAG, the runtime should be able to see that DAG directly and specialize around it instead of reconstructing it from queue plumbing and handler conventions.
I included a Disruptor comparison, but tried to keep the claim scoped rather than promotional:
https://github.com/ElevatedDev/Lattice/blob/master/docs/disruptor-comparison.mdThe current checked-in benchmark snapshot is here:
https://github.com/ElevatedDev/Lattice/tree/master/docs/benchmark-results/2026-05-02-per-graph-refreshThat baseline uses OpenJDK 21.0.10, JMH 1.36, Disruptor 4.0.0, Intel Core i9-14900HX under WSL2, with:
-Xms2g -Xmx2g -XX:+AlwaysPreTouch -XX:+UnlockDiagnosticVMOptions -XX:+UseParallelGC
Some of the static-graph rows are favorable to Lattice:
- physical three-stage publish: 31.9M vs 21.7M ops/s
- inline-fused publish: 127.9M vs 35.7M ops/s
- equal-call-site reference row: 209.2M vs 31.1M ops/s
- completion-gated source-inline path: 77.9M vs 3.62M ops/s
But the matrix also keeps Disruptor-favorable rows visible. Disruptor wins or does better on some simple source/sink completed-operation rows, physical pipeline completed rows, broadcast, and dependency/join completed cases. So the intended claim is not “Lattice is always faster than Disruptor.” The claim is narrower: a static typed graph gives the runtime extra information, and that information can be useful in specific shapes.
I would particularly value criticism on:
- whether the benchmark shapes are fair and useful,
- which benchmark cases are missing,
- whether the Disruptor comparison is framed correctly,
- whether the ownership/backpressure model is clear enough,
- whether the fusion and source-specialization rules make sense,
- and where this abstraction would break down in real low-latency JVM systems.
The core runtime is Java. There is an optional Rust JNI backend, but it is only for placement/topology diagnostics and is not required for normal Java use. The project is Apache 2.0 and includes runnable examples, Javadocs, JCStress tests, JMH JSON/stdout artifacts, and release checks.
It is also on Maven Central:
io.github.elevateddev:lattice:1.0.0
Repository:
https://github.com/ElevatedDev/LatticeI am not looking for soft launch feedback so much as technical pushback. If you have maintained JVM pipelines built from raw queues, Disruptor, Aeron-adjacent components, executor graphs, or heavier stream processors, I would be very interested in where you think this design is useful, where it is confused, and what you would test before trusting it.