Hi,
I have been looking through the specifications and trying to grasp better the scope of the project. What strikes me is that there seems to be a lot of important mechanisms (fault tolerance, async messaging, streams processing) that is discussed and specified but seems lacking a bit on a coherent programming model to fit these mechanisms into (see Erlang for the extreme side of strict programming models).
My developer’s intuition tells me that one might over time end up in a situation where the executing container lacks the overall knowledge to schedule efficiently and there will be difficult to debug performance issues due to eg. same bean used both in sync and async context.
Stepping back a bit, there are a number of use cases one might want to handle in a container. The container ought to be given the best possible information about how to optimally allocate resources for and schedule these.
- controller logic
- in synchronous flow, eg. once typical controller in a web MVC framework that hits the DB, does some logic and returns a new view.
Network bound if hits DB.
- in asynchronous invocation (message passing), typically state machines in order to reason about reactive invocations. Normally execute quickly.
- long running batch jobs.
- batch DB CRUD operations, blocking, network bound
- data processing, polling data from queue or otherwise
- stream/message queue processing, reactive, might be CPU bound for data processing
Probably lots more such higher level patterns.
How a container would optimally schedule this vary a lot depending on whether is CPU, local I/O (eg. persistent queue journal) or network bound
- CPU bound - number of threads depend on number of CPU cores, to avoid unnecessary context switching.
- local I/O bound - one thread executes at a time to avoid thrashing disk access (SSDs may fare better).
- network I/O - any number of threads almost since they will be blocking for I/O most of the time.
From this I’d say one would want to give the container as many hints as possible about the behaviour of managed entity, even up to the container refusing to perform an operation (like blocking I/O) unless the component has explicitly asked for it.
This could be done mainly declaratively, similar to the @Stereotypes in the CDI spec (?).
@MessageDriven - message driven bean, stateless, asynchonously invoked, can do synchronous calls
@Actor - stateful message processor, asynchronously invoked, useful for event processing logic (events from multiple sources that must be synchronised via state machines)
@Stateless - existing stateless session bean, synchronously invoked.
@Stateful - existing stateful bean, not many use cases for these as state normally resides persistently in DB
@MVCController
@DAO
@GET, @POST for REST services
@Batchjob - like blockingly reading from a queue and writing to DB
@Polling - at intervals, call some service (like an external FTP server) to check for new data.
@LongPolling - mostly used from client side I guess and not appropriate patter to use in a container.
This gives the container hints of what the role of the bean is within the system. In addition could be properties that would tell the container how to schedule
@Blocking(contraint=NETWORK, roundtrips=4)
roundtrips would be a hint that for one single invocation of a method, it will do four network roundtrip calls. Is just for illustration this,
probably something a container might figure out anyway by keeping stats for any incoming bean call leading to outbound calls to network, and then adapt scheduling dynamically at runtime.
@CPUIntensive - could tell container to try to instrument execution time and adjust scheduling, eg. just pass to one of the per CPU-core bound threads to optimize for CPU cache locality.
Specifying these high-level patterns also helps the developer think clear about architecture, and might improve readability for other developers.
Intuition tells me one would want to avoid beans that are both CPU intensitive and spend long time blocking in the same call chain, rather do these in separate specialised beans with execution scheduled independently.
Good case for message passing/reactive programming.
For reactive programming, an advantage here is of course that it gives container more leeway on how to schedule. Message passing or CSPs
allows the container to put tasks on queues and schedule when appropriate while a fully synchronous stack requires a lot of threads since most stuff will block towards some external DB or something of the sort.
However for there to be a large advantage to it I believe it has to be reactive all the way to something that takes time executing, eg. make network call async, simple if-this-do-that logic execute so fast anyways that whether reactive or not does not matter for performance.
All network interaction, even DB calls can be made asynchronous all down to the network wire. Client side Ajax calls already do that (via REST), they call Ajax with a continuation function and the browser can schedule REST HTTP calls at will, even in parallell. Below, the web browser could do all 10 calls to the server in parallel, utilising only one thread by doing asynchronous I/O for HTTP.
for (var i = 0; i < 10; ++ i) {
sendAsyncQueryForSomeInforFromDB(url, i, param2, param3, function() {
console.log(‘got HTTP response ’ + i + ‘ from server side’);
})
}
Entirely doable from Java too (easy with lambdas) but very few existing blocking APIs allows one to do this, a google search gives me this though
https://blogs.oracle.com/java/jdbc-next:-a-new-asynchronous-api-for-connecting-to-a-database
JPA does not support this, it is entirely synchronous though one could add continuation APIs in the future.
II suppose in theory, if all I/O is done async , one can get full CPU utilisation with only as many threads as there are cores, and avoid a lot of thread context switching, since threads would never block for I/O and hence be doing something, async processing is however more challenging programming wise, whether using continuations or state machines.
Oh well, just comments for further thought this.
Regards,
Nils Henrik Lorentzen
In addition to making concurrent applications simpler and/or more scalable, this will make life easier for library authors, as there will no longer be a need to provide both synchronous and asynchronous APIs for a different simplicity/performance tradeoff.
The implementation of the networking APIs in the java.net and java.nio.channels packages have as been updated so that fibers doing blocking I/O operations park, rather than block in a system call, when a socket is not ready for I/O. When a socket is not ready for I/O it is registered with a background multiplexer thread. The fiber is then unpacked when the socket is ready for I/O. These same blocking I/O are also updated to support cancellation. If a fiber is cancelled while in a blocking I/O operation then it will abort with an IOException.
Fiber fiber1 = new Fiber1(() -> new URL("http://foo.com").openConnection());
Fiber fiber2 = new Fiber1(() -> new URL("http://bar.com").openConnection());
fiber1.start();
fiber2.start();
URLConnection openConnection() {
final Fiber fiber = getCurrentFiber();
final URLConnection result;
// Called from within fiber?
if (fiber != null) {
// simple value holder to store value
final Value<URLConnection> connection = new Value<>();
// Schedule connect operation asynchronously
scheduleOpenConnectionAsynchronously(urlConnection -> {
// store result so can retrieve from calling scope
connection.value = urlConnection
// resume fiber from below yield()
fiber.resume();
});
// saves the current call stack and adds to Fiber scheduler
fiber.yield();
// here after resume(), scheduler has restored stack and registers.
// note that all this happens on the same thread, which is what allows for better performance
result = connection.value;
}
else {
result = alreadyExistingBlockingImplementation();
}
return result;
}
--
You received this message because you are subscribed to the Google Groups "Eclipse MicroProfile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microprofile...@googlegroups.com.
To post to this group, send email to microp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/microprofile/b711a145-607c-467c-9b58-e6bc2193eba0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Even though admittedly I haven’t written a single line of Erlang code myself, I’d recommend looking at its runtime model and also its error handling. The lightweight processes (actors) there are much like operating system processes but with only software isolation. I guess Akka does similar.
However errors propagate upwards like in OS processes and clean up properly. Not sure how applicable, could give a different perspective though.
Hello,Thanks for this great write-up!We have been discussing these topics in the reactive hangout (happening every Tuesday at 11:00AM Paris time zone). You are more than welcome to join and work with us there. Let us know if the time does not work for you.As Emily said, we are moving towards the first version of Reactive Messaging, that would provide the base of message-driven / reactive model. What you described is something we had been covering a little bit in the current spec. Typically, @Blocking (not part of the spec) is something I want to experiment with in the SmallRye implementation (I like the constraint idea).About Loom, you are right, Loom does not remove the need to Async, it just eases user code as they would be able to write “imperative” code (synchronous) and execute it in an asynchronous-manner. But for this, it relies on non-blocking IO. I heavily recommend the talk from Martin Thomson about interaction protocols. I have the chance to see it last week at JBCN, and his take on Loom is the best metaphor ever.Clement
To unsubscribe from this group and stop receiving emails from it, send an email to microp...@googlegroups.com.