Non-blocking SQL high-level overview + difference between async MSSQL?

470 views
Skip to first unread message

Róbert Kohányi

unread,
Jan 17, 2019, 4:50:43 PM1/17/19
to r2dbc
Hello!

I'd like to better understand the high-level theory of how a non-blocking database driver such as R2DBC could work.
I have my theories that either needs reinforcement or debunking.
Primarily I'm looking for some pointers to related articles or similar on this topic.

Firstly: I think I understand the problem with JDBC's blocking nature for example in the context of serving a page that needs some data to be fetched from a DB. A request comes in, a thread is fetched to handle it from a pool, a DB query is initiated, the DB handles the query, creates the result and sends it back over the wire, the data is then sent back in the response in a view. The problem being the thread is blocked while the DB processes and serves the query result and that's a waste of resources.

In my head, this is the problem that needs to be fixed and I was looking for solutions to this problem for a time. While searching for one I've stumbled upon various articles about "seemingly" non-blocking JDBC facades or replacements which all ended-up being creating connection pools somewhere under the hood.

So a question: is that right that R2DBC doesn't use a connection pool? That's my understanding, but I'm not just after all the information out there.

Also, while searching for a solution I've found out that in C# people can use an async SQL API to query stuff (unfortunately I'm not intimate with this technology). I dug into this by reading articles, and I did found one (which I can't find for the love of my life anymore - if someone knows about this I'd appreciate a link to it) where the author basically described what happens when someone issues an async SQL query using MSSQL, going as deep as the Windows kernel (where it registers basically a callback in a funky, low-level way) to show that no threads will be blocked along the way. I might remember incorrectly, but this author (or possibly another) also mentioned that MSSQL supports async queries at the database level which is why this works so well for them.

So a few questions:
  • doesn't async queries require database level support at some point or another? This was my impression, but after seeing that there is a PostgreSQL and a H2 variant of R2DBC I'm not sure.
  • non-blocking SQL was available for quite some time for MSSQL, is that true? (Might not be the right place to ask this, but just for kicks.)
Now I've just watched Ben Hale's talk (https://www.youtube.com/watch?v=idApf9DMdfk) and heard and seen that he specifically mentions that async is not reactive which I kinda get, but not totally since R2DBC says about itself on the current homepage is that it is founded on Reactive Streams providing an asynchronous, non-blocking API. Bottom line: I'm confused all this mumbo-jumbo (I hope I'm not the only one.) 

I get the non-blocking part, but I'm not sold on the reactiveness - yet. Although I understand how to use reactive streams I've yet to utilize this concept to its fullest "in the wild". The one place I could have been using it already is relational databases, but as R2DBC just starts to get footing this has to wait.

So, question, if we forget about reactiveness R2DBC for Java is what the async SQL driver is for MSSQL?

Well. That's all. I hope I made sense. Looking forward to any insights given. Thank you very much!

Best,
Robert

Mark Paluch

unread,
Jan 17, 2019, 5:43:28 PM1/17/19
to r2dbc, Róbert Kohányi
Hi Robert, 

Thanks for your questions. Some aspects are beyond R2DBC but let’s take a look in depth.

> What is the issue with JDBC?

So JDBC is a blocking API. This means, a caller is required to wait until a called method returns. JDBC is also an imperative API and imperative APIs typically do not distinguish between methods that perform some sort of computation and these that perform I/O (well, maybe they do, if we look at the exceptions they throw). 
So the main part we’re talking are methods that initiate some sort of I/O because the caller has to wait until a response comes back and gets processed. During that time, a calling Thread goes into I/O wait and is blocked for a long time (from a computer perspective where a couple of milliseconds are like ages).

It’s not possible to turn a blocking API into a non-blocking one. The only thing you can do is offloading blocking calls onto a ThreadPool and emulate non-blocking behavior to the caller. What in fact happens, is that you keep more threads busy and once the thread pool and its worker queue are full, you end up with blocking behavior again.

> So a question: is that right that R2DBC doesn't use a connection pool? 

R2DBC will require a non-blocking connection pool. Currently we don’t have one. On a related note: Most JDBC pools are exposed through the DataSource API which is blocking. 

>  R2DBC says about itself on the current homepage is that it is founded on Reactive Streams providing an asynchronous, non-blocking API. Bottom line: I'm confused all this mumbo-jumbo

Point taken. So there are a lot of ways to think about reactive programming. For us, Reactive means:

* Using an API that allows to issue non-blocking calls
* Switching to a functional/declarative programming model
* Applying an event-driven programming model
* Treat application control as series of events (data, errors, completion)
* Consider data as stream
* Use an API that allows deferred execution
* No longer assume control of resources but leave resource scheduling to the runtime/platform („React to resource availability“)
* Efficient usage of resources
* Leave flow control to be handled by the runtime (such as loops or forking asynchronous calls is part of the platform and no longer part of the application code) as part of execution scheduling
* Remove assumptions over concurrency from the programming model and leave this aspect up the runtime
* Use back-pressure to allow flow control, to defer the actual execution and to not overwhelm consumers

Reading the list might cause more confusion in the first place. After digesting the essence we end up with a programming model that is non-blocking and that does not express how concurrency happens in the code. In imperative programming we spin up threads to do things in parallel. In reactive programming we declare the unit of work and leave execution (i.e. concurrency) up to the runtime. 

What reactive infrastructure typically does, is dispatching I/O work to non-blocking sockets and switch threads to perform work where it can be done. So we’re basically getting asynchronicity for free because that’s the modus operandi.

Feel free to hook in if this explanation causes more confusion than it explains.

async is not reactive 

This is true, at least for us. There are a couple of ways to think about async and reactive. I think the understanding what reactive means is key. I tend to think about async vs. reactive as some sort of maturity model:

0. Imperative Programming
1. Non-blocking code using callbacks or Futures to control concurrency
2. Message-driven Programming
3. Reactive Programming

> doesn't async queries require database level support at some point or another

If we just look on the asynchronous aspect, we need to draw a line somewhere. For R2DBC we can draw the line within the drivers as R2DBC drivers use the database vendor-specific wire protocol to communicate with the databases. 

Postgres and SQL Server use a request/response scheme in their protocol which means that asynchronous terminates at the I/O socket as these database servers do not support command multiplexing over a single connection. This is sufficient to improve the efficiency of resource usage within the JVM. 

R2DBC drivers write database commands to the transport, after that the thread is free to perform other work. Threads are no longer blocked until the response arrives. Once a response is received, a response processing thread starts to decode the response and then it emits the received data.

The same is true for other reactive drivers such as MongoDB, Apache Cassandra, Redis, and Couchbase.

> non-blocking SQL was available for quite some time for MSSQL

Not sure. I’m not aware of such a feature/driver library.

> R2DBC for Java is what the async SQL driver is for MSSQL?

Not sure I follow as I do not understand the question.

Hope these answers shed a bit more light and help to resolve the confusion. Feel free to follow up if you need some more details or clarification.

Cheers, 
Mark
--
You received this message because you are subscribed to the Google Groups "r2dbc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r2dbc+un...@googlegroups.com.
To post to this group, send email to r2...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/r2dbc/91b742f3-17d8-4c70-a94c-36dcd6d18b8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Róbert Kohányi

unread,
Jan 18, 2019, 1:15:38 PM1/18/19
to Mark Paluch, r2dbc

Hi Mark!

Thanks for the in-depth answer. A few things are clarified in my mind and new questions formed :) Which is great!


> R2DBC will require a non-blocking connection pool. Currently we don’t have one. On a related note: Most JDBC pools are exposed through the DataSource API which is blocking.

By blocking DataSource API you mean that when getConnection() is called by application code it'll wait/block until one connection is ready, right?
I'm not sure what's a non-blocking connection pool is tho'. Do you refer to a pool with an API that would allow me to do something like Pool.tryBorrow(TimeUnit)?


> leave execution (i.e. concurrency) up to the runtime.
> leave resource scheduling to the runtime/platform

You refer to "runtime" a few times. Just to clarify, what does runtime (or platform) mean in this context exactly? What does it mean to leave concurrency up to the runtime/ platform?

> What reactive infrastructure typically does, is dispatching I/O work to non-blocking sockets and switch threads to perform work where it can be done.

As an exercise when java.nio came to life I wrote a little sample program that was able to download files from multiple URLs on a single thread (the main thread) using selectors so I can kinda understand this (this piece of information is really valuable).


> > non-blocking SQL was available for quite some time for MSSQL
> Not sure. I’m not aware of such a feature/driver library.

Well, I found the article I was referring to: https://blog.stephencleary.com/2013/11/there-is-no-thread.html
It does not deal with SQL per se, but with async calls in C#/.NET (mentioning on the side write operations to a network stream)
I've no experience with this API, but it seems that async queries are in place for some time in SQL. Not sure. This topic isn't important, I just wanted to mention it as something of possible interest.

> >  R2DBC for Java is what the async SQL driver is for MSSQL?
> Not sure I follow as I do not understand the question.
What I meant is that if I'm right about the MSSQL driver being able to issue async (truly async) queries against a DB, then is R2DBC essentially the same thing for the Java world? (Hope this make more sense, if not, ignore it.)

Well, thanks for any insight in advance.

Best,
Robert

Ben Hale

unread,
Jan 18, 2019, 2:55:07 PM1/18/19
to Róbert Kohányi, Mark Paluch, r2dbc
I'll jump in for Mark as Europe heads to sleep.

> > R2DBC will require a non-blocking connection pool. Currently we don’t have one. On a related note: Most JDBC pools are exposed through the DataSource API which is blocking.
>
> By blocking DataSource API you mean that when getConnection() is called by application code it'll wait/block until one connection is ready, right?
> I'm not sure what's a non-blocking connection pool is tho'. Do you refer to a pool with an API that would allow me to do something like Pool.tryBorrow(TimeUnit)?

Imagine an API `<T> T Pool.acquire(Duration)`. When a caller invokes `T t = pool.acquire(Duration.ofSeconds(5))` and no object is available, the thread that the caller is running in goes into a blocked state waiting on an object to be returned to the pool and provided to the caller. This thread cannot be used for (and its associated stack memory) cannot be used by any other actor in the JVM until acquisition or timeout. Now imagine an alternate API `<T> Mono<T> pool.acquire()`. When a caller invokes `Mono<T> t = pool.acquire()` this is guaranteed to return immediately with no opportunity for blocking. But the key here is that you get this `Mono<T>` which is a reactive type, not the T itself. In a reactive system you then _subscribe_ to that `Mono` so that at some indeterminate point in the future you can act on what it provides you; `t.map(t -> String.valueOf(t)).subscribe()`. While that closure is waiting for a T to be provided **it does not block a thread from being used**. It will be notified when a new object is available and act in an available thread at this time.

This is the key point, that while waiting, threads are not occupied.


> > leave execution (i.e. concurrency) up to the runtime.
> > leave resource scheduling to the runtime/platform
>
> You refer to "runtime" a few times. Just to clarify, what does runtime (or platform) mean in this context exactly? What does it mean to leave concurrency up to the runtime/ platform?

Imagine the following code from Project Reactor (a Reactive runtime):

```
Flux<Integer> odds = generator.odds();
Flux<Integer> evens = generator.evens();

Flux.merge(odds, evens)
.doOnNext(System.out::println)
.subscribe()
```

How many threads do you think are required to execute this merging of numbers in the most efficient way possible? Is it 1 where we drain all odds, then all evens? Is it 2 so that odds and evens are interleaved in a random order? If the generator retrieved odds via a web service and evens locally in the VM does that change your answer? This is what we mean by leaving concurrence and resource scheduling up to the runtime (Reactor). What we're *actually* saying is that `odds` and `evens` can be executed in parallel with one another and `System.out::println` must be executed sequentially after the generation. Beyond that, we don't care about threading models so long as we get the behavior and order we've declared.

```
odds evens
\ /
|
System.out::println
```

You'd be surprised exactly how sophisticated these reactive platforms are at determining scheduling. There's event loops, thread-stealing, micro-batching, and more. It's absolutely fascinating.

> > > R2DBC for Java is what the async SQL driver is for MSSQL?
> > Not sure I follow as I do not understand the question.
> What I meant is that if I'm right about the MSSQL driver being able to issue async (truly async) queries against a DB, then is R2DBC essentially the same thing for the Java world? (Hope this make more sense, if not, ignore it.)

Well, R2DBC is really an API for reactive (non-blocking, asynchronous, pull-push back pressure) access to _any database_ including MSSQL. Even in that Microsoft driver, the API describes asynchronicity (IAsyncResult), the implementation likely (but certainly not guaranteed) is non-blocking, and it doesn't include the third point, back pressure. At this time, Java doesn't have a spec that address any of the three and in this way .Net is ahead of us. We're going to jump beyond Microsoft with R2DBC.


Don't worry about being confused by reactive programming and why we focus on such obscure things like the distinctions between non-blocking, async, and reactive back pressure; it's tough for everyone to get started. In the mean time I highly recommend you check out Reactor's primer[1] for general (non R2DBC-specifc) reactive concepts.


-Ben

[1]: https://projectreactor.io/docs/core/release/reference/#intro-reactive

Róbert Kohányi

unread,
Jan 19, 2019, 1:08:10 AM1/19/19
to Ben Hale, Mark Paluch, r2dbc
Hi Ben!

Thanks for jumping in!

>  `<T> Mono<T> pool.acquire()`

Great example, I think I _mostly_ got this part.

> we don't care about threading models so long as we get the behavior and order we've declared  

Okay, so R2DBC's code doesn't (won't) deal with concurrency at all _directly_, only indirectly via Flux and Mono? (And the underlying "magic".) So basically the runtime is Reactor? And as the application code's writer, I don't have to worry about these things too much, only about writing my queries and (at some point or another) declaring where (same thread, another thread) do I want to process the results as they get available. Something like that?

> You'd be surprised exactly how sophisticated these reactive platforms are at determining scheduling. There's event loops, thread-stealing, micro-batching, and more.  It's absolutely fascinating.

I'm surprised and yeah, it's fascinating, that's why I'm here seeking answers :) I'll leaving micro-batching for another day tho' :D

I encountered Flux and Mono when I read WebFlux's docs, but at that point, I decided I'm not really interested since my queries will block anyways :) 
On the frontend I've used rx with Angular, but somehow when you have an explicitly single-threaded model as in the browser and you can only make HTTP calls it gets easier to understand how to use the tools at hand (Observable, subscribe, etc.).
For me (on the backend) R2DBC is a game changer and opens up a whole lot of new perspectives.

Thanks for the input. I'll go and process the Reactor docs to get a clearer picture in the meantime.

Best,
Robert

Ben Hale

unread,
Jan 19, 2019, 11:07:27 AM1/19/19
to Róbert Kohányi, Mark Paluch, r2dbc
> > we don't care about threading models so long as we get the behavior and order we've declared
>
> Okay, so R2DBC's code doesn't (won't) deal with concurrency at all _directly_, only indirectly via Flux and Mono? (And the underlying "magic".) So basically the runtime is Reactor? And as the application code's writer, I don't have to worry about these things too much, only about writing my queries and (at some point or another) declaring where (same thread, another thread) do I want to process the results as they get available. Something like that?

Nearly perfect. The runtime is Reactor and as the application author you don't _have to_ deal with threading. The one clarification is that when you say "... declaring where (same thread, another thread) do I want to process the results...", you typically don't do that. What you're declaring is that processing must happen sequentially after the query has been sent, not what thread you like to processing to occur on. Threading continues to be invisible to you as the application author.

Note, that for the **rare** occasion that you care about threading, Reactor has `.publishOn()` and `.subscribeOn()` operators, but they are almost never used.

> I encountered Flux and Mono when I read WebFlux's docs, but at that point, I decided I'm not really interested since my queries will block anyways :)

> For me (on the backend) R2DBC is a game changer and opens up a whole lot of new perspectives.

This is precisely right. In order to gain significant benefit from Reactive programming, the entire stack (at least within a single process) needs to be reactive; you lose important data like back pressure otherwise. The Spring and Reactor teams have spent the last three years progressively rolling out reactive components (WebFlux, WebClient, etc.) and in 2018 we decided to review what was still impeding wide-spread adoption of reactive programming. As you observed, relational database access was the number one roadblock (much to the chagrin of the NoSQL community, RDBMS's still hold the most, and most important data). These observations led to our investments in R2DBC for relational database access and RSocket[1] for a reactive network protocol.

Good luck in your reactive journey!


-Ben


[1]: http://rsocket.io

Róbert Kohányi

unread,
Jan 19, 2019, 1:09:31 PM1/19/19
to Ben Hale, Mark Paluch, r2dbc
Thank you for all the insights Mark, Ben, much appreciated.

RSocket seems interesting too ... save to my lately fattened Read It Later lists :)

Keep it up!

Best,
Robert
Reply all
Reply to author
Forward
0 new messages