Hi Shay,
Responses inline:
On Mon, Jul 9, 2018 at 8:02 AM, Shay Rojansky <
ro...@roji.org> wrote:
> Hi Michael, see some comments below.
>
>> The thing we want to avoid in the multi-query test is implementations
>> that are written like this:
>>
>> * For each incoming HTTP request, construct a batch of queries.
>> * In one operation, send the entire batch as a unit to the database.
>> * Don't do anything else until the response for the entire batch is
>> received.
>>
>> The multi-query test is a stand-in for a "real" application where such
>> an approach isn't possible. For the dependent query requirement,
>> imagine the result of each query has to be read *by the application
>> server* before it can know what the next query should be. Or you
>> could imagine it is interleaving queries between multiple databases or
>> other external services.
>
>
> This is quite confusing to me... First, why do you say that such an approach
> isn't possible in a real application? Assuming there's no dependency between
> your queries, this actually seems like quite a sensible way to execute - in
> fact I've written code like this many times. Most database drivers indeed
> give you "exclusive" rights to a pooled connection: after you're assigned a
> connection, indeed nobody else can use it. This is how Npgsql works, for
> example. When using such a driver, why would you *not* batch all your
> queries rather than wait for each query's results before sending the next
> (again, assuming there's no dependency)?
I was not saying that batch queries aren't possible in any
application. I can see why you'd find that opinion confusing. :)
I was saying the particular real application we're simulating is one
where batch queries aren't a solution. Why? Because that's how we've
defined it. I gave a couple of examples of problems that can't be
solved by batch queries. Imagine that our multi-query application is
solving those problems.
To make it easier for people to contribute solutions, we've simplified
the actual problem somewhat. But we retain some requirements from the
more complex theoretical problems.
>
> I suspect that there's some confusion here around terminology. In your 2nd
> mail you write that you don't think the goal of pipelining is to avoid
> round-trips, but rather to "use the connections to the database more
> efficiently". At least if you look at what HTTP pipelining is, "avoiding
> round-trips" is exactly what pipelining is about: it's simply about not
> waiting for a previous response before sending another request. In fact, the
> PostgreSQL protocol really is very similar to how HTTP pipelining works: you
> can send requests before waiting for responses, but responses will always
> come back in FIFO order.
Yes, we're probably using the term "round-trips" differently. A batch
query solution might issue 4 queries by sending a single packet to the
database and receiving all 4 result sets in a single packet, which I
was calling one round-trip. Meanwhile, a pipelining solution might
issue 4 queries by sending out 4 packets and receiving 4 back, which I
was calling 4 round-trips.
I suspect that your definition of "round-trips" is the more popular
one so I'll try to avoid using my definition the future.
(Aside: A pipelining solution might opportunistically squash the
queries and/or result sets into fewer packets. This was the
contentious part of what vertx-postgres was doing, which started the
debate about whether to allow pipelining. I don't think anyone was
concerned about *when* queries/results were sent/received or how many
connections were used.)
> Now, a small minority of drivers do seem to allow
> the same connection to be shared by several "users", allowing user B to send
> query 2 just after user A sent query 1, but before the results of query 1
> have been received. I preferred to call this "multiplexing" as opposed to
> "pipelining" (because pipelining does seem to have the somewhat standard
> meaning of HTTP pipelining), but terminology is secondary here. The
> important thing is to know whether it's about having multiple users having
> queries in flight on the same connection.
Right. I saw your definitions of batching, pipelining, and
multiplexing in the GitHub issue you linked. They're great.
For what it's worth, in all our discussions I have been saying
"pipelining" when I mean "multiplexing" in your terminology. The
client that vertx-postgres uses, reactive-pg-client, is doing
multiplexing.
>
> But to go back to what TechEmpower allows or does not allow... Once again,
> from my point of view:
>
> If the goal of the multi-query benchmark is to measure multiple roundtrips
> (i.e. one roundtrip per query), that's fine and it makes some sense. It can
> be explicitly required in the benchmark requirements, and tt can be
> "enforced" by introducing a dependency between the query: query 2 contains a
> parameter that we only get from the resultset of query 1.
It *could* be enforced like you say, but we'd have to rewrite all of
the implementations, which would be a lot of work. Instead, we're
enforcing it manually.
> If the goal of the multi-query benchmark is to allow programs and drivers to
> execute multiple queries as fast as possible, that also makes sense, but
> then why not allow whatever form of speed-up (batching/pipelining) is
> available?
If this was the goal, we'd allow "SELECT * FROM world WHERE id IN
(...)" and all of the implementations would be written that way.
> Why do you care how the wire protocol messages look like, and
> forbid Npgsql's style of batching?
We really don't want to care how the wire protocol messages look.
That was part of my original announcement that explained why we're
allowing pipelining!
However, we do care about the general solution being used from the
application's point of view. We forbid *all* styles of batching. The
whole discussion about Sync messages was a red herring. Forget the
points I made about failed queries; that was in response to a
theoretical solution that no one actually proposed. (Imagine a
framework that solves the single-query test using batch queries. Or,
if you have no idea what I mean by that, then just forget it - it's
not important.)
-Michael