The Results Dashboard indicate and issue

730 views
Skip to first unread message

b...@facil.io

unread,
Mar 7, 2018, 12:53:58 PM3/7/18
to framework-benchmarks
The Results Dashboard for the continuous testing isn't updating or, more likely, it seems that the latest test is hanging.

The last update to the status was sometime yesterday (March 6th).

What would be the correct way to communicate that the continuous testing is experiencing any issues?

I'm pretty sure that this group is only a slightly better option than a GitHub issue.

B.

Michael Hixson

unread,
Mar 7, 2018, 1:27:44 PM3/7/18
to b...@facil.io, framework-benchmarks
Thanks! Its benchmark.cfg file was lost somehow. I restored it and
restarted. Would not be too surprised if the issue repeats itself
later since I don't know what caused it.

Reporting these issues on the mailing list is fine for now.

-Michael
> --
> You received this message because you are subscribed to the Google Groups
> "framework-benchmarks" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to framework-benchm...@googlegroups.com.
> Visit this group at https://groups.google.com/group/framework-benchmarks.
> For more options, visit https://groups.google.com/d/optout.

b...@facil.io

unread,
Mar 7, 2018, 11:27:34 PM3/7/18
to framework-benchmarks
I'm happy to help...

...and the issue might have just repeated (it's been more than an hour sine the last update).

B.

zloster

unread,
Mar 8, 2018, 4:01:27 AM3/8/18
to b...@facil.io, framework-benchmarks

Hi all,

I don't know if someone else already noted the problem with MySQL installation: seems it just hangs until the build timeouts on Travis.

See here for example:
I think there are several other examples in the Travis build logs but I'm too lazy to search for them. The jobs with this problem could be spotted by their "!" status - not "x".

Cheers,
zloster

b...@facil.io

unread,
Mar 12, 2018, 11:44:41 PM3/12/18
to framework-benchmarks
Hi Zloster,

Do you think this might be rated to the hanging CI benchmarks in the dashboard?

Maybe the lack of a timeout is the cause for the long delays in the CI benchmarking. The last test seems to have been hanging for 19 hours or so.

B.

zloster

unread,
Mar 13, 2018, 9:38:43 AM3/13/18
to b...@facil.io, framework-benchmarks
On 13.03.2018 05:44, b...@facil.io wrote:

> Hi Zloster,
>
> Do you think this might be rated to the hanging CI benchmarks in the
> dashboard?
>
I wouldn't bother to write the message and gather the information if I
don't think it could be related.

Cheers,
zloster

Michael Hixson

unread,
Mar 13, 2018, 12:38:56 PM3/13/18
to zloster, b...@facil.io, framework-benchmarks
We're looking into it.

One issue we're having in the Citrine environment is that we're
running out of disk space. We have a couple of PRs open right now
that should help with that.

-Michael

Gelin Luo

unread,
Mar 16, 2018, 5:21:17 PM3/16/18
to framework-benchmarks
Hi

It looks like most of the Fortune tests on the latest few test runs have been failed. However I didn't see any wrong with the out file, e.g. https://tfb-status.techempower.com/unzip/results.2018-03-16-12-34-28-192.zip/actframework-eclipselink-mysql-rythm/out.txt

Anyone knows what's going on with those Fortune tests?

Thanks,
Green

Michael Hixson

unread,
Mar 16, 2018, 5:45:53 PM3/16/18
to Gelin Luo, framework-benchmarks
Does act rely on the capitalization of the table being "Fortune" and
not "fortune"? If so, it may be running into the problem I described
here: https://github.com/TechEmpower/FrameworkBenchmarks/pull/3396#issuecomment-373103181

To summarize, there is a problem with our docker toolset right now
such that the "Fortune" table is not always there. We don't know why
this is happening.

Also, in case you're wondering why none of the recent runs are
completing: there is a separate problem in our toolset related to
logging that's causing it to crash. We have a work in progress PR to
improve how logging is done in general so we're hoping that stops
these crashes.
https://github.com/TechEmpower/FrameworkBenchmarks/pull/3416

-Michael

green

unread,
Mar 16, 2018, 5:50:59 PM3/16/18
to framework-benchmarks
On Sat, Mar 17, 2018 at 8:45 AM, Michael Hixson <michael...@gmail.com> wrote:
Does act rely on the capitalization of the table being "Fortune" and
not "fortune"?  If so, it may be running into the problem I described
here:  https://github.com/TechEmpower/FrameworkBenchmarks/pull/3396#issuecomment-373103181


To summarize, there is a problem with our docker toolset right now
such that the "Fortune" table is not always there.  We don't know why
this is happening.
Act relies on lowercase table name including 'fortune' and `world`. So this is not the issue.  

Also, in case you're wondering why none of the recent runs are
completing:  there is a separate problem in our toolset related to
logging that's causing it to crash.  We have a work in progress PR to
improve how logging is done in general so we're hoping that stops
these crashes.
https://github.com/TechEmpower/FrameworkBenchmarks/pull/3416
Cool!


-Michael

On Fri, Mar 16, 2018 at 2:21 PM, Gelin Luo <green...@gmail.com> wrote:
> Hi
>
> It looks like most of the Fortune tests on the latest few test runs have
> been failed. However I didn't see any wrong with the out file, e.g.
> https://tfb-status.techempower.com/unzip/results.2018-03-16-12-34-28-192.zip/actframework-eclipselink-mysql-rythm/out.txt
>
> Anyone knows what's going on with those Fortune tests?
>
> Thanks,
> Green
>
> --
> You received this message because you are subscribed to the Google Groups
> "framework-benchmarks" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Michael Hixson

unread,
Mar 16, 2018, 6:48:30 PM3/16/18
to green, framework-benchmarks
Oh interesting. So we have another problem in the toolset related to
fortunes. When we verify each fortune implementation's output, we are
accidentally accumulating the output across all runs, so all but the
first verification fails. You can see the expected output "piling up"
in the {framework}/fortune/verification.txt files:

2x - https://tfb-status.techempower.com/unzip/results.2018-03-16-12-34-28-192.zip/actframework-ebean-mysql/fortune/verification.txt
3x - https://tfb-status.techempower.com/unzip/results.2018-03-16-12-34-28-192.zip/actframework-ebean-mysql-rythm/fortune/verification.txt
4x - https://tfb-status.techempower.com/unzip/results.2018-03-16-12-34-28-192.zip/actframework-ebean-pgsql/fortune/verification.txt

We'll try to get that fixed up soon.

-Michael
>> > email to framework-benchm...@googlegroups.com.
>> > Visit this group at
>> > https://groups.google.com/group/framework-benchmarks.
>> > For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "framework-benchmarks" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to framework-benchm...@googlegroups.com.

Gelin Luo

unread,
Mar 17, 2018, 4:41:41 PM3/17/18
to framework-benchmarks
Looks like "visualize this run on TFB website" missing the act result in the visualisation although the result itself includes act test data:

Gelin Luo

unread,
Mar 17, 2018, 6:07:58 PM3/17/18
to framework-benchmarks
I think the problem is inside https://www.techempower.com/benchmarks/js/tfb-lookup.js?m=49 which hardcoded "actframework" into framework list. Can we add an `act` into that list?

Gelin Luo

unread,
Mar 20, 2018, 2:42:39 PM3/20/18
to framework-benchmarks
Hi Michael or any TEB guys,

How do you think about this?

Thanks,
Green

Michael Hixson

unread,
Mar 20, 2018, 5:56:38 PM3/20/18
to Gelin Luo, framework-benchmarks
We'd like to come up with a fix for this that also lets newly-added frameworks show up without us having to edit tfb-lookup.js.  We have a particular fix in mind, but it will take some time to implement.

-Michael

--
You received this message because you are subscribed to the Google Groups "framework-benchmarks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to framework-benchmarks+unsub...@googlegroups.com.

Michael Hixson

unread,
Mar 23, 2018, 12:02:20 PM3/23/18
to Gelin Luo, framework-benchmarks
We deployed the fix.  New and renamed frameworks should show up now in the "Visualize this run on the TFB website" links without us having to update anything manually in that tfb-lookup.js file.

A side effect of this is that, since tfb-status doesn't have all the framework metadata until a run completes, we can no longer show the "visualize" links for in-progress runs, so those links will be provided for completed runs only.

-Michael

-Michael


To unsubscribe from this group and stop receiving emails from it, send an email to framework-benchmarks+unsubscrib...@googlegroups.com.

Anton Kirilov

unread,
Mar 23, 2018, 2:02:56 PM3/23/18
to framework-benchmarks
Hello,

It may be related, but when I look at the pages for older rounds, the results for frameworks that have been from the source tree don't appear. For example, look for lwan in round 12.

Tony

Michael Hixson

unread,
Mar 23, 2018, 6:27:21 PM3/23/18
to Anton Kirilov, framework-benchmarks
Hi Tony,

Thanks for bringing that to our attention. It turns out that problem
is not new (it is unrelated to the fix for continuous runs we deployed
today), and it's an unintended behavior of the tool we use for
managing that tfb-lookup.js file. We're not currently planning to
spend any time on implementing a fix, though. Maybe we can revisit it
later when we're done with the transition to docker.

-Michael
> --
> You received this message because you are subscribed to the Google Groups
> "framework-benchmarks" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to framework-benchm...@googlegroups.com.

Anton Kirilov

unread,
Apr 19, 2018, 10:09:18 PM4/19/18
to framework-benchmarks
Hi,

The results dashboard has proved to be really useful for performance tuning, so thank you very much for adding that functionality! However, I noticed two more issues - one of them is that if I try to visualize one of the runs, then the filters (e.g. for language) do not seem to work, and the results Web site shows the full data set. The other, more interesting problem, is that the fastest implementations in the JSON test for run 8de314d3-b970-4295-a3ca-3682f3905a3b have scored around 1.9 million RPS, while in the later runs they hover around 1.1 million RPS. On the other hand, the plaintext results have increased from approximately 7 million to 9.2 million RPS (and the rest of the tests don't seem to be affected much). Is the reason for those differences well understood?

Best wishes,
Tony

msm...@techempower.com

unread,
Apr 20, 2018, 10:57:26 AM4/20/18
to framework-benchmarks
Hi Tony,

We are aware of the filters panel being broken on visualization, and we will get that fixed up.

Run 8de314d3-b970-4295-a3ca-3682f3905a3b was while we were still on the experimental `docker` branch, meaning we had not yet merged into master and started any runs we would consider representative. I browsed the logs of the two runs for a bit, but did not see anything abnormal. My guess is that we still had some kinks in the tooling that we were working out. The next run (45a20e4b-1d25-461f-b841-ca2b0ebaf601) and and beyond are to be considered more representative of the toolset being more firmly tested and fleshed out.

I will say that there was a bug in the `wrk` script we were using prior to 45a20e4b-1d25-461f-b841-ca2b0ebaf601 and that MAY have produced results like you see in 8de314d3-b970-4295-a3ca-3682f3905a3b.

Anton Kirilov

unread,
May 12, 2018, 11:44:24 AM5/12/18
to framework-benchmarks
Hello,

Another interesting bit that I noticed on the dashboard - vertx-postgres and vertx-web-postgres perform significantly better than the rest on some of the database benchmarks, and I have been curious to see what techniques allow them to achieve that. Apparently the reactive Postgres client used by them supports command pipelining (up to 256 commands by default, and can be changed by the setPipeliningLimit() method of the PgPoolOptions class), and a quick check with Wireshark shows that the database server may return several results in a single TCP packet. Doesn't that conflict with the requirement that every query results in a full round-trip to the server?

Best wishes,
Tony

Michael Hixson

unread,
May 14, 2018, 10:46:27 PM5/14/18
to Anton Kirilov, framework-benchmarks
Ah-ha! I was also wondering about vertx-postgres. I tried using the
same Postgres client with the Undertow framework and wasn't able to
get anything close to the same performance. If it's just a matter of
pipelining and that's enabled default, I think I should've gotten
similar results. I wonder what I did wrong...

Anyway, thanks for solving that puzzle. You're right, that does sound
like a violation of the round-trip requirement. Perhaps we can add a
setPipeliningLimit(1) line and call it a day. Now I wonder if other
implementations are doing this but they're slow enough (for other
reasons) that no one has noticed. It's a cool feature.

-Michael

Anton Kirilov

unread,
May 15, 2018, 7:32:49 PM5/15/18
to framework-benchmarks
Yes, it is a cool feature, which the standard C client library, libpq, for example, does not support, unfortunately. As for Undertow, if it is not really prepared to deal with fully non-blocking operation on the database side (which the reactive Postgres client provides), then I doubt that it will benefit from switching the client. I am not really fluent in Java, but I had a quick look, and Undertow seems to be using blocking request handlers, which doesn't sound too encouraging to me. Anyway, all framework implementations should be on a level playing field.

Tony

Julien Viet

unread,
May 16, 2018, 3:39:23 AM5/16/18
to Michael Hixson, Anton Kirilov, framework-benchmarks
Hi,

can you elaborate why it violates the Requirements #6 ?

each query is sent as an individual Postgres command (in the PG protocol) and has its own round trip to the database, besides there is no query aggregation or use of batching.

in asynchronous libraries it is very common to implement this way because it is more efficient, and that's what other DB do as well (like https://redis.io/topics/pipelining)

regards

Julien

Anton Kirilov

unread,
May 16, 2018, 12:05:16 PM5/16/18
to framework-benchmarks
Hello,

> ... has its own round trip to the database...
This is the contentious part - it is very easy to check with Wireshark that the database returns several query results together in a single TCP packet. Consider the multiple queries test, for example - the behaviour I have seen when the parameter is 20 is that there are 20 packets from the client to the database, but there is one packet with a single result from the database to the client, and the other 19 results arrive together. So, I wouldn't say that each query has its own round trip with certainty - obviously it shares with the others to a certain extent. In fact, even the Redis page you have linked to clearly states that pipelining makes only the first command (in their example) pay the full RTT cost.

Anyway, it is a little bit of a gray area, hence I have asked for a clarification.

Tony

Daniel Nicoletti

unread,
May 16, 2018, 1:50:56 PM5/16/18
to Anton Kirilov, framework-benchmarks
2018-05-16 13:05 GMT-03:00 Anton Kirilov <antonv...@gmail.com>:
> In fact, even the Redis page you have
> linked to clearly states that pipelining makes only the first command (in
> their example) pay the full RTT cost.

Surely the number of packets will be reduced, but to me this means it is an
improved implementation and it's not trying to cheat the test.

When the tests says you can't batch your queries is because the idea behind
it is that different clients will make different queries say:
select name from users where id = :id;
where each client will have a different id and thus you can't batch it.

What they are doing with pipelining is that as each request arrives they
do a query like that, but it pill up and is eventually sent to the database,
the number of TCP packets shouldn't matter as 20 requests created 20
queries this wouldn't be different in real world, so I believe this is totally
valid and in fact is one of the things I plan to implement for my framework
as soon as I fix it's async processing.



--
Daniel Nicoletti

KDE Developer - http://dantti.wordpress.com

INADA Naoki

unread,
May 16, 2018, 2:01:51 PM5/16/18
to Daniel Nicoletti, Anton Kirilov, framework-benchmarks
This idea was declined, as far as I remember.
Imagine that one query depends on result of previous query.  Pipelining shouldn't work for such scenario.

2018年5月17日(木) 2:50 Daniel Nicoletti <dant...@gmail.com>:

Daniel Nicoletti

unread,
May 16, 2018, 2:06:25 PM5/16/18
to INADA Naoki, Anton Kirilov, framework-benchmarks
Of course it would work, you will have a "callback" for the previous one
once it arrives you will do the second query, that will also be pipelined,
I'm not sure how this would work for transactions, but I believe that if the
db allows you can have a transaction id where queries go arriving.

Still this would be up to the db to support that and in our test we don't
exercise query dependency.

Julien Viet

unread,
May 16, 2018, 5:34:20 PM5/16/18
to Daniel Nicoletti, INADA Naoki, Anton Kirilov, framework-benchmarks
yes, I agree with you

when you have dependant queries then any implementation needs to wait for the response before sending a query (that's an obvious fact)

from the perspective of the Postgres protocol there is not violation or misuses, the packets are simply sent more efficiently to make a better usage of TCP.

INADA Naoki

unread,
May 17, 2018, 2:11:10 AM5/17/18
to Daniel Nicoletti, Anton Kirilov, framework-benchmarks
On Thu, May 17, 2018 at 3:06 AM Daniel Nicoletti <dant...@gmail.com> wrote:

> Of course it would work, you will have a "callback" for the previous one
> once it arrives you will do the second query, that will also be pipelined,
> I'm not sure how this would work for transactions, but I believe that if
the
> db allows you can have a transaction id where queries go arriving.

> Still this would be up to the db to support that and in our test we don't
> exercise query dependency.


Test implementation should show performance like queries are depending to
previous query. In other words, you must to prevent pipelining.

In case of vertx-postgres, you need to wait result of previous query before
sending next query. Current code seems violates regulation.

See also:
https://github.com/TechEmpower/FrameworkBenchmarks/issues/2204#issuecomment-241571077

Regards,

--
INADA Naoki <songof...@gmail.com>

Julien Viet

unread,
May 17, 2018, 3:12:22 AM5/17/18
to INADA Naoki, Daniel Nicoletti, Anton Kirilov, framework-benchmarks
hi Inada,

this is a clarification that I was unaware of.

it should be written imho in the Requirements section and not be buried in a comment on GitHub :-)

So we should change the test so that it is written in a synchronous manner (with asynchronous code).

I will make a patch soon so multiple queries benchmark

that being said, I think the actual TFB is really beneficial to blocking / synchronous implementations because either

- benchmarks use up to 16k connections but with no backend
- benchmarks use a back-end but don't use many concurrent requests (up to 256 afair) so actually doing a 1-1 mapping between application server code to backend connection pool, i.e it is an ideal situation where each application server thread has a 1-to-1 mapping with a DB connection. As the database queries are trivial, it ends up with a very small latency giving very good results with synchronous servers.

In real world you rather have instead more concurrent requests that uses a backend with a smaller concurrency (when it's an non-multiplexed backend like Postgres or MySQL).

I believe it would be very interesting to have a couple of benchmarks with a greater front concurrency that uses a possibly asynchronous backend (like another HTTP server that fakes a small latency) or with a database with a multiplexed protocols.

Julien

Michael Hixson

unread,
May 17, 2018, 4:02:32 AM5/17/18
to Julien Viet, INADA Naoki, Daniel Nicoletti, Anton Kirilov, framework-benchmarks
Well, before you write that patch...

I don't agree that it's a requirement for the queries to be run
sequentially. Without looking, I'm guessing we have many
implementations that run the queries in parallel. I bet that most or
all of the JavaScript implementations are this way.

"Conditional branching queries" is just an example. The followup
discussion to that GitHub issue that happened on our mailing list
clarifies this. See this post especially:
https://groups.google.com/d/msg/framework-benchmarks/nePDNY9jp-4/VRc_YZx5FQAJ

-Michael

INADA Naoki

unread,
May 17, 2018, 4:36:58 AM5/17/18
to michael...@gmail.com, jul...@julienviet.com, Daniel Nicoletti, Anton Kirilov, framework-benchmarks
On Thu, May 17, 2018 at 5:02 PM Michael Hixson <michael...@gmail.com>
wrote:

> Well, before you write that patch...

> I don't agree that it's a requirement for the queries to be run
> sequentially. Without looking, I'm guessing we have many
> implementations that run the queries in parallel. I bet that most or
> all of the JavaScript implementations are this way.

> "Conditional branching queries" is just an example. The followup
> discussion to that GitHub issue that happened on our mailing list
> clarifies this. See this post especially:

https://groups.google.com/d/msg/framework-benchmarks/nePDNY9jp-4/VRc_YZx5FQAJ

> -Michael


OK, concurrent query may be valid. But in that message,

"Another example reason is that you need to interact with N separate
databases on different servers."

Clearly, automatic pipelining shouldn't happened. N roundtrips must not be
batched or pipelined
explicitly or automatically.

--
INADA Naoki <songof...@gmail.com>

Daniel Nicoletti

unread,
May 17, 2018, 9:09:17 AM5/17/18
to INADA Naoki, Michael Hixson, jul...@julienviet.com, Anton Kirilov, framework-benchmarks
Since you are quoting: " We want to exercise the framework and platform's
database driver code (and ORM, where applicable) repeatedly during the
scope of each request."

This is exactly exercising driver code. Interaction with N separate servers
still has nothing to do with pipelining, if your application needs to talk to N
servers each query must be explicit on which server it will run, and
this doesn't
forbid pipelining. You will still pipeline to those servers.

I believe this might be an endless discussion if TE guys don't step in and
make a explicit definition on this, maybe using some kind of voting...

>
> --
> INADA Naoki <songof...@gmail.com>



--

Brian Hauer

unread,
May 17, 2018, 12:45:31 PM5/17/18
to framework-benchmarks
We will ultimately make a decision on this and clarify the requirements.  We had wanted to solicit additional input in case there was a compelling argument that balanced the objectives of this test alongside the reality of the Postgres protocol's capability.  This isn't likely to be something that has a definitive solution, so I don't expect someone to arrive with a silver bullet.  Nevertheless, I'd like to keep this conversation open for a little longer before we make a decision since others may still join in.

Internally, we debated this for a while and arrived at a tentative agreement that we prefer prohibiting pipelining because this test type is intended to exercise repeated full round-trips to an external database server, one per iteration.  The Postgres protocol capability to pipeline those was a curve ball.  For the record, I was not previously even aware of the pipelining feature provided by the Postgres protocol.  It leads to theoretical questions such as, "What would we do if a new database platform arrived and the protocol pipelined by default?"  But as a practical matter, this test type is still serving both as an exercise of database connectivity but also as a vague proxy for generic round-trip communication with an external service.  We are therefore instinctively resistant to any implementations that actively or passively avoid the network impact of a full round-trip per iteration.  That is the case despite the fact that this Postgres feature strikes me as fairly awesome and definitely something I would use.

In summary, we are leaning toward prohibiting pipelining to keep consistent with the original intent of this test type.  If there are more opinions, we'd like to hear them.  We'll aim to have a decision one way or the other by roughly the end of the week.  Also, bear in mind that any decision could be later reversed.

Daniel Nicoletti

unread,
May 17, 2018, 3:01:38 PM5/17/18
to Brian Hauer, framework-benchmarks
2018-05-17 13:45 GMT-03:00 Brian Hauer <teona...@gmail.com>:
> "What would we do if a new database platform arrived and the protocol pipelined by default?"

I can also think that, some future driver implementation might only
support doing pipelined requests.

IMO what should draw a line here is how close is this to real world
apps, this isn't like stripped
implementations that were hand crafted to perform better, any user of
such driver will benefit
without writing different code.

My driver Qt Sql doesn't support this and in fact it's implementation
is a blocking one, I'd
need to write a complete new module to do Sql in async, still thanks
to knowing about
this feature I'm tempted to start writing an improved implementation.
This should the be
feeling of other FW authors that want better performance.

--
Daniel Nicoletti

KDE Developer - http://dantti.wordpress.com

> --
> You received this message because you are subscribed to the Google Groups
> "framework-benchmarks" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to framework-benchm...@googlegroups.com.
> Visit this group at https://groups.google.com/group/framework-benchmarks.
> For more options, visit https://groups.google.com/d/optout.



Philip

unread,
May 17, 2018, 3:25:45 PM5/17/18
to Daniel Nicoletti, Brian Hauer, framework-benchmarks
We are simulating requests from multiple independent clients. The idea is to see how many clients we can support at once. It doesn't make any sense to send multiple responses in the same packet because in a real scenario, they would be sent to different IPs.

It might be overkill, but we could try to spoof IPs for each client request. Then instead of a rule and a discussion about pipelining, we just have a structural feature of the testing that obviates the need or benefit of pipelining responses.

--
Daniel Nicoletti

KDE Developer - http://dantti.wordpress.com

--
You received this message because you are subscribed to the Google Groups "framework-benchmarks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to framework-benchmarks+unsub...@googlegroups.com.

Daniel Nicoletti

unread,
May 17, 2018, 3:32:29 PM5/17/18
to Philip, Brian Hauer, framework-benchmarks
2018-05-17 16:25 GMT-03:00 Philip <bran...@gmail.com>:
> We are simulating requests from multiple independent clients. The idea is to
> see how many clients we can support at once. It doesn't make any sense to
> send multiple responses in the same packet because in a real scenario, they
> would be sent to different IPs.

So you connect your client request directly do the database? Sorry but
this doesn't
make any sense.

> It might be overkill, but we could try to spoof IPs for each client request.
> Then instead of a rule and a discussion about pipelining, we just have a
> structural feature of the testing that obviates the need or benefit of
> pipelining responses.

I'm sorry but IP doesn't matter at all in this issue. The framework will do
the multiplexing, you can have 10 different clients arriving at your FW
with different IPs, they will pile up in a TCP packet and be sent, when
the DB send the reply you know to which client the query was for,
if you don't your framework is just broken.
>> > email to framework-benchm...@googlegroups.com.
>> > Visit this group at
>> > https://groups.google.com/group/framework-benchmarks.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> Daniel Nicoletti
>>
>> KDE Developer - http://dantti.wordpress.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "framework-benchmarks" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to framework-benchm...@googlegroups.com.

Julien Viet

unread,
May 17, 2018, 5:28:37 PM5/17/18
to Brian Hauer, framework-benchmarks
Hi Brian,

in the case you opt for non pipelining, I believe that the benchmark should actually be modified to create a data dependency so that each next query has to effectively wait the database response to be executed.

This way it would universally work for all framework and guarantees the semantic the benchmark exhibits, whatever the technology used.

regards

Julien


INADA Naoki

unread,
May 17, 2018, 8:22:06 PM5/17/18
to Daniel Nicoletti, Michael Hixson, jul...@julienviet.com, Anton Kirilov, framework-benchmarks
> > OK, concurrent query may be valid. But in that message,
> >
> > "Another example reason is that you need to interact with N separate
> > databases on different servers."
> >
> > Clearly, automatic pipelining shouldn't happened. N roundtrips must
not be
> > batched or pipelined
> > explicitly or automatically.

> Since you are quoting: " We want to exercise the framework and platform's
> database driver code (and ORM, where applicable) repeatedly during the
> scope of each request."

> This is exactly exercising driver code. Interaction with N separate
servers
> still has nothing to do with pipelining, if your application needs to
talk to N
> servers each query must be explicit on which server it will run, and
> this doesn't
> forbid pipelining. You will still pipeline to those servers.


How can it use pipelining?
Doesn't pipelining mean packing multiple queries in one TCP (or UDP) packet?

--
INADA Naoki <songof...@gmail.com>

INADA Naoki

unread,
May 17, 2018, 8:35:00 PM5/17/18
to Daniel Nicoletti, teona...@gmail.com, framework-benchmarks
On Fri, May 18, 2018 at 4:01 AM Daniel Nicoletti <dant...@gmail.com> wrote:

> 2018-05-17 13:45 GMT-03:00 Brian Hauer <teona...@gmail.com>:
> > "What would we do if a new database platform arrived and the protocol
pipelined by default?"

> I can also think that, some future driver implementation might only
> support doing pipelined requests.

My opinion is:

* If the DB doesn't support transaction or the protocol support
multiplexing concurrent transactions,
it's OK to multiplexing (and pipelining) queries from concurrent request.

* For queries in one HTTP request, previous result must be read before
sending next query.
Pipelining queries in single HTTP request is not allowed.
INADA Naoki <songof...@gmail.com>

Daniel Nicoletti

unread,
May 17, 2018, 10:06:10 PM5/17/18
to INADA Naoki, Michael Hixson, jul...@julienviet.com, Anton Kirilov, framework-benchmarks
Yes, the number o packets is up to the OS to decide
but why is that a problem with multiple DB servers?

You seen to think that one can't know which query on the pile
of queries belong to each HTTP request.

Pipelining can reduce the number of TCP packets but IMO
most importantly it can keep the DB server busier and this
is the reason why I believe this should be allowed.

Daniel Nicoletti

unread,
May 17, 2018, 10:11:27 PM5/17/18
to INADA Naoki, Brian Hauer, framework-benchmarks
2018-05-17 21:34 GMT-03:00 INADA Naoki <songof...@gmail.com>:
> On Fri, May 18, 2018 at 4:01 AM Daniel Nicoletti <dant...@gmail.com> wrote:
>
>> 2018-05-17 13:45 GMT-03:00 Brian Hauer <teona...@gmail.com>:
>> > "What would we do if a new database platform arrived and the protocol
> pipelined by default?"
>
>> I can also think that, some future driver implementation might only
>> support doing pipelined requests.
>
> My opinion is:
>
> * If the DB doesn't support transaction or the protocol support
> multiplexing concurrent transactions,
> it's OK to multiplexing (and pipelining) queries from concurrent request.

But this is what is being done. Many different HTTP clients are processed
and put on the socket buffer to be sent, there is no query dependency
because each query comes from a different client.

After all, the DB tests all are done without HTTP pipelinig.

> * For queries in one HTTP request, previous result must be read before
> sending next query.
> Pipelining queries in single HTTP request is not allowed.

The only test where one HTTP request generates multiple queries
is the "Multiple Queries" test. And this one explicit allows for batching.

Daniel Nicoletti

unread,
May 17, 2018, 10:18:03 PM5/17/18
to INADA Naoki, Brian Hauer, framework-benchmarks
Even SMTP servers support pipelining, forbidding
it will cause a penalty on async frameworks due
the fact that as opposed to sync FW, that will do
the query immediately, it will put the query on a queue
and will only process when it returns to the event loop,
this essentially make async FW slower.

b...@facil.io

unread,
May 17, 2018, 11:03:44 PM5/17/18
to framework-benchmarks
Although my framework avoids pipelining in it's Redis implementation (which supports pipelining), I believe pipelining should be allowed.

While my framework prefers to validate receipt of each query / command before sending the next query / command (minimizing the risk of repeated commands in cases of disconnections), others might prefer speed over connection validation. 

I'm not sure which should be preferred, but I'm pretty sure I would like to see the price I'm paying on the dashboard.

Kindly,
   B.

INADA Naoki

unread,
May 18, 2018, 1:15:31 AM5/18/18
to b...@facil.io, framework-benchmarks
In case Redis, I think it's OK to enable pipelining.
Redis'es transaction is very different from RDB transactions.
And comparing Redis performance against RDB performance is nonsense.

On the other hand, we have prohibited batching / pipelining for a long time.
We have many frameworks, to **compare** performance.
In this project, fairness is important.

If you want to measure peak performance, you can do it yourself.
But in this project, and existing test scenario, I strong -1 to allow it.

Daniel Nicoletti

unread,
May 18, 2018, 8:53:04 AM5/18/18
to INADA Naoki, Brian Hauer, framework-benchmarks
2018-05-17 23:11 GMT-03:00 Daniel Nicoletti <dant...@gmail.com>:
> The only test where one HTTP request generates multiple queries
> is the "Multiple Queries" test. And this one explicit allows for batching.

Hmm sorry my bad, the only that allows for batching is the /updates:

"Use of batch updates is acceptable but not required. To be clear:
batches are not permissible for selecting/reading the rows, but
batches are acceptable for writing the updates."

Brian Hauer

unread,
May 18, 2018, 11:08:46 AM5/18/18
to framework-benchmarks
Incidentally, I (and I think others) feel some regret about the prior concession that we made to allow implementations of the Updates test type use a batch for writes.  It's not a huge regret—not sufficient to yet consider rolling it back.  But it's there, and we're reminded of it when we consider the underlying bifurcation in the Updates results between those implementations that use individual round-trips to write (as originally conceived) and those that use a batch operation to write.

That colors my current perspective on this matter: will an additional concession eventually be something we regret?  On the flip side, enabling pipelining in communication with external services is probably a good idea outside of our context—I had an immediate reaction of "that's awesome, I want that!" when I read about it.

Whatever the outcome, remember that in order to balance providing a useful proxy for real world applications while also allowing for widespread implementation, we have formulated constraints that, taken out of context, seem funny: Who wouldn't use batching for this? Who wouldn't use a single SELECT...IN query? Who needs all of these meaningless random numbers from a database anyway? :)

I think it comes down to two issues:
  1. How much we believe the individual network round-trips per iteration were intended to be an immutable characteristic of this test type.  That was what we had in mind, but how immutable did we really intend this to be?  We have previously disallowed implementations that use SQLite because doing so puts the database local/in-process and avoids all network traffic.
  2. We already have batch updates versus non-batch updates in the Updates test causing an invisible confounding variable in the results, and we believe it's non-trivial.  Are we comfortable with another similar confounding variable, and one that affects other test types (not just Updates)?  With some work, we could add attributes to distinguish these approach variances (e.g., something that succinctly says "Realistic with Database Pipelining").

Michael Hixson

unread,
May 22, 2018, 1:43:55 PM5/22/18
to Brian Hauer, framework-benchmarks
Thanks for your comments, everyone.

We (TechEmpower) have decided to allow pipelining between the
application and database. We'll clarify the requirements to call this
out specifically.

Pipelining does avoid some of the per-request network overhead that we
originally intended to be part of all the database tests. But what it
avoids is just that: overhead. It uses the network more efficiently
while still doing the essential work. It issues a query over the
network to a database server and receives a result back. We don't
care what the individual packets look like.

Also, pipelining doesn't "take advantage" of our test scenarios
relative to real scenarios. As far as we can tell, pipelining would
be just as effective in a real application doing many different kinds
of queries - the kind of application our tests are meant to simulate.
It doesn't care about the particular structure of our queries. It
works just as well in the multiple-query test regardless of whether
the queries depend upon results from previous queries, as individual
queries from multiple HTTP requests can still be pipelined together.

We like to think that this project encourages web frameworks and
related tools to be faster. Disallowing this form of pipelining would
seem to do the opposite. It's a novel and useful feature that we'd
like to see appear in more database drivers. Hopefully, by allowing
pipelining in our test implementations, we're pushing towards that
eventual outcome. We're willing to accept that, at least for the time
being, our database test results may show a large split between
implementations that use pipelining and those that don't, heavily
favoring pipelining.

Decisions about requirements like this are never final, but this is
the decision we're running with for now.

-Michael

Anton Kirilov

unread,
May 22, 2018, 6:25:53 PM5/22/18
to framework-benchmarks
Hello,

Thanks for the clarification! Note that this decision may create some further controversial cases: For example, the PostgreSQL protocol allows multiple statements in a simple query ( https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT ), and if each statement is surrounded by "BEGIN;" and "COMMIT;", then this is functionally quite similar to a pipeline. A clever asynchronous/non-blocking database driver that keeps a queue of queries to execute could automatically merge several of them before the next communication with the database, and the observable behaviour will probably be indistinguishable from true pipelining unless one inspects the traffic with Wireshark. In fact, IMHO it is feasible to implement that with, for instance, libpq, which doesn't support pipelining per se.

More importantly, I would like to report another issue - the cached queries test type seems to be currently broken because the parameter specifying the number of queries is not set, i.e. the URL ends with "?queries=". The last good run is 8de314d3-b970-4295-a3ca-3682f3905a3b ( https://tfb-status.techempower.com/results/8de314d3-b970-4295-a3ca-3682f3905a3b ).

Best wishes,
Tony

rik...@ngs.hr

unread,
May 24, 2018, 5:40:07 PM5/24/18
to framework-benchmarks
Hi Michael,


On Tuesday, May 22, 2018 at 7:43:55 PM UTC+2, Michael Hixson wrote:
Thanks for your comments, everyone.

Pipelining does avoid some of the per-request network overhead that we
originally intended to be part of all the database tests.  But what it
avoids is just that: overhead.  It uses the network more efficiently
while still doing the essential work.  It issues a query over the
network to a database server and receives a result back.  We don't
care what the individual packets look like.


Multiple queries and Updates tests are bound by network rountrips.
Optimized frameworks are mostly idling on those tests.
 
We like to think that this project encourages web frameworks and
related tools to be faster.  Disallowing this form of pipelining would
seem to do the opposite.  It's a novel and useful feature that we'd
like to see appear in more database drivers.  Hopefully, by allowing
pipelining in our test implementations, we're pushing towards that
eventual outcome.  We're willing to accept that, at least for the time
being, our database test results may show a large split between
implementations that use pipelining and those that don't, heavily
favoring pipelining.

 
Regards,
Rikard

Michael Hixson

unread,
May 24, 2018, 5:58:42 PM5/24/18
to rik...@ngs.hr, framework-benchmarks
On Thu, May 24, 2018 at 2:40 PM, <rik...@ngs.hr> wrote:
>
> Does this mean Revenj can now enable application level pipelining:
> https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/CSharp/revenj/Revenj.Bench/Context.cs#L32
> ?

I don't know what Revenj does there. If I enabled that commented-out
code and I enabled postgres's query log, would the query log show a
separate "SELECT * FROM world WHERE id = ?" statement for each query,
or would it show a bulk query like "SELECT * FROM world WHERE id IN
...?" If it's the latter, then no, that is still disallowed.

-Michael

rik...@ngs.hr

unread,
May 24, 2018, 6:03:32 PM5/24/18
to framework-benchmarks
It would show neither of those.
It would show aggregation of specified queries in a complex aggregated query, meaning if you request

1) select * from world where id = 1
2) select * from fortunes
3) select * from world where id in (2,3)

it would show a query which allows to execute all those expressions in a single roundtrip
Note that such expression is nontrivial to parse so it's not practical to do it manually.

Regards,
Rikard

Michael Hixson

unread,
May 24, 2018, 6:27:36 PM5/24/18
to rik...@ngs.hr, framework-benchmarks
Based on that description, I would say no, we still don't allow that.
We decided to allow pipelining as a network-level optimization, but we
still do not allow bulk selects to occur at the database level.

-Michael

rik...@ngs.hr

unread,
May 24, 2018, 6:56:54 PM5/24/18
to framework-benchmarks
Ok. Tnx for the clarification.
It's a shame though that only certain kind of real world optimizations are allowed.
Especially that optimization which cannot be replicated by any other framework
or manual implementation (in a generic manner) is disallowed ;(

Anyway, regarding Vertx implementation:

"It works just as well in the multiple-query test regardless of whether
the queries depend upon results from previous queries, as individual
queries from multiple HTTP requests can still be pipelined together."

This is not true, since if their implementation was forced to respect dependency across the
request/reponse that pipeline would not be as effective.
Also, its not really true that you will benefit from pipelining on different HTTP request,
because sharing connection on different HTTP request is not common at all.

Regards,
Rikard

Shay Rojansky

unread,
Jun 2, 2018, 4:47:56 AM6/2/18
to framework-benchmarks
Hi Michael, Shay here, the maintainer of the PostgreSQL .NET driver (Npgsql).

I'd just like to make sure with regards to one thing. Npgsql allows you to pipeline (or batch) several statements in a single roundtrip by including them them in the same "command" (commands are a .NET database API concept, not a PostgreSQL concept):

var cmd = new NpgsqlCommand("SELECT 1; SELECT 2", connection);
var reader = cmd.ExecuteReader();

Npgsql implements this by parsing the provided string client-side, splitting on the semicolon, and sending separate protocol messages for each query (potentially combined in a single TCP packet). From the PostgreSQL perspective these are two completely different SQL statements, and will show up as such in the logs; it looks exactly the same as executing with two commands with two separate API calls, apart from the fact that the latter would result in two roundtrips (and will be much slower). In other words, the above seems to correspond to what you've defined as a pure "network-level optimization", rather than a database-level bulk select (where a single complex SQL statementis involved).

There has been some doubts on this since the Npgsql API for this involves packing the two SQL statements into a single string (command), so I'd like to be crystal clear that this is allowed. Again, the string which contains "SELECT 1; SELECT 2" does not seem to be a "single complex query": it is broken down client side and sent as two completely separate PostgreSQL queries.

Thanks for your time and feedback!

Anton Kirilov

unread,
Jun 2, 2018, 8:34:32 AM6/2/18
to framework-benchmarks
Hi Shay,

I don't know how much this applies to Npgsql, but if you do the same with libpq (the C driver), i.e. send several statements separated by semicolons, then all of them will be executed in a single transaction, which is definitely not a network-level optimization. Actually, I was discussing the same thing in my previous message in this thread, and the work-around is to use explicit "BEGIN;" and "COMMIT;" statements, but I am also not sure how acceptable that would be.

On an unrelated note, the cached queries test still seems to have problems, and I have opened an issue on GitHub, but I don't know if it has received any attention:

Best wishes,
Tony

Michael Hixson

unread,
Jun 2, 2018, 1:56:28 PM6/2/18
to framework-benchmarks
We stopped short of putting this detail in the requirements text, but
maybe we'll need to do that after all.

For the kind of "pipelining" we do allow, it was my intent that
multiple queries do NOT get batched together in such a way that the
batching itself affects query semantics at all, especially with
respect to failed queries. For example, what would happen if you
changed that Npgsql command to "SELECT 1; SELECT foo bar; SELECT 2",
where "foo bar" is nonsense that will cause that middle SELECT to
fail? Will the "SELECT 2" query still be executed? If not, then that
sort of batching violates requirements as I see them. Why? Because
that's exploiting simplifications in the test scenarios in a way that
goes against the spirit of the tests, where the same optimization
wouldn't necessarily be a good idea in the kind of real production
applications we're trying to simulate. Meanwhile, as I understand the
pipelining that vertx-postgres is doing, there's no reason not to
enable that kind of pipelining.

The Postgres JDBC driver also allows for multiple queries separated by
semicolons, each producing its own result set. I tried this with one
of the Java frameworks in TFB, where I had the multi-query endpoint
construct its mega-query like this:

String sql = Strings.repeat(
"select * from world where id = ?;",
getQueryCount(httpRequest));

The performance was something like 3x better that way than it was when
it made each query separately.

I didn't notice a difference in Postgres's query logs until I changed
the "log_line_prefix" setting, and then I saw that the individual
queries in each mega-query were sharing the same "virtual transaction
id":
https://www.postgresql.org/docs/9.5/static/runtime-config-logging.html#GUC-LOG-LINE-PREFIX

That's another thing you could look for if you try this with Npgsql.
vertx-postgres's queries do not share a virtual transaction id, for
what it's worth.

Anton, when I first read your previous message about using "BEGIN" and
"COMMIT" I didn't understand what you were getting at. Were you
basically trying to isolate failures? When I tried this with Java,
the queries each then had a separate virtual transaction id, but the
"SELECT foo bar" failure still killed the remainder of the mega-query.
I was left confused about how that works. What does Postgres call
that unit of execution if not a transaction, and is there any way to
identify it in the logs?

If the explanation about query semantics and failures seems really
lame to most people, if the general feeling in the community becomes
that we're trying to contort the requirements to favor vertx-postgres
(or the reactive-pg-client library that it uses), that could cause us
to reverse our decision to allow pipelining. I thought the
"network-level optimization" text and concept was sufficient, but
perhaps it is not.

---------------

With regard to the cached queries test, Anton: thanks for reporting
those issues. We'll get to them eventually. That test type is still
not displayed on the main results website, so issues with that test
type's requirements or implementation haven't been super high priority
for us.

-Michael

Anton Kirilov

unread,
Jun 2, 2018, 8:57:35 PM6/2/18
to framework-benchmarks
Hi Michael,

What I was trying to do with the "BEGIN" and "COMMIT" trick was to come up with a way to achieve the effects of pipelining without actually using pipelining (so that approach would work without special support in the database driver), while at the same time preserving the query semantics. As far as I can tell, the transaction semantics are the same, but I must admit that I forgot about the failure case. I suppose that the clever database driver I was talking about could replay the queries that came after the failed one (so in the worst case the number of packets sent would be the same as without any query combining or pipelining, though the transferred data would be larger), but, anyway, it was mainly a thought experiment (and one that would not be acceptable in the project, it seems). As for the term that PostgreSQL uses for that unit of execution - I don't have much more information than what is written in the documentation page I have linked to before.

Concerning the cached queries test, are there any plans to make it visible on the results Web site?

Best wishes,
Tony

Shay Rojansky

unread,
Jun 4, 2018, 11:48:07 AM6/4/18
to framework-benchmarks
Thanks for the details Michael and Anton. I'd definitely add more documentation on this requirement.

But I'd like to better understand the reasoning behind this, as it seems a bit odd to me - I'd like to better understand why the error handling semantics of a batch of statements should have any influence on whether batching is permitted or not in a benchmark. I do understand the reasons behind disallowing "SELECT * FROM world WHERE id IN (x, y, z)", as this reduces multiple SQL statements into one, and may or may not be supported across database types. However, with the batching we really are sending several distinct SQL statements.

I'll go into some details even though it may be a bit excessive. To implement batching/pipelining, Npgsql sends wire protocol messages in a way which causes later messages to be skipped after an error. A single SQL statement is sent to PostgreSQL via the following chain of messages: Parse/Bind/Describe/Execute/Sync. If any message provokes an error (say the Parse one in this example, because of malformed SQL), PostgreSQL will skip all messages until the next Sync, allowing the client and server to continue communicating. A batch is sent by sending the following chain: Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. The single Sync at the end produces the same behavior: if any error occurred with any messages, everything else is skipped up to the Sync. Npgsql's batching *could* have been implemented by inserting a Sync after Execute1, reversing the behavior and allowing later queries to execute even after failure. It's quite odd for this to be a decisive factor in deciding whether batching is allowed or not...

To summarize, pipelining (obviously) has a decisive impact on the performance results of any benchmark. If the intent is to isolate single query perf - which is completely fine and implies disabling pipelining completely, the best way seems to send queries which depend on previous query's return values. On the other hand, if you want to allow pipelining where it's supported, why not allow it regardless of what happens when a failure occurs in the batch? Put another way, if I now changed Npgsql's behavior to omit a Sync message in the middle of the above batch, does it make sense for it to "start performing much better" because its pipelining starts to qualify for your benchmark?

Michael Hixson

unread,
Jun 5, 2018, 7:27:49 PM6/5/18
to Anton Kirilov, framework-benchmarks
On Sat, Jun 2, 2018 at 5:57 PM, Anton Kirilov <antonv...@gmail.com> wrote:
>
> Concerning the cached queries test, are there any plans to make it visible
> on the results Web site?
>

I don't think there are any concrete plans, someone just needs to do
it. I'd say that round 17 would be a reasonable target for this, but
then again we don't have anyone on TFB full time right now, so maybe
not.

-Michael

Michael Hixson

unread,
Jun 5, 2018, 8:30:44 PM6/5/18
to Shay Rojansky, framework-benchmarks
Hi Shay,

Thanks for giving us some of those low-level details from the database
driver perspective.

I agree that it would be odd for us to permit a batch statement like
"select * from world where id = ?; select * from world where id = ?;"
only on the condition that it kept executing through failures. That's
not what I meant to imply in my previous message. I would want to
disallow that form of batching regardless of how failures are handled.
That feature seems clearly different to me than the pipelining
implemented in reactive-pg-client (which vertx-postgres uses). I take
it we're having a disconnect there - you don't think those two
features are different, at least not in any way that we should care
about?

Since a few people have suggested making queries depend on previous
queries (including the author of reactive-pg-client?), let's explore
that a little bit. That would be a change to the multi-query test
requirements, so you would not expect that to affect the single-query
results at all, correct? Therefore, vertx-postgres would likely
remain in first place by a large margin in the single-query test, with
the key difference between it and other frameworks being pipelining?
In that theoretical future, I would expect vertx-postgres to remain in
first place in the multi-query test as well. It wouldn't be able to
issue queries #1-20 concurrently within the scope of an individual
HTTP request, but the wrk client would be making 512 concurrent HTTP
requests, which is more than enough to saturate that database driver's
current/default pipelining limit of 256. And I don't expect the
dependent queries / independent queries requirement to affect the
throughput of other frameworks either, at least not significantly.
Latency numbers might change some but I wouldn't be surprised if they
stayed the same just because of where current performance bottlenecks
for these tests are. Do you agree with my guesses about how the
dependent query requirement would affect results?

The dependent query requirement would prevent people from writing
semicolon-separated batch queries, but it wouldn't prevent
vertx-postgres from pipelining. That's kind of where I want to end
up, except I'd rather not expend all that development capital on
changing all the implementations. I'd rather clarify the requirements
in a way that satisfies everyone and where all the existing
implementations are compliant already. That's what I tried to do with
our recent small edit to the requirements, but apparently I didn't do
a good enough job.

I hope that better explains where I'm coming from. Let me know if I'm
not making sense.

-Michael
>> > email to framework-benchm...@googlegroups.com.
>> > Visit this group at
>> > https://groups.google.com/group/framework-benchmarks.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "framework-benchmarks" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to framework-benchm...@googlegroups.com.

Shay Rojansky

unread,
Jun 6, 2018, 4:57:11 AM6/6/18
to framework-benchmarks
Hi Michael, thanks for the attention, this is quite an interesting discussion.

Thanks for giving us some of those low-level details from the database
driver perspective.

Sure thing, I'll be happy to provide any more information if you guys need it - don't hesitate to ask.

I agree that it would be odd for us to permit a batch statement like
"select * from world where id = ?; select * from world where id = ?;"
only on the condition that it kept executing through failures.  That's
not what I meant to imply in my previous message.  I would want to
disallow that form of batching regardless of how failures are handled.
That feature seems clearly different to me than the pipelining
implemented in reactive-pg-client (which vertx-postgres uses).  I take
it we're having a disconnect there - you don't think those two
features are different, at least not in any way that we should care
about?

OK, so from the paragraph above, would it be true to say that it's less the error semantics that poses a problem for you guys, and more the fact that the two statements in question are executed via an explicit batching API, rather than simply by calling the regular, single-statement API twice? That does shift the discussion a bit.

To answer your question, yes - those features seem (almost) equivalent to me, and specifically I don't understand why we'd want to exclude one and not the other from a given benchmark (again, excluding both seems totally fine). To make sure we're synchronized on naming, I'll call the former batching (some sort of driver API is used to send multiple statements as a batch), and the latter pipelining (the regular single-statement driver API is used, but a second statement can be sent before the first one has completed and been consumed).

My first reason for thinking the above are equivalent is that in the context of a single HTTP request, they produce the same thing on the wire (modulu the error semantics which we've already put aside); I tend to see them as two different APIs for producing the same result, which is to send two statements at once, without waiting for the first to complete before sending the second. From this point of view, it seems that you are disallowing a specific type of API for doing something while allowing another.

Now, I do admit that pipelining and batching aren't exactly identical. Specifically, in a scenario where a program listens to requests (of some sort) as they come in and executes database operations for each one, pipelining does have an obvious advantage - database requests can be sent (enqueued) whenever needed, where with batching you can pack several statement up-front, but then you're blocked and have to wait for the batch to complete before sending any more. However, in the context of a HTTP request which produces multiple database operations, this advantage doesn't come into play: when the HTTP request arrives, we fully know the database operations we'll need to perform, and can batch them just as well as pipeline them - there's no difference. So while batching and pipelining aren't exactly the same, they do produce the same effect in the context of our benchmarks, unless I'm mistaken. It definitely possible to imagine another kind of benchmark where where multiple HTTP requests are sent gradually by the client, each triggering a database request as it comes in, at which point the pipelining vs. batching difference would become apparent - but that seems to me like a completely different benchmark.

(Note that in addition to batching and pipelining, there's what I'd call "multiplexing", which is sending database statements from *multiple clients (e.g. from different HTTP requests) over the same physical connection(s). I have no idea exactly what vertex-postgres does and whether it multiplexes.
I think we may have a slight terminology mismatch here (or I may be misunderstanding entirely)... In my mind, dependent queries would prevent what I refer to as pipelining: since we have to wait for a result from query A in order to send query B, how can we pipeline both of them? Maybe you're thinking of pipelining across multiple "clients" (HTTP requests), i.e. what I referred to above as multiplexing?

To be clear, I'm not saying that you should necessarily switch to dependent queries - the question is what you want to benchmark. If you want to measure one-statement-one-roundtrip, i.e. disabling any form of batching/pipelining/multiplexing, then dependent queries seems like a good way to achieve that (but not the only way), at least in part (you may also need to disable multiplexing in vertex or wherever that's supported). If you want to allow multiple-statements-per-roundtrip, then of course dependent queries don't make sense, but I also don't see the sense of disallowing batching while allowing pipelining.

Anton Kirilov

unread,
Jun 9, 2018, 10:10:21 AM6/9/18
to framework-benchmarks
Hi Shay,

As far as I can tell, the reactive Postgres client as used by Vert.x does both multiplexing and pipelining (using your terminology), that is HTTP requests and database connections are completely decoupled (and it doesn't matter whether a HTTP request results in one query or more), which is the reason why it has the best performance in all database tests, single query in particular, except fortunes. However, in the updates test it seems that those techniques provide just a slight edge. That's why requiring dependent queries for the multiple queries test would not be sufficient in general.

As for the fortunes test, I suspect that the reason for the difference in behaviour is the fact that the data volume associated with each query is an order of magnitude larger.

For the record, the h2o implementation (which I am the author of) also decouples HTTP requests and database connections completely, and given that one Web application worker thread may handle several database connections at the same time (this is the current configuration in fact), it is quite possible to end up in a situation where the queries from one HTTP request are spread over many connections to the database (in the multiple queries test).

The reactive Postgres client also uses another optimization - as you have mentioned, the protocol messages sent for each query are Parse/Bind/Describe/Execute/Sync. Now, I am not that familiar with the PostgreSQL protocol, but the Parse message is used only when creating a new prepared statement, right? Well, by default prepared statements have the same lifetime as the connection, which means that message is necessary only once per prepared statement (when establishing the connection). In order words, each query can result in Bind/Describe/Execute/Sync. However, I couldn't find any specification that required the Describe message, so we can reduce that further to only Bind/Execute/Sync, which is what the reactive Postgres client does.

> Now, I do admit that pipelining and batching aren't exactly identical. Specifically, in a scenario where a program listens
> to requests (of some sort) as they come in and executes database operations for each one, pipelining does have an
> obvious advantage - database requests can be sent (enqueued) whenever needed, where with batching you can pack
> several statement up-front, but then you're blocked and have to wait for the batch to complete before sending any more.
Yes, but with batching you can enqueue several incoming requests, and when the current batch completes, send all requests in the queue as a single batch, and so on - in fact, this is what I alluded to with my example of a clever driver above, and I expect that in the limit (i.e. many incoming requests) batching will behave quite similarly to pipelining.

Best wishes,
Tony

P.S. Concerning the fortunes test, previously I was asked to provide some additional verification for the h2o implementation. I posted a comment in the pull request where the request originated, but there were no replies, so I don't know if anyone noticed it.

Shay Rojansky

unread,
Jun 9, 2018, 11:58:43 AM6/9/18
to Anton Kirilov, framework-benchmarks
Hi Anton, thanks for the details! My answers below.

As far as I can tell, the reactive Postgres client as used by Vert.x does both multiplexing and pipelining (using your terminology), that is HTTP requests and database connections are completely decoupled (and it doesn't matter whether a HTTP request results in one query or more), which is the reason why it has the best performance in all database tests, single query in particular, except fortunes. However, in the updates test it seems that those techniques provide just a slight edge. That's why requiring dependent queries for the multiple queries test would not be sufficient in general. 

As for the fortunes test, I suspect that the reason for the difference in behaviour is the fact that the data volume associated with each query is an order of magnitude larger.

OK, thanks for clarifying that. I don't really see why pipelining/multiplexing should provide more of a benefit for selecting vs. for updating... It may be possible that in the case of updates the perf improvement is negligible since the update itself takes long...

For the record, the h2o implementation (which I am the author of) also decouples HTTP requests and database connections completely, and given that one Web application worker thread may handle several database connections at the same time (this is the current configuration in fact), it is quite possible to end up in a situation where the queries from one HTTP request are spread over many connections to the database (in the multiple queries test).

OK. I'll be looking at the possibility to evolve the Npgsql driver in this direction (https://github.com/npgsql/npgsql/issues/1982).

The reactive Postgres client also uses another optimization - as you have mentioned, the protocol messages sent for each query are Parse/Bind/Describe/Execute/Sync. Now, I am not that familiar with the PostgreSQL protocol, but the Parse message is used only when creating a new prepared statement, right? Well, by default prepared statements have the same lifetime as the connection, which means that message is necessary only once per prepared statement (when establishing the connection). In order words, each query can result in Bind/Describe/Execute/Sync. However, I couldn't find any specification that required the Describe message, so we can reduce that further to only Bind/Execute/Sync, which is what the reactive Postgres client does.

What you say corresponds to what I knows as well. The Parse/Bind/Describe/Execute/Sync chain corresponds to a single, unprepared statement. However, it is possible to first prepare a statement by sending Parse/Describe/Sync, which has PostgreSQL parse the SQL and creates a named server-side statement that can be used later. Then, a single prepared statement execution involves sending Bind/Execute/Sync, "binding" the previously-created server-side statement (to a set of parameters) and executing it.

The Describe message returns a description of the resultset (which columns, which types) which in general is required in order to properly parse the results. When preparing statements this "resultset description" can be cached client-side and reused later, so you need to Describe only once.

Npgsql also has prepared statements with the same lifetime as physical connections, so the above works in the benchmarks and is actually quite important for performance (in PostgreSQL using prepared statements gives a pretty significant boost).
 
> Now, I do admit that pipelining and batching aren't exactly identical. Specifically, in a scenario where a program listens
> to requests (of some sort) as they come in and executes database operations for each one, pipelining does have an
> obvious advantage - database requests can be sent (enqueued) whenever needed, where with batching you can pack
> several statement up-front, but then you're blocked and have to wait for the batch to complete before sending any more.
Yes, but with batching you can enqueue several incoming requests, and when the current batch completes, send all requests in the queue as a single batch, and so on - in fact, this is what I alluded to with my example of a clever driver above, and I expect that in the limit (i.e. many incoming requests) batching will behave quite similarly to pipelining.

I'm not so sure about that - when you're batching you're needlessly waiting for the current batch to complete before pushing the next batch to the network... How significant that can be is going to depend on a bunch of factors (e.g. batch execution time, network latency) but I suspect that in a truly saturated scenario it could be quite significant (on the other hand, you're #1 :)). In addition, pipelining allows application code to be simplified by simply sending statements as they come in, rather than having to batch them yourself.
 
P.S. Concerning the fortunes test, previously I was asked to provide some additional verification for the h2o implementation. I posted a comment in the pull request where the request originated, but there were no replies, so I don't know if anyone noticed it.

Very interesting read, thanks! There's definitely lots of good stuff there. I'm especially interested in the thread-local memory and database connection pools... Npgsql has a classical global connection pool, and in fact one of the main perf boosters for round 16 was to rewrite it to be completely lock-free. But it's still an application-global synchronization point which simply doesn't seem to exist in your case. The presentation linked to is also very interesting... I'm only the database driver guy (so the HTTP parts are less relevant) but still lots of food for thought, thanks!

Anton Kirilov

unread,
Jun 9, 2018, 5:41:24 PM6/9/18
to framework-benchmarks
Hi Shay,

I am glad that you found the information interesting. A large part of the reason to go for thread-local connection pools was that they simplified the implementation - no need for synchronization, obviously, but also no fairness issues (i.e. a thread starving for database connections because the other threads consume them all), no need to implement a notification mechanism for connection availability (by design the application uses only non-blocking operations, so it can't just wait for a database connection to become available - that's essentially blocking), and so on. Actually, I thought that the synchronization costs would be dwarfed by the time necessary to communicate with a remote database server, but if you saw gains in lock-free code, then I can only be glad for choosing such a design. On the other hand, a global connection pool allows you to easily fine tune the number of database connections (in my case I can only use multiples of 28 in the Citrine environment), so I wouldn't say that one approach is better in all cases.

In addition, pipelining allows application code to be simplified by simply sending statements as they come in, rather than having to batch them yourself.
I agree with this 100%, especially if you consider the failure cases in which the database driver will have to replay the queries starting from the middle of the batch, which can get tricky implementation-wise.

Best wishes,
Tony

Brian Hauer

unread,
Jun 10, 2018, 10:49:07 AM6/10/18
to framework-benchmarks
In addition, pipelining allows application code to be simplified by simply sending statements as they come in, rather than having to batch them yourself.
I agree with this 100%, especially if you consider the failure cases in which the database driver will have to replay the queries starting from the middle of the batch, which can get tricky implementation-wise.

This conversation is fascinating and those of us who are not in the thick of it are learning a lot from you guys!  Or at least I am.  So thank you!

For what it's worth, the point above played a role in our decision to permit this functionality.  The feature seems to be the sort of enhancement to the protocol and drivers that simultaneously improves application developer ergonomics and performance.  We had to debate what precisely was the "spirit" of our test, and we ultimately decided that the spirit was that from the application developer's point of view, N queries were being executed and there was no need to manually batch them and deal with a batch failure.  Obviously, this was a gray area but the balance was tipped by our feeling this was precisely the sort of clever improvement we want to see in frameworks and infrastructure software.

Julien Viet

unread,
Jun 10, 2018, 11:38:56 AM6/10/18
to Anton Kirilov, framework-benchmarks
Hi,

the number of connection is interesting.

few connections are greater from a TCP standpoint but it implies synchronisation on the library side and I think the trade-off using a connection per thread or using fewer connection is interesting to study and greatly depend on the use case.

you can see in H20 benchmark (which has great performance in Fortunes - ranked #1) that they use 3 connections for physical and 4 for cloud : https://github.com/TechEmpower/FrameworkBenchmarks/pull/3751 which is even fewer.

Julien


Shay Rojansky

unread,
Jun 22, 2018, 7:00:40 AM6/22/18
to framework-benchmarks
Sorry Brian, I think I somehow missed the below message you sent two weeks ago.


On Sunday, June 10, 2018 at 3:49:07 PM UTC+1, Brian Hauer wrote:
In addition, pipelining allows application code to be simplified by simply sending statements as they come in, rather than having to batch them yourself.
I agree with this 100%, especially if you consider the failure cases in which the database driver will have to replay the queries starting from the middle of the batch, which can get tricky implementation-wise.

This conversation is fascinating and those of us who are not in the thick of it are learning a lot from you guys!  Or at least I am.  So thank you!

It's my pleasure, I'm also learning a lot from seeing other drivers perform so well and understanding the techniques they use (Npgsql definitely would not be in the optimized state it is currently without TechEmpower). In fact I'll probably take a look at what it would mean to provide pipelining/multiplexing in Npgsql (see https://github.com/npgsql/npgsql/issues/1982).

For what it's worth, the point above played a role in our decision to permit this functionality.  The feature seems to be the sort of enhancement to the protocol and drivers that simultaneously improves application developer ergonomics and performance.  We had to debate what precisely was the "spirit" of our test, and we ultimately decided that the spirit was that from the application developer's point of view, N queries were being executed and there was no need to manually batch them and deal with a batch failure.  Obviously, this was a gray area but the balance was tipped by our feeling this was precisely the sort of clever improvement we want to see in frameworks and infrastructure software.

I understand. Again, I think it's worth reconsidering this distinction: in my mind whether application developers need to manually batch statements or not isn't very relevant in the context of these performance benchmarks; it's more an API detail than anything else really. This decision seems to unnecessarily penalize drivers which do support batching but not pipelining, although any real-world usage would obviously take advantage of batching to arrive at identical perf.
 
Reply all
Reply to author
Forward
0 new messages