[serf] r2360 committed - Don't report requests as 'died/cancelled' to the application if our...

987 views
Skip to first unread message

se...@googlecode.com

unread,
May 28, 2014, 4:48:53 AM5/28/14
to serf...@googlegroups.com
Revision: 2360
Author: b.huijben
Date: Wed May 28 08:48:40 2014 UTC
Log: Don't report requests as 'died/cancelled' to the application if
our
caller will requeue the request under the assumption that we hit the
max requests per connection limit.

* trunk/outgoing.c
(APR_STATUS_IMPLIES_HANGUP): New macro.
(handle_response): Don't report requests as cancelled if the status
implies a hangup.
(read_from_connection): Add comment. Use APR_STATUS_IMPLIES_HANGUP.

http://code.google.com/p/serf/source/detail?r=2360

Modified:
/trunk/outgoing.c

=======================================
--- /trunk/outgoing.c Fri Mar 7 17:30:30 2014 UTC
+++ /trunk/outgoing.c Wed May 28 08:48:40 2014 UTC
@@ -24,6 +24,13 @@

#include "serf_private.h"

+/* Some implementations -like Windows- report some hangup errors via a
+ different event than the specific HUP event. */
+#define APR_STATUS_IMPLIES_HANGUP(status) \
+ (APR_STATUS_IS_ECONNRESET(status) || \
+ APR_STATUS_IS_ECONNABORTED(status) || \
+ status == SERF_ERROR_REQUEST_LOST)
+
/* cleanup for sockets */
static apr_status_t clean_skt(void *data)
{
@@ -973,8 +980,12 @@
request->resp_bkt,
pool);

- if (SERF_BUCKET_READ_ERROR(status)) {
- /* Report the request as 'died'/'cancelled' to the application
*/
+ if (SERF_BUCKET_READ_ERROR(status)
+ && !APR_STATUS_IMPLIES_HANGUP(status) {
+
+ /* Report the request as 'died'/'cancelled' to the application,
+ but only if our caller doesn't handle this status
specifically,
+ with something like a retry */
(void)(*request->handler)(request,
NULL,
request->handler_baton,
@@ -1187,10 +1198,12 @@

/* Some systems will not generate a HUP poll event so we have to
* handle the ECONNRESET issue and ECONNABORT here.
+ *
+ * ### Update similar code in handle_response() if this condition
+ * changes, or requests will get lost and/or accidentally
reported
+ * cancelled.
*/
- if (APR_STATUS_IS_ECONNRESET(status) ||
- APR_STATUS_IS_ECONNABORTED(status) ||
- status == SERF_ERROR_REQUEST_LOST) {
+ if (APR_STATUS_IMPLIES_HANGUP(status)) {
/* If the connection had ever been good, be optimistic & try
again.
* If it has never tried again (incl. a retry), fail.
*/

Lieven Govaerts

unread,
May 28, 2014, 5:37:06 AM5/28/14
to serf...@googlegroups.com
Hi Bert,

for the record, I have decided to not back port any changes to the
1.3.x branch anymore without test that demonstrates the issue it
solves, where technically possible (so everything except the
NTLM/GSSAPI code).

So, I'd really (really really) appreciate it if you'd take a bit of
time to learn how serf's unit testing framework works, and add a test
for the original issue that triggered all these changes. I'm available
in the coming days to assist you where needed, I'll also commit a mock
socket bucket tomorrow that allows us to simulate ECONNABORTED errors
on non-Windows platforms.

regards,

Lieven

Bert Huijben

unread,
May 28, 2014, 5:49:46 AM5/28/14
to serf...@googlegroups.com


> -----Original Message-----
> From: serf...@googlegroups.com [mailto:serf...@googlegroups.com]
> On Behalf Of Lieven Govaerts
> Sent: woensdag 28 mei 2014 11:37
> To: serf...@googlegroups.com
> Subject: Re: [serf-dev] [serf] r2360 committed - Don't report requests as
> 'died/cancelled' to the application if our...
>
> Hi Bert,
>
> for the record, I have decided to not back port any changes to the
> 1.3.x branch anymore without test that demonstrates the issue it
> solves, where technically possible (so everything except the
> NTLM/GSSAPI code).
>
> So, I'd really (really really) appreciate it if you'd take a bit of
> time to learn how serf's unit testing framework works, and add a test
> for the original issue that triggered all these changes. I'm available
> in the coming days to assist you where needed, I'll also commit a mock
> socket bucket tomorrow that allows us to simulate ECONNABORTED errors
> on non-Windows platforms.

This problem is easily reproducable by just running
$ svn co https://svn0.us-west.freebsd.org/base/releng/8.4/contrib crash1 -q

on a Windows machine.
(It takes quite some time, but it invoked the problem every time for me)

See reports on TortoiseSVN and users@ for the crashes generated by Subversion 1.8.9/Serf 1.3.5.
(I'm glad I didn't push a new AnkhSVN version with Serf 1.3.5 yet)

The problem is just that the reports don't produce a useful stacktrace as in most cases the problem triggers releasing memory of the failed request, that will be used only after a new serf connection receives data.

I don't think you are helping the users by delaying a fix (or a revert of the fix in 1.3.5).


By adding more and more developer requirements (First scons -essentially dead project- which doesn't even support Windows development tools released a year ago, now yet a new test framework) you are not making it easier for developers on other platforms to join the serf development.

Bert
> --
> You received this message because you are subscribed to the Google Groups
> "Serf Development List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to serf-dev+u...@googlegroups.com.
> To post to this group, send email to serf...@googlegroups.com.
> Visit this group at http://groups.google.com/group/serf-dev.
> For more options, visit https://groups.google.com/d/optout.

Lieven Govaerts

unread,
May 28, 2014, 6:11:57 AM5/28/14
to serf...@googlegroups.com
We should have delayed the previous fix and 1.3.5 until we had a
decent test case (like I asked you btw), adding more untested fixes is
not the solution.
I'd rather release a serf 1.3.6 that's serf 1.3.5 minus the
problematic patch(es).

>
> By adding more and more developer requirements (First scons -essentially dead project- which doesn't even support Windows development tools released a year ago, now yet a new test framework) you are not making it easier for developers on other platforms to join the serf development.
>

Not sure why you want to discuss the need to write a test for an
issue, it's not that this hasn't been a best practice in sw
development for the last 15 years.

Anyway, if you don't have the/want to make time to write a test for
these issues, then this has to wait until I have the time to write
one.

Lieven

Lieven Govaerts

unread,
Jun 2, 2014, 3:19:32 PM6/2/14
to serf...@googlegroups.com
Hi Bert,


I had a look at the problem you're trying to solve here (thx for the
private feedback btw).

(for the uninformed: Bert is trying to solve the Subversion 1.8.x
crashes caused by a change introduced in serf 1.3.5.)

I'm seeing the following situation:

A. Serf 0.3.1 and older
------------------------------
Before serf had its own authn implementation, it passed every (part of
an) incoming response to the application by calling the
handle_response callback.
If there was an error while reading the response, the application
always knew about it.

There are basically two ways serf uses the handle_respone callback:
1. there's more data incoming: serf passes the response bucket created
in the accept_response callback in its call to handle_response.
2. the request got cancelled (during a full reset of the connection):
serf calls handle_response with a NULL response and removes the
request from the queue.

B. Serf 0.4 and newer
-----------------------------
Since we implemented an authn layer, serf itself will read at least
the headers of each response (*). If an error happens while serf is
reading the response status line and headers, we have a problem. The
problem is that serf needs to inform the application about these
errors, just like it did before serf had the authn layer, or currently
when authn is disabled.
(*) authn layer is only activated when the application has set the
credentials callback.

You have been trying to implement that notification from serf to the
application, and I totally agree it's a promise serf should keep.


We should stick to the two ways of calling handle_response as
described above: either pass a response bucket, or pass a NULL
response and remove the request from the queue. Your r2258/r2319 tries
to introduce a third way, by calling handle_response with a NULL
response but then not removing the request from the queue. I now think
that was the wrong approach, and therefore I also think that this
r2360 is wrong.

There are 5 places where an error can happen when serf is reading the response:
1. While reading the response status line.
2. While reading the response headers.
3. While discarding the body of a 401/407 response.
4a. In case of 401/407 status code, when trying to find a matching authn scheme
b. When validation authentication with one of serf's authn scheme handlers
c. As a result of a call to the application's credentials callback
5. In case of a 2xx status code, with authn headers (ie. Digest or Negotiate)

I propose the following solution:
For 1, 2 and 3: just call the application's handle_response callback
and pass the response. The application will read from the response and
get the same error status code as serf already read before, but
there's nothing wrong with that. At least this way the application
knows there's an error.

For 4a, if serf doesn't have a matching authentication scheme
implementation, we could just call the application with the 401/407
response and let it try. This enables an application to implement
authn scheme's not implemented by serf.

5 is trickier. When an error happens when parsing the
Authentication-Info header from a 2xx response, we should treat the
response body as untrusted. In this scenario serf must not pass the
response to the application, because it has now way to tell the
application that this body should not be trusted.
I think we can solve this one by discarding the response body, and
then wrapping the response in a new bucket that always returns
SERF_ERROR_AUTHN_FAILED when one of the read or peek functions is
called. We then call the application with this wrapped response
bucket, and basically fallback in the same scenario as 1,2 and 3.

4b and 4c can IMO implemented just like 5. If there's an error during
e.g. Negotiate validation or from the authn callback, we can pass this
as an error to the application and let it do any error
handling/cleanup tasks needed.


In summary: serf will always call the application with a response, but
where an error occurred during authn handling, the response will be
nothing more than a means to provide the application with an error
status.


Does this makes sense to you, from the pov of Subversion/ra_serf also?

Now, I'm not convinced we should do all this in the 1.3.x line. I'm
inclined to release 1.3.6 as (1.3.5 minus r2319) and then implement
the above in serf 1.4.0, for which my plan is to release it before svn
1.9.

Feedback welcome.

Lieven

On Wed, May 28, 2014 at 10:48 AM, <se...@googlecode.com> wrote:

Bert Huijben

unread,
Jun 3, 2014, 4:24:37 PM6/3/14
to serf...@googlegroups.com
Hi,

This plan is very good, but tere might be a few problems when we go this way.

  1. Some SSL scenarios may result in errors in a few more places. (Probably easy to solve)
  2. Subversion used to be very bad in checking http result codes and I wouldn't be surprised if other clients have the same problem. (Subversion trunk/1.9 is strict)
    1. In many cases <= 1.8.x will just parse the error output as xml and will start returning parser errors if we just return the 40X reports. (Not to hard to fix for new versions, but newer serf versions might break older Subversion releases).

+1 on reverting the old patch for 1.3.6.

But if we want to release 1.4.X for Subversion 1.9, we should have at least an alpha or beta release soon or it will slow down the Subversion 1.9 release process.

Bert



Sent from Windows Mail

Alexander Stohr

unread,
Jun 13, 2014, 3:57:11 AM6/13/14
to serf...@googlegroups.com, se...@googlecode.com
it got reported by people that TortoiseSVN is affected by this problem.
when checking out
http://code.google.com/p/tortoisesvn/issues/detail?id=647

Lieven Govaerts

unread,
Jun 13, 2014, 8:46:50 AM6/13/14
to serf...@googlegroups.com
Yes, that's why we released serf 1.3.6 last Monday to revert the
original problematic change from the 1.3.x line.
A new TortoiseSVN release with serf 1.3.6 should resolve the issue.

regards,

Lieven
Reply all
Reply to author
Forward
0 new messages