Heap Use After Free

Zachary Grafton

未读，

2016年3月15日 09:39:182016/3/15

收件人 onion-dev

David,

While running the onion test suite after compiling with clang's address sanitizer, I came across another bug in the poller. It appears when running test/01-internal/06-onion. I've attached the log, but if you need more info, let me know. I'm thinking it could be related to some of the data race conditions as well.

Thanks,

Zack

log

David Moreno Montero

未读，

2016年3月15日 13:47:342016/3/15

收件人 Zachary Grafton、onion-dev

Hi,

I'm testing with valgrind and also shows a use after free because of a race:

1. Thread A: Start handling slot
2. Thread B: Check timeouts, mark for removal
3. Thread C: Picks up the removal, removes it
4. Thread A: Uses it.

The problem seems to be that the mark for removal (close) is launched even if the epoll is marked EPOLLONESHOT, maybe because it is another kind of event.

Still working on it.

David

--
You received this message because you are subscribed to the Google Groups "onion-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onion-dev+...@coralbits.com.
To post to this group, send email to onio...@coralbits.com.
Visit this group at https://groups.google.com/a/coralbits.com/group/onion-dev/.
For more options, visit https://groups.google.com/a/coralbits.com/d/optout.

Zachary Grafton

未读，

2016年3月15日 13:58:032016/3/15

收件人 onion-dev、zachary...@gmail.com

Cool(kind of). So it was related. The thread sanitizer run of clang is showing similar results. I have a run saved somewhere that is showing the same behavior, if you need it let me know. Good luck.

Zack

David Moreno Montero

未读，

2016年3月15日 19:40:212016/3/15

收件人 Zachary Grafton、onion-dev

Hi,

after trying harder, and some restructuring, I think finally the problems are over.

As described in the patch:

    Slots now use a id system for locating through epoll

    There were problems of using epoll data as is or fd as both
    can imply some data reuse after free, for example:

    1. Thread A: Get epoll signal, get fd or data ptr
    2. Thread B: Timeout, mark for deletion
    3. Thread C: Delete
    4. Thread D: Reuse fd
    5. Thread A: Add data is expired, no longer fd valid, nor data ptr.

    Using a secuential id as ev->data.u64, and a resizable array, its possible
    to look always by this id, and if its not available, just dont return the
    el. Also by marking the id as in use, no two threads will try to use the
    slot at the same time.

    Finally, removing by id is necessary as the fd can be reused, so searching
    for the slot by fd could return an old or in process slot.

Can you try the clang address sanitizer with the fixtimeouts branch?

Good night,

David.

--

David Moreno Montero

dmo...@coralbits.com
+34 658 18 77 17

http://www.coralbits.com

Solar Designer

未读，

2016年3月15日 19:54:052016/3/15

收件人 onion-dev、Rich Felker

Hi David,

On Wed, Mar 16, 2016 at 12:40:01AM +0100, David Moreno Montero wrote:
> after trying harder, and some restructuring, I think finally the problems
> are over.
>
> As described in the patch:
>
> Slots now use a id system for locating through epoll
>
> There were problems of using epoll data as is or fd as both
> can imply some data reuse after free, for example:
>
> 1. Thread A: Get epoll signal, get fd or data ptr
> 2. Thread B: Timeout, mark for deletion
> 3. Thread C: Delete
> 4. Thread D: Reuse fd
> 5. Thread A: Add data is expired, no longer fd valid, nor data ptr.

I was under impression that we had expected "3. Thread C: Delete"
wouldn't be happening until after "5. Thread A: ..." because of the
one-shot nature of the epoll instance, since the deletion would be
triggered by an epoll event too, and Thread A wouldn't reset the epoll
instance until it's done with the first event.

Apparently, this didn't appear to be the case in your testing?

Anyhow, relying on epoll properties like this, even if guaranteed by its
current spec, for avoiding data races is like playing with fire - the
risk impact is unaffordable. So I agree that a more robust solution was
needed (or the impact reduced).

> Using a secuential id as ev->data.u64, and a resizable array, its possible
> to look always by this id, and if its not available, just dont return the
> el. Also by marking the id as in use, no two threads will try to use the
> slot at the same time.
>
> Finally, removing by id is necessary as the fd can be reused, so searching
> for the slot by fd could return an old or in process slot.

Unique IDs was an approach I started thinking of as well. I haven't yet
looked at your implementation, but I intend to.

Thank you, David and Zack, for staying on top of these issues!

Alexander

Zachary Grafton

未读，

2016年3月16日 07:12:012016/3/16

收件人 onion-dev、zachary...@gmail.com

David,

The address sanitizer run on the hello example looks good now. Just for good measure I ran the thread sanitizer against it as well. I've attached the log from it. I think we're definitely a lot closer than we were before. The only things I'm seeing on the hello example are warnings about errno being potentially trashed by the SIGINT/SIGTERM handler and a missed locking of the mutex on the poller, but that was fixed in one of my commits from the data_races branch which you pulled in. I'll run the tests through the sanitizers and see what they pick up there and I'll let you know what I find.

Zack

tsan_output

Solar Designer

未读，

2016年3月16日 16:34:372016/3/16

收件人 onion-dev、Rich Felker

Hi David,

On Wed, Mar 16, 2016 at 12:40:01AM +0100, David Moreno Montero wrote:

> Using a secuential id as ev->data.u64, and a resizable array, its possible
> to look always by this id, and if its not available, just dont return the
> el.

I am now looking at this commit in the fixtimeouts branch. Frankly, it
doesn't make me feel confident yet.

There are still places where the code operates on shared data without
locking, right? Do you make assumptions about atomicity of
reads/writes? Perhaps at least for ev->data.u64? Do you make
assumptions about memory ordering? If so, then even on x86 you'll need
volatile and compiler-level barriers (so that the compiler will know not
to reorder operations in a manner that will violate your assumptions),
and on many other archs you'll need hardware memory barriers.

The realloc() thing is probably incompatible with the lockless portion
of the code. I'd rather preallocate for RLIMIT_NOFILE and not ever
adjust the allocation. (BTW, to make this code robust, you need to
check for potential integer overflow on calculating the allocation size.
And make sure you use unsigned integers if you post-check rather than
pre-check for it, since signed overflow would be UB already.)

> Also by marking the id as in use, no two threads will try to use the
> slot at the same time.

Where is this implemented?

> Finally, removing by id is necessary as the fd can be reused, so searching
> for the slot by fd could return an old or in process slot.

Sure, but your code would be able to check the id right away - wouldn't
that be sufficient? And you wouldn't need to "search" by fd (you do
when looking up by id), because you'd be able to have the array indexed
by fd.

Alexander

Solar Designer

未读，

2016年3月16日 17:16:392016/3/16

收件人 onion-dev、Rich Felker

David,

On Wed, Mar 16, 2016 at 11:34:33PM +0300, Solar Designer wrote:
> On Wed, Mar 16, 2016 at 12:40:01AM +0100, David Moreno Montero wrote:
> > Using a secuential id as ev->data.u64, and a resizable array, its possible
> > to look always by this id, and if its not available, just dont return the
> > el.
>
> I am now looking at this commit in the fixtimeouts branch. Frankly, it
> doesn't make me feel confident yet.

Testing it didn't go any better, either.

First, it doesn't build with RHEL6's default gcc anymore. With default
CFLAGS, I am getting errors like:

src/onion/poller.c:241: error: 'for' loop initial declarations are only allowed in C99 mode

With -std=c99 added, I am getting errors like:

src/onion/url.c:77: error: 'onion_url_data' has no member named 'str'

What works is building with newer gcc (such as 4.9.1 from devtoolset-3)
_and_ also adding -std=c99.

While requiring C99 may be fine per se, I suggest that you maintain the
ability to build with RHEL6's gcc 4.4.7.

Now to testing the new build. It survived a manual web page request
with "elinks", but failed as soon as I started curl-loader:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd13be700 (LWP 11141)]
0x0000000000418878 in onion_poller_poll ()
(gdb) bt
#0 0x0000000000418878 in onion_poller_poll ()
#1 0x000000000040feac in onion_poller_poll_start ()
#2 0x000000000041be34 in start_thread ()
#3 0x0000000000496a49 in clone ()
(gdb) disass
Dump of assembler code for function onion_poller_poll:
[...]
0x000000000041884d <+61>: callq 0x41d7e0 <pthread_mutex_lock>
0x0000000000418852 <+66>: mov 0x4c(%rbx),%edx
0x0000000000418855 <+69>: mov 0x40(%rbx),%rax
0x0000000000418859 <+73>: test %edx,%edx
0x000000000041885b <+75>: jle 0x418a77 <onion_poller_poll+615>
0x0000000000418861 <+81>: sub $0x1,%edx
0x0000000000418864 <+84>: lea 0x8(%rax,%rdx,8),%rdi
0x0000000000418869 <+89>: mov $0x7fffffff,%edx
0x000000000041886e <+94>: xchg %ax,%ax
0x0000000000418870 <+96>: mov (%rax),%rcx
0x0000000000418873 <+99>: test %rcx,%rcx
0x0000000000418876 <+102>: je 0x418885 <onion_poller_poll+117>
=> 0x0000000000418878 <+104>: mov 0x40(%rcx),%rcx
0x000000000041887c <+108>: movslq %edx,%rsi
0x000000000041887f <+111>: cmp %rsi,%rcx
0x0000000000418882 <+114>: cmovl %ecx,%edx
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) p/x $rcx
$1 = 0x301
(gdb) p/d $rcx
$2 = 769

I haven't tried investigating this yet. Need to remove or bypass
realloc() first.

Alexander

Solar Designer

未读，

2016年3月16日 17:35:182016/3/16

收件人 onion-dev、Rich Felker

On Thu, Mar 17, 2016 at 12:16:32AM +0300, Solar Designer wrote:
> I haven't tried investigating this yet. Need to remove or bypass
> realloc() first.

With realloc() avoided by preallocating enough slots right away (5000
instead of 128; my RLIMIT_NOFILE is 4096), it is running fine so far,
but the scalability is much worse than before:

Previously, 300 "clients" in one instance of curl-loader would result in
90% CPU on the server and 10.5k req/s, and 2x300 in two curl-loaders at
once would result in 100% CPU on the server and 11.5k req/s.

Now 300 is similar (90% and 10k+), but 2x300 is 92% and ~10.6k. What's
worse, the first time I tried this, with some other tests before it, the
CPU usage dropped to ~50% and request rate to ~7k the moment I started
the second instance of curl-loader. Restarting the onion-based service
helped mostly regain the speed, but not to the level of the version from
prior to the fixtimeouts patch.

(The maximum request rate of 11.5k at 100% CPU is correct for this
service, due to the computation it is doing per request. It's not an
onion thing. But it introduces delays that expose onion bugs and
limitations, as compared to serving static content. Also, I am listing
the CPU time percentages relative to the maximum for this server, total
across all CPUs.)

Alexander

Solar Designer

未读，

2016年3月16日 17:45:072016/3/16

收件人 onion-dev、Rich Felker

On Thu, Mar 17, 2016 at 12:34:14AM +0300, Solar Designer wrote:
> With realloc() avoided by preallocating enough slots right away (5000
> instead of 128; my RLIMIT_NOFILE is 4096), it is running fine so far,
> but the scalability is much worse than before:
>
> Previously, 300 "clients" in one instance of curl-loader would result in
> 90% CPU on the server and 10.5k req/s, and 2x300 in two curl-loaders at
> once would result in 100% CPU on the server and 11.5k req/s.
>
> Now 300 is similar (90% and 10k+), but 2x300 is 92% and ~10.6k. What's
> worse, the first time I tried this, with some other tests before it, the
> CPU usage dropped to ~50% and request rate to ~7k the moment I started
> the second instance of curl-loader. Restarting the onion-based service
> helped mostly regain the speed, but not to the level of the version from
> prior to the fixtimeouts patch.

By restarting the service again, I just triggered the performance
reduction with client count increase again - this time it's 2x300 doing
only 80% CPU and 9.8k req/s. This variability in performance and
scalability across service restarts wasn't appearing before the
fixtimeouts patch.

Alexander

David Moreno Montero

未读，

2016年3月17日 05:07:362016/3/17

收件人 Solar Designer、onion-dev、Rich Felker

I, I fixed the realloc problem.

I will look into the performance problem, but quite probably is some lock contention.

About using the id as mark for "do not use" and memory barriers, the change and check of id is always inside mutex (just pushed two places were not). If you detect any place where its not locked, let me know.

Regards,

David.

Alexander

--
You received this message because you are subscribed to the Google Groups "onion-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to onion-dev+...@coralbits.com.
To post to this group, send email to onio...@coralbits.com.
Visit this group at https://groups.google.com/a/coralbits.com/group/onion-dev/.
For more options, visit https://groups.google.com/a/coralbits.com/d/optout.

Solar Designer

未读，

2016年3月19日 18:33:012016/3/19

收件人 David Moreno Montero、onion-dev

David,

On Tue, Mar 15, 2016 at 06:47:33PM +0100, David Moreno Montero wrote:
> I'm testing with valgrind and also shows a use after free because of a race:
>
> 1. Thread A: Start handling slot
> 2. Thread B: Check timeouts, mark for removal
> 3. Thread C: Picks up the removal, removes it
> 4. Thread A: Uses it.
>
> The problem seems to be that the mark for removal (close) is launched even
> if the epoll is marked EPOLLONESHOT, maybe because it is another kind of
> event.

Your code as currently committed to the default branch does not appear
to fully implement the approach suggested by Rich yet (although it gets
close), so I think we can't arrive at the above conclusion from testing
it yet.

Specifically, these two lines:

// closed, do not even try to call it.
cur->f=NULL;
cur->data=NULL;

appear to violate the approach, because they're performed on onion's own
timeout detection without necessarily having received any epoll event
for this slot. Thus, these NULL's may be assigned to the fields while
another thread is proceeding to call f(), after having already received
an event for this slot.

You say you're seeing a use after free rather than a NULL pointer
dereference, but a possible explanation is that a third thread sees
!el->f and proceeds to n = -1 and to onion_poller_remove(), all without
having received any epoll event.

Simply removing these two lines might fix it, or maybe there are more
issues like this. Maybe try it?

Also related at least to my slots indexed by fd approach:

Where is the fd closed? I tried to locate the close(), but it's tricky -
there are many of them and I am not entirely sure which one(s) are
right. There doesn't appear to be one that is timed correctly for this
approach: it must be after the slot is removed from the list, not before.
Is your code easy to adjust to make this so?

I think the slots-by-fd approach is both more robust and faster than the
malloc/free approach - but only once the above requirement is met.

Alexander

Solar Designer

未读，

2016年3月19日 19:12:472016/3/19

收件人 David Moreno Montero、onion-dev

David,

On Sun, Mar 20, 2016 at 01:32:58AM +0300, Solar Designer wrote:
> Your code as currently committed to the default branch does not appear
> to fully implement the approach suggested by Rich yet (although it gets
> close), so I think we can't arrive at the above conclusion from testing
> it yet.
>
> Specifically, these two lines:
>
> // closed, do not even try to call it.
> cur->f=NULL;
> cur->data=NULL;
>
> appear to violate the approach, because they're performed on onion's own
> timeout detection without necessarily having received any epoll event
> for this slot. Thus, these NULL's may be assigned to the fields while
> another thread is proceeding to call f(), after having already received
> an event for this slot.
>
> You say you're seeing a use after free rather than a NULL pointer
> dereference, but a possible explanation is that a third thread sees
> !el->f and proceeds to n = -1 and to onion_poller_remove(), all without
> having received any epoll event.
>
> Simply removing these two lines might fix it, or maybe there are more
> issues like this. Maybe try it?

I am trying this now, but I also removed a few lines prior to the above,
replacing them with explicit shutdown(cur->fd, SHUT_RDWR) like this:

#if 0
if (cur->shutdown){
cur->shutdown(cur->shutdown_data);
onion_poller_slot_set_shutdown(cur,NULL,NULL);

}
// closed, do not even try to call it.
cur->f=NULL;
cur->data=NULL;

#else
shutdown(cur->fd, SHUT_RDWR);
#endif

This is passing my testing so far (still running as I type this),
including along with fd-indexed slots, whereas your #if 0'ed code failed
those tests quickly. Performance is good, too.

> Also related at least to my slots indexed by fd approach:
>
> Where is the fd closed? I tried to locate the close(), but it's tricky -
> there are many of them and I am not entirely sure which one(s) are
> right. There doesn't appear to be one that is timed correctly for this
> approach: it must be after the slot is removed from the list, not before.
> Is your code easy to adjust to make this so?

It appears that cur->shutdown(cur->shutdown_data) involved
close(cur->fd), right? I found that onion_listen_point_accept() uses:

onion_poller_slot_set_shutdown(slot, (void*)onion_request_free, req);

and onion_request_free does:

if (req->connection.listen_point!=NULL && req->connection.listen_point->close)
req->connection.listen_point->close(req);

but its references to listen_point made me think this was about the
listen()ing fd rather than about accept()ed fd's - but perhaps they are
about the latter (as well)?

I am still concerned that there are many other close() calls, which I
think may be invoked on per-request transient errors, resulting in the
fd getting closed earlier than we require to avoid races. Am I correct?
Would you refactor the code to consistently postpone those close() calls
(pun unintended, but appropriate)?

Alexander

Solar Designer

未读，

2016年3月19日 22:52:362016/3/19

收件人 David Moreno Montero、onion-dev

On Sun, Mar 20, 2016 at 02:12:43AM +0300, Solar Designer wrote:
> #if 0
> if (cur->shutdown){
> cur->shutdown(cur->shutdown_data);
> onion_poller_slot_set_shutdown(cur,NULL,NULL);
> }
> // closed, do not even try to call it.
> cur->f=NULL;
> cur->data=NULL;
> #else
> shutdown(cur->fd, SHUT_RDWR);
> #endif

SHUT_RDWR causes extra SIGPIPE's (which onion normally ignores, but
still) and appears unnecessary. SHUT_RD appears to be sufficient.

> This is passing my testing so far (still running as I type this),
> including along with fd-indexed slots, whereas your #if 0'ed code failed
> those tests quickly. Performance is good, too.

This is still the case.

> I am still concerned that there are many other close() calls, which I
> think may be invoked on per-request transient errors, resulting in the
> fd getting closed earlier than we require to avoid races. Am I correct?
> Would you refactor the code to consistently postpone those close() calls
> (pun unintended, but appropriate)?

This concern still applies.

Attached is the patch I am currently using. In it, you can see that I
also optimized timeout handling so that the mutex is only acquired and
the list traversed once per second per thread, rather than once per
event per thread. This is an optimization for busy servers, but it
overhead for mostly idle servers. You might want to do it smarter, but
for me it works well - I got measurable speedup under load that the
server is actually expected to see, and the thread count required to
fully utilize the server's capacity is now lower (down from 40+ to 34).

Alexander

onion.diff

Solar Designer

未读，

2016年3月19日 22:58:462016/3/19

收件人 David Moreno Montero、onion-dev

On Sun, Mar 20, 2016 at 05:52:32AM +0300, Solar Designer wrote:
> Attached is the patch I am currently using. In it, you can see that I
> also optimized timeout handling so that the mutex is only acquired and
> the list traversed once per second per thread, rather than once per
> event per thread.

Correction: before this patch, it was twice per event per thread. One
time in onion_poller_get_next_timeout() called from onion_poller_poll()
and the other in onion_poller_poll() itself.

With the patch, I am still calling time() per event per thread, which is
nasty, but not as bad as the two mutex acquisitions/releases and list
traversals.

Alexander

Solar Designer

未读，

2016年3月20日 14:51:502016/3/20

收件人 David Moreno Montero、onion-dev

David,

I think this change:

int onion_poller_remove(onion_poller *poller, int fd){
if (epoll_ctl(poller->fd, EPOLL_CTL_DEL, fd, NULL) < 0){
- ONION_ERROR("Error remove descriptor to listen to. %s", strerror
+ if (errno!=ENOENT && errno!=EBADF){
+ ONION_ERROR("Error remove descriptor to listen to. %s (%
+ }
}

in:

commit 7fb66d1df141fc36c7694954e4fc8cd78c9f3884
Author: David Moreno <dmo...@coralbits.com>
Date: Tue Mar 15 16:34:39 2016 +0100

Do not show expected errors on poller, remove verbose debug at listen_point

is wrong in that those errors are only expected because of the bug
elsewhere in the poller: namely, that cur->shutdown(cur->shutdown_data)
closes the fd (as far as I currently understand from the observed
behavior). Replacing cur->shutdown(cur->shutdown_data) on timeout with
shutdown(cur->fd, SHUT_RD) as in the patch I posted should solve this,
and you will need to remove the above change then (so as not to hide
more bugs of this sort, if any remain or are reintroduced).

My wording above was still wrong: I should have said just "per event"
rather than "per event per thread", because each event is only received
by one thread.

Alexander

David Moreno Montero

未读，

2016年3月20日 14:53:502016/3/20

收件人 Solar Designer、onion-dev

Hi,

I will apply this patch about not checking the timeouts so aggressively, I saw this performance opportunity (or bug, depends on you point of view), but decided to keep it as it was helping me expose the memory free before time on the 06-onion/ 06-timeouts test. As it looks resolved, I think its good idea to fix this performance opportunity.

Also I will fix the use of SHUT_RD. About the listen_point->close, it is about the closing of listened clients. Maybe it needs a better name. Close of the listen fd is done at onion_listen_point_listen_stop, that uses a predefined one or a custom one (https). Also it needs a shutdown and a close (I'm not sure if you wanted to leave only the shutdown). Also it needs to be via the el->shutdown callback, because other listen points may need extra code to close the fd, as a proper close in SSL.

Also I think everybody agrees that the new fixtimeouts poller is better and than the old one, so unless there is any objection I will merge later today into master.

Regards,

David.

David Moreno Montero

未读，

2016年3月20日 15:11:322016/3/20

收件人 Solar Designer、onion-dev

Hi,

the socket must be close as well as shutdown, specially if we only shutdown it for RD. They do different things, and I dont remember properly right now, but doing only shutdown keeps the fd open some more time. I prefer to do both, but will check it out better.

http://stackoverflow.com/questions/4160347/close-vs-shutdown-socket#4160356 gives some answers, and I think that it is that with shutdown we still can read something, and keeps the connection at kernel level open some more time.

About the ONION_ERROR and the EBADF and ENOENT check I think it must stay. For example sometimes its the client that closes the connection at the same time, and this catches that situation, but does not require to show error.

David.

Solar Designer

未读，

2016年3月20日 15:14:342016/3/20

收件人 David Moreno Montero、onion-dev

On Sun, Mar 20, 2016 at 07:53:30PM +0100, David Moreno Montero wrote:
> I will apply this patch about not checking the timeouts so aggressively, I
> saw this performance opportunity (or bug, depends on you point of view),
> but decided to keep it as it was helping me expose the memory free before
> time on the 06-onion/ 06-timeouts test. As it looks resolved, I think its
> good idea to fix this performance opportunity.

Yes, I agree with your reasoning here. I also postponed making this
change until after the rest of the patch passed my stress-testing.

> Also I will fix the use of SHUT_RD. About the listen_point->close, it is
> about the closing of listened clients. Maybe it needs a better name. Close
> of the listen fd is done at onion_listen_point_listen_stop, that uses a
> predefined one or a custom one (https). Also it needs a shutdown and a
> close (I'm not sure if you wanted to leave only the shutdown). Also it
> needs to be via the el->shutdown callback, because other listen points may
> need extra code to close the fd, as a proper close in SSL.

Thanks for the explanation.

Yes, where you currently perform this call on timeout, I "wanted to
leave only the shutdown" - so this is literally what I did in that patch.

> Also I think everybody agrees that the new fixtimeouts poller is better and
> than the old one, so unless there is any objection I will merge later today
> into master.

I am currently not convinced that it is better. Prior to this commit
today:

commit 08871851636c02d38c189a2762098ca8a0b46956
Author: David Moreno <dmo...@coralbits.com>
Date: Sun Mar 20 19:29:52 2016 +0100

Force thread lock on id use at poller

it was both still unreliable and much slower than master - in fact, its
performance for my use case was unacceptable.

Now maybe it's just slower, or maybe it's also still unreliable - it'd
take thorough review and testing to find out, which I haven't done yet
(the above commit just arrived), and I think I am done patching and
testing libonion's poller for a while now, with no intent to move to a
different approach in it yet, unless I become aware of major issues with
the patch I posted that affect my application. No time for more stuff
like that now - I need to move on to other tasks.

Even if I had more time for code review and testing now, which I don't,
the unacceptable performance makes those changes irrelevant for me.

IOW, this might end up being the point where I'll stay with my (trivial)
fork of libonion, planning to hopefully replace libonion altogether later.
I might or might not be merging my poller changes with other changes you
will be making to upstream libonion from that point on.

Of course, you don't have to care about the above - it's just some user
feedback for you. :-)

Alexander

David Moreno Montero

未读，

2016年3月20日 15:17:202016/3/20

收件人 Solar Designer、onion-dev

I appreciate the feedback. Thanks for the effort on using onion, and the patches.

Do you think a thread pool would give better performance?

David.

Solar Designer

未读，

2016年3月20日 15:29:572016/3/20

收件人 David Moreno Montero、onion-dev

On Sun, Mar 20, 2016 at 08:11:11PM +0100, David Moreno Montero wrote:
> the socket must be close as well as shutdown, specially if we only shutdown
> it for RD. They do different things, and I dont remember properly right
> now, but doing only shutdown keeps the fd open some more time. I prefer to
> do both, but will check it out better.
>
> http://stackoverflow.com/questions/4160347/close-vs-shutdown-socket#4160356
> gives some answers, and I think that it is that with shutdown we still can
> read something, and keeps the connection at kernel level open some more
> time.

There's some misunderstanding between us on this. Here's what I am
trying to say:

Yes, we must close() the fd eventually, not only shutdown() it.

However, we must not close() too soon. Your code currently close()s too
soon, and this leads to a variety of problems through fd reuse
potentially happening before we're ready for it. My slots-indexed-by-fd
approach exposes those problems more easily, but the problems are
present without that approach as well: e.g., we don't want one event's
read() called on a reused fd that already belongs to another slot.
(The event may have occurred and returned from epoll_wait(), but not yet
been fully processed in userspace, before you have closed the fd.)

Yes, "doing only shutdown keeps the fd open some more time" - that's
what we need! As soon as we've closed the fd, it's free for reuse, and
this is a source of some of the race conditions we're dealing with.

> About the ONION_ERROR and the EBADF and ENOENT check I think it must stay.
> For example sometimes its the client that closes the connection at the same
> time, and this catches that situation, but does not require to show error.

My understanding is that the only reason the client's behavior may
sometimes cause those errors is that the server code is buggy.

A client closing the connection should not directly cause EBADF - that
would indicate the fd is no longer open, and the only way this can
legitimately happen is if libonion has called close(). ENOENT is
trickier, but I'd expect it to be similar - an action by libonion code
(perhaps EPOLL_CTL_DEL) causing it, even if in response to a client
event, rather than the client directly causing it.

None of these errors occur in my last day of stress-testing with the
patch I posted.

So I disagree.

Alexander

Solar Designer

未读，

2016年3月20日 15:49:582016/3/20

收件人 David Moreno Montero、onion-dev

On Sun, Mar 20, 2016 at 08:16:59PM +0100, David Moreno Montero wrote:
> Do you think a thread pool would give better performance?

Don't we currently use a thread pool? Do you mean something else?

For my use case, I am considering these two approaches to achieving the
desired performance level (which includes not only the req/s capacity, but
also reaching it with fewer concurrent connections and at lower latency):

1. Same thread pool for handling HTTP connections and for handling the
application-level requests (and producing responses). This is what we
have now with libonion, where each thread alternates between libonion
HTTP code and application code.

2. Two thread pools: one for handling HTTP connections (possibly just
two threads or so, for my target request rate) and another for handling
the application-level requests (and producing responses). The two
categories of threads would communicate via a multi-producer,
multi-consumer ring buffer, or via multiple producer-consumer rings.

Approach 2 isn't currently implementable in an app using libonion
because the per-request callbacks into the app expect to return
responses right away, and the corresponding libonion thread won't
return to servicing HTTP until a response has been computed and sent to
the socket. I've been envisioning my own poller handling both reads and
writes, and having per-fd data buffers for both reads and writes, and
being driven by both socket events and availability of responses in the
ring buffer(s).

I think it's fine that this is beyond libonion, unless you see a way to
enhance libonion to be suitable for that approach without cluttering
libonion too much. (There are other good reasons to enhance the poller
with handling of writes, though - I posted about that separately.)

Luckily, for now approach #1 works well (with the patch I posted), even
if precisely because libonion uses the maybe-fragile approach of
lock-free sharing of an epoll instance, relying heavily on the one-shot
property. Essentially, this property of epoll distributes work between
the threads much like the ring buffer(s) would, saving the need to
implement those. (But this only works for small responses that are
expected to fit in the kernel's socket write buffers, as well as be
within TCP window size.)

Alexander

David Moreno Montero

未读，

2016年3月20日 16:14:112016/3/20

收件人 Solar Designer、onion-dev

Don't we currently use a thread pool? Do you mean something else?

Sorry, of course we use one, I meant a more classic one based on work queues. One thread listening for connections, and as arrive add the work to accept, when some data arrives add the work to process the request and so on.. which is basically what we do, but instead of using epoll on every thread, we would use it only on one. Something like https://github.com/Pithikos/C-Thread-Pool

The poller could be used for writes, but right now that would also mean change how onion_response works, as it has a small (1500 bytes) buffer that onion_response_write family fills, and once filled or flushed is sent to user. This is all done in the handler code, or at most at just return. To have the option of non blocking write we would need or to keep a bigger buffer, probably not bounded, and when the handler code finishes, do the non blocking write, or (and there is code preparing this direction) signal a non blocking mode with a special return code (OCS_NON_BLOCK or similar), setting in the response object a callback to call whenever its possible to write.

Anyway, as always, time is limited, and unless somebody request this feature, we will stay with blocking write, and at most, non blocking write for 1500 byte answers... which normally are small enough to do not block.

Luckily, for now approach #1 works well (with the patch I posted), even
if precisely because libonion uses the maybe-fragile approach of
lock-free sharing of an epoll instance, relying heavily on the one-shot
property. Essentially, this property of epoll distributes work between
the threads much like the ring buffer(s) would, saving the need to
implement those. (But this only works for small responses that are
expected to fit in the kernel's socket write buffers, as well as be
within TCP window size.)

Yes, its fragile in the details, but I still think it can provide all we need, and make it robust.

Solar Designer

未读，

2016年3月21日 06:19:532016/3/21

收件人 onion-dev

On Sun, Mar 20, 2016 at 05:52:32AM +0300, Solar Designer wrote:

> Attached is the patch I am currently using.

Further stress-testing (500 threads, timeout 500 ms - so triggering
spurious timeouts when a request just happens to cross a whole second
boundary) revealed that this had an fd leak (which I then reproduced in
less extreme conditions as well). I've attached the corrected version.

Specifically, I've now #if 0'ed this piece:

#if 0
int i;
for (i=0;i<nfds;i++){
onion_poller_slot *el=(onion_poller_slot*)event[i].data.ptr;
if (cur==el){ // If removed just one with event, make it ignore the event later.
ONION_DEBUG0("Ignoring event as it timeouted: %d", cur->fd);
event[i].data.ptr=NULL;
}
}
#endif

It was not safe to ignore those events, because they might have been the
last events for their fd's (perhaps if the client closed the connection
at about the same time we detected timeout), and we absolutely have to
handle the last event for each fd, so that we can close the fd. Luckily,
there was little reason for us to special-case this timeout vs. event
collision anyway.

Alexander

onion.diff

Solar Designer

未读，

2016年3月27日 13:17:202016/3/27

收件人 onion-dev

On Mon, Mar 21, 2016 at 01:19:14PM +0300, Solar Designer wrote:
> On Sun, Mar 20, 2016 at 05:52:32AM +0300, Solar Designer wrote:
> > Attached is the patch I am currently using.
>
> Further stress-testing (500 threads, timeout 500 ms - so triggering
> spurious timeouts when a request just happens to cross a whole second
> boundary) revealed that this had an fd leak (which I then reproduced in
> less extreme conditions as well). I've attached the corrected version.

I made this into a pull request:

https://github.com/davidmoreno/onion/pull/167

I split the patch into two commits. Ideally, it would be three commits,
but I was lazy.

Also, I found that the timeout vs. event race was reported in mid-2015
as issue #116, still open:

https://github.com/davidmoreno/onion/issues/116

This patch fixes it.

Alexander

回复全部

回复作者