two bugs

0 views
Skip to first unread message

Jay Kreps

unread,
Jun 22, 2009, 2:01:06 AM6/22/09
to project-...@googlegroups.com
Hey Guys,

I found and fixed two pretty serious bugs.

1. It turns out that the version of commons-pool we were using for
connection pooling would synchronize the entire object creation. Since
the object we were creating was a socket connection, this means the
entire creation of the tcp/ip connection was synchronized with a
global lock. The same problem effects DBCP. As a result if the
connection timed out, the lock would be held the entire time,
preventing any new connections. Plus we were not setting an explicit
timeout on the connection, and the soTimeout doesn't get used during
the connection either. This problem would manifest if you try to
connect to a non-existant ip, as you will get no response and no other
connections could be created during this period. The fix is to upgrade
to commons pool 1.5, which appears to fix the problem.

This is the second time we have gotten badly burned by commons pool,
and I wonder if it doesn't make sense to just swap in a properly
written ConcurrentMap+BlockingQueue as a substitute. Most of the
configuration options commons pool offers seem to do more harm than
good and the implementation is quite frightening. I have also been
thinking that the dynamic growth of the pool is kind of pointless, it
might make sense to just create all the connections on startup so that
things are immediately in a steady state.

2. We discovered that interrupting socket IO does not actually have
any effect. As a result the shutdown() method on the SocketServer was
allowing requests on existing connections to continue while the server
was shutting down. This lead to request timeouts during the shutdowns.
The fix is to keep a Map of active sessions and forcefully close the
socket for each active session when shutdown is called to ensure no
active connections are present when we begin shutting down the storage
engines.

Cheers,

-Jay

ijuma

unread,
Jun 22, 2009, 3:04:46 AM6/22/09
to project-voldemort
Hey Jay,

On Jun 22, 7:01 am, Jay Kreps <jay.kr...@gmail.com> wrote:
> I found and fixed two pretty serious bugs.

Great.

> This is the second time we have gotten badly burned by commons pool,
> and I wonder if it doesn't make sense to just swap in a properly
> written ConcurrentMap+BlockingQueue as a substitute.

It may be the way forward, but do we have tests for the issues with
commons pool that have burned us? If not, we may end up with similar
bugs in our implementation (and different ones, of course). But maybe
with a simpler implementation, this is less likely to happen (which is
I believe your point is).

You mentioned in another thread that you generally create a release
after LinkedIn has upgraded to a given version of Voldemort as it
provides regression and load testing. Do you have a rough schedule in
mind for the next release?

Best,
Ismael

Jay Kreps

unread,
Jun 22, 2009, 10:32:06 AM6/22/09
to project-...@googlegroups.com
Yeah it is a fair point. It turns out all the complication comes from
supporting a very configurable pool. I did add a simple integration
test against commons pool that reproduces the problem against 1.4 and
not against 1.5.

-Jay

Jay Kreps

unread,
Jun 22, 2009, 10:45:30 AM6/22/09
to project-...@googlegroups.com
As to releasing, I had planned to do so in a few weeks. However
because of these bugs I am going to try to get it out in the next few
days. At that point we can make the decision as to whether or not it
is worth calling it an official pv release or not.

-Jay

On Mon, Jun 22, 2009 at 12:04 AM, ijuma<ism...@juma.me.uk> wrote:
>

ijuma

unread,
Jun 22, 2009, 1:55:53 PM6/22/09
to project-voldemort
On Jun 22, 3:45 pm, Jay Kreps <jay.kr...@gmail.com> wrote:
> As to releasing, I had planned to do so in a few weeks. However
> because of these bugs I am going to try to get it out in the next few
> days.

Sounds good.

Ismael

Tatu Saloranta

unread,
Jun 22, 2009, 5:55:00 PM6/22/09
to project-...@googlegroups.com
On Sun, Jun 21, 2009 at 11:01 PM, Jay Kreps<jay....@gmail.com> wrote:
...

> 2. We discovered that interrupting socket IO does not actually have
> any effect. As a result the shutdown() method on the SocketServer was

I think this is platform-dependant, and mostly due to difficulties in
being able to interrupt actual OS-dependant blocking functionality.
And worse than this, not implemented (at least across the board, for
most commonly used blocking operations) on enough platforms to make it
usable.

So yes, unfortunately one can not count on thread.interrupt() to wake
up blocked threads in Java. :-/
(I think this was also something that NIO was hoped to help resolve)

-+ Tatu +-

bhupesh bansal

unread,
Jun 30, 2009, 4:45:56 PM6/30/09
to project-...@googlegroups.com
Linkedin saw some socket issues with commons-pool-1.5.1 which goes away with commons-pool-1.4 rollback.
Till now I am not able to pin-point the issue or reproduce an isolated test case for the same.

Just a heads up and please be cautious while using commons-pool-1.5.1, I will report and if needed revert common-pools
once we have identified the issue.

Best
Bhupesh

ijuma

unread,
Jun 30, 2009, 5:14:16 PM6/30/09
to project-voldemort
Hi Bhupesh,

On Jun 30, 9:45 pm, bhupesh bansal <bbansal....@gmail.com> wrote:
> Linkedin saw some socket issues with commons-pool-1.5.1 which goes away with
> commons-pool-1.4 rollback.
> Till now I am not able to pin-point the issue or reproduce an isolated test
> case for the same.
>
> Just a heads up and please be cautious while using commons-pool-1.5.1, I
> will report and if needed revert common-pools
> once we have identified the issue.

Thanks for the heads-up. Can you describe the symptoms (even if
vaguely)?

Ismael

bhupesh bansal

unread,
Jun 30, 2009, 6:34:39 PM6/30/09
to project-...@googlegroups.com
Ismael,

I have filed one bug with common-pools guys to investigate https://issues.apache.org/jira/browse/POOL-146

I am trying to come up with a reproducible test case till now I dont think I have a good test to reproduce it this is my latest attempt.
http://github.com/voldemort/voldemort/commit/bbcb6e2ec7b5e7f1a44a24d6a92df7b764415164


Best
Bhupesh

bhupesh bansal

unread,
Jul 5, 2009, 5:16:58 PM7/5/09
to project-...@googlegroups.com
Good news folks,

Commons-pools team has successfully identified the issue and close this bug. I would try out their changes and report later.

https://issues.apache.org/jira/browse/POOL-146

Best
Bhupesh
Reply all
Reply to author
Forward
0 new messages