Why IOException is thrown when socketChannel.read() returns -1 ?

170 views
Skip to first unread message

ilkinulas

unread,
May 26, 2010, 10:20:31 AM5/26/10
to spymemcached
Hi,

We are continuously seeing the logs below in our application.

2010-05-26 15:29:38.170 INFO net.spy.memcached.MemcachedConnection:
Reconnecting due to exception on {QA sa=/172.28.183.28:2009, #Rops=21,
#Wops=148, #iq=2, topRop=net.spy.memcac
hed.protocol.ascii.StoreOperationImpl@6217525e,
topWop=net.spy.memcached.protocol.ascii.StoreOperationImpl@7a15d86f,
toWrite=0, interested=5}
java.io.IOException: Disconnected
at
net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:
353)
at
net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:
306)
at
net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:
193)
at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:
1458)
2010-05-26 15:29:38.172 WARN net.spy.memcached.MemcachedConnection:
Closing, and reopening {QA sa=/172.28.183.28:2009, #Rops=21,
#Wops=148, #iq=9, topRop=net.spy.memcached.protoc
ol.ascii.StoreOperationImpl@6217525e,
topWop=net.spy.memcached.protocol.ascii.StoreOperationImpl@7a15d86f,
toWrite=0, interested=5}, attempt 0.

Is it necessary to close the connection instead of reusing it ?
Why not only close the connection when IOException is thrown from the
underlying SocketChannel?

Environment :
SunOS 5.10 Generic_142900-02 sun4v sparc SUNW,T5240
spy memcached 2.4.2
memcached : 1.4.3
jdk1.6.0_13

Matt Ingenthron

unread,
May 26, 2010, 12:08:50 PM5/26/10
to spymem...@googlegroups.com
Hi,

This is an INFO level log, and it's probably more the verbosity of the
information that is concerning you rather than anything else. This is
could happen every once in a while.

More below...

spymemcached uses IOException in a great many cases with a SocketChannel
to indicate that the connection needs to be reopened. In this case, the
section of code in question has received an end-of-stream from the TCP
socket, so the simple, safe thing done is throw an IOException which
will cause spy to build a new TCP socket next time there is an operation
for this server.

Unless the rate of these is very high, I wouldn't worry about it. It's
just info logging.

- Matt

ilkinulas

unread,
May 26, 2010, 1:35:37 PM5/26/10
to spymemcached
Thanks for your reply
The application executes approximately 300 transactions per second,
which means nearly 300 cache operations(set,get or delete) in a
second.
While spymemcached client closes and reopens socket connections, the
spy API throws timeout exceptions which are very unpleasant.
Is there a way of telling :
"do not close the undelying socket connection unless an IOException is
thrown by the SocketChannel"

-ilkin

Matt Ingenthron

unread,
May 26, 2010, 4:59:03 PM5/26/10
to spymem...@googlegroups.com
ilkinulas wrote:
> Thanks for your reply
> The application executes approximately 300 transactions per second,
> which means nearly 300 cache operations(set,get or delete) in a
> second.
>

Sorry for not being more specific, but I was actually asking about the
rate of these disconnects you were seeing in the log.

> While spymemcached client closes and reopens socket connections, the
> spy API throws timeout exceptions which are very unpleasant.
> Is there a way of telling :
> "do not close the undelying socket connection unless an IOException is
> thrown by the SocketChannel"
>

Currently, staging a reconnection is perfectly appropriate. I
understand what you're asking for, but spy's MemcachedConnection class
is only throwing the IOException when the read() on the SocketChannel
indicates an end of stream. In spy's operating model, the connection
shouldn't be dropped so it considers this an exception case, though to
Java the socket being closed remotely is not an exception case.

If we did what you suggest, and only reconnect upon SocketChannel's
IOException, then once the SocketChannel appears to be closed by the
remote side (from Java's point of view) everything would timeout.

See:
http://github.com/dustin/java-memcached-client/blob/master/src/main/java/net/spy/memcached/MemcachedConnection.java#L389

If you're seeing more than one of these a second (or maybe every few
seconds or minute) I'd have some concern. The thing to investigate
though is why the connection appears to be getting closed. This
probably means looking into packet capture on both the client and server
side.

In a normal operating case the connection should be persistent between
the client and the server.

With regard to the timeouts, how long are you giving the operation to
succeed? It can take a few milliseconds to build the connection and get
some operations done. Any time there is a disconnection, it's likely
that some operations will be canceled or time out. If it's acceptable
to wait longer, you can certainly retry them a couple of times from the
application level.

- Matt

ilkinulas

unread,
May 28, 2010, 1:51:24 PM5/28/10
to spymemcached
> If you're seeing more than one of these a second (or maybe every few
> seconds or minute) I'd have some concern.  The thing to investigate
> though is why the connection appears to be getting closed.  This
> probably means looking into packet capture on both the client and server
> side.
IOException (thrown when read() returns -1) occurs 3-4 times an hour
during peak hours.
We check the network between the application and memcached server and
could not find any clue explaining these IOExceptions.

Application uses only one MemcachedClient, and when it is disconnected
all the requests fail.
It takes max 2 seconds to establish the connection under traffic,
which means ~500 requests are handled through the database during
reconnection phase.

To handle these temporary disconnections, I will create a pool of
memcached clients.
When a client is disconnected I'will remove it from the pool (until
it is reconnected again.)
So the requests will be routed to the memcached server through another
client( which is connected)
And I hope all the Memcached Clients do not disconnect at the same
time.

Matt Ingenthron

unread,
May 28, 2010, 7:08:19 PM5/28/10
to spymem...@googlegroups.com
Hi,

ilkinulas wrote:
>> If you're seeing more than one of these a second (or maybe every few
>> seconds or minute) I'd have some concern. The thing to investigate
>> though is why the connection appears to be getting closed. This
>> probably means looking into packet capture on both the client and server
>> side.
>>
> IOException (thrown when read() returns -1) occurs 3-4 times an hour
> during peak hours.
> We check the network between the application and memcached server and
> could not find any clue explaining these IOExceptions.
>

While it's not easy to explain, that's not a really high rate and
probably has some mundane explanation.


> Application uses only one MemcachedClient, and when it is disconnected
> all the requests fail.
> It takes max 2 seconds to establish the connection under traffic,
> which means ~500 requests are handled through the database during
> reconnection phase.
>

2 seconds sounds like a very long time to establish these connections.
This may be worth investigating. Perhaps you need more client resources
during these peak periods?


> To handle these temporary disconnections, I will create a pool of
> memcached clients.
> When a client is disconnected I'will remove it from the pool (until
> it is reconnected again.)
> So the requests will be routed to the memcached server through another
> client( which is connected)
> And I hope all the Memcached Clients do not disconnect at the same
> time.
>

Just to offer another option, you can observe connection changes in
spymemcached. I believe it wouldn't be too hard to handle the timeout,
decide if it's because a connection is waiting to be established, then
either block or do something application level intelligent.

A pool of clients is also an option, but whatever is causing the close
of the connection could affect all of them.

Hope that helps,

- Matt

ilkinulas

unread,
May 29, 2010, 3:26:08 PM5/29/10
to spymemcached
Hi again
> A pool of clients is also an option, but whatever is causing the close
> of the connection could affect all of them.

I made a test with a pool of 5 clients, and 1 or 2 of them were
disconnected at a time.
Other connected clients handled the requests during these
disconnections.
This solves my problem.
I will tell if I find something which explains these "end of stream
reached" cases.

--ilkin

Matt Ingenthron

unread,
May 29, 2010, 3:45:47 PM5/29/10
to spymem...@googlegroups.com
Hi,

Excellent! Perhaps in the future we need to consider connection pooling
within spy.

Perhaps it has something to do with workload and timeouts? I wouldn't
think so, as TCP is pretty reliable in this regard. I think you'll have
to look at this with network or platform level tools. To java, it's
just an end of stream.

- Matt

Reply all
Reply to author
Forward
0 new messages