ThriftBroker: Connection reset by peer

846 views
Skip to first unread message

Kasey

unread,
Apr 6, 2011, 4:26:35 PM4/6/11
to Hypertable User
I am using version 0.9.4.3 on a local filesystem (for dev purposes) in
ubuntu. I am using the Java thrift API with a mutator set to flush
every 5000 millis. I also have a mutator flush set to true on close
which happens at application shutdown.

My application is listening to certain types of network data and
writing cells to a table using the set_cell method. My usage patterns
vary greatly throughout the day with some points having throughput of
several hundred thousand writes per minute to a part of the day that
writes 0 data for 40 minutes or so. Over the last several days each
time I hit this low period, I see the following in the ThriftBroker
logs

Thrift: Tue Apr 5 15:47:16 2011 TSocket::peek() recv() <Host: Port:
0>Connection reset by peer
Thrift: Tue Apr 5 15:47:16 2011 TThreadedServer client died: recv():
Connection reset by peer

followed at some point later by

Thrift: Tue Apr 5 16:41:40 2011 TSocket::peek() recv() <Host: Port:
0>Resource temporarily unavailable
Thrift: Tue Apr 5 16:41:40 2011 TThreadedServer client died: recv():
Resource temporarily unavailable

Further, when the data comes back after the lull my application throws
the following exception:

ERROR {1} - Problem
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:
132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:
129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:
101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:
378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:
297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:
204)
at org.hypertable.thriftgen.ClientService
$Client.recv_set_cell(ClientService.java:1715)
at org.hypertable.thriftgen.ClientService
$Client.set_cell(ClientService.java:1699)

A restart of my application fixes the problem.

I was able to reproduce this in a test that:
- Create a connection.
- Creates a namespace.
- Creates the table.
- Open the mutator (with 5000 milli flush)
- Inserts 1 record into the table.
- Sleeps for 30 minutes (note a 25 minute sleep does not cause an
error)
- Insert 1 record into the table.
- Sleeps for 10 seconds.
- Flush the mutator.
- Close the mutator.
- Drop the table.
- Drop the namespace
- Close the connection.

In this case I get the same messages in the ThrifBroker logs but my
client application throws the above exception at the drop namespace
call.

I was beginning to suspect some sort of tcp/ip timeout issue when
today I encountered this error during the middle of a busy period
(~100K writes per minute). With todays problem I got the same errors
in the ThriftBroker log but my client application did not throw an
exception, it just silently stopped processing data. It is possible
that last part was due to a thread management issue on my side (I am
still investigating).

This error message is not terribly clear to me, are there issues with
the Java API and timeouts/socket management? If this is a timeout
issue, I could fix it with a keep alive message of some sort, is that
the recommended way to do this? If this is a timeout issue, what is
the explanation for my similar problem in the middle of a busy period?

Thanks,
Kasey

Doug Judd

unread,
Apr 6, 2011, 6:13:09 PM4/6/11
to hyperta...@googlegroups.com, Kasey
Hi Kasey,

Thanks for the detailed error report.  We'll try to reproduce this on our end and get to the bottom of it.

- Doug


--
You received this message because you are subscribed to the Google Groups "Hypertable User" group.
To post to this group, send email to hyperta...@googlegroups.com.
To unsubscribe from this group, send email to hypertable-us...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hypertable-user?hl=en.


Doug Judd

unread,
Apr 7, 2011, 1:30:07 AM4/7/11
to hyperta...@googlegroups.com, Kasey
Hi Kasey,

What does your setup look like?  Do you have your client program communicating with the ThriftBroker over the network, or is this all on a single host?  Also, do you have iptables running?  If so, take a look at this post regarding iptables Connection Timout.

- Doug

On Wed, Apr 6, 2011 at 1:26 PM, Kasey <kkli...@gmail.com> wrote:

Kasey

unread,
Apr 7, 2011, 3:31:40 PM4/7/11
to Hypertable User
Everything is running on the same box. I am running iptables and
following the advice in the post attached tried changing my tcp keep
alive. For testing purposes I ramped the keep alive way up and ran my
test several times while capturing the data with wireshark. A few
things came out of this testing.

1. In my test, the error is not actually happening on drop namespace,
it is happening on set_cell the same as in my application, exception
handling code is obscuring this.

2. The timeout is consistently around 26.5 minutes.

3. My test is actually using 2 ports. I assume one is for reading and
one is for writing and that this is normal behavior.

4. In watching wireshark (I can provide the file if it is convenient)
you can see my first push, then 26.5 minutes of tcp keep alive
messages then a call from the server port (38080) that is a [FIN ACK]
on both client ports. Then 30 seconds (to match my 27 minute timeout)
my attempt to set cell, which clearly is now failing.

Any advice?

I'm not sure if it is important but I do have other clients that do
read only operations on hypertable working from other machines on the
network, though they do not have opportunities for long timeouts.

Kasey

On Apr 7, 12:30 am, Doug Judd <d...@hypertable.com> wrote:
> Hi Kasey,
>
> What does your setup look like?  Do you have your client program
> communicating with the ThriftBroker over the network, or is this all on a
> single host?  Also, do you have iptables running?  If so, take a look at
> this post regarding iptables Connection
> Timout<http://documents.made-it.com/iptables-timeout.html>
> .

Doug Judd

unread,
Apr 8, 2011, 8:25:15 AM4/8/11
to hyperta...@googlegroups.com, Kasey
Hi Kasey,

Can you send me your test program?  I'd like to be sure to replicate the problem exactly.  Also, it would be good to know if the problem persists in 0.9.5.0.pre which is built with a newer version of Thrift (0.6.0).

- Doug

Doug Judd

unread,
Apr 8, 2011, 8:56:57 PM4/8/11
to hyperta...@googlegroups.com, Kasey
Hi Kasey,

I was able to reproduce this problem with your test program and believe I have tracked down the problem.  There is a ThriftBroker.Timeout config property which defaults to 1600 seconds.  This value gets passed into TServerSocket which in turn sets the receive timeout on the socket (SO_RECVTIMEOUT) to this value which is causing recvfrom to timeout.  I've modified the code to have no timeout by default and I'm running the test now.  If all goes well, I'll get the fix into the 0.9.5.0 "pre2" release which should go out on Monday.

- Doug

On Thu, Apr 7, 2011 at 12:31 PM, Kasey <kkli...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages