Doubts on buffer handling (Buffers, message duplicates, EAGAIN and timeouts)

472 views
Skip to first unread message

Stefano Alberto Russo

unread,
Sep 19, 2011, 11:26:28 AM9/19/11
to Scribe Server
Hi all,

It has been six months that I'm testing Scribe here at CERN and thanks
to Gautam and Travis I was able to perfectly configure it for our
needs (thankyou again). Right now, it is installed on the main storage
cluster, and it is collecting logs generated by our storage manager.
There are more or less 1500 Scribe clients writing to a main Scribe
aggregator, which is storing locally, on Hadoop and forwarding to
another server (via Scribe as well) for real-time analysis.

I ran into an annoying problem several times, I am able to more or
less handle it by tuning some configuration parameters, but since I
think that it is more radical than just tuning parameters, I would
like to discuss it.

The scenario is a network or system fail in which one or more Scribe
clients are not able to communicate with the next Scribe in the chain
anymore (in my case, with Scribe aggregator, Scribe Hadoop instance,
or Scribe server for real time analysis)

If it happens, the client starts writing to the buffer file and keep
on trying connecting to the Scribe server. Once the Scribe server is
online again, it will try to send a buffer file of a maximum size of
'max_size' every 'check_interval'. The problem is that if the buffer
file cannot be transferred before the 'timeout', it will generate a
EAGAIN error:

[Mon Sep 19 16:07:53 2011] "Failed to send <8083064> messages to
remote scribe server <scribeaggregator:1464> " "error <EAGAIN (timed
out)>"

This error will cause the client to stop the transfer and to retrying
sending the *entire* buffer file after a 'retry_interval'. But in the
meanwhile on the server the, untill the timeout has arised, a part of
the file has ben written. Now there are two possible evolutions:

1) The file could not be transferred before the timeout because of
network congestion: the next time that the client tries again to send
the buffer it will be successfully transferred, but this will cause
messages duplicates on the server.

2) The file could not be transferred before the timeout because the
network is too slow in any case for the configured 'max_size' and
'timeout'. This will lead to a neverending loop that will cause a
continuous writing of the same part of the first buffer file on the
server.


I ran in the first case after a network outage (and the consequent re-
syncronization of a lot of services over network that caused
congestion) and into the second one because I didn't pay enought
attention when I installed the Scribe server for real time analysis.

Now what I would like to ask is: is it a know problem (or limit) of
Scribe or did I miss something in the philosopy about how to use it?

Thankyou,
Stefano Alberto Russo.

Gautam Roy

unread,
Sep 21, 2011, 1:57:53 AM9/21/11
to scribe...@googlegroups.com
Yes, what you mention is a known limitation. Your best option is to keep the max_size small and use the buffer_send_rate parameter to increase the number of files attempted to transfer every check_interval.

e.g. if currently you have max_size=10000000 and buffer_send_rate=1 try changing it to max_size=1000000 and buffer_send_rate=10. 

Are you using scribe 2.2?
was introduced to alleviate this situation. i.e. in scribe 2.2 scribe will attempt to transfer the entire file, causing waste of network bandwidth when the file is eventually rejected via a 'denied for queue size'. With this patch, scribe first tests the downstream queues by sending an empty message to see whether the downstream is accepting messages before attempting to transfer the entire file.

Best,
Gautam

Stefano Alberto Russo

unread,
Oct 28, 2011, 6:07:31 AM10/28/11
to Scribe Server, gauta...@gmail.com
Thankyou for the reply, I've tested it a bit more in the last month,
here are my observations:

On Sep 21, 7:57 am, Gautam Roy <gautam....@gmail.com> wrote:
> Yes, what you mention is a known limitation. Your best option is to keep the
> max_size small and use the buffer_send_rate parameter to increase the number
> of files attempted to transfer every check_interval.
>
> e.g. if currently you have max_size=10000000 and buffer_send_rate=1 try
> changing it to max_size=1000000 and buffer_send_rate=10.

..but which is the difference between this approach and just setting a
timeout 10 times longer? I found no difference except that if the
transfer fails because of a temporary problem (and not an intrinsic
one like the one I ran into), then there are less duplicated messages.
Am I right?


> Are you using scribe 2.2?
> This simple patch in the master branch:https://github.com/facebook/scribe/commit/0f32992c27a52fe19f26a521c9f...
> was introduced to alleviate this situation. i.e. in scribe 2.2 scribe will
> attempt to transfer the entire file, causing waste of network bandwidth when
> the file is eventually rejected via a 'denied for queue size'. With this
> patch, scribe first tests the downstream queues by sending an empty message
> to see whether the downstream is accepting messages before attempting to
> transfer the entire file.

Yes, I applied this patch a long time ago. Anyway, I ran in an
intrinsic network problem, not in a Scribe one. The defaults values
are:

timeout: 5 secs
file_size: ~1 Gb

so if there is a quite long network outage and the buffer fills up
untill 1 Gb, by default Scribe tries to transfer this 1 Gb in 5
seconds, which brings to a minimum required bandwidth of 1.6 Gigabit
for every contemporary transfer to avoid running into EAGAIN errors
that causes loops and message duplicates. I really think that these
defaults should be reviewed..


But the most important thing about the entire story is that, in my
opinion, one expects a kind of heartbeat monitoring during the
transfer, so that the timeout occours only if there is no respnse on
the other side, not if there's a transfer going on. We should write on
the Scribe documentation in capital letters about this behaviour! :)

Thankyou and cheers,
Stefano


> Best,
> Gautam
>
> On Mon, Sep 19, 2011 at 8:26 AM, Stefano Alberto Russo <
>
Reply all
Reply to author
Forward
0 new messages