Hi,
We use the following type of configuration when writing to HDFS files.
Which is basically buffer store with another bufferstore as primary. We
use two hdfs clusters on the same physical cluster to deal with the
problem of the namenode being a single point of failure.
With the replay_buffer=no configuration,
scribe does try to transfer data from secondary to primary when say
hdfs://dfsscribe3 comes back up again after being down for a while.
So basically the functionality becomes, try writing to first hdfs
cluster, else try writing to the second hdfs cluster and if both fail,
then buffer on local disk.
With this kind of setup you will have to set up the right copier scripts
to collect your data from two logical clusters.
Hope that helps,
Gautam
port=1456
max_msg_per_second=1000000
check_interval=1
max_queue_size=100000000
num_thrift_server_threads=3
# DEFAULT
<store>
category=default
type=buffer
max_write_interval=1
retry_interval=120
buffer_send_rate=5
must_succeed=yes
<primary>
type=buffer
retry_interval=600
replay_buffer=no
<primary>
type=file
fs_type=hdfs
file_path=hdfs://dfsscribe3:9000/user/scribe
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=1000000000
rotate_period=hourly
add_newlines=1
write_stats=no
rotate_on_reopen=yes
</primary>
<secondary>
type=file
fs_type=hdfs
file_path=hdfs://dfsscribe4:9000/user/scribe
create_symlink=no
use_hostname_sub_directory=yes
base_filename=thisisoverwritten
max_size=1000000000
rotate_period=hourly
add_newlines=1
write_stats=no
rotate_on_reopen=yes
</secondary>
</primary>
<secondary>
type=file
file_path=/mnt/d0/scribe
base_filename=thisisoverwritten
max_size=40000000
</secondary>
</store>
On 5/7/10 4:07 PM, Travis Crawford wrote:
> On Fri, May 7, 2010 at 8:09 AM, Wouter de Bie<
pru...@gmail.com> wrote:
>> Hi all,
>>
>> We're currently having some problems when writing to HDFS if the
>> connection to the namenode becomes unavailable. hdfsWrite() always
>> returns the bytes written, even if it never actually wrote. The hdfs
>> client tries to reconnect and tries this for 45 minutes. This is done
>> in Client.java line 307:
>>
>> } catch (SocketTimeoutException toe) {
>> /* The max number of retries is 45,
>> * which amounts to 20s*45 = 15 minutes retries.
>> */
>> handleConnectionFailure(timeoutFailures++, 45, toe);
>> }
>>
>>
>> There is some code in hfds.c that tries to catch an exception from the
>> java client, but it seems to never get that exception (or maybe after
>> 45 minutes). This is in hdfs.c line 1005:
>>
>> if (invokeMethod(env, NULL,&jExc, INSTANCE, jOutputStream,