Sure thing! Hopefully the highlighting comes through in logs below, as that helps identify the important parts. Let me know if you have any questions about how/why/wtf, or Zookeeper stuff if you're not familiar.
The following logs and examples are from my testing of the following two patches:
In addition to the following tests, I run this in production on at least several hundred machines.
All testing thus far has been from looking at logs, and verifying the logs look correct in the analysis system.
######################################################################
### Test case: Zookeeper already running, scribe registers itself. ###
######################################################################
SUMMARY
At startup, a new ZKClient object is created, and because the config has a zk_registration_prefix a Zookeeper connection is established. When the connection and session are established we receive a callback which then registers the task in Zookeeper. Note the parent directories are created if necessary and an ephemeral registration node in host:port format is created.
LOGS
[travis@hoth ~/src/scribe]$ cat zk-agg.conf
port=4444
zk_server=localhost:2181
zk_registration_prefix=/scribe/aggregator
<store>
category=travis
type=null
</store>
[travis@hoth ~/src/scribe]$ src/scribed -c zk-agg.conf
[Thu May 6 11:34:25 2010] "setrlimit error (setting max fd size)"
[Thu May 6 11:34:25 2010] "STATUS: STARTING"
[Thu May 6 11:34:25 2010] "STATUS: configuring"
[Thu May 6 11:34:25 2010] "got configuration data from file <zk-agg.conf>"
[Thu May 6 11:34:25 2010] "Creating new ZKClient."
2010-05-06 11:34:25,744:9223(0x7fff70a75be0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-06 11:34:25,744:9223(0x7fff70a75be0):ZOO_INFO@log_env@662: Client environment:
host.name=hoth.local
2010-05-06 11:34:25,744:9223(0x7fff70a75be0):ZOO_INFO@log_env@669: Client environment:
os.name=Darwin
2010-05-06 11:34:25,744:9223(0x7fff70a75be0):ZOO_INFO@log_env@670: Client environment:os.arch=10.3.0
2010-05-06 11:34:25,744:9223(0x7fff70a75be0):ZOO_INFO@log_env@671: Client environment:os.version=Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386
2010-05-06 11:34:25,744:9223(0x7fff70a75be0):ZOO_INFO@log_env@679: Client environment:
user.name=travis
2010-05-06 11:34:25,745:9223(0x7fff70a75be0):ZOO_INFO@log_env@687: Client environment:user.home=/Users/travis
2010-05-06 11:34:25,745:9223(0x7fff70a75be0):ZOO_INFO@log_env@699: Client environment:user.dir=/Users/travis/src/scribe
2010-05-06 11:34:25,745:9223(0x7fff70a75be0):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x10004b3b0 sessionId=0 sessionPasswd=<null> context=0x0 flags=0
2010-05-06 11:34:25,746:9223(0x7fff70a75be0):ZOO_DEBUG@start_threads@152: starting threads...
2010-05-06 11:34:25,746:9223(0x100481000):ZOO_DEBUG@do_completion@326: started completion thread
[Thu May 6 11:34:25 2010] "CATEGORY : travis"
2010-05-06 11:34:25,747:9223(0x1002c0000):ZOO_DEBUG@do_io@279: started IO thread
[Thu May 6 11:34:25 2010] "configured <1> stores"
2010-05-06 11:34:25,748:9223(0x1002c0000):ZOO_INFO@check_events@1585: initiated connection to server [::1:2181]
[Thu May 6 11:34:25 2010] "store thread starting"
[Thu May 6 11:34:25 2010] "STATUS: "
[Thu May 6 11:34:25 2010] "STATUS: ALIVE"
[Thu May 6 11:34:25 2010] "Starting scribe server on port 4444"
Thrift: Thu May 6 11:34:25 2010 libevent 1.4.13-stable method kqueue
2010-05-06 11:34:25,755:9223(0x1002c0000):ZOO_INFO@check_events@1632: session establishment complete on server [::1:2181], sessionId=0x1286ee218490001, negotiated timeout=10000
2010-05-06 11:34:25,755:9223(0x1002c0000):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-06 11:34:25,755:9223(0x100481000):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
[Thu May 6 11:34:25 2010] "Registering task in Zookeeper."
2010-05-06 11:34:25,755:9223(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30bb2 for path [/scribe] to ::1:2181
2010-05-06 11:34:25,756:9223(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30bb2 rc=-110
2010-05-06 11:34:25,756:9223(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30bb3 for path [/scribe/aggregator] to ::1:2181
2010-05-06 11:34:25,757:9223(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30bb3 rc=-110
2010-05-06 11:34:25,757:9223(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30bb4 for path [/scribe/aggregator/hoth.local:4444] to ::1:2181
2010-05-06 11:34:25,758:9223(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30bb4 rc=0
2010-05-06 11:34:29,092:9223(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
^C
[travis@hoth ~/src/scribe]$
####################################################################################
### Test case: Znode already exists, set watch and register when node goes away. ###
####################################################################################
SUMMARY
In the case where scribe restarts quickly the previous ephemeral node may not have been garbage collected yet, so we set a watch on that znode and register when it goes away. Scribe can still receive messages during this time, and technically its still discoverable since clients would see the stale ephemeral znode.
LOGS
2010-05-06 11:41:45,997:9251(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d6a for path [/scribe/aggregator/hoth.local:4444] to fe80::1:2181
2010-05-06 11:41:45,998:9251(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d6a rc=0
2010-05-06 11:41:49,331:9251(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
^C
[travis@hoth ~/src/scribe]$ src/scribed -c zk-agg.conf
[Thu May 6 11:41:51 2010] "setrlimit error (setting max fd size)"
[Thu May 6 11:41:51 2010] "STATUS: STARTING"
[Thu May 6 11:41:51 2010] "STATUS: configuring"
[Thu May 6 11:41:51 2010] "got configuration data from file <zk-agg.conf>"
[Thu May 6 11:41:51 2010] "Creating new ZKClient."
2010-05-06 11:41:51,270:9252(0x7fff70a75be0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-06 11:41:51,270:9252(0x7fff70a75be0):ZOO_INFO@log_env@662: Client environment:
host.name=hoth.local
2010-05-06 11:41:51,270:9252(0x7fff70a75be0):ZOO_INFO@log_env@669: Client environment:
os.name=Darwin
2010-05-06 11:41:51,270:9252(0x7fff70a75be0):ZOO_INFO@log_env@670: Client environment:os.arch=10.3.0
2010-05-06 11:41:51,270:9252(0x7fff70a75be0):ZOO_INFO@log_env@671: Client environment:os.version=Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386
2010-05-06 11:41:51,270:9252(0x7fff70a75be0):ZOO_INFO@log_env@679: Client environment:
user.name=travis
2010-05-06 11:41:51,271:9252(0x7fff70a75be0):ZOO_INFO@log_env@687: Client environment:user.home=/Users/travis
2010-05-06 11:41:51,271:9252(0x7fff70a75be0):ZOO_INFO@log_env@699: Client environment:user.dir=/Users/travis/src/scribe
2010-05-06 11:41:51,271:9252(0x7fff70a75be0):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x10004b3b0 sessionId=0 sessionPasswd=<null> context=0x0 flags=0
2010-05-06 11:41:51,271:9252(0x7fff70a75be0):ZOO_DEBUG@start_threads@152: starting threads...
2010-05-06 11:41:51,271:9252(0x100481000):ZOO_DEBUG@do_completion@326: started completion thread
2010-05-06 11:41:51,271:9252(0x1002c0000):ZOO_DEBUG@do_io@279: started IO thread
[Thu May 6 11:41:51 2010] "CATEGORY : travis"
[Thu May 6 11:41:51 2010] "configured <1> stores"
[Thu May 6 11:41:51 2010] "STATUS: "
[Thu May 6 11:41:51 2010] "STATUS: ALIVE"
[Thu May 6 11:41:51 2010] "Starting scribe server on port 4444"
2010-05-06 11:41:51,272:9252(0x1002c0000):ZOO_INFO@check_events@1585: initiated connection to server [
127.0.0.1:2181]
[Thu May 6 11:41:51 2010] "store thread starting"
Thrift: Thu May 6 11:41:51 2010 libevent 1.4.13-stable method kqueue
2010-05-06 11:41:51,282:9252(0x1002c0000):ZOO_INFO@check_events@1632: session establishment complete on server [
127.0.0.1:2181], sessionId=0x1286ee218490003, negotiated timeout=10000
2010-05-06 11:41:51,283:9252(0x1002c0000):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-06 11:41:51,283:9252(0x100481000):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
[Thu May 6 11:41:51 2010] "Registering task in Zookeeper."
2010-05-06 11:41:51,283:9252(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d70 for path [/scribe] to
127.0.0.1:21812010-05-06 11:41:51,285:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d70 rc=-110
2010-05-06 11:41:51,285:9252(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d71 for path [/scribe/aggregator] to
127.0.0.1:21812010-05-06 11:41:51,287:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d71 rc=-110
2010-05-06 11:41:51,287:9252(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d72 for path [/scribe/aggregator/hoth.local:4444] to
127.0.0.1:21812010-05-06 11:41:51,289:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d72 rc=-110
2010-05-06 11:41:51,299:9252(0x100481000):ZOO_DEBUG@zoo_awexists@2573: Sending request xid=0x4be30d73 for path [/scribe/aggregator/hoth.local:4444] to
127.0.0.1:21812010-05-06 11:41:51,342:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2016: Calling COMPLETION_STAT for xid=0x4be30d73 rc=0
[Thu May 6 11:41:51 2010] "Set watch on znode that already exists: /scribe/aggregator/hoth.local:4444"
2010-05-06 11:41:54,633:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
2010-05-06 11:41:57,967:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
2010-05-06 11:42:00,015:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@1916: Processing WATCHER_EVENT
2010-05-06 11:42:00,015:9252(0x100481000):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [/scribe/aggregator/hoth.local:4444], type = -1 event=ZOO_DELETED_EVENT
[Thu May 6 11:42:00 2010] "Registering task in Zookeeper."
2010-05-06 11:42:00,015:9252(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d74 for path [/scribe] to
127.0.0.1:21812010-05-06 11:42:00,016:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d74 rc=-110
2010-05-06 11:42:00,016:9252(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d75 for path [/scribe/aggregator] to
127.0.0.1:21812010-05-06 11:42:00,017:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d75 rc=-110
2010-05-06 11:42:00,017:9252(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be30d76 for path [/scribe/aggregator/hoth.local:4444] to
127.0.0.1:21812010-05-06 11:42:00,018:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be30d76 rc=0
2010-05-06 11:42:03,352:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
2010-05-06 11:42:06,686:9252(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
^C
[travis@hoth ~/src/scribe]$
################################################################################################
### Test case: Zookeeper down when scribe starts, Scribe registers when Zookeeper is started ###
################################################################################################
SUMMARY
The Zookeeper client library takes care of retries when Zookeeper server is not available. More specifically, the client starts it its ZOO_CONNECTING_STATE, and the library retries when in this state. This is the same flow when a ZK server goes away -- the state machines goes back to ZOO_CONNECTING_STATE and we follow the same reconnect flow.
LOGS
[travis@hoth ~/src/scribe]$ src/scribed -c zk-agg.conf
[Thu May 6 11:56:04 2010] "setrlimit error (setting max fd size)"
[Thu May 6 11:56:04 2010] "STATUS: STARTING"
[Thu May 6 11:56:04 2010] "STATUS: configuring"
[Thu May 6 11:56:04 2010] "got configuration data from file <zk-agg.conf>"
[Thu May 6 11:56:04 2010] "Creating new ZKClient."
2010-05-06 11:56:04,476:9291(0x7fff70a75be0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-06 11:56:04,476:9291(0x7fff70a75be0):ZOO_INFO@log_env@662: Client environment:
host.name=hoth.local
2010-05-06 11:56:04,476:9291(0x7fff70a75be0):ZOO_INFO@log_env@669: Client environment:
os.name=Darwin
2010-05-06 11:56:04,476:9291(0x7fff70a75be0):ZOO_INFO@log_env@670: Client environment:os.arch=10.3.0
2010-05-06 11:56:04,476:9291(0x7fff70a75be0):ZOO_INFO@log_env@671: Client environment:os.version=Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386
2010-05-06 11:56:04,476:9291(0x7fff70a75be0):ZOO_INFO@log_env@679: Client environment:
user.name=travis
2010-05-06 11:56:04,477:9291(0x7fff70a75be0):ZOO_INFO@log_env@687: Client environment:user.home=/Users/travis
2010-05-06 11:56:04,477:9291(0x7fff70a75be0):ZOO_INFO@log_env@699: Client environment:user.dir=/Users/travis/src/scribe
2010-05-06 11:56:04,477:9291(0x7fff70a75be0):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x10004b3b0 sessionId=0 sessionPasswd=<null> context=0x0 flags=0
2010-05-06 11:56:04,478:9291(0x7fff70a75be0):ZOO_DEBUG@start_threads@152: starting threads...
2010-05-06 11:56:04,478:9291(0x1002c0000):ZOO_DEBUG@do_io@279: started IO thread
2010-05-06 11:56:04,478:9291(0x100481000):ZOO_DEBUG@do_completion@326: started completion thread
[Thu May 6 11:56:04 2010] "CATEGORY : travis"
[Thu May 6 11:56:04 2010] "configured <1> stores"
2010-05-06 11:56:04,478:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [fe80::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
[Thu May 6 11:56:04 2010] "store thread starting"
[Thu May 6 11:56:04 2010] "STATUS: "
2010-05-06 11:56:04,479:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
[Thu May 6 11:56:04 2010] "STATUS: ALIVE"
2010-05-06 11:56:04,479:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [
127.0.0.1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
[Thu May 6 11:56:04 2010] "Starting scribe server on port 4444"
Thrift: Thu May 6 11:56:04 2010 libevent 1.4.13-stable method kqueue
2010-05-06 11:56:07,813:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [fe80::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2010-05-06 11:56:07,813:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2010-05-06 11:56:07,813:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [
127.0.0.1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2010-05-06 11:56:11,146:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [fe80::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2010-05-06 11:56:11,147:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2010-05-06 11:56:11,147:9291(0x1002c0000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [
127.0.0.1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2010-05-06 11:56:14,481:9291(0x1002c0000):ZOO_INFO@check_events@1585: initiated connection to server [fe80::1:2181]
2010-05-06 11:56:14,505:9291(0x1002c0000):ZOO_INFO@check_events@1632: session establishment complete on server [fe80::1:2181], sessionId=0x1286ef99f260000, negotiated timeout=10000
2010-05-06 11:56:14,506:9291(0x1002c0000):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-06 11:56:14,506:9291(0x100481000):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
[Thu May 6 11:56:14 2010] "Registering task in Zookeeper."
2010-05-06 11:56:14,506:9291(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be310c5 for path [/scribe] to fe80::1:2181
2010-05-06 11:56:14,517:9291(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be310c5 rc=-110
2010-05-06 11:56:14,517:9291(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be310c6 for path [/scribe/aggregator] to fe80::1:2181
2010-05-06 11:56:14,526:9291(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be310c6 rc=-110
2010-05-06 11:56:14,526:9291(0x100481000):ZOO_DEBUG@zoo_acreate@2489: Sending request xid=0x4be310c7 for path [/scribe/aggregator/hoth.local:4444] to fe80::1:2181
2010-05-06 11:56:14,527:9291(0x1002c0000):ZOO_DEBUG@zookeeper_process@2049: Calling COMPLETION_STRING for xid=0x4be310c7 rc=0
2010-05-06 11:56:17,860:9291(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
2010-05-06 11:56:21,194:9291(0x1002c0000):ZOO_DEBUG@zookeeper_process@1974: Got ping response in 0 ms
^C
[travis@hoth ~/src/scribe]$
#######################################################################
### Test case: Discover remote_host and remote_port from Zookeeper. ###
#######################################################################
SUMMARY
When remote_host specifies a znode we randomly select a child registered at that location, overriding the remote_host and remote_port config options. Using this method means the existing code paths are reusued, so we have the same failure modes. Note that ZKClient is created, then a store, then we see the ZK connection start. This is because this config does not register a znode so we only open the connection when needed. After discovering a remote scribe we establis a connection to it as usual. Note we also disconnect if not registering ourselves, since the connection is not needed.
LOGS
[travis@hoth ~/src/scribe]$ cat zk.conf
port=3333
zk_server=localhost:2181
<store>
category=travis
type=buffer
retry_interval=5
retry_interval_range=2
<primary>
type=network
remote_host=zk:///scribe/aggregator
remote_port=2222
</primary>
<secondary>
type=file
file_path=/tmp
base_filename=travis
</secondary>
</store>
[travis@hoth ~/src/scribe]$ src/scribed -c zk.conf
[Thu May 6 12:03:28 2010] "setrlimit error (setting max fd size)"
[Thu May 6 12:03:28 2010] "STATUS: STARTING"
[Thu May 6 12:03:28 2010] "STATUS: configuring"
[Thu May 6 12:03:28 2010] "got configuration data from file <zk.conf>"
[Thu May 6 12:03:28 2010] "Creating new ZKClient."
[Thu May 6 12:03:28 2010] "CATEGORY : travis"
[Thu May 6 12:03:28 2010] "configured <1> stores"
[Thu May 6 12:03:28 2010] "STATUS: "
[Thu May 6 12:03:28 2010] "STATUS: ALIVE"
[Thu May 6 12:03:28 2010] "store thread starting"
2010-05-06 12:03:28,115:9321(0x1002c0000):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-06 12:03:28,115:9321(0x1002c0000):ZOO_INFO@log_env@662: Client environment:
host.name=hoth.local
2010-05-06 12:03:28,115:9321(0x1002c0000):ZOO_INFO@log_env@669: Client environment:
os.name=Darwin
2010-05-06 12:03:28,115:9321(0x1002c0000):ZOO_INFO@log_env@670: Client environment:os.arch=10.3.0
2010-05-06 12:03:28,115:9321(0x1002c0000):ZOO_INFO@log_env@671: Client environment:os.version=Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386
2010-05-06 12:03:28,115:9321(0x1002c0000):ZOO_INFO@log_env@679: Client environment:
user.name=travis
[Thu May 6 12:03:28 2010] "Starting scribe server on port 3333"
Thrift: Thu May 6 12:03:28 2010 libevent 1.4.13-stable method kqueue
2010-05-06 12:03:28,117:9321(0x1002c0000):ZOO_INFO@log_env@687: Client environment:user.home=/Users/travis
2010-05-06 12:03:28,117:9321(0x1002c0000):ZOO_INFO@log_env@699: Client environment:user.dir=/Users/travis/src/scribe
2010-05-06 12:03:28,117:9321(0x1002c0000):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x10004b3b0 sessionId=0 sessionPasswd=<null> context=0x0 flags=0
2010-05-06 12:03:28,117:9321(0x1002c0000):ZOO_DEBUG@start_threads@152: starting threads...
2010-05-06 12:03:28,117:9321(0x101184000):ZOO_DEBUG@do_completion@326: started completion thread
2010-05-06 12:03:28,117:9321(0x100787000):ZOO_DEBUG@do_io@279: started IO thread
2010-05-06 12:03:28,118:9321(0x100787000):ZOO_INFO@check_events@1585: initiated connection to server [::1:2181]
[Thu May 6 12:03:28 2010] "Getting the best remote scribe."
2010-05-06 12:03:28,118:9321(0x1002c0000):ZOO_DEBUG@zoo_awget_children_@2612: Sending request xid=0x4be31281 for path [/scribe/aggregator] to ::1:2181
2010-05-06 12:03:28,122:9321(0x100787000):ZOO_INFO@check_events@1632: session establishment complete on server [::1:2181], sessionId=0x1286ef99f260003, negotiated timeout=10000
2010-05-06 12:03:28,122:9321(0x100787000):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-06 12:03:28,123:9321(0x100787000):ZOO_DEBUG@zookeeper_process@2026: Calling COMPLETION_STRINGLIST for xid=0x4be31281 rc=0
2010-05-06 12:03:28,123:9321(0x101184000):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
2010-05-06 12:03:28,123:9321(0x100787000):ZOO_DEBUG@do_io@317: IO thread terminated
2010-05-06 12:03:28,123:9321(0x101184000):ZOO_DEBUG@do_completion@336: completion thread terminated
2010-05-06 12:03:28,123:9321(0x1002c0000):ZOO_INFO@zookeeper_close@2290: Closing zookeeper sessionId=0x1286ef99f260003 to [::1:2181]
[Thu May 6 12:03:28 2010] "Got remote scribe from zookeeper <hoth.local:4444>"
[Thu May 6 12:03:28 2010] "Opened connection to remote scribe server <hoth.local:4444>"
[Thu May 6 12:03:28 2010] "[travis] Opened file </tmp/travis_00000> for writing"
[Thu May 6 12:03:28 2010] "[travis] Changing state from <DISCONNECTED> to <SENDING_BUFFER>"
[Thu May 6 12:03:28 2010] "[travis] successfully read <0> entries of <0> bytes from file </tmp/travis_00000>"
[Thu May 6 12:03:28 2010] "[travis] No more buffer files to send, switching to streaming mode"
[Thu May 6 12:03:28 2010] "[travis] Changing state from <SENDING_BUFFER> to <STREAMING>"
^C
[travis@hoth ~/src/scribe]$
###################################################
### Test case: Unable to discover remote scribe ###
###################################################
SUMMARY
Zookeeper outages will happen. When unable to get a remote scribe the secondary store is used, if configured. Later when switching to the primary store we go through the discovery code path and potentially get a place to connect.
LOGS
[travis@hoth ~/src/scribe]$ src/scribed -c zk.conf
[Thu May 6 14:27:51 2010] "setrlimit error (setting max fd size)"
[Thu May 6 14:27:51 2010] "STATUS: STARTING"
[Thu May 6 14:27:51 2010] "STATUS: configuring"
[Thu May 6 14:27:51 2010] "got configuration data from file <zk.conf>"
[Thu May 6 14:27:51 2010] "Creating new ZKClient."
[Thu May 6 14:27:51 2010] "CATEGORY : travis"
[Thu May 6 14:27:51 2010] "configured <1> stores"
[Thu May 6 14:27:51 2010] "STATUS: "
[Thu May 6 14:27:51 2010] "STATUS: ALIVE"
[Thu May 6 14:27:51 2010] "store thread starting"
2010-05-06 14:27:51,523:21643(0x1002c0000):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.0
2010-05-06 14:27:51,523:21643(0x1002c0000):ZOO_INFO@log_env@662: Client environment:
host.name=hoth.local
[Thu May 6 14:27:51 2010] "Starting scribe server on port 3333"
2010-05-06 14:27:51,523:21643(0x1002c0000):ZOO_INFO@log_env@669: Client environment:
os.name=Darwin
2010-05-06 14:27:51,523:21643(0x1002c0000):ZOO_INFO@log_env@670: Client environment:os.arch=10.3.0
2010-05-06 14:27:51,523:21643(0x1002c0000):ZOO_INFO@log_env@671: Client environment:os.version=Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386
2010-05-06 14:27:51,523:21643(0x1002c0000):ZOO_INFO@log_env@679: Client environment:
user.name=travis
Thrift: Thu May 6 14:27:51 2010 libevent 1.4.13-stable method kqueue
2010-05-06 14:27:51,524:21643(0x1002c0000):ZOO_INFO@log_env@687: Client environment:user.home=/Users/travis
2010-05-06 14:27:51,524:21643(0x1002c0000):ZOO_INFO@log_env@699: Client environment:user.dir=/Users/travis/src/scribe
2010-05-06 14:27:51,524:21643(0x1002c0000):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x10004b3b0 sessionId=0 sessionPasswd=<null> context=0x0 flags=0
2010-05-06 14:27:51,526:21643(0x1002c0000):ZOO_DEBUG@start_threads@152: starting threads...
2010-05-06 14:27:51,526:21643(0x101181000):ZOO_DEBUG@do_completion@326: started completion thread
2010-05-06 14:27:51,526:21643(0x10078a000):ZOO_DEBUG@do_io@279: started IO thread
[Thu May 6 14:27:51 2010] "Getting the best remote scribe."
2010-05-06 14:27:51,527:21643(0x1002c0000):ZOO_DEBUG@zoo_awget_children_@2612: Sending request xid=0x4be33458 for path [/scribe/aggregator] to
127.0.0.1:21812010-05-06 14:27:51,527:21643(0x10078a000):ZOO_INFO@check_events@1585: initiated connection to server [
127.0.0.1:2181]
2010-05-06 14:27:51,530:21643(0x10078a000):ZOO_INFO@check_events@1632: session establishment complete on server [
127.0.0.1:2181], sessionId=0x1286ef99f260019, negotiated timeout=10000
2010-05-06 14:27:51,530:21643(0x10078a000):ZOO_DEBUG@check_events@1638: Calling a watcher for a ZOO_SESSION_EVENT and the state=ZOO_CONNECTED_STATE
2010-05-06 14:27:51,530:21643(0x101181000):ZOO_DEBUG@process_completions@1765: Calling a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
2010-05-06 14:27:51,530:21643(0x10078a000):ZOO_DEBUG@zookeeper_process@2026: Calling COMPLETION_STRINGLIST for xid=0x4be33458 rc=0
2010-05-06 14:27:51,531:21643(0x10078a000):ZOO_DEBUG@do_io@317: IO thread terminated
2010-05-06 14:27:51,531:21643(0x101181000):ZOO_DEBUG@do_completion@336: completion thread terminated
2010-05-06 14:27:51,531:21643(0x1002c0000):ZOO_INFO@zookeeper_close@2290: Closing zookeeper sessionId=0x1286ef99f260019 to [
127.0.0.1:2181]
Thrift: Thu May 6 14:27:51 2010 TSocket::open() getaddrinfo() <Host:
zk:///scribe/aggregator Port: 2222>nodename nor servname provided, or not known
[Thu May 6 14:27:51 2010] "failed to open connection to remote scribe server <
zk:///scribe/aggregator:2222> thrift error <Could not resolve host for client socket.>"
[Thu May 6 14:27:51 2010] "[travis] Opened file </tmp/travis_00000> for writing"
[Thu May 6 14:27:51 2010] "[travis] choosing new retry interval <4> seconds"
[Thu May 6 14:27:51 2010] "[travis] Changing state from <DISCONNECTED> to <DISCONNECTED>"
^C
[travis@hoth ~/src/scribe]$