HYPERDEX_CLIENT_OFFLINE

65 views

Skip to first unread message

pulpfree

unread,

Aug 30, 2014, 10:43:50 AM8/30/14

to hyperdex...@googlegroups.com

Hello:

I got this error, when do python client put. But it definitely works before, I forgot due to some reason... maybe I shutdown coordinator first then the daemon. hyperdex show-config shows the server is shutdown, but I am not sure how to bring the server online again? the server refers to coordinator or daemon? But both of them are running when I do the put operation.

Traceback (most recent call last):
File "phonebookAdd.py", line 3, in <module>
    c.put('phonebook', 'JohnSmith1', {'first': 'John', 'last': 'Smith', 'phone': '6075551024'})
File "client.pyx", line 1392, in hyperdex.client.Client.put (bindings/python/hyperdex/client.c:15636)
File "client.pyx", line 1390, in hyperdex.client.Client.async_put (bindings/python/hyperdex/client.c:15519)
File "client.pyx", line 1210, in hyperdex.client.Client.asynccall__spacename_key_attributes__status (bindings/python/hyperdex/client.c:13219)
hyperdex.client.HyperDexClientException: HyperDexClientException: all servers for key "JohnSmith1" in space "phonebook" are offline: bring one or more online to remedy the issue [HYPERDEX_CLIENT_OFFLINE]

cluster 1913914489913495245
version 123
flags 0
server 491779001291425117 127.0.0.1:2012 AVAILABLE
server 1363041888570235046 127.0.0.1:2012 SHUTDOWN
server 1802128086941680806 127.0.0.1:2012 SHUTDOWN
server 2145261057056824332 128.205.39.36:2012 AVAILABLE
server 2490897627649210004 127.0.0.1:2012 SHUTDOWN
server 3076798709681432226 127.0.0.1:2012 NOT_AVAILABLE
server 3319047087161579476 127.0.0.1:2012 SHUTDOWN
server 3615746286359306737 127.0.0.1:2014 SHUTDOWN
server 8134970455907940130 127.0.0.1:2012 SHUTDOWN
server 10160014588334731561 127.0.0.1:2012 SHUTDOWN
server 10544309857607091003 127.0.0.1:2012 SHUTDOWN
server 11279569542217133710 127.0.0.1:2012 SHUTDOWN
server 11643725703686172679 127.0.0.1:2012 SHUTDOWN
server 13384306817260474973 127.0.0.1:2012 SHUTDOWN
server 14115043375538333379 127.0.0.1:2012 SHUTDOWN
server 15395286616284170484 127.0.0.1:2012 SHUTDOWN
server 16196212010585472750 127.0.0.1:2012 NOT_AVAILABLE
server 16692864235313841091 127.0.0.1:2012 SHUTDOWN
server 17506619312234666687 127.0.0.1:2012 SHUTDOWN
space 858 phonebook
fault_tolerance 1
predecessor_width 1
schema
    attribute username HYPERDATATYPE_STRING
    attribute first HYPERDATATYPE_STRING
    attribute last HYPERDATATYPE_STRING
    attribute phone HYPERDATATYPE_STRING
subspace 859
    attributes username
    region 860 lower=(0,) upper=(2305843009213693951,) replicas=[]
    region 861 lower=(2305843009213693952,) upper=(4611686018427387903,) replicas=[]
    region 862 lower=(4611686018427387904,) upper=(6917529027641081855,) replicas=[]
    region 863 lower=(6917529027641081856,) upper=(9223372036854775807,) replicas=[]
    region 864 lower=(9223372036854775808,) upper=(11529215046068469759,) replicas=[]
    region 865 lower=(11529215046068469760,) upper=(13835058055282163711,) replicas=[]
    region 866 lower=(13835058055282163712,) upper=(16140901064495857663,) replicas=[]
    region 867 lower=(16140901064495857664,) upper=(18446744073709551615,) replicas=[]
subspace 868
    attributes first last phone
    region 869 lower=(0,0,0,) upper=(6148914691236517203,9223372036854775807,9223372036854775807,) replicas=[]
    region 870 lower=(0,0,9223372036854775808,) upper=(6148914691236517203,9223372036854775807,18446744073709551615,) replicas=[]
    region 871 lower=(0,9223372036854775808,0,) upper=(6148914691236517203,18446744073709551615,9223372036854775807,) replicas=[]
    region 872 lower=(0,9223372036854775808,9223372036854775808,) upper=(6148914691236517203,18446744073709551615,18446744073709551615,) replicas=[]
    region 873 lower=(6148914691236517204,0,0,) upper=(12297829382473034407,9223372036854775807,9223372036854775807,) replicas=[]
    region 874 lower=(6148914691236517204,0,9223372036854775808,) upper=(12297829382473034407,9223372036854775807,18446744073709551615,) replicas=[]
    region 875 lower=(6148914691236517204,9223372036854775808,0,) upper=(12297829382473034407,18446744073709551615,9223372036854775807,) replicas=[]
    region 876 lower=(6148914691236517204,9223372036854775808,9223372036854775808,) upper=(12297829382473034407,18446744073709551615,18446744073709551615,) replicas=[]
    region 877 lower=(12297829382473034408,0,0,) upper=(18446744073709551615,9223372036854775807,9223372036854775807,) replicas=[]
    region 878 lower=(12297829382473034408,0,9223372036854775808,) upper=(18446744073709551615,9223372036854775807,18446744073709551615,) replicas=[]
    region 879 lower=(12297829382473034408,9223372036854775808,0,) upper=(18446744073709551615,18446744073709551615,9223372036854775807,) replicas=[]
    region 880 lower=(12297829382473034408,9223372036854775808,9223372036854775808,) upper=(18446744073709551615,18446744073709551615,18446744073709551615,) replicas=[]
index 881 attribute 1
index 882 attribute 2
index 883 attribute 3

the log of coordinator
I0830 10:30:43.506476 7799 daemon.cc:307] running in the foreground
I0830 10:30:43.506631 7799 daemon.cc:308] no log will be generated; instead, the log messages will print to the terminal
I0830 10:30:43.506654 7799 daemon.cc:309] provide "--daemon" on the command-line if you want to run in the background
I0830 10:30:43.888217 7799 daemon.cc:455] restoring previous instance: 463607186017582943
I0830 10:30:43.905333 7803 object_manager.cc:1032] spawning worker thread for object 7528171833139684728
I0830 10:30:47.210224 7803 object_manager.cc:1104] hyperdex:alarm @ 51743: establishing checkpoint 1741
I0830 10:30:47.210247 7803 object_manager.cc:1104] hyperdex:alarm @ 51743: checkpoint 1741 done
I0830 10:30:47.210253 7803 object_manager.cc:1104] hyperdex:alarm @ 51743: garbage collect <= checkpoint 1621
I0830 10:30:47.211592 7803 object_manager.cc:1104] hyperdex:alarm @ 51746: establishing checkpoint 1742
I0830 10:30:47.211609 7803 object_manager.cc:1104] hyperdex:alarm @ 51746: checkpoint 1742 done
I0830 10:30:47.211616 7803 object_manager.cc:1104] hyperdex:alarm @ 51746: garbage collect <= checkpoint 1622
I0830 10:30:47.212847 7803 object_manager.cc:1104] hyperdex:alarm @ 51749: establishing checkpoint 1743
I0830 10:30:47.212864 7803 object_manager.cc:1104] hyperdex:alarm @ 51749: checkpoint 1743 done
I0830 10:30:47.212870 7803 object_manager.cc:1104] hyperdex:alarm @ 51749: garbage collect <= checkpoint 1623
I0830 10:30:47.213995 7803 object_manager.cc:1104] hyperdex:alarm @ 51752: establishing checkpoint 1744
I0830 10:30:47.214009 7803 object_manager.cc:1104] hyperdex:alarm @ 51752: checkpoint 1744 done
I0830 10:30:47.214015 7803 object_manager.cc:1104] hyperdex:alarm @ 51752: garbage collect <= checkpoint 1624
I0830 10:30:47.215157 7803 object_manager.cc:1104] hyperdex:alarm @ 51755: establishing checkpoint 1745
I0830 10:30:47.215169 7803 object_manager.cc:1104] hyperdex:alarm @ 51755: checkpoint 1745 done
I0830 10:30:47.215174 7803 object_manager.cc:1104] hyperdex:alarm @ 51755: garbage collect <= checkpoint 1625
I0830 10:30:47.216267 7803 object_manager.cc:1104] hyperdex:alarm @ 51758: establishing checkpoint 1746
I0830 10:30:47.216281 7803 object_manager.cc:1104] hyperdex:alarm @ 51758: checkpoint 1746 done
I0830 10:30:47.216286 7803 object_manager.cc:1104] hyperdex:alarm @ 51758: garbage collect <= checkpoint 1626
I0830 10:30:47.227761 7799 daemon.cc:625] resuming normal operation
I0830 10:30:47.227788 7799 daemon.cc:1284] deploying configuration configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:30:47.227967 7799 daemon.cc:1341] the latest stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:30:47.227993 7799 daemon.cc:1342] the latest proposed configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
W0830 10:30:47.228008 7799 daemon.cc:1348] the most recently deployed configuration can tolerate at most 0 failures which is less than the 2 failures the cluster is expected to tolerate; bring 4 more servers online to restore 2-fault tolerance
I0830 10:30:47.228020 7799 daemon.cc:2078] we are chain_node(bind_to=127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=51761 | acked <=51761
I0830 10:30:47.228031 7799 daemon.cc:2081] our stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:30:47.228044 7799 daemon.cc:2082] the suffix of the chain stabilized through 0
I0830 10:30:47.228082 7799 daemon.cc:2346] command tail stabilizes at configuration 1
I0830 10:30:49.745877 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51761: changing server(491779001291425117) from AVAILABLE to NOT_AVAILABLE because we suspect it failed
I0830 10:30:49.745916 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51761: issuing new configuration version 120
I0830 10:30:49.745939 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51761: acked through version 120
I0830 10:30:49.745959 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51761: stable through version 120
I0830 10:30:49.746017 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51762: changing server(2145261057056824332) from AVAILABLE to NOT_AVAILABLE because we suspect it failed
I0830 10:30:49.746044 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51762: issuing new configuration version 121
I0830 10:30:49.746063 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51762: acked through version 121
I0830 10:30:49.746083 7803 object_manager.cc:1104] hyperdex:server_suspect @ 51762: stable through version 121
I0830 10:31:00.000340 7799 daemon.cc:2078] we are chain_node(bind_to=127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=51763 | acked <=51763
I0830 10:31:00.000391 7799 daemon.cc:2081] our stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:31:00.000421 7799 daemon.cc:2082] the suffix of the chain stabilized through 1
I0830 10:31:17.252115 7803 object_manager.cc:1104] hyperdex:alarm @ 51763: establishing checkpoint 1747
I0830 10:31:17.252164 7803 object_manager.cc:1104] hyperdex:alarm @ 51763: checkpoint 1747 done
I0830 10:31:17.252187 7803 object_manager.cc:1104] hyperdex:alarm @ 51763: garbage collect <= checkpoint 1627
I0830 10:31:18.000455 7799 daemon.cc:2560] session for 13151792766968793896 timed out
I0830 10:31:18.000515 7799 daemon.cc:2560] session for 15674829714483218388 timed out
I0830 10:31:18.000612 7799 daemon.cc:2039] disconnecting client 13151792766968793896
I0830 10:31:18.000689 7799 daemon.cc:2039] disconnecting client 15674829714483218388
I0830 10:31:47.501719 7803 object_manager.cc:1104] hyperdex:alarm @ 51766: establishing checkpoint 1748
I0830 10:31:47.501765 7803 object_manager.cc:1104] hyperdex:alarm @ 51766: checkpoint 1748 done
I0830 10:31:47.501787 7803 object_manager.cc:1104] hyperdex:alarm @ 51766: garbage collect <= checkpoint 1628
W0830 10:31:50.052884 7799 daemon.cc:1577] rejecting "CONDITION_WAIT" that came from dead client 15674829714483218388
W0830 10:31:50.052975 7799 daemon.cc:1577] rejecting "CONDITION_WAIT" that came from dead client 15674829714483218388
W0830 10:31:50.053038 7799 daemon.cc:1577] rejecting "CONDITION_WAIT" that came from dead client 15674829714483218388
W0830 10:31:50.053091 7799 daemon.cc:1577] rejecting "CONDITION_WAIT" that came from dead client 15674829714483218388
I0830 10:32:00.000462 7799 daemon.cc:2078] we are chain_node(bind_to=127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=51767 | acked <=51767
I0830 10:32:00.000602 7799 daemon.cc:2081] our stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:32:00.000633 7799 daemon.cc:2082] the suffix of the chain stabilized through 1
I0830 10:32:05.887670 7799 daemon.cc:746] providing configuration to 1/127.0.0.1:43655 as part of the bootstrap process
I0830 10:32:05.888164 7799 daemon.cc:2031] registering client 4408941022089249666
I0830 10:32:05.890859 7803 object_manager.cc:1104] hyperdex:server_online @ 51770: changing server(491779001291425117) from NOT_AVAILABLE to AVAILABLE
I0830 10:32:05.890900 7803 object_manager.cc:1104] hyperdex:server_online @ 51770: issuing new configuration version 122
I0830 10:32:05.890921 7803 object_manager.cc:1104] hyperdex:server_online @ 51770: acked through version 122
I0830 10:32:05.890941 7803 object_manager.cc:1104] hyperdex:server_online @ 51770: stable through version 122
I0830 10:32:17.751667 7803 object_manager.cc:1104] hyperdex:alarm @ 51776: establishing checkpoint 1749
I0830 10:32:17.751709 7803 object_manager.cc:1104] hyperdex:alarm @ 51776: checkpoint 1749 done
I0830 10:32:17.751730 7803 object_manager.cc:1104] hyperdex:alarm @ 51776: garbage collect <= checkpoint 1629
I0830 10:32:48.002405 7803 object_manager.cc:1104] hyperdex:alarm @ 51778: establishing checkpoint 1750
I0830 10:32:48.002449 7803 object_manager.cc:1104] hyperdex:alarm @ 51778: checkpoint 1750 done
I0830 10:32:48.002470 7803 object_manager.cc:1104] hyperdex:alarm @ 51778: garbage collect <= checkpoint 1630
I0830 10:33:00.000219 7799 daemon.cc:2078] we are chain_node(bind_to=127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=51780 | acked <=51780
I0830 10:33:00.000339 7799 daemon.cc:2081] our stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:33:00.000372 7799 daemon.cc:2082] the suffix of the chain stabilized through 1
I0830 10:33:12.730511 7799 daemon.cc:746] providing configuration to 2/127.0.0.1:43657 as part of the bootstrap process
I0830 10:33:12.730998 7799 daemon.cc:2031] registering client 13549665462073974740
I0830 10:33:18.252177 7803 object_manager.cc:1104] hyperdex:alarm @ 51782: establishing checkpoint 1751
I0830 10:33:18.252202 7803 object_manager.cc:1104] hyperdex:alarm @ 51782: checkpoint 1751 done
I0830 10:33:18.252213 7803 object_manager.cc:1104] hyperdex:alarm @ 51782: garbage collect <= checkpoint 1631
I0830 10:33:43.000922 7799 daemon.cc:2560] session for 13549665462073974740 timed out
I0830 10:33:43.001055 7799 daemon.cc:2039] disconnecting client 13549665462073974740
I0830 10:33:48.502138 7803 object_manager.cc:1104] hyperdex:alarm @ 51785: establishing checkpoint 1752
I0830 10:33:48.502181 7803 object_manager.cc:1104] hyperdex:alarm @ 51785: checkpoint 1752 done
I0830 10:33:48.502202 7803 object_manager.cc:1104] hyperdex:alarm @ 51785: garbage collect <= checkpoint 1632
I0830 10:34:00.000462 7799 daemon.cc:2078] we are chain_node(bind_to=127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=51787 | acked <=51787
I0830 10:34:00.000519 7799 daemon.cc:2081] our stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:34:00.000547 7799 daemon.cc:2082] the suffix of the chain stabilized through 1
I0830 10:34:15.356878 7799 daemon.cc:746] providing configuration to 3/127.0.0.1:43663 as part of the bootstrap process
I0830 10:34:15.357327 7799 daemon.cc:2031] registering client 14686801427653678224
I0830 10:34:18.751518 7803 object_manager.cc:1104] hyperdex:alarm @ 51788: establishing checkpoint 1753
I0830 10:34:18.751566 7803 object_manager.cc:1104] hyperdex:alarm @ 51788: checkpoint 1753 done
I0830 10:34:18.751588 7803 object_manager.cc:1104] hyperdex:alarm @ 51788: garbage collect <= checkpoint 1633
I0830 10:34:29.594558 7799 daemon.cc:746] providing configuration to 4/127.0.0.1:43666 as part of the bootstrap process
I0830 10:34:29.594988 7799 daemon.cc:2031] registering client 5949567176976530518
I0830 10:34:46.001003 7799 daemon.cc:2560] session for 14686801427653678224 timed out
I0830 10:34:46.001119 7799 daemon.cc:2039] disconnecting client 14686801427653678224
I0830 10:34:49.002576 7803 object_manager.cc:1104] hyperdex:alarm @ 51792: establishing checkpoint 1754
I0830 10:34:49.002617 7803 object_manager.cc:1104] hyperdex:alarm @ 51792: checkpoint 1754 done
I0830 10:34:49.002640 7803 object_manager.cc:1104] hyperdex:alarm @ 51792: garbage collect <= checkpoint 1634
I0830 10:35:00.001041 7799 daemon.cc:2560] session for 5949567176976530518 timed out
I0830 10:35:00.001087 7799 daemon.cc:2078] we are chain_node(bind_to=127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=51794 | acked <=51794
I0830 10:35:00.001116 7799 daemon.cc:2081] our stable configuration is configuration(cluster=8408733372378132265, prev_token=0, this_token=15769795986360367231, version=1, command=[463607186017582943], config=[463607186017582943], members=[chain_node(bind_to=127.0.0.1:1982, token=463607186017582943)])
I0830 10:35:00.001142 7799 daemon.cc:2082] the suffix of the chain stabilized through 1
I0830 10:35:00.001252 7799 daemon.cc:2039] disconnecting client 5949567176976530518
I0830 10:35:06.194485 7799 daemon.cc:746] providing configuration to 5/127.0.0.1:43668 as part of the bootstrap process
I0830 10:35:06.194885 7799 daemon.cc:2031] registering client 674487290940983852
I0830 10:35:06.196992 7803 object_manager.cc:1104] hyperdex:server_online @ 51800: changing server(2145261057056824332) from NOT_AVAILABLE to AVAILABLE
I0830 10:35:06.197029 7803 object_manager.cc:1104] hyperdex:server_online @ 51800: cannot transfer state to new servers in region(867) because all servers are offline
I0830 10:35:06.197051 7803 object_manager.cc:1104] hyperdex:server_online @ 51800: cannot transfer state to new servers in region(880) because all servers are offline
I0830 10:35:06.197070 7803 object_manager.cc:1104] hyperdex:server_online @ 51800: issuing new configuration version 123
I0830 10:35:06.197090 7803 object_manager.cc:1104] hyperdex:server_online @ 51800: acked through version 123
I0830 10:35:06.197110 7803 object_manager.cc:1104] hyperdex:server_online @ 51800: stable through version 123

the log of daemon
I0830 10:32:05.708066 11301 daemon.cc:298] running in the foreground
I0830 10:32:05.708223 11301 daemon.cc:299] no log will be generated; instead, the log messages will print to the terminal
I0830 10:32:05.708248 11301 daemon.cc:300] provide "--daemon" on the command-line if you want to run in the background
I0830 10:32:05.708268 11301 daemon.cc:307] initializing local storage
I0830 10:32:05.845888 11301 daemon.cc:338] changing coordinator address from 127.0.0.1:9881 to 127.0.0.1:1982
I0830 10:32:05.846019 11306 background_thread.cc:154] indexing thread started
I0830 10:32:05.846026 11307 background_thread.cc:154] wiping thread started
I0830 10:32:05.846045 11305 background_thread.cc:154] checkpointer thread started
I0830 10:32:05.888679 11301 daemon.cc:387] starting checkpoint process with checkpoint=1748 checkpoint_stable=1748 checkpoint_gc=1628
W0830 10:32:05.888985 11301 daemon.cc:1612] cannot determine device name for reporting io.* stats
W0830 10:32:05.889014 11301 daemon.cc:1613] iostat-like statistics will not be reported
I0830 10:32:05.889590 11308 replication_manager.cc:951] replication background thread started
I0830 10:32:05.890130 11312 daemon.cc:612] network thread 2 started on core 2
I0830 10:32:05.890228 11311 daemon.cc:612] network thread 1 started on core 1
I0830 10:32:05.890467 11309 background_thread.cc:154] state transfer thread started
I0830 10:32:05.890522 11310 daemon.cc:612] network thread 0 started on core 0
I0830 10:32:05.890609 11313 daemon.cc:612] network thread 3 started on core 3
I0830 10:32:05.890862 11301 daemon.cc:504] moving to configuration version=121; pausing all activity while we reconfigure
I0830 10:32:05.891494 11301 daemon.cc:514] reconfiguration complete; resuming normal operation
I0830 10:32:05.891983 11301 daemon.cc:504] moving to configuration version=122; pausing all activity while we reconfigure
I0830 10:32:05.892110 11301 daemon.cc:514] reconfiguration complete; resuming normal operation

pulpfree

unread,

Sep 1, 2014, 4:28:53 PM9/1/14

to hyperdex...@googlegroups.com

well, it is working now.

Robert Escriva

unread,

Sep 2, 2014, 11:54:12 AM9/2/14

to hyperdex...@googlegroups.com

What's going on here, is that a server that's been shutdown (turned off)
can no longer serve the data it holds. Looking at your config, the
majority of your servers are turned off (as if you shut them down), so
the error message is telling you that all servers which hold
"JohnSmith1" have been turned off and cannot process your request. You
should turn them back on to serve the data.

> server 491779001291425117 [1]127.0.0.1:2012 AVAILABLE
> server 1363041888570235046 [2]127.0.0.1:2012 SHUTDOWN
> server 1802128086941680806 [3]127.0.0.1:2012 SHUTDOWN
> server 2145261057056824332 [4]128.205.39.36:2012 AVAILABLE
> server 2490897627649210004 [5]127.0.0.1:2012 SHUTDOWN
> server 3076798709681432226 [6]127.0.0.1:2012 NOT_AVAILABLE
> server 3319047087161579476 [7]127.0.0.1:2012 SHUTDOWN
> server 3615746286359306737 [8]127.0.0.1:2014 SHUTDOWN
> server 8134970455907940130 [9]127.0.0.1:2012 SHUTDOWN
> server 10160014588334731561 [10]127.0.0.1:2012 SHUTDOWN
> server 10544309857607091003 [11]127.0.0.1:2012 SHUTDOWN
> server 11279569542217133710 [12]127.0.0.1:2012 SHUTDOWN
> server 11643725703686172679 [13]127.0.0.1:2012 SHUTDOWN
> server 13384306817260474973 [14]127.0.0.1:2012 SHUTDOWN
> server 14115043375538333379 [15]127.0.0.1:2012 SHUTDOWN
> server 15395286616284170484 [16]127.0.0.1:2012 SHUTDOWN
> server 16196212010585472750 [17]127.0.0.1:2012 NOT_AVAILABLE
> server 16692864235313841091 [18]127.0.0.1:2012 SHUTDOWN
> server 17506619312234666687 [19]127.0.0.1:2012 SHUTDOWN

> [463607186017582943], members=[chain_node(bind_to=[20]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:30:47.227967 7799 daemon.cc:1341] the latest stable configuration
> is configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[21]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:30:47.227993 7799 daemon.cc:1342] the latest proposed
> configuration is configuration(cluster=8408733372378132265, prev_token=0,
> this_token=15769795986360367231, version=1, command=[463607186017582943],

> config=[463607186017582943], members=[chain_node(bind_to=[22]127.0.0.1:1982

> , token=463607186017582943)])
> W0830 10:30:47.228008 7799 daemon.cc:1348] the most recently deployed
> configuration can tolerate at most 0 failures which is less than the 2
> failures the cluster is expected to tolerate; bring 4 more servers online
> to restore 2-fault tolerance

> I0830 10:30:47.228020 7799 daemon.cc:2078] we are chain_node(bind_to=[23]

> 127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=
> 51761 | acked <=51761
> I0830 10:30:47.228031 7799 daemon.cc:2081] our stable configuration is
> configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[24]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:30:47.228044 7799 daemon.cc:2082] the suffix of the chain
> stabilized through 0
> I0830 10:30:47.228082 7799 daemon.cc:2346] command tail stabilizes at
> configuration 1
> I0830 10:30:49.745877 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51761: changing server(491779001291425117) from AVAILABLE to
> NOT_AVAILABLE because we suspect it failed
> I0830 10:30:49.745916 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51761: issuing new configuration version 120
> I0830 10:30:49.745939 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51761: acked through version 120
> I0830 10:30:49.745959 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51761: stable through version 120
> I0830 10:30:49.746017 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51762: changing server(2145261057056824332) from AVAILABLE to
> NOT_AVAILABLE because we suspect it failed
> I0830 10:30:49.746044 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51762: issuing new configuration version 121
> I0830 10:30:49.746063 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51762: acked through version 121
> I0830 10:30:49.746083 7803 object_manager.cc:1104] hyperdex:server_suspect
> @ 51762: stable through version 121

> I0830 10:31:00.000340 7799 daemon.cc:2078] we are chain_node(bind_to=[25]

> 127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=

> 51763 | acked <=51763
> I0830 10:31:00.000391 7799 daemon.cc:2081] our stable configuration is
> configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[26]127.0.0.1:1982, token

> I0830 10:32:00.000462 7799 daemon.cc:2078] we are chain_node(bind_to=[27]

> 127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=

> 51767 | acked <=51767
> I0830 10:32:00.000602 7799 daemon.cc:2081] our stable configuration is
> configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[28]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:32:00.000633 7799 daemon.cc:2082] the suffix of the chain
> stabilized through 1
> I0830 10:32:05.887670 7799 daemon.cc:746] providing configuration to 1/

> [29]127.0.0.1:43655 as part of the bootstrap process

> I0830 10:32:05.888164 7799 daemon.cc:2031] registering client
> 4408941022089249666
> I0830 10:32:05.890859 7803 object_manager.cc:1104] hyperdex:server_online
> @ 51770: changing server(491779001291425117) from NOT_AVAILABLE to
> AVAILABLE
> I0830 10:32:05.890900 7803 object_manager.cc:1104] hyperdex:server_online
> @ 51770: issuing new configuration version 122
> I0830 10:32:05.890921 7803 object_manager.cc:1104] hyperdex:server_online
> @ 51770: acked through version 122
> I0830 10:32:05.890941 7803 object_manager.cc:1104] hyperdex:server_online
> @ 51770: stable through version 122
> I0830 10:32:17.751667 7803 object_manager.cc:1104] hyperdex:alarm @ 51776:
> establishing checkpoint 1749
> I0830 10:32:17.751709 7803 object_manager.cc:1104] hyperdex:alarm @ 51776:
> checkpoint 1749 done
> I0830 10:32:17.751730 7803 object_manager.cc:1104] hyperdex:alarm @ 51776:
> garbage collect <= checkpoint 1629
> I0830 10:32:48.002405 7803 object_manager.cc:1104] hyperdex:alarm @ 51778:
> establishing checkpoint 1750
> I0830 10:32:48.002449 7803 object_manager.cc:1104] hyperdex:alarm @ 51778:
> checkpoint 1750 done
> I0830 10:32:48.002470 7803 object_manager.cc:1104] hyperdex:alarm @ 51778:
> garbage collect <= checkpoint 1630

> I0830 10:33:00.000219 7799 daemon.cc:2078] we are chain_node(bind_to=[30]

> 127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=

> 51780 | acked <=51780
> I0830 10:33:00.000339 7799 daemon.cc:2081] our stable configuration is
> configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[31]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:33:00.000372 7799 daemon.cc:2082] the suffix of the chain
> stabilized through 1
> I0830 10:33:12.730511 7799 daemon.cc:746] providing configuration to 2/

> [32]127.0.0.1:43657 as part of the bootstrap process

> I0830 10:33:12.730998 7799 daemon.cc:2031] registering client
> 13549665462073974740
> I0830 10:33:18.252177 7803 object_manager.cc:1104] hyperdex:alarm @ 51782:
> establishing checkpoint 1751
> I0830 10:33:18.252202 7803 object_manager.cc:1104] hyperdex:alarm @ 51782:
> checkpoint 1751 done
> I0830 10:33:18.252213 7803 object_manager.cc:1104] hyperdex:alarm @ 51782:
> garbage collect <= checkpoint 1631
> I0830 10:33:43.000922 7799 daemon.cc:2560] session for
> 13549665462073974740 timed out
> I0830 10:33:43.001055 7799 daemon.cc:2039] disconnecting client
> 13549665462073974740
> I0830 10:33:48.502138 7803 object_manager.cc:1104] hyperdex:alarm @ 51785:
> establishing checkpoint 1752
> I0830 10:33:48.502181 7803 object_manager.cc:1104] hyperdex:alarm @ 51785:
> checkpoint 1752 done
> I0830 10:33:48.502202 7803 object_manager.cc:1104] hyperdex:alarm @ 51785:
> garbage collect <= checkpoint 1632

> I0830 10:34:00.000462 7799 daemon.cc:2078] we are chain_node(bind_to=[33]

> 127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=

> 51787 | acked <=51787
> I0830 10:34:00.000519 7799 daemon.cc:2081] our stable configuration is
> configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[34]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:34:00.000547 7799 daemon.cc:2082] the suffix of the chain
> stabilized through 1
> I0830 10:34:15.356878 7799 daemon.cc:746] providing configuration to 3/

> [35]127.0.0.1:43663 as part of the bootstrap process

> I0830 10:34:15.357327 7799 daemon.cc:2031] registering client
> 14686801427653678224
> I0830 10:34:18.751518 7803 object_manager.cc:1104] hyperdex:alarm @ 51788:
> establishing checkpoint 1753
> I0830 10:34:18.751566 7803 object_manager.cc:1104] hyperdex:alarm @ 51788:
> checkpoint 1753 done
> I0830 10:34:18.751588 7803 object_manager.cc:1104] hyperdex:alarm @ 51788:
> garbage collect <= checkpoint 1633
> I0830 10:34:29.594558 7799 daemon.cc:746] providing configuration to 4/

> [36]127.0.0.1:43666 as part of the bootstrap process

> I0830 10:34:29.594988 7799 daemon.cc:2031] registering client
> 5949567176976530518
> I0830 10:34:46.001003 7799 daemon.cc:2560] session for
> 14686801427653678224 timed out
> I0830 10:34:46.001119 7799 daemon.cc:2039] disconnecting client
> 14686801427653678224
> I0830 10:34:49.002576 7803 object_manager.cc:1104] hyperdex:alarm @ 51792:
> establishing checkpoint 1754
> I0830 10:34:49.002617 7803 object_manager.cc:1104] hyperdex:alarm @ 51792:
> checkpoint 1754 done
> I0830 10:34:49.002640 7803 object_manager.cc:1104] hyperdex:alarm @ 51792:
> garbage collect <= checkpoint 1634
> I0830 10:35:00.001041 7799 daemon.cc:2560] session for 5949567176976530518
> timed out

> I0830 10:35:00.001087 7799 daemon.cc:2078] we are chain_node(bind_to=[37]

> 127.0.0.1:1982, token=463607186017582943) and here's some info: issued <=

> 51794 | acked <=51794
> I0830 10:35:00.001116 7799 daemon.cc:2081] our stable configuration is
> configuration(cluster=8408733372378132265, prev_token=0, this_token=

> 15769795986360367231, version=1, command=[463607186017582943], config=

> [463607186017582943], members=[chain_node(bind_to=[38]127.0.0.1:1982, token

> =463607186017582943)])
> I0830 10:35:00.001142 7799 daemon.cc:2082] the suffix of the chain
> stabilized through 1
> I0830 10:35:00.001252 7799 daemon.cc:2039] disconnecting client
> 5949567176976530518
> I0830 10:35:06.194485 7799 daemon.cc:746] providing configuration to 5/

> [39]127.0.0.1:43668 as part of the bootstrap process

> from [40]127.0.0.1:9881 to [41]127.0.0.1:1982

peiyuan...@gmail.com

unread,

Oct 10, 2016, 4:42:11 AM10/10/16

to hyperdex-discuss

Hi,Robert:

I have the same problem when I do put,but I find the reason is thay I can't start a daemon,the wrong message in daemon log is below:

================================================================================

Exiting because the coordinator says it doesn't know about this node.

Check the coordinator logs for details, but it's most likely the case that

this server was killed, or this server tried reconnecting to a different

coordinator. You may just have to restart the daemon with a different

coordinator address or this node may be dead and you can simply erase it.

================================================================================

While the operation of space is right.Can you give me a solution?

在 2014年9月2日星期二 UTC+8下午11:54:12，Robert Escriva写道：

> to [42]hyperdex-discuss+unsub...@googlegroups.com.

> [42] mailto:hyperdex-discuss+unsub...@googlegroups.com
> [43] https://groups.google.com/d/optout

Robert Escriva

unread,

Oct 10, 2016, 8:41:43 AM10/10/16

to hyperdex...@googlegroups.com

What do the coordinator logs say?

The error you encountered says one of two things happened, and the
coordinator log will give you insight into which one it is:

1) You've connected it to a different coordinator than it was
originally connected to. This would be a totally new invocation of
the coordinator command, and not just a different member of the
coordinator ensemble. This can easily happen if you launch multiple
coordinator processes and forget the --connect option for them.

The fix here is to reconnect to the right cluster.

2) You've used the hyperdex-kill command or some other process caused
the daemon to be forgotten by the coordinator.

The fix here is to erase the node.

-Robert

> > to [42]hyperdex-discu...@googlegroups.com.

> > [42] mailto:hyperdex-discu...@googlegroups.com
> > [43] https://groups.google.com/d/optout

>
> --
> You received this message because you are subscribed to the Google Groups
> "hyperdex-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email

> to hyperdex-discu...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages