configuring exhibitor for a cluster of 3 zookepeers

2,583 views
Skip to first unread message

Antonio Terreno

unread,
Jan 7, 2014, 11:23:50 AM1/7/14
to exhibit...@googlegroups.com
Hi all, 

I am finding hard to configure exhibitor on 3 boxes. 

I am trying to start exhibitor on the 3 machines with this command (shared file based config):  

java -jar /opt/molsfw/exhibitor/exhibitor-1.5.1.jar --headingtext MOL --nodemodification true --filesystembackup true -c file --fsconfigdir /opt/molsfw/zooman --defaultconfig /opt/molsfw/zooman/exhibitor.properties

They all start fine, the /opt/molsfw/zooman/ is a shared folder between the 3 boxes, it's on a NAS. 

When I try to get ZK running however, I run into the typical error that it throws when the myid file is missing. 

I've seen a previous post on the subject, but it still not clear to me how the ensemble would start if the file is not there at all, and how the different zk instances can see each others if they are not configured in a cluster. 

It seems like a bad race condition and I can't see which initial configuration that would make the ensemble work.

Thanks for your help. 
Antonio

Other info: 
-bash-4.1$ uname -a
SunOS mol-hsk-zookeeper7.local 5.11 joyent_20130226T234312Z i86pc i386 i86pc Solaris

-bash-4.1$ /opt/molsfw/java/jdk7u17/bin/java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) Server VM (build 23.7-b01, mixed mode)

Stack Trace on the ZK:
2014-01-07 15:50:57,686 [myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
2014-01-07 15:50:57,692 [myid:] - WARN  [main:QuorumPeerConfig@290] - Non-optimial configuration, consider an odd number of servers.
2014-01-07 15:50:57,693 [myid:] - INFO  [main:QuorumPeerConfig@334] - Defaulting to majority quorums
2014-01-07 15:50:57,694 [myid:] - ERROR [main:QuorumPeerMain@85] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
        at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:121)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.lang.IllegalArgumentException: /opt/moldata/zookeeper/snapshot/myid file is missing
        at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:344)
        at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:117)
        ... 2 more

Config of (any) of the ZK:
-bash-4.1$ cat /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
#Auto-generated by Exhibitor - Tue Jan 07 15:50:57 UTC 2014
#Tue Jan 07 15:50:57 UTC 2014
server.3=10.250.76.189\:2888\:3888
server.2=10.250.76.188\:2888\:3888
server.1=10.250.76.187\:2888\:3888
initLimit=10
syncLimit=5
clientPort=2181
tickTime=2000
dataDir=/opt/moldata/zookeeper/snapshot
dataLogDir=/opt/moldata/zookeeper/log
server.4=10.250.76.190\:2888\:3888

Config on the shared folder:
-bash-4.1$ cat /opt/molsfw/zooman/exhibitor.properties
#Auto-generated by Exhibitor
#Tue Jan 07 15:20:00 UTC 2014
com.netflix.exhibitor-rolling-hostnames=
com.netflix.exhibitor-rolling.zookeeper-data-directory=/opt/moldata/zookeeper/snapshot
com.netflix.exhibitor-rolling.servers-spec=1\:10.250.76.187,2\:10.250.76.188,3\:10.250.76.189,4\:10.250.76.190
com.netflix.exhibitor.java-environment=/opt/molsfw/zookeeper/zookeeper-3.4.5/conf/java.env
com.netflix.exhibitor.zookeeper-data-directory=/opt/moldata/zookeeper/snapshot
com.netflix.exhibitor-rolling-hostnames-index=0
com.netflix.exhibitor-rolling.java-environment=/opt/molsfw/zookeeper/zookeeper-3.4.5/conf/java.env
com.netflix.exhibitor-rolling.observer-threshold=999
com.netflix.exhibitor.servers-spec=1\:10.250.76.187,2\:10.250.76.188,3\:10.250.76.189,4\:10.250.76.190
com.netflix.exhibitor.cleanup-period-ms=43200000
com.netflix.exhibitor.auto-manage-instances-fixed-ensemble-size=4
com.netflix.exhibitor.zookeeper-install-directory=/opt/molsfw/zookeeper/*
com.netflix.exhibitor.check-ms=30000
com.netflix.exhibitor.zookeeper-log-directory=/opt/moldata/zookeeper/log
com.netflix.exhibitor-rolling.auto-manage-instances=0
com.netflix.exhibitor-rolling.cleanup-period-ms=43200000
com.netflix.exhibitor-rolling.auto-manage-instances-settling-period-ms=180000
com.netflix.exhibitor-rolling.check-ms=30000
com.netflix.exhibitor.log-index-directory=/opt/moldata/zookeeper/log-index
com.netflix.exhibitor-rolling.log-index-directory=/opt/moldata/zookeeper/log-index
com.netflix.exhibitor.backup-period-ms=60000
com.netflix.exhibitor-rolling.connect-port=2888
com.netflix.exhibitor-rolling.election-port=3888
com.netflix.exhibitor-rolling.backup-extra=directory\=%2Fopt%2Fmoldata%2Fzookeeper%2Fbackup-extra
com.netflix.exhibitor.client-port=2181
com.netflix.exhibitor-rolling.zoo-cfg-extra=syncLimit\=5&tickTime\=2000&initLimit\=10
com.netflix.exhibitor-rolling.zookeeper-install-directory=/opt/molsfw/zookeeper/*
com.netflix.exhibitor.cleanup-max-files=3
com.netflix.exhibitor-rolling.auto-manage-instances-fixed-ensemble-size=4
com.netflix.exhibitor-rolling.backup-period-ms=60000
com.netflix.exhibitor-rolling.client-port=2181
com.netflix.exhibitor.backup-max-store-ms=86400000
com.netflix.exhibitor-rolling.cleanup-max-files=3
com.netflix.exhibitor-rolling.backup-max-store-ms=86400000
com.netflix.exhibitor.connect-port=2888
com.netflix.exhibitor.backup-extra=directory\=%2Fopt%2Fmoldata%2Fzookeeper%2Fbackup-extra
com.netflix.exhibitor.observer-threshold=999
com.netflix.exhibitor.log4j-properties=
com.netflix.exhibitor.auto-manage-instances-apply-all-at-once=1
com.netflix.exhibitor.election-port=3888
com.netflix.exhibitor-rolling.auto-manage-instances-apply-all-at-once=1
com.netflix.exhibitor.zoo-cfg-extra=syncLimit\=5&tickTime\=2000&initLimit\=10
com.netflix.exhibitor-rolling.zookeeper-log-directory=/opt/moldata/zookeeper/log
com.netflix.exhibitor.auto-manage-instances-settling-period-ms=180000
com.netflix.exhibitor-rolling.log4j-properties=
com.netflix.exhibitor.auto-manage-instances=0



Matthew Hooker

unread,
Jan 7, 2014, 11:52:48 AM1/7/14
to exhibit...@googlegroups.com
Hi Antonio,

when you say "When I try to get ZK running “, how are you doing that?

-- 
Matthew Hooker

--
You received this message because you are subscribed to the Google Groups "exhibitor-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to exhibitor-use...@googlegroups.com.
To post to this group, send email to exhibit...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/exhibitor-users/7baeb36b-02fe-4a13-9e48-0c3b8501ae56%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jordan Zimmerman

unread,
Jan 7, 2014, 12:26:47 PM1/7/14
to exhibit...@googlegroups.com, exhibit...@googlegroups.com
It depends what's in the default config. Check the config tab on the UI. If the first few fields aren't set Exhibitor stays in "latent" mode. 

====================
Jordan Zimmerman

Antonio Terreno

unread,
Jan 8, 2014, 6:14:58 AM1/8/14
to exhibit...@googlegroups.com
Hi, 
I click 'restart' on the exhibitor UI, that, as fair as I understand, tries to start zkServer with the same user that it's running exhibitor with the standard zkServer.sh start command. 

Antonio Terreno

unread,
Jan 8, 2014, 6:19:41 AM1/8/14
to exhibit...@googlegroups.com
Hi, no it's not in latent mode, since all the necessary fields are populated. 
As it's on the exhibitor file I've pasted on my first post I've set them in /opt/molsfw/zooman/exhibitor.properties which is passed with the --defaultconfig flag. 

com.netflix.exhibitor.zookeeper-install-directory=/opt/molsfw/zookeeper/*
com.netflix.exhibitor.zookeeper-data-directory=/opt/moldata/zookeeper/snapshot
com.netflix.exhibitor.log-index-directory=/opt/moldata/zookeeper/log-index

all those folders are present and owned by the same user that runs the exhibitor process. 

Jordan Zimmerman

unread,
Jan 8, 2014, 8:16:16 AM1/8/14
to exhibit...@googlegroups.com, Antonio Terreno
Please send the Exhibitor logs. Also the ZK logs would be helpful.

-Jordan


From: Antonio Terreno Antonio Terreno
Reply: exhibit...@googlegroups.com exhibit...@googlegroups.com
Date: January 8, 2014 at 6:19:42 AM
To: exhibit...@googlegroups.com exhibit...@googlegroups.com
Subject:  Re: configuring exhibitor for a cluster of 3 zookepeers

Antonio Terreno

unread,
Jan 8, 2014, 10:14:43 AM1/8/14
to exhibit...@googlegroups.com, Antonio Terreno
Interesting, 
I did start the whole cluster without exhibitor in order to keep going with the setup and now I did spawn successfully an exhibitor instance on all the machines. 

I now wonder, is this the required procedure to have a zk&exhibitor cluster up and running? 

- setup zk as exhibitor won't exist
- start all the zk instances
- start all the exhibitor instances 


The one and only other change is that I am starting exhibitor with a simple nohup instead of using an SVC, but I doubt that would be the root cause. 

I will now look more into this and try to restart the ZK instances from the web ui to see if that causes loss of the myids  or if it doesn't bring back up the instances.. 

Thanks 

Antonio Terreno

unread,
Jan 8, 2014, 10:48:56 AM1/8/14
to exhibit...@googlegroups.com, Antonio Terreno
I think I've nailed the problem. 

- Have ZK running fine, all green on the panel
- Click on restart on one of the node (box .188)  
- exhibitor manages to kill the instance successfully, the myid gets deleted but then no process gets started. 

Attaching logs: 

-bash-4.1$ jps
70836 Jps
70826 QuorumPeerMain
70129 exhibitor-1.5.1.jar

= ZK up and running fine

==> zookeeper.out <==
2014-01-08 15:42:00,388 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:37414
2014-01-08 15:42:00,388 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing ruok command from /127.0.0.1:37414
2014-01-08 15:42:00,388 [myid:2] - INFO  [Thread-155:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:37414 (no session established for client)
2014-01-08 15:42:01,296 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.100.6.24:54780
2014-01-08 15:42:01,296 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing stat command from /10.100.6.24:54780
2014-01-08 15:42:01,296 [myid:2] - INFO  [Thread-156:NIOServerCnxn$StatCommand@655] - Stat command output
2014-01-08 15:42:01,297 [myid:2] - INFO  [Thread-156:NIOServerCnxn@1001] - Closed socket connection for client /10.100.6.24:54780 (no session established for client)
2014-01-08 15:42:03,390 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:58689
2014-01-08 15:42:03,391 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing ruok command from /127.0.0.1:58689
2014-01-08 15:42:03,391 [myid:2] - INFO  [Thread-157:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:58689 (no session established for client)
2014-01-08 15:42:04,332 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.100.6.24:54784
2014-01-08 15:42:04,333 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing stat command from /10.100.6.24:54784
2014-01-08 15:42:04,333 [myid:2] - INFO  [Thread-158:NIOServerCnxn$StatCommand@655] - Stat command output
2014-01-08 15:42:04,334 [myid:2] - INFO  [Thread-158:NIOServerCnxn@1001] - Closed socket connection for client /10.100.6.24:54784 (no session established for client)
2014-01-08 15:42:06,393 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:47980
2014-01-08 15:42:06,393 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing ruok command from /127.0.0.1:47980
2014-01-08 15:42:06,394 [myid:2] - INFO  [Thread-159:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:47980 (no session established for client)
2014-01-08 15:42:07,371 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.100.6.24:54788
2014-01-08 15:42:07,372 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing stat command from /10.100.6.24:54788
2014-01-08 15:42:07,372 [myid:2] - INFO  [Thread-160:NIOServerCnxn$StatCommand@655] - Stat command output
2014-01-08 15:42:07,373 [myid:2] - INFO  [Thread-160:NIOServerCnxn@1001] - Closed socket connection for client /10.100.6.24:54788 (no session established for client)
2014-01-08 15:42:08,750 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.250.76.191:38112
2014-01-08 15:42:08,750 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing mntr command from /10.250.76.191:38112
2014-01-08 15:42:08,751 [myid:2] - INFO  [Thread-161:NIOServerCnxn@1001] - Closed socket connection for client /10.250.76.191:38112 (no session established for client)
2014-01-08 15:42:08,756 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.250.76.191:55892
2014-01-08 15:42:08,756 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing mntr command from /10.250.76.191:55892
2014-01-08 15:42:08,757 [myid:2] - INFO  [Thread-162:NIOServerCnxn@1001] - Closed socket connection for client /10.250.76.191:55892 (no session established for client)
2014-01-08 15:42:09,386 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:53816
2014-01-08 15:42:09,387 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing ruok command from /127.0.0.1:53816
2014-01-08 15:42:09,387 [myid:2] - INFO  [Thread-163:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:53816 (no session established for client)

Then After click on the ui on .118 restart: 

==> exhibitor.out <==
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to stop instance [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Kill attempted result: 0 [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Kill attempted result: 0 [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Starting in standalone mode [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-1]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Process started via: /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg [pool-2-thread-1]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  State: down [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to stop instance [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Attempting to start/restart ZooKeeper [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  jps didn't find instance - assuming ZK is not running [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Starting in standalone mode [ActivityQueue-0]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  Process started via: /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/zkServer.sh [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-3]
ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg [pool-2-thread-3]
INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]

The only concern are those two ERRORs, I don't see what's the deal, JMX is enabled and works fine (also from remote, we need it), the config is correct (in fact I don't need to change it to restart zk). 

No new logs on zookeper.out and if I run jps, of course I see no process: 

-bash-4.1$ jps
70754 Jps
70129 exhibitor-1.5.1.jar

At that point the ZK myid is gone, so I have to manually put it back in order to get the instance back up again. :-(

I hope it helps ... 

Jordan Zimmerman

unread,
Jan 8, 2014, 12:12:01 PM1/8/14
to exhibit...@googlegroups.com, Antonio Terreno, Antonio Terreno
What was the problem?

Date: January 8, 2014 at 10:48:58 AM

Antonio Terreno

unread,
Jan 8, 2014, 12:30:15 PM1/8/14
to exhibit...@googlegroups.com, Antonio Terreno
Sorry, I wrote that line when I *thought* I've nailed it , but then I've fixed a configuration issue and the previously attached logs are the result. 

So no luck. 

The main culprit seems to be the incapability of exhibitor to spawn a new zookeeper, from what I see (I should probably dig a little into its code..) it does manage to kill the running instance, so jps works fine, it does manage to delete the myid but then fails in starting the process again, leaving no error logs. 

As said earlier the only three lines with an ERROR in all the logs are these: 

ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: JMX enabled by default [pool-2-thread-3]
ERROR com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Using config: /opt/molsfw/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg [pool-2-thread-3]

They don't ring any bells to me, since that zk config is the right one to use and jmx is enabled as I need it to be enabled.... 

I hope they will ring a bell there..  

Thanks again Jordan. 

Antonio Terreno

unread,
Jan 8, 2014, 12:32:12 PM1/8/14
to exhibit...@googlegroups.com, Antonio Terreno
One last note, we use Smart Os, which have been proven to be a little 'peculiar' in different occasions. 

I do wonder if that can be an issue. 

-bash-4.1$ uname -a
SunOS mol-hsk-zookeeper4.local 5.11 joyent_20130226T234312Z i86pc i386 i86pc Solaris
-bash-4.1$ java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) Server VM (build 23.7-b01, mixed mode)
-bash-4.1$
Reply all
Reply to author
Forward
0 new messages