Deployment of a 3 Node CoprHD Cluster

254 views
Skip to first unread message

Stephen McElroy

unread,
Apr 14, 2016, 6:21:44 PM4/14/16
to coprHD
Can someone point me to the documentation for this if any? Ive searched for a while now and haven't been able to find anything.

Jim DeWaard

unread,
Apr 14, 2016, 7:39:00 PM4/14/16
to coprHD
Hey Steven,
Are you deploying the CoprHD OVA or installing via the RPM (or from source).

I'm working on some documentation on deploying the OVA in a cluster configuration. I'll post the steps and some example config files when I have a few minutes.

Thanks,
Jim D

Jim DeWaard

unread,
Apr 14, 2016, 7:44:30 PM4/14/16
to coprHD
Sorry. Stephen, not Steven.

Stephen McElroy

unread,
Apr 14, 2016, 7:48:02 PM4/14/16
to Jim DeWaard, coprHD

I'm deploying from a ovf. So far I've gotten 3 nodes going, but the only thing that I see that has been propagated is the root/svcuser password and the basic settings from my node 1. So I'm sure I messed up something lol.

--
You received this message because you are subscribed to a topic in the Google Groups "coprHD" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/coprhd/liZLuYfAXco/unsubscribe.
To unsubscribe from this group and all its topics, send an email to coprhd+un...@googlegroups.com.
To post to this group, send email to cop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/coprhd/3566e52a-20f3-4076-8aee-f1bcc0a18490%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim DeWaard

unread,
Apr 14, 2016, 7:58:33 PM4/14/16
to coprHD

Yeah, I hear ya. I think i deployed 5 different times before things started coming together.

Would you mind posting the contents of the ovfenv.properties file from each node?

cat /etc/ovfenv.properties

Thanks!

Stephen McElroy

unread,
Apr 14, 2016, 8:19:08 PM4/14/16
to coprHD
After the deployment all that was changed was ovfenv.properties, followed by a reboot. Here are all 3 ovfevn

#node 1
network_1_ipaddr6=::0

network_2_ipaddr6=::0
network_3_ipaddr6=::0
network_1_ipaddr=10.9.94.44
network_2_ipaddr=10.9.94.45

network_3_ipaddr=10.9.94.46

network_gateway6=::0

network_gateway=10.9.92.1

network_netmask=255.255.252.0

network_prefix_length=64

network_vip6=::0

network_vip=10.9.92.200

node_count=3

node_id=vipr1

#node 2

network_1_ipaddr6=::0
network_2_ipaddr6=::0
network_3_ipaddr6=::0
network_1_ipaddr=10.9.94.45
network_2_ipaddr=10.9.94.44

network_3_ipaddr=10.9.94.46

network_gateway6=::0

network_gateway=10.9.92.1

network_netmask=255.255.252.0

network_prefix_length=64

network_vip6=::0

network_vip=10.9.92.200

node_count=3

node_id=vipr2

#node 3

network_1_ipaddr6=::0
network_2_ipaddr6=::0
network_3_ipaddr6=::0
network_1_ipaddr=10.9.94.46
network_2_ipaddr=10.9.94.44

network_3_ipaddr=10.9.94.45

network_gateway6=::0

network_gateway=10.9.92.1

network_netmask=255.255.252.0

network_prefix_length=64

network_vip6=::0

network_vip=10.9.92.200

node_count=3

node_id=vipr3
Message has been deleted

Jim DeWaard

unread,
Apr 14, 2016, 9:07:01 PM4/14/16
to coprHD
Ok, so I think I see the issue.  In your 'ovfenv.properties' files, I believe the 'network_#_ipaddr' lines need to be the same across all nodes.  The only variation in the configuration between nodes will be the "node_id" value.

Here's what config file would look like on each node:
network_1_ipaddr6=::0
network_2_ipaddr6
=::0
network_3_ipaddr6
=::0
network_1_ipaddr
=10.9.94.44
network_2_ipaddr
=10.9.94.45
network_3_ipaddr
=10.9.94.46
network_gateway6
=::0
network_gateway
=10.9.92.1
network_netmask
=255.255.252.0
network_prefix_length
=64
network_vip6
=::0
network_vip
=10.9.92.200
node_count
=3

node_id
=vipr ##(The node_id value would be 'vipr1' on the first node, 'vipr2' on the second, and so on)


The only difference in the config files across all nodes is the node_id line.  Let me know if that works.

Thanks,
Jim D.

Stephen McElroy

unread,
Apr 15, 2016, 10:32:54 AM4/15/16
to coprHD
Ok I'll give this a go. I think Im going to start this from scratch, document and report back my steps and results. I'd like to try this from a fresh deployment perspective. Since Im unable to find any OVA's, ill be using the ovf for CoprHD-3.0.0.0.306

Jim DeWaard

unread,
Apr 15, 2016, 1:12:06 PM4/15/16
to coprHD
Yeah sorry, I use OVA/OVF terms interchangeably sometimes.  This is sort of a silly question, but is CoprHD 3.0 considered a stable build?  I can't seem to find any release documentation on the 3.0 release at this point.

Stephen McElroy

unread,
Apr 15, 2016, 2:26:07 PM4/15/16
to coprHD
Not silly at all, I couldnt find anything either. But It did fix alot of issues I was running into with 2.4. So far Im testing it out to see how it goes, and honesty so far so good. Should be said I couldn't get 2.4 to work in a 3 node either. 

I redeployed everything from scratch, with the new ovfenv config, still a nogo. After changing the files and preforming a reboot I ended up with this error on all nodes.

[main] WARN HostSupplierImpl.java (line 65) hostsupplier is empty. May be dbsvc hasn't started yet. waiting for 10000 msec

I'll try to take a look into this a little more later. For now I need to go and sled 400 drives and install a SAN array :(.

Jim DeWaard

unread,
Apr 17, 2016, 2:33:15 PM4/17/16
to coprHD
Hmmm. I think I've seen that error before when the cluster is running through its initial setup. Is that message in the dbsvc.log file? Would you be able to post the contents of your logs directory when you get a free moment?

Thanks!
Jim D

Stephen McElroy

unread,
Apr 18, 2016, 4:56:01 PM4/18/16
to coprHD
Looks like I fat fingered the ovfenv file when I made it like you suggested. After correcting it and rebooting all the machines, they reported the UI being up and has been working since. 

Ben Perkins

unread,
Apr 18, 2016, 10:19:04 PM4/18/16
to coprHD
I'm glad you're having better luck now.  As to your question about 3.0, you can always refer to https://coprhd.atlassian.net/wiki/display/COP/CoprHD-Controller+Repository for information on the various branches and what's officially been released.  As for 3.0, it's stabilizing now but has not yet reached release.

Jim DeWaard

unread,
Apr 19, 2016, 8:14:47 AM4/19/16
to coprHD
Excellent! Just as a reminder, there is a modification that needs to be made to the GeoStorageOS keyspace in the Cassandra DB before the cluster will be highly available (at least in the 2.4 release). Let me know if this is also true for the 3.0 release.

Thanks!
Jim D

Ben Perkins

unread,
Apr 24, 2016, 11:08:18 AM4/24/16
to coprHD
Jim,

Can you elaborate on the keyspace modification or point me toward the thread where it's discussed?  Is there a JIRA item for whatever it is?

Thanks,
Ben

Jim DeWaard

unread,
May 4, 2016, 10:31:14 AM5/4/16
to coprHD
Hey Ben,
Sorry for the delay.  Here's where the issue was initially discussed.

https://groups.google.com/d/msg/coprhd/NsnUg7Gf5bs/oL58EQCsLAAJ

I have not created a Jira item for this yet, but I will look into doing that today.

Thanks,
Jim DeWaard

Ben Perkins

unread,
May 5, 2016, 10:00:54 AM5/5/16
to coprHD
Thanks Jim.  The steps of manually updating the database configuration as referenced in that thread definitely don't sound like something that should be required.  If you run into any trouble creating the JIRA, let us know.  You can reference these threads in the JIRA, and then once it's created put a reference to the JIRA here so folks running into this issue can find it and be aware of whatever the resolution is.  Thanks for helping make CoprHD better!

Stephen

unread,
May 17, 2016, 9:46:26 AM5/17/16
to coprHD
Hi guys,

I've downloaded the 3.0.0.0.310 build from Jenkins and trying to install to replace an older 2.4.x system that refuses to upgrade (single node).  I'd like to move to a 3-node config so I used storageos-deployment-template.sh to try and deploy a build.  I've read storageos-deployment-template.sh and the storageos-deployment-template.ps1 script but I can't really figure out how these scripts are supposed to locate the vmdk files that I've downloaded, so far the script fails to find any disk files.  I've scoured the site but can't find any guide to deploy a 3-node cluster, am I really supposed to manually edit files in the VM after deploying?

thanks
Stephen

Salman Riaz

unread,
Jul 27, 2016, 12:55:41 PM7/27/16
to coprHD
Hi,

I am also facing this error. I have tried to re-install these machines but not succeed. diagtool shows me following output. I have created three VMs (OpenSuse 13.2) on Citrix XenServer 6.5. Can you please guide me in this regard.

coprhd1:~ # /etc/diagtool -v
* Network interface: [OK]
         network_ipaddr=10.10.50.3
         network_netmask=255.255.255.248
         network_ipaddr6=
         network_rx_packets=102080
         network_tx_packets=53627
         number_of_errors=0
         network_status=RUNNING
* IP uniqueness: [OK]
         network_vip=10.10.50.2,[OK]
         network_1_ipaddr=10.10.50.3,[OK]
         network_2_ipaddr=10.10.50.4,[OK]
         network_3_ipaddr=10.10.50.5,[OK]
* Network routing: [OK]
         network_gw6=UNCONFIGURED
         network_gw=10.10.50.1,REACHABLE
* Nodes connectivity: [REACHABLE]
         vipr1=10.10.50.3,REACHABLE
         vipr2=10.10.50.4,REACHABLE
         vipr3=10.10.50.5,REACHABLE
* Network VIP: [IPV4_ONLY, REACHABLE]
         ipv4_vip=10.10.50.2
         ipv4_vip_status=REACHABLE
* VDC Status: [REACHABLE]
* Peer synchronization: [No peer exists]
* IP subnets: [SAME]
         network_1_ipaddr=10.10.50.3, subnet is 10.10.50.0
         network_2_ipaddr=10.10.50.4, subnet is 10.10.50.0
         network_3_ipaddr=10.10.50.5, subnet is 10.10.50.0
         gateway=10.10.50.1, subnet is 10.10.50.0
* Db connection: [UNREACHABLE]
* ZK connection: [OK]
* Firewall: [CONFIGURED, ACTIVE]
* DNS: [OK]
         network_nameserver=8.8.8.8 [OK]
         network_nameserver=8.8.4.4. [OK]
* NTP: [CONFIGURED, DEGRADED]
         network_ntpserver=0.opensuse.pool.ntp.org [OK]
         network_ntpserver=1.opensuse.pool.ntp.org [OK]
         network_ntpserver=3.opensuse.pool.ntp.org [OK]
         network_ntpserver=2.opensuse.pool.ntp.org [CONFIGURED, UNREACHABLE]
* EMC upgrade repository: [OK]
         system_update_repo=https://colu.emc.com/soap/rpc
* connectEMC: [OK]
         FTPS server=corpusfep3.emc.com, REACHABLE
         SMTP server=NOT SPECIFIED
/etc/diagtool: line 489: let: avail_percentage=100 - : syntax error: operand expected (error token is "- ")
/etc/diagtool: line 491: [: -ge: unary operator expected
/etc/diagtool: line 494: [: -le: unary operator expected
/etc/diagtool: line 897: 10*/100: syntax error: operand expected (error token is "/100")
* Memory usage: [OK]
         total=11781M
         used=3079M
         free=8702M
         shared=1M
         buffers=55M
         cached=851M
* CPU usage: [OK]
         cpu_util_percentage=2.4%

Salman Riaz

unread,
Jul 28, 2016, 4:20:01 AM7/28/16
to coprHD
Hi,

I have reinstalled these VMs. I have first installed CoprHD on every machine as separate entity. Then I updated the file /etc/ovfenv.properties. And rebooted the servers. After this I tried to open the webpage but it gives me following page. Can you guys please update me.

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<error>
<code>6503</code>
<description>
Unable to connect to the service. The service is unavailable, try again later.
</description>
<details>
The service is currently unavailable because a connection failed to a core component. Please contact an administrator or try again later.
</details>
<retryable>true</retryable>
</error>

Salman Riaz

unread,
Jul 28, 2016, 6:32:36 AM7/28/16
to coprHD
Hi,

One thing more is that "diagtool -v" output shows all okay but when I run "diagtool -r" then I get the following outputs on each node:


coprhd1:~ # /etc/diagtool -r
* Resource allocation: [OK]
         Resources details for vipr1(localhost) is:
         memory size: 11781M
         disk size: 125110568M
         processor count: 4
         total cpu frequency: 9178.46MHz

         vipr2 is down


         vipr3 is down

coprhd2:~ # /etc/diagtool -r
* Resource allocation: [OK]
         Resources details for vipr2(localhost) is:
         memory size: 11781M
         disk size: 125110560M
         processor count: 4
         total cpu frequency: 9179.48MHz

         vipr1 is down


         vipr3 is down

coprhd3:~ # /etc/diagtool -r
* Resource allocation: [OK]
         Resources details for vipr3(localhost) is:
         memory size: 11781M
         disk size: 125110576M
         processor count: 4
         total cpu frequency: 9179.1MHz

         vipr1 is down


         vipr2 is down


Why is it so???

Salman Riaz

unread,
Jul 28, 2016, 10:10:41 AM7/28/16
to coprHD
Hi,

I am facing this error in apisvc.log:

2016-07-28 19:07:05,099 [main]  WARN  HostSupplierImpl.java (line 73) hostsupplier is empty. May be dbsvc hasn't started yet. waiting for 10000 msec
2016-07-28 19:07:15,100 [main]  WARN  HostSupplierImpl.java (line 107) no dbsvc instance running. Coordinator exception message: The coordinator cannot locate any service with path /sites/59597760-5493-11e6-8828-cbd944af77a6/service/dbsvc/3.5
2016-07-28 19:07:15,101 [main]  WARN  HostSupplierImpl.java (line 73) hostsupplier is empty. May be dbsvc hasn't started yet. waiting for 10000 msec


And receiving this in dbsvc.log:
2016-07-28 19:01:46,906 [main]  INFO  SchemaUtil.java (line 315) try scan and setup db ...
2016-07-28 19:01:46,906 [main]  INFO  SchemaUtil.java (line 335) keyspace exist already
2016-07-28 19:01:46,923 [main]  INFO  SchemaUtil.java (line 554) Current strategyOptions={vdc1=1}
2016-07-28 19:01:46,926 [main]  INFO  DrUtil.java (line 492) Cassandra DC Name is vdc1
2016-07-28 19:01:46,926 [main]  INFO  DrUtil.java (line 492) Cassandra DC Name is vdc1
2016-07-28 19:01:46,926 [main]  INFO  DrUtil.java (line 492) Cassandra DC Name is vdc1
2016-07-28 19:01:46,929 [main]  INFO  SchemaUtil.java (line 348) Current db schema version 3.5
2016-07-28 19:01:46,950 [main]  INFO  SchemaUtil.java (line 351) scan and setup db schema succeed
2016-07-28 19:01:46,950 [main]  INFO  StartupMode.java (line 116) DB schema validated
2016-07-28 19:01:46,956 [main]  INFO  DbServiceStatusChecker.java (line 132) Waiting for all cluster nodes to become state: joined

And page is redirected to https://VIP/maintenance?targetUrl=%2F

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<error>
<code>6503</code>
<description>
Unable to connect to the service. The service is unavailable, try again later.
</description>
<details>
The service is currently unavailable because a connection failed to a core component. Please contact an administrator or try again later.
</details>
<retryable>true</retryable>
</error>



Kindly guide me in this regard.

Stephen McElroy

unread,
Jul 28, 2016, 10:16:39 AM7/28/16
to coprHD
I had all these same exact issues, and at one point I thought I had it going good, but after a couple of weeks I gave up on it and ran it as a single node application. I'll revisit this eventually, but for now I have no idea why they wont talk to each other.

Salman Riaz

unread,
Jul 28, 2016, 10:43:41 AM7/28/16
to coprHD
Hi Stephen,

I hope someone will reply :)

I have been trying to build this thing multiple times for last few days. I have also applied different work-around mentioned in different forums. I have tried to create the KEYSPACE from svcuser but not successful. I'm not able to telnet localhost (cqlsh localhost Connection error: Could not connect to localhost:9160) on vipr2 and vipr3. 

I'm trying to get it done.

Salman Riaz

unread,
Jul 28, 2016, 3:27:59 PM7/28/16
to coprHD
Stephen,

How's your experience with single-node CoprHD? Is it stable one? How many times is it crashed?

Regards,

Salman Riaz.
Reply all
Reply to author
Forward
0 new messages