Failed During GPInitSystem

349 views
Skip to first unread message

David

unread,
Jan 3, 2017, 2:18:43 PM1/3/17
to Greenplum Users
Hey Guys, 
 Clearly some progress going here.    I've just run gpinitsystem and failed during segment creation.


This looks like the first errors I can find (Notice there are some success right before it?)

Any ideas here?  From the best I can see , here are the errors (but full log is included as attachment as well

Thanks for all your help getting going on this.


/usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg5 \[greenplum-segment-02\]:/data/mirror/gpseg5
20170103:18:43:58:092072 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Start Function RUN_COMMAND_REMOTE
20170103:18:43:58:092347 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Start Function RUN_COMMAND_REMOTE
20170103:18:43:58:092928 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-End Function BACKOUT_COMMAND
20170103:18:43:58:092639 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-End Function BACKOUT_COMMAND
20170103:18:43:58:092072 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Commencing remote /bin/ssh greenplum-segment-01 export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg6 \[greenplum-segment-02\]:/data/mirror/gpseg6
20170103:18:43:58:092347 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Commencing remote /bin/ssh greenplum-segment-01 export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg7 \[greenplum-segment-02\]:/data/mirror/gpseg7
20170103:18:43:58:092928 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO][23]:-Completed to start segment instance database greenplum-segment-03 /data/mirror/gpseg9
20170103:18:43:58:092639 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO][22]:-Completed to start segment instance database greenplum-segment-03 /data/mirror/gpseg8
ssh_exchange_identification: Connection closed by remote host
[FATAL]:-Unexpected EOF on RemotePysync output stream
20170103:18:43:58:092928 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Copying data for mirror on greenplum-segment-03 using remote copy from primary greenplum-segment-02 ...
20170103:18:43:58:092639 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Copying data for mirror on greenplum-segment-03 using remote copy from primary greenplum-segment-02 ...
20170103:18:43:58:092928 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Start Function RUN_COMMAND_REMOTE
20170103:18:43:58:092639 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Start Function RUN_COMMAND_REMOTE
20170103:18:43:58:087289 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL]:- Command export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg11 \[greenplum-segment-03\]:/data/mirror/gpseg11 on greenplum-segment-02 failed with error status 3
20170103:18:43:58:092928 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Commencing remote /bin/ssh greenplum-segment-02 export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg9 \[greenplum-segment-03\]:/data/mirror/gpseg9
20170103:18:43:58:092639 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-Commencing remote /bin/ssh greenplum-segment-02 export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg8 \[greenplum-segment-03\]:/data/mirror/gpseg8
20170103:18:43:58:087289 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-End Function RUN_COMMAND_REMOTE
20170103:18:43:58:087289 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL][3]:-Failed remote copy of segment data directory from greenplum-segment-02 to greenplum-segment-03
Killed by signal 1.
Killed by signal 1.
Killed by signal 1.
Traceback (most recent call last):
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 669, in <module>
Traceback (most recent call last):
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 669, in <module>
Traceback (most recent call last):
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 669, in <module>
    sys.exit(LocalPysync(sys.argv, progressTimestamp=True).run())
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 647, in run
    sys.exit(LocalPysync(sys.argv, progressTimestamp=True).run())
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 647, in run
    sys.exit(LocalPysync(sys.argv, progressTimestamp=True).run())
    code = self.work()
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 611, in work
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 647, in run
    code = self.work()
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 611, in work
    code = self.work()
  File "/usr/local/greenplum-db/./bin/lib/pysync.py", line 611, in work
    self.socket.connect(self.connectAddress)
  File "<string>", line 1, in connect
    self.socket.connect(self.connectAddress)
  File "<string>", line 1, in connect
    self.socket.connect(self.connectAddress)
  File "<string>", line 1, in connect
socket.error: socket.error: [Errno 110] Connection timed out
[Errno 110] Connection timed out
socket.error: [Errno 110] Connection timed out

init.log

Robert Mcphail

unread,
Jan 3, 2017, 2:36:36 PM1/3/17
to David, Greenplum Users
Hi David,

Happy to help.

Could you please show what your /etc/hosts file looks like?

Also are you using a 1 Gb or 10 Gb network?

Thank you,
Bob McPhail

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--

Bob McPhail  |  Partner Engineering  |  Pivotal 

yuwei...@gmail.com

unread,
Jan 3, 2017, 2:39:04 PM1/3/17
to Robert Mcphail, David, Greenplum Users
also, can you share "/etc/limits.conf" and "/etc/sysctl.conf"?
Did you run gpcheckperf to test out network and io of your cluster?
Yu-wei Sung

Keaton Adams

unread,
Jan 3, 2017, 3:02:06 PM1/3/17
to Greenplum Users
So this continues to look like a network issue: hostname/routing, access/permissions, network performance, or something of the sort.  I see from the DB name it is titled, "AWS Greenplum DW". If the attempt is to get a GPDB cluster up and running in AWS, have you tried the Pivotal supplied AMIs to get something running and then work from there to configure a proper AWS environment for Greenplum?

Search the Public AMIs for GPDB and there should be a full list of options available:

gpdb-cloud-sandbox 4.3.11.0-aws-20161214
gpdb-cloud-sandbox 4.3.10.0-aws-20161024
gpdb-bootstrap 4.3.9.1-aws-20161115a

 
The latest GPDB Documentation also has a specific section for Amazon AWS that might be helpful, with details on networking including VPC configuration with specific ports that need to be opened up, etc:


See if these resources help out.  Otherwise we would need the specifics on Instance type, OS selected, network setup, etc.

Thanks,

Keaton




20170103:18:46:07:092347 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL]:- Command export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg7 \[greenplum-segment-02\]:/data/mirror/gpseg7 on greenplum-segment-01 failed with error status 1
20170103:18:46:07:090898 gpcreateseg.sh:greenplum-master-01:gpadmin-[INFO]:-End Function RUN_COMMAND_REMOTE
20170103:18:46:07:091172 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL]:- Command export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg3 \[greenplum-segment-02\]:/data/mirror/gpseg3 on greenplum-segment-01 failed with error status 1
20170103:18:46:07:092072 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL]:- Command export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg6 \[greenplum-segment-02\]:/data/mirror/gpseg6 on greenplum-segment-01 failed with error status 1
20170103:18:46:07:091475 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL]:- Command export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg4 \[greenplum-segment-02\]:/data/mirror/gpseg4 on greenplum-segment-01 failed with error status 1
20170103:18:46:07:091773 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL]:- Command export GPHOME=/usr/local/greenplum-db/.; . /usr/local/greenplum-db/./greenplum_path.sh; /usr/local/greenplum-db/./bin/lib/pysync.py -x pg_log -x postgresql.conf -x postmaster.pid /data/primary/gpseg5 \[greenplum-segment-02\]:/data/mirror/gpseg5 on greenplum-segment-01 failed with error status 1
20170103:18:46:07:090622 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL][15]:-Failed remote copy of segment data directory from greenplum-segment-03 to greenplum-segment-01
20170103:18:46:07:088589 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL][8]:-Failed remote copy of segment data directory from greenplum-segment-03 to greenplum-segment-01

David

unread,
Jan 3, 2017, 4:58:09 PM1/3/17
to Greenplum Users
Ok to answer everyone's questions:

This is a cluster of D2.8XL ( 3 segment Nodes for now).  
Storage is:
 Raid 10 (22 Drives) + 2 Hot Spare for primary (ephemeral
 Raid 0   2 x11TB for Secondary (EBS)

Local gets 1300MB/Sec IO, EBS is closer to 1000MB/sec.

Networking: These are in a placement group and should get 10GB, but I did just notice they do not have enhanced networking on... which is likely why Im seeing slightly worse performance (I think the hosts averaged 900MB/sec)... I will fix that.

Firewalls (IP Tables and FirewallD) are not present

This is Centos 7.2.

Host file is 

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.0.151.138 greenplum-master-01
10.0.151.109 greenplum-master-02
10.0.151.7   greenplum-segment-01
10.0.151.83  greenplum-segment-02
10.0.151.12  greenplum-segment-03

Limits looks like this:

* soft nofile 65536
* hard nofile 65536

gpadmin soft nproc 131072
gpadmin hard nproc 131072

#end of file


As far as security Groups and ports:
I have 6000-6400 / 7000-7400 /8000-8400 and 9000-9400 all open for TCP as well as whatever else in the attached image


So definitely seems like getting enhanced networking is next, but not sure if that is it... or what else I should be testing.

Thank you all for the help
securityGroups.PNG

Robert Mcphail

unread,
Jan 3, 2017, 6:01:04 PM1/3/17
to David, Greenplum Users
Hi David,

That all looks pretty good to me.

To verify at least that the GPDB software successfully deployed across your network to all the servers in the cluster, check /usr/local.  You should see the greenplum-db link.

Not to put out a red herring but it seems most of the errors have something to do with greenplum-segment-03.  For example:

20170103:18:43:58:087289 gpcreateseg.sh:greenplum-master-01:gpadmin-[FATAL][3]:-Failed remote copy of segment data directory from greenplum-segment-02 to greenplum-segment-03

On 03, check /usr/local to make sure the GPDB software got there.  Check /var/log/messages for info that might help, verify /data/ directory permissions, even disk free space, etc.

See if you can ssh from 02, to 03.  Then try that on the other servers just to make sure they all act the same.

Also since this is AWS, I'd definitely look at the info Keaton Adams sent.





--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

David

unread,
Jan 3, 2017, 7:00:37 PM1/3/17
to Greenplum Users, dco...@yieldmo.com
So here's one more thing... do we thing the networking could be a red herring?  
I only ask because not getting lost in the logs and looking at the initial output... The primary segments all succeed, the mirror segments all fail:

20170103:23:56:00:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Building the Master instance database, please wait...
20170103:23:56:06:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Starting the Master in admin mode
20170103:23:56:24:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20170103:23:56:24:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
........................
20170103:23:56:24:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
.................
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:------------------------------------------------
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Parallel process exit status
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:------------------------------------------------
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Total processes marked as completed           = 24
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Total processes marked as killed              = 0
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Total processes marked as failed              = 0
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:------------------------------------------------
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Commencing parallel build of mirror segment instances
20170103:23:56:42:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...

0170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:------------------------------------------------
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Parallel process exit status
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:------------------------------------------------
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Total processes marked as completed           = 0
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Total processes marked as killed              = 0
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[WARN]:-Total processes marked as failed              = 24 <<<<<
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:------------------------------------------------
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[FATAL]:-Errors generated from parallel processes
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Dumped contents of status file to the log file
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Building composite backout file
20170103:23:58:53:gpinitsystem:greenplum-master-01:gpadmin-[FATAL]:-Failures detected, see log file /home/gpadmin/gpAdminLogs/gpinitsystem_20170103.log for more detail Script Exiting!
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[WARN]:-Script has left Greenplum Database in an incomplete state
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[WARN]:-Run command /bin/bash /home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20170103_235455 to remove these changes
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-Start Function BACKOUT_COMMAND
20170103:23:58:53:098657 gpinitsystem:greenplum-master-01:gpadmin-[INFO]:-End Function BACKOUT_COMMAND
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--

Bob McPhail  |  Partner Engineering  |  Pivotal 

Robert Mcphail

unread,
Jan 3, 2017, 7:10:04 PM1/3/17
to David, Greenplum Users
You could test by commenting out the mirror spec in gpinitsystem file then redeploy.

Did you create the /data/mirror directories?   With proper permissions?

Could you show this file?
MACHINE_LIST_FILE=/tmp/host_exkeys_segments


To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--

Bob McPhail  |  Partner Engineering  |  Pivotal 

David

unread,
Jan 3, 2017, 7:25:24 PM1/3/17
to Greenplum Users, dco...@yieldmo.com

data/mirror is the mount for a raid:
data/mirror/data is a directory I made.

When I look at the /mirror/data it definitely does have new directories, so I think it did have the right permissions.

[root@greenplum-segment-02 data]# dir
gpseg0  gpseg1  gpseg2  gpseg3  gpseg4  gpseg5  gpseg6  gpseg7




The file is below:

greenplum-segment-01
greenplum-segment-02
greenplum-segment-03

With no Mirror Segments , the system started up correctly

David

unread,
Jan 3, 2017, 7:30:57 PM1/3/17
to Greenplum Users, dco...@yieldmo.com
I suppose the next thing would be to try to run gpaddmirrors?

Robert Mcphail

unread,
Jan 3, 2017, 7:37:14 PM1/3/17
to David, Greenplum Users
That could be worth a try.   Would be interesting and curious if that did work.  gpinitsystem creates the mirrors (when specified as you have done) as well as the primaries during GPDB initialization.

Assuming you also used a mount for the primaries, I'd check to make sure both mounts and directories are configured the same.  For the names, I see this in the init log:
declare -a DATA_DIRECTORY=(/data/primary /data/primary /data/primary /data/primary /data/primary /data/primary /data/primary /data/primary)
declare -a MIRROR_DATA_DIRECTORY=(/data/mirror /data/mirror /data/mirror /data/mirror /data/mirror /data/mirror /data/mirror /data/mirror)

Maybe check the parms for those mounts in fstabs too.



To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--

Bob McPhail  |  Partner Engineering  |  Pivotal 

David Cohen

unread,
Jan 3, 2017, 7:47:43 PM1/3/17
to Robert Mcphail, Greenplum Users
Thanks for the suggestion to look at /etc/fstab  looks like I copy pasted wrong the last time through, and only on mirror.

UUID="f42b64aa-6347-48a1-a307-25a4363dfa64" /data/primary xfs nodev,noatime,inode64,allocsize=16m 0 0
UUID="4ba9e40d-49cf-4cee-aab1-d724710c0777" /data/mirror xfs nodev,noatime,inode64,allocsize=16m 0 0xfs nodev,noatime,inode64,allocsize=16m 0 0


Guess I will just delete the system and redo it ,with the corrected mount.
--
David Cohen
VP Business Information Architecture

218 West 18th St, 2nd FL
New York, NY 10011

David

unread,
Jan 4, 2017, 1:00:17 PM1/4/17
to Greenplum Users, rmcp...@pivotal.io
So, this was no the actual problem.

I installed it fine with primary only.
When I run addmirrors, I got a different error, that /bin/bash couldnt find perl.  So, installed perl, and then add mirrors completed fine.

Robert Mcphail

unread,
Jan 4, 2017, 1:28:27 PM1/4/17
to David, Greenplum Users
Hi David,

Glad to hear you got everything working!


Reply all
Reply to author
Forward
0 new messages