Warnings inside the output of gpcheckperf

365 views
Skip to first unread message

Jamie Cox

unread,
Apr 19, 2017, 11:56:16 AM4/19/17
to Greenplum Users
I am running gpcheckperf -f hostfile_gpcheckperf -r M -d /tmp > netcheck.out
and I am getting these lines in the output along with the results...

[Warning] netperf failed on gpseg10 -> gpseg4
[Warning] netperf failed on gpseg10 -> gpseg20
[Warning] netperf failed on gpseg11 -> gpseg2

I am wondering if these messages are something I need to fix, or are they ok? 

Thank You so much
Jamie

Keaton Adams

unread,
Apr 19, 2017, 11:58:34 AM4/19/17
to Greenplum Users
"To run a full-matrix bandwidth test, you can specify -r M which will cause every host to send and receive data from every other host specified. This test is best used to validate if the switch fabric can tolerate a full-matrix workload."

What is the network configuration on your GP cluster?  Are these physical machines / network switches or virtual?

Jamie Cox

unread,
Apr 19, 2017, 12:03:24 PM4/19/17
to Greenplum Users
Hi Keaton,
These are physical machines each with 10GB ethernet attached to a physical switch.
I was running this test after I switched to 10GB to make sure everything was running good. 

Thank You 

Luis Macedo

unread,
Apr 19, 2017, 12:07:23 PM4/19/17
to Jamie Cox, Greenplum Users
Looks like you have some issue with your network.

Can you ping gpseg4 from gpseg10 and so on?

Rgds,


Luis Macedo | Sr Platform Architect | Pivotal Inc 

Mobile: +55 11 97616-6438

Take care of the customers and the rest takes care of itself

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

Jamie Cox

unread,
Apr 19, 2017, 12:14:10 PM4/19/17
to Greenplum Users, cpu...@gmail.com
Hi Luis,
Yes, I can ping all nodes from each other. I see no problems with the network so far. 
Thank You
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

Mayur Mahadeshwar

unread,
Apr 19, 2017, 12:47:54 PM4/19/17
to Jamie Cox, Greenplum Users
Hi Jamie ,
Can you run it with a -v flag . And look for [Info] before [Warning] 'netperf failed'  it will show you the actual command run . 

Regards,
Mayur Mahadeshwar
Data Engineer 


To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Keaton Adams

unread,
Apr 19, 2017, 12:50:44 PM4/19/17
to Greenplum Users
Also, what does a regular gpheckperf on the network show?

gpcheckperf -f hostfile_gpcheckperf -r N -d /tmp > netcheck.out




On Wednesday, April 19, 2017 at 8:56:16 AM UTC-7, Jamie Cox wrote:

Jamie Cox

unread,
Apr 19, 2017, 12:58:44 PM4/19/17
to Greenplum Users
Hi Keaton,
I see some of these in there as well.
[Warning] connection between gpseg1 and gpseg2 is no good
[Warning] connection between gpseg5 and gpseg6 is no good
[Warning] connection between gpseg7 and gpseg8 is no good

Jamie Cox

unread,
Apr 19, 2017, 1:00:18 PM4/19/17
to Greenplum Users, cpu...@gmail.com
Hi Mayur,
I see these before the warnings...

[Info] gpseg10 -> gpseg7 : ['0', '0', '32768', '14.80', '49.34']
[Info] Connected to server
0     0        32768       14.74     52.50

[Info] gpseg10 -> gpseg4 : ['0', '0', '32768', '14.74', '52.50']
[Warning] netperf failed on gpseg10 -> gpseg5
[Warning] netperf failed on gpseg10 -> gpseg24
[Info] Connected to server
0     0        32768       14.84     35.20

[Info] gpseg10 -> gpseg21 : ['0', '0', '32768', '14.84', '35.20']
[Warning] netperf failed on gpseg10 -> gpseg20


Luis Macedo

unread,
Apr 19, 2017, 1:16:44 PM4/19/17
to Jamie Cox, Greenplum Users
From a distance looks like a firewall issue.

Did you exchange keys between all nodes?

--- Sent from my Nexus 5x

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Jamie Cox

unread,
Apr 19, 2017, 1:22:09 PM4/19/17
to Greenplum Users, cpu...@gmail.com
All firewalls are currently off and I re-ran the key exchange to make sure that was not the issue. 

Keaton Adams

unread,
Apr 19, 2017, 2:03:08 PM4/19/17
to Greenplum Users
There really should be no errors, especially on a "regular" gpcheckperf network test.  The fact that there are links between some of the servers that are reported as "no good" indicates that most likely, something in the network layer itself is having issues.  From the cluster configuration guide, this is what the basic requirements for the Interconnect are:

"Two layer-2/layer-3 managed switches per rack. All ports must have full bandwidth, be able to operate at line rate, and be non-blocking."  10 GiB bandwidth.


Also see if these resources are helpful at all:


Might want to also check with your Network Admin to see if there are errors at the switch level to help in troubleshooting the problem.  If nothing else, if your company has a support contract with Pivotal, open up a case and a Support specialist will be able to help troubleshoot further.

Thanks.

Jamie Cox

unread,
Apr 19, 2017, 2:10:06 PM4/19/17
to Greenplum Users
Hi Keaton,
I wouldnt say that they are errors. 
Inside that netcheck.out I see results...
gpseg1 -> gpseg2 = 537.830000
gpseg2 -> gpseg1 = 545.800000
and I also see the Warning...
[Warning] connection between gpseg1 and gpseg2 is no good
The results for these systems are lower than the rest, but they are working. Do you think this is why yhey are warnings instead of errors? 
I will look through the documents provided.
Thank You

Luis Macedo

unread,
Apr 19, 2017, 3:52:06 PM4/19/17
to Jamie Cox, Greenplum Users
Oh yeah... That is it the warning means that the performance is below the recommended. 

Can you post the entire output? Try picking up from the screen plz.


Thanks,


Luis Macedo | Sr Platform Architect | Pivotal Inc 

Mobile: +55 11 97616-6438

Take care of the customers and the rest takes care of itself

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

David Gordon

unread,
Apr 19, 2017, 4:01:18 PM4/19/17
to Luis Macedo, Jamie Cox, Greenplum Users

Jamie,

Is the network between nodes dedicated to the Greenplum cluster?

David
--
David Gordon | Senior Account Data Engineer | Pivotal

Mobile: 917-699-2181 | dgordon@pivotal.io


Jamie Cox

unread,
Apr 19, 2017, 4:43:57 PM4/19/17
to Greenplum Users, cpu...@gmail.com
Here is the output from one of the commands...
/usr/local/greenplum-db/./bin/gpcheckperf -f hostfile_gpcheckperf -r N -d /tmp

-------------------
--  NETPERF TEST
-------------------

====================
==  RESULT
====================
Netperf bisection bandwidth test
gpseg1 -> gpseg2 = 537.830000
gpseg3 -> gpseg4 = 983.700000
gpseg5 -> gpseg6 = 684.670000
gpseg7 -> gpseg8 = 703.440000
gpseg9 -> gpseg10 = 748.910000
gpseg11 -> gpseg12 = 707.880000
gpseg13 -> gpseg14 = 480.850000
gpseg15 -> gpseg16 = 746.280000
gpseg17 -> gpseg18 = 615.130000
gpseg19 -> gpseg20 = 936.580000
gpseg21 -> gpseg22 = 1013.510000
gpseg23 -> gpseg24 = 501.940000
gpseg2 -> gpseg1 = 545.800000
gpseg4 -> gpseg3 = 407.490000
gpseg6 -> gpseg5 = 735.750000
gpseg8 -> gpseg7 = 430.670000
gpseg10 -> gpseg9 = 997.230000
gpseg12 -> gpseg11 = 630.550000
gpseg14 -> gpseg13 = 449.080000
gpseg16 -> gpseg15 = 405.970000
gpseg18 -> gpseg17 = 579.260000
gpseg20 -> gpseg19 = 1057.610000
gpseg22 -> gpseg21 = 795.780000
gpseg24 -> gpseg23 = 539.350000

Summary:
sum = 16235.26 MB/sec
min = 405.97 MB/sec
max = 1057.61 MB/sec
avg = 676.47 MB/sec
median = 684.67 MB/sec

[Warning] connection between gpseg1 and gpseg2 is no good
[Warning] connection between gpseg5 and gpseg6 is no good
[Warning] connection between gpseg7 and gpseg8 is no good
[Warning] connection between gpseg9 and gpseg10 is no good
[Warning] connection between gpseg11 and gpseg12 is no good
[Warning] connection between gpseg13 and gpseg14 is no good
[Warning] connection between gpseg15 and gpseg16 is no good
[Warning] connection between gpseg17 and gpseg18 is no good
[Warning] connection between gpseg19 and gpseg20 is no good
[Warning] connection between gpseg23 and gpseg24 is no good
[Warning] connection between gpseg2 and gpseg1 is no good
[Warning] connection between gpseg4 and gpseg3 is no good
[Warning] connection between gpseg6 and gpseg5 is no good
[Warning] connection between gpseg8 and gpseg7 is no good
[Warning] connection between gpseg12 and gpseg11 is no good
[Warning] connection between gpseg14 and gpseg13 is no good
[Warning] connection between gpseg16 and gpseg15 is no good
[Warning] connection between gpseg18 and gpseg17 is no good
[Warning] connection between gpseg22 and gpseg21 is no good
[Warning] connection between gpseg24 and gpseg23 is no good
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

Jamie Cox

unread,
Apr 19, 2017, 4:45:22 PM4/19/17
to Greenplum Users, lma...@pivotal.io, cpu...@gmail.com
Hi David,
Yes, this is a dedicated 10GB interconnect only for the GreenPlum cluster
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--
David Gordon | Senior Account Data Engineer | Pivotal

Mobile: 917-699-2181 | dgordon...@pivotal.io


Keaton Adams

unread,
Apr 19, 2017, 4:55:58 PM4/19/17
to Greenplum Users, cpu...@gmail.com
This is what it should look like, if it is a full 10 GiB Interconnect, with all ports having full bandwidth, operating at line rate speeds, non-blocking. So something is not right in the output of your gpcheckperf, if the expectation is a full 1 GB between hosts on the dedicated interconnect network.


-------------------
--  NETPERF TEST
-------------------
 
====================
==  RESULT
====================
Netperf bisection bandwidth test
mdw -> smdw = 1122.390000
sdw1 -> sdw2 = 1122.440000
sdw3 -> sdw4 = 1122.400000
sdw5 -> sdw6 = 1122.400000
sdw7 -> sdw8 = 1122.400000
sdw9 -> sdw10 = 1122.400000
sdw11 -> sdw12 = 1122.390000
smdw -> mdw = 1122.390000
sdw2 -> sdw1 = 1122.390000
sdw4 -> sdw3 = 1122.380000
sdw6 -> sdw5 = 1122.400000
sdw8 -> sdw7 = 1122.460000
sdw10 -> sdw9 = 1122.390000
sdw12 -> sdw11 = 1122.380000
 
Summary:
sum = 15713.61 MB/sec
min = 1122.38 MB/sec
max = 1122.46 MB/sec
avg = 1122.40 MB/sec
median = 1122.40 MB/sec

Luis Macedo

unread,
Apr 19, 2017, 5:10:03 PM4/19/17
to Keaton Adams, cpu...@gmail.com, Greenplum Users
Jamie,

How about the memory and disk tests? They both ran fine?

The results now make me think for some reason your switch is not able to handle the load...

Maybe run the test with, say, 4 servers. See if you get better network performance then.


--- Sent from my Nexus 5x
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

yuwei...@gmail.com

unread,
Apr 19, 2017, 5:28:54 PM4/19/17
to Keaton Adams, Luis Macedo, Greenplum Users, cpu...@gmail.com
How about the sysctl settings?
Can you show those settings?
--
Yu-wei Sung

Scott Kahler

unread,
Apr 19, 2017, 5:31:41 PM4/19/17
to Greenplum Users, cpu...@gmail.com
do you know what model the dedicated switch is?

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.
--
Yu-wei Sung

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--

Scott Kahler | Pivotal, Greenplum Product Management  | ska...@pivotal.io | 816.237.0610

Jamie Cox

unread,
Apr 19, 2017, 5:33:48 PM4/19/17
to Greenplum Users, kad...@pivotal.io, cpu...@gmail.com
Yes those ran fine. 

Jamie Cox

unread,
Apr 19, 2017, 5:34:42 PM4/19/17
to Greenplum Users, cpu...@gmail.com
This is running on an Arista DCS-7050 switch.
--
Yu-wei Sung

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--

Jamie Cox

unread,
Apr 19, 2017, 5:35:40 PM4/19/17
to Greenplum Users, kad...@pivotal.io, lma...@pivotal.io, cpu...@gmail.com
Hi Yu-wei,
Here is my sysctl. conf file
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.all.disable_ipv6 = 1

#Set for use with GreenPlum
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.default.accept_redirects=0
net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.default.secure_redirects=0
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
vm.swappiness = 0

..: Mark Sloan :..

unread,
Apr 19, 2017, 5:52:11 PM4/19/17
to Jamie Cox, Greenplum Users, Keaton Adams, lma...@pivotal.io
check bios power saving stuff ..etc although if it was that I would also expect to see variance in the stream (memory) benchmark.


AFAIK the warning comes up when there is a large enough variance between the links.


as an example:

$ gpcheckperf -f ~/hosts/hosts.all-segments-single -rN -d /tmp -D
/usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/hosts/hosts.all-segments-single -rN -d /tmp -D


-------------------
--  NETPERF TEST
-------------------

====================
==  RESULT
====================
Netperf bisection bandwidth test
sdw1 -> sdw2 = 1122.460000
sdw3 -> sdw4 = 1122.340000
sdw5 -> sdw6 = 1104.390000
sdw7 -> sdw8 = 671.500000

sdw2 -> sdw1 = 1122.390000
sdw4 -> sdw3 = 1122.510000
sdw6 -> sdw5 = 1122.620000
sdw8 -> sdw7 = 566.110000

Summary:
sum = 7954.32 MB/sec
min = 566.11 MB/sec
max = 1122.62 MB/sec
avg = 994.29 MB/sec
median = 1122.39 MB/sec

[Warning] connection between sdw7 and sdw8 is no good
[Warning] connection between sdw8 and sdw7 is no good


To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

yuwei...@gmail.com

unread,
Apr 19, 2017, 6:00:47 PM4/19/17
to ..: Mark Sloan :.., Jamie Cox, Greenplum Users, Keaton Adams, lma...@pivotal.io
Check if you have backlog or packet drops  on nics between sdw7 and sdw8.   Are they in the same host?
--
Yu-wei Sung

Robert Mcphail

unread,
Apr 19, 2017, 6:11:14 PM4/19/17
to ..: Mark Sloan :.., Jamie Cox, Greenplum Users, Keaton Adams, lma...@pivotal.io
Hi Jamie,

If you have access to the hardware (or if someone can do this for you) I'd reseat the cables both on the server and switch side just to be certain.  You could also swap a cable from the slower server with a faster one to test the cable.

If you are using twinax cables rather than fiber, twinax can be touchy regarding outside interference, length of cable, etc, even NIC brand.  I've seen more than one occurrence of this causing flakey issues.  

Also might be good to verify you have updated drivers on the network cards.  At least make sure the driver versions are consistent on all servers in the cluster.

The segment servers in the cluster should all have the exact same hardware configuration, BIOS, firmware and driver versions.

By the way, when ever the network test result is less than 1000 (10 Gb) it will always say the connection is no good.  

--

Bob McPhail  |  Partner Engineering  |  Pivotal 

Jamie Cox

unread,
Apr 19, 2017, 6:14:34 PM4/19/17
to Greenplum Users, mark.a...@gmail.com, cpu...@gmail.com, kad...@pivotal.io, lma...@pivotal.io
These two hosts are on different switches in different racks. I dont see any dropped packets from the host to the switch, but I do see some from the switch to the core. I will look into these dropped packets.
It is possible that the uplink is getting saturated.
Thank You so much. 

Jamie Cox

unread,
Apr 19, 2017, 6:21:51 PM4/19/17
to Greenplum Users, mark.a...@gmail.com, cpu...@gmail.com, kad...@pivotal.io, lma...@pivotal.io
Hi Robert,
I am using 10GB over copper RJ-45 to the switch and 40GB Twinax bonded to the core.  This is the way I readi it in the documentation from GreenPlum. The only difference is I dont have 2 TOR switches, I used 1. 
I will update my drivers and make sure all are the same. 

Thank You
Bob McPhail  |  Partner Engineering  |  Pivotal 

Luis Macedo

unread,
Apr 19, 2017, 6:35:51 PM4/19/17
to Jamie Cox, Greenplum Users, mark.a...@gmail.com, kad...@pivotal.io
Usually GPDB servers have dedicated switches. There is too much chat between the nodes and this usually is a problem for the core network.

If your network between servers is running through the core you should change that since you got physical servers.


--- Sent from my Nexus 5x

Jamie Cox

unread,
Apr 19, 2017, 6:42:04 PM4/19/17
to Greenplum Users, cpu...@gmail.com, mark.a...@gmail.com, kad...@pivotal.io
Hi Luis,
We plan on having multiple racks of GreenPlum servers that are connected together by this core. These are the only server using this core and it will stay this way. I am wondering if I need to add more uplinks to the core since I have dropped packets there.
Thank You

Luis Macedo

unread,
Apr 19, 2017, 8:02:01 PM4/19/17
to Jamie Cox, Greenplum Users, mark.a...@gmail.com, kad...@pivotal.io
Nice! 

Then you should check your core throughout as you mentioned. Looks like you are reaching capacity, that is why I told you to try the tests with less servers.

--- Sent from my Google Pixel

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Jamie Cox

unread,
Apr 20, 2017, 10:56:46 AM4/20/17
to Greenplum Users, cpu...@gmail.com, mark.a...@gmail.com, kad...@pivotal.io
I have changed my file to include 8 nodes total and that test comes out perfect. It looks like this is an oversubscription problem.
Currently I have 1 -10GB link from each server to the TOR switch, and 1 - 40GB going from the TOR to each of the core switches(which makes 80GB total). Is this the type of design expected for this topology? I have read through the Pivotal literature and this is what I believe that they suggested. 

Thank You so much for all of your help in this
Jamie

Luis Macedo

unread,
Apr 24, 2017, 9:50:20 AM4/24/17
to Jamie Cox, Greenplum Users, mark.a...@gmail.com, Keaton Adams
Hey Jamie,

Check this manual, pg 4-5. 


It has a picture of what the recommendation is for 1 rack and 2+ racks.

It sounds like you have the correct setup... We might be misreading the test. Maybe the full matrix run all connections at the same time and what you are seen is the real limitation that anyone would see when having a cluster larger than one rack. 

That is why Infiniband exists right? :) 


Rgds,


Luis Macedo | Sr Platform Architect | Pivotal Inc 

Mobile: +55 11 97616-6438

Take care of the customers and the rest takes care of itself

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Jamie Cox

unread,
Apr 24, 2017, 2:06:33 PM4/24/17
to Greenplum Users, cpu...@gmail.com, mark.a...@gmail.com, kad...@pivotal.io
Thank You so much Luis. I will play around a little with my uplink to see if I can get the results to turn out as desired. 
Reply all
Reply to author
Forward
0 new messages