Warnings inside the output of gpcheckperf

Jamie Cox

unread,

Apr 19, 2017, 11:56:16 AM4/19/17

to Greenplum Users

I am running gpcheckperf -f hostfile_gpcheckperf -r M -d /tmp > netcheck.out

and I am getting these lines in the output along with the results...

[Warning] netperf failed on gpseg10 -> gpseg4

[Warning] netperf failed on gpseg10 -> gpseg20

[Warning] netperf failed on gpseg11 -> gpseg2

I am wondering if these messages are something I need to fix, or are they ok?

Thank You so much

Jamie

Keaton Adams

unread,

Apr 19, 2017, 11:58:34 AM4/19/17

to Greenplum Users

"To run a full-matrix bandwidth test, you can specify -r M which will cause every host to send and receive data from every other host specified. This test is best used to validate if the switch fabric can tolerate a full-matrix workload."

What is the network configuration on your GP cluster? Are these physical machines / network switches or virtual?

Jamie Cox

unread,

Apr 19, 2017, 12:03:24 PM4/19/17

to Greenplum Users

Hi Keaton,

These are physical machines each with 10GB ethernet attached to a physical switch.

I was running this test after I switched to 10GB to make sure everything was running good.

Thank You

Luis Macedo

unread,

Apr 19, 2017, 12:07:23 PM4/19/17

to Jamie Cox, Greenplum Users

Looks like you have some issue with your network.

Can you ping gpseg4 from gpseg10 and so on?

Rgds,

Luis Macedo | Sr Platform Architect | Pivotal Inc

Mobile: +55 11 97616-6438

Pivotal.io

Take care of the customers and the rest takes care of itself

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

Jamie Cox

unread,

Apr 19, 2017, 12:14:10 PM4/19/17

to Greenplum Users, cpu...@gmail.com

Hi Luis,

Yes, I can ping all nodes from each other. I see no problems with the network so far.

Thank You

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

Mayur Mahadeshwar

unread,

Apr 19, 2017, 12:47:54 PM4/19/17

to Jamie Cox, Greenplum Users

Hi Jamie ,

Can you run it with a -v flag . And look for [Info] before [Warning] 'netperf failed' it will show you the actual command run .

Regards,

Mayur Mahadeshwar

Data Engineer

919-394-7677|

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Keaton Adams

unread,

Apr 19, 2017, 12:50:44 PM4/19/17

to Greenplum Users

Also, what does a regular gpheckperf on the network show?

gpcheckperf -f hostfile_gpcheckperf -r N -d /tmp > netcheck.out

On Wednesday, April 19, 2017 at 8:56:16 AM UTC-7, Jamie Cox wrote:

Jamie Cox

unread,

Apr 19, 2017, 12:58:44 PM4/19/17

to Greenplum Users

Hi Keaton,

I see some of these in there as well.

[Warning] connection between gpseg1 and gpseg2 is no good

[Warning] connection between gpseg5 and gpseg6 is no good

[Warning] connection between gpseg7 and gpseg8 is no good

Jamie Cox

unread,

Apr 19, 2017, 1:00:18 PM4/19/17

to Greenplum Users, cpu...@gmail.com

Hi Mayur,

I see these before the warnings...

[Info] gpseg10 -> gpseg7 : ['0', '0', '32768', '14.80', '49.34']

[Info] Connected to server

0 0 32768 14.74 52.50

[Info] gpseg10 -> gpseg4 : ['0', '0', '32768', '14.74', '52.50']

[Warning] netperf failed on gpseg10 -> gpseg5

[Warning] netperf failed on gpseg10 -> gpseg24

[Info] Connected to server

0 0 32768 14.84 35.20

[Info] gpseg10 -> gpseg21 : ['0', '0', '32768', '14.84', '35.20']

[Warning] netperf failed on gpseg10 -> gpseg20

Luis Macedo

unread,

Apr 19, 2017, 1:16:44 PM4/19/17

to Jamie Cox, Greenplum Users

From a distance looks like a firewall issue.

Did you exchange keys between all nodes?

--- Sent from my Nexus 5x

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Jamie Cox

unread,

Apr 19, 2017, 1:22:09 PM4/19/17

to Greenplum Users, cpu...@gmail.com

All firewalls are currently off and I re-ran the key exchange to make sure that was not the issue.

Keaton Adams

unread,

Apr 19, 2017, 2:03:08 PM4/19/17

to Greenplum Users

There really should be no errors, especially on a "regular" gpcheckperf network test. The fact that there are links between some of the servers that are reported as "no good" indicates that most likely, something in the network layer itself is having issues. From the cluster configuration guide, this is what the basic requirements for the Interconnect are:

"Two layer-2/layer-3 managed switches per rack. All ports must have full bandwidth, be able to operate at line rate, and be non-blocking." 10 GiB bandwidth.

http://gpdb.docs.pivotal.io/clustering/topics/recommended_arch.html

Also see if these resources are helpful at all:

http://gpdb.docs.pivotal.io/43111/pdf/GPDB43BestPracticesA05.pdf

https://content.pivotal.io/blog/how-to-build-a-hardware-cluster-for-pivotal-greenplum-database

Might want to also check with your Network Admin to see if there are errors at the switch level to help in troubleshooting the problem. If nothing else, if your company has a support contract with Pivotal, open up a case and a Support specialist will be able to help troubleshoot further.

Thanks.

Jamie Cox

unread,

Apr 19, 2017, 2:10:06 PM4/19/17

to Greenplum Users

Hi Keaton,

I wouldnt say that they are errors.

Inside that netcheck.out I see results...

gpseg1 -> gpseg2 = 537.830000

gpseg2 -> gpseg1 = 545.800000

and I also see the Warning...

[Warning] connection between gpseg1 and gpseg2 is no good

The results for these systems are lower than the rest, but they are working. Do you think this is why yhey are warnings instead of errors?

I will look through the documents provided.

Thank You

Luis Macedo

unread,

Apr 19, 2017, 3:52:06 PM4/19/17

to Jamie Cox, Greenplum Users

Oh yeah... That is it the warning means that the performance is below the recommended.

Can you post the entire output? Try picking up from the screen plz.

Thanks,

Luis Macedo | Sr Platform Architect | Pivotal Inc

Mobile: +55 11 97616-6438

Pivotal.io

Take care of the customers and the rest takes care of itself

--

You received this message because you are subscribed to the Google Groups "Greenplum Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

David Gordon

unread,

Apr 19, 2017, 4:01:18 PM4/19/17

to Luis Macedo, Jamie Cox, Greenplum Users

Jamie,

Is the network between nodes dedicated to the Greenplum cluster?

David

--

David Gordon | Senior Account Data Engineer | Pivotal

Mobile: 917-699-2181 | dgordon@p ivotal.io

Jamie Cox

unread,

Apr 19, 2017, 4:43:57 PM4/19/17

to Greenplum Users, cpu...@gmail.com

Here is the output from one of the commands...

/usr/local/greenplum-db/./bin/gpcheckperf -f hostfile_gpcheckperf -r N -d /tmp

-------------------

-- NETPERF TEST

-------------------

====================

== RESULT

====================

Netperf bisection bandwidth test

gpseg1 -> gpseg2 = 537.830000

gpseg3 -> gpseg4 = 983.700000

gpseg5 -> gpseg6 = 684.670000

gpseg7 -> gpseg8 = 703.440000

gpseg9 -> gpseg10 = 748.910000

gpseg11 -> gpseg12 = 707.880000

gpseg13 -> gpseg14 = 480.850000

gpseg15 -> gpseg16 = 746.280000

gpseg17 -> gpseg18 = 615.130000

gpseg19 -> gpseg20 = 936.580000

gpseg21 -> gpseg22 = 1013.510000

gpseg23 -> gpseg24 = 501.940000

gpseg2 -> gpseg1 = 545.800000

gpseg4 -> gpseg3 = 407.490000

gpseg6 -> gpseg5 = 735.750000

gpseg8 -> gpseg7 = 430.670000

gpseg10 -> gpseg9 = 997.230000

gpseg12 -> gpseg11 = 630.550000

gpseg14 -> gpseg13 = 449.080000

gpseg16 -> gpseg15 = 405.970000

gpseg18 -> gpseg17 = 579.260000

gpseg20 -> gpseg19 = 1057.610000

gpseg22 -> gpseg21 = 795.780000

gpseg24 -> gpseg23 = 539.350000

Summary:

sum = 16235.26 MB/sec

min = 405.97 MB/sec

max = 1057.61 MB/sec

avg = 676.47 MB/sec

median = 684.67 MB/sec

[Warning] connection between gpseg1 and gpseg2 is no good

[Warning] connection between gpseg5 and gpseg6 is no good

[Warning] connection between gpseg7 and gpseg8 is no good

[Warning] connection between gpseg9 and gpseg10 is no good

[Warning] connection between gpseg11 and gpseg12 is no good

[Warning] connection between gpseg13 and gpseg14 is no good

[Warning] connection between gpseg15 and gpseg16 is no good

[Warning] connection between gpseg17 and gpseg18 is no good

[Warning] connection between gpseg19 and gpseg20 is no good

[Warning] connection between gpseg23 and gpseg24 is no good

[Warning] connection between gpseg2 and gpseg1 is no good

[Warning] connection between gpseg4 and gpseg3 is no good

[Warning] connection between gpseg6 and gpseg5 is no good

[Warning] connection between gpseg8 and gpseg7 is no good

[Warning] connection between gpseg12 and gpseg11 is no good

[Warning] connection between gpseg14 and gpseg13 is no good

[Warning] connection between gpseg16 and gpseg15 is no good

[Warning] connection between gpseg18 and gpseg17 is no good

[Warning] connection between gpseg22 and gpseg21 is no good

[Warning] connection between gpseg24 and gpseg23 is no good

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

Jamie Cox

unread,

Apr 19, 2017, 4:45:22 PM4/19/17

to Greenplum Users, lma...@pivotal.io, cpu...@gmail.com

Hi David,

Yes, this is a dedicated 10GB interconnect only for the GreenPlum cluster

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--

David Gordon | Senior Account Data Engineer | Pivotal

Mobile: 917-699-2181 | dgordon...@p ivotal.io

Keaton Adams

unread,

Apr 19, 2017, 4:55:58 PM4/19/17

to Greenplum Users, cpu...@gmail.com

This is what it should look like, if it is a full 10 GiB Interconnect, with all ports having full bandwidth, operating at line rate speeds, non-blocking. So something is not right in the output of your gpcheckperf, if the expectation is a full 1 GB between hosts on the dedicated interconnect network.

-------------------

-- NETPERF TEST

-------------------

====================

== RESULT

====================

Netperf bisection bandwidth test

mdw -> smdw = 1122.390000

sdw1 -> sdw2 = 1122.440000

sdw3 -> sdw4 = 1122.400000

sdw5 -> sdw6 = 1122.400000

sdw7 -> sdw8 = 1122.400000

sdw9 -> sdw10 = 1122.400000

sdw11 -> sdw12 = 1122.390000

smdw -> mdw = 1122.390000

sdw2 -> sdw1 = 1122.390000

sdw4 -> sdw3 = 1122.380000

sdw6 -> sdw5 = 1122.400000

sdw8 -> sdw7 = 1122.460000

sdw10 -> sdw9 = 1122.390000

sdw12 -> sdw11 = 1122.380000

Summary:

sum = 15713.61 MB/sec

min = 1122.38 MB/sec

max = 1122.46 MB/sec

avg = 1122.40 MB/sec

median = 1122.40 MB/sec

Luis Macedo

unread,

Apr 19, 2017, 5:10:03 PM4/19/17

to Keaton Adams, cpu...@gmail.com, Greenplum Users

Jamie,

How about the memory and disk tests? They both ran fine?

The results now make me think for some reason your switch is not able to handle the load...

Maybe run the test with, say, 4 servers. See if you get better network performance then.

--- Sent from my Nexus 5x

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

yuwei...@gmail.com

unread,

Apr 19, 2017, 5:28:54 PM4/19/17

to Keaton Adams, Luis Macedo, Greenplum Users, cpu...@gmail.com

How about the sysctl settings?

Can you show those settings?

--

Yu-wei Sung

Scott Kahler

unread,

Apr 19, 2017, 5:31:41 PM4/19/17

to Greenplum Users, cpu...@gmail.com

do you know what model the dedicated switch is?

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--
Yu-wei Sung

--

You received this message because you are subscribed to the Google Groups "Greenplum Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--

Scott Kahler | Pivotal, Greenplum Product Management | ska...@pivotal.io | 816.237.0610

Jamie Cox

unread,

Apr 19, 2017, 5:33:48 PM4/19/17

to Greenplum Users, kad...@pivotal.io, cpu...@gmail.com

Yes those ran fine.

Jamie Cox

unread,

Apr 19, 2017, 5:34:42 PM4/19/17

to Greenplum Users, cpu...@gmail.com

This is running on an Arista DCS-7050 switch.

--
Yu-wei Sung

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

--

Jamie Cox

unread,

Apr 19, 2017, 5:35:40 PM4/19/17

to Greenplum Users, kad...@pivotal.io, lma...@pivotal.io, cpu...@gmail.com

Hi Yu-wei,

Here is my sysctl. conf file

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.all.disable_ipv6 = 1

#Set for use with GreenPlum

kernel.shmmax = 500000000

kernel.shmmni = 4096

kernel.shmall = 4000000000

kernel.sem = 250 512000 100 2048

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.ip_forward = 0

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.ipv4.ip_local_port_range = 1025 65535

net.ipv4.conf.all.accept_redirects=0

net.ipv4.conf.default.accept_redirects=0

net.ipv4.conf.all.secure_redirects=0

net.ipv4.conf.default.secure_redirects=0

net.core.netdev_max_backlog = 10000

net.core.rmem_max = 2097152

net.core.wmem_max = 2097152

vm.overcommit_memory = 2

vm.dirty_ratio = 10

vm.dirty_background_ratio = 1

vm.swappiness = 0

..: Mark Sloan :..

unread,

Apr 19, 2017, 5:52:11 PM4/19/17

to Jamie Cox, Greenplum Users, Keaton Adams, lma...@pivotal.io

check bios power saving stuff ..etc although if it was that I would also expect to see variance in the stream (memory) benchmark.

AFAIK the warning comes up when there is a large enough variance between the links.

as an example:

$ gpcheckperf -f ~/hosts/hosts.all-segments-single -rN -d /tmp -D
/usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/hosts/hosts.all-segments-single -rN -d /tmp -D

-------------------
-- NETPERF TEST
-------------------

====================
== RESULT
====================
Netperf bisection bandwidth test

sdw1 -> sdw2 = 1122.460000
sdw3 -> sdw4 = 1122.340000
sdw5 -> sdw6 = 1104.390000
sdw7 -> sdw8 = 671.500000

sdw2 -> sdw1 = 1122.390000

sdw4 -> sdw3 = 1122.510000
sdw6 -> sdw5 = 1122.620000
sdw8 -> sdw7 = 566.110000

Summary:
sum = 7954.32 MB/sec
min = 566.11 MB/sec
max = 1122.62 MB/sec
avg = 994.29 MB/sec
median = 1122.39 MB/sec

[Warning] connection between sdw7 and sdw8 is no good
[Warning] connection between sdw8 and sdw7 is no good

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

yuwei...@gmail.com

unread,

Apr 19, 2017, 6:00:47 PM4/19/17

to ..: Mark Sloan :.., Jamie Cox, Greenplum Users, Keaton Adams, lma...@pivotal.io

Check if you have backlog or packet drops on nics between sdw7 and sdw8. Are they in the same host?

--

Yu-wei Sung

Robert Mcphail

unread,

Apr 19, 2017, 6:11:14 PM4/19/17

to ..: Mark Sloan :.., Jamie Cox, Greenplum Users, Keaton Adams, lma...@pivotal.io

Hi Jamie,

If you have access to the hardware (or if someone can do this for you) I'd reseat the cables both on the server and switch side just to be certain. You could also swap a cable from the slower server with a faster one to test the cable.

If you are using twinax cables rather than fiber, twinax can be touchy regarding outside interference, length of cable, etc, even NIC brand. I've seen more than one occurrence of this causing flakey issues.

Also might be good to verify you have updated drivers on the network cards. At least make sure the driver versions are consistent on all servers in the cluster.

The segment servers in the cluster should all have the exact same hardware configuration, BIOS, firmware and driver versions.

By the way, when ever the network test result is less than 1000 (10 Gb) it will always say the connection is no good.

--

Bob McPhail | Partner Engineering | Pivotal

C (919) 601 0595

rmcphail@pivotal.io | www.pivotal.io

Jamie Cox

unread,

Apr 19, 2017, 6:14:34 PM4/19/17

to Greenplum Users, mark.a...@gmail.com, cpu...@gmail.com, kad...@pivotal.io, lma...@pivotal.io

These two hosts are on different switches in different racks. I dont see any dropped packets from the host to the switch, but I do see some from the switch to the core. I will look into these dropped packets.

It is possible that the uplink is getting saturated.

Thank You so much.

Jamie Cox

unread,

Apr 19, 2017, 6:21:51 PM4/19/17

to Greenplum Users, mark.a...@gmail.com, cpu...@gmail.com, kad...@pivotal.io, lma...@pivotal.io

Hi Robert,

I am using 10GB over copper RJ-45 to the switch and 40GB Twinax bonded to the core. This is the way I readi it in the documentation from GreenPlum. The only difference is I dont have 2 TOR switches, I used 1.

I will update my drivers and make sure all are the same.

Thank You

Bob McPhail | Partner Engineering | Pivotal
C (919) 601 0595

rmcp...@pivotal.io | www.pivotal.io

Luis Macedo

unread,

Apr 19, 2017, 6:35:51 PM4/19/17

to Jamie Cox, Greenplum Users, mark.a...@gmail.com, kad...@pivotal.io

Usually GPDB servers have dedicated switches. There is too much chat between the nodes and this usually is a problem for the core network.

If your network between servers is running through the core you should change that since you got physical servers.

--- Sent from my Nexus 5x

Jamie Cox

unread,

Apr 19, 2017, 6:42:04 PM4/19/17

to Greenplum Users, cpu...@gmail.com, mark.a...@gmail.com, kad...@pivotal.io

Hi Luis,

We plan on having multiple racks of GreenPlum servers that are connected together by this core. These are the only server using this core and it will stay this way. I am wondering if I need to add more uplinks to the core since I have dropped packets there.

Thank You

Luis Macedo

unread,

Apr 19, 2017, 8:02:01 PM4/19/17

to Jamie Cox, Greenplum Users, mark.a...@gmail.com, kad...@pivotal.io

Nice!

Then you should check your core throughout as you mentioned. Looks like you are reaching capacity, that is why I told you to try the tests with less servers.

--- Sent from my Google Pixel

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Jamie Cox

unread,

Apr 20, 2017, 10:56:46 AM4/20/17

to Greenplum Users, cpu...@gmail.com, mark.a...@gmail.com, kad...@pivotal.io

I have changed my file to include 8 nodes total and that test comes out perfect. It looks like this is an oversubscription problem.

Currently I have 1 -10GB link from each server to the TOR switch, and 1 - 40GB going from the TOR to each of the core switches(which makes 80GB total). Is this the type of design expected for this topology? I have read through the Pivotal literature and this is what I believe that they suggested.

Thank You so much for all of your help in this

Jamie

Luis Macedo

unread,

Apr 24, 2017, 9:50:20 AM4/24/17

to Jamie Cox, Greenplum Users, mark.a...@gmail.com, Keaton Adams

Hey Jamie,

Check this manual, pg 4-5.

http://gpdb.docs.pivotal.io/43100/pdf/Pivotal_Clustering_Concepts_A03.pdf

It has a picture of what the recommendation is for 1 rack and 2+ racks.

It sounds like you have the correct setup... We might be misreading the test. Maybe the full matrix run all connections at the same time and what you are seen is the real limitation that anyone would see when having a cluster larger than one rack.

That is why Infiniband exists right? :)

Rgds,

Luis Macedo | Sr Platform Architect | Pivotal Inc

Mobile: +55 11 97616-6438

Pivotal.io

Take care of the customers and the rest takes care of itself

To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.

Jamie Cox

unread,

Apr 24, 2017, 2:06:33 PM4/24/17

to Greenplum Users, cpu...@gmail.com, mark.a...@gmail.com, kad...@pivotal.io

Thank You so much Luis. I will play around a little with my uplink to see if I can get the results to turn out as desired.

Reply all

Reply to author

Forward