Procedure to Package GPDB OSS for Distribution to Non-Dev Servers?

115 views
Skip to first unread message

kad...@pivotal.io

unread,
Apr 5, 2017, 9:40:44 AM4/5/17
to Greenplum Developers
So I built GPDB 5.0.alpha (including ORCA) and brought up the database on my Dev machine.  Great start.

I then tar-gzipped up /usr/local/gpdb and copied it to another CentOS 6.8 server for deployment.  This went well until I attempted to initialize Greenplum, then I ran into errors:

20170405:07:26:47:009622 gpinitsystem:mdw:gpadmin-[INFO]:-Starting the Master in admin mode
Error: unable to import module: No module named psutil
20170405:07:26:50:gpinitsystem:mdw:gpadmin-[FATAL]:-Unknown host  Script Exiting!
20170405:07:26:50:009622 gpinitsystem:mdw:gpadmin-[WARN]:-Script has left Greenplum Database in an incomplete state


So I attempted to install the psutil python module, but that failed because this is not a Dev machine with all of the dependencies installed:

    psutil/_psutil_linux.c:12:20: error: Python.h: No such file or directory
    psutil/_psutil_linux.c:40: error: ‘CHAR_BIT’ undeclared here (not in a function)


How are companies deploying the OSS version of GPDB on servers that are not full Dev boxes?  Or does setting up their "production" clusters involve building them as essentially Dev servers with the full list of -devel packages installed in order to run OSS GPDB?

Is there an easy way to package up a compiled /usr/local/gpdb OSS release with all of its dependencies, for the purpose of deploying it on other (non-Dev) machines elsewhere?

If there is something I can just go read to get the details of how to package up something like this, I would be happy to be pointed in the right direction and read up on the details myself.

Thanks,

Keaton



gpdb_oss_deploy.txt

Dave Cramer

unread,
Apr 5, 2017, 1:23:14 PM4/5/17
to Keaton Adams, Greenplum Developers
On Wed, Apr 5, 2017 at 9:40 AM, <kad...@pivotal.io> wrote:
So I built GPDB 5.0.alpha (including ORCA) and brought up the database on my Dev machine.  Great start.

I then tar-gzipped up /usr/local/gpdb and copied it to another CentOS 6.8 server for deployment.  This went well until I attempted to initialize Greenplum, then I ran into errors:

20170405:07:26:47:009622 gpinitsystem:mdw:gpadmin-[INFO]:-Starting the Master in admin mode
Error: unable to import module: No module named psutil
20170405:07:26:50:gpinitsystem:mdw:gpadmin-[FATAL]:-Unknown host  Script Exiting!
20170405:07:26:50:009622 gpinitsystem:mdw:gpadmin-[WARN]:-Script has left Greenplum Database in an incomplete state


So I attempted to install the psutil python module, but that failed because this is not a Dev machine with all of the dependencies installed:

    psutil/_psutil_linux.c:12:20: error: Python.h: No such file or directory
    psutil/_psutil_linux.c:40: error: ‘CHAR_BIT’ undeclared here (not in a function)


How are companies deploying the OSS version of GPDB on servers that are not full Dev boxes?  Or does setting up their "production" clusters involve building them as essentially Dev servers with the full list of -devel packages installed in order to run OSS GPDB?

Not sure these are either/or. 

How did you attempt to install psutil ?

Dave Cramer
 

kad...@pivotal.io

unread,
Apr 5, 2017, 1:44:19 PM4/5/17
to Greenplum Developers, kad...@pivotal.io
Like so:

[root@mdw ~]# cat /etc/system-release
CentOS release 6.9 (Final)

[root@mdw ~]# uname -a
Linux mdw 2.6.32-696.el6.x86_64 #1 SMP Tue Mar 21 19:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


warning: /var/tmp/rpm-tmp.X3Iw1u: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing...                ########################################### [100%]
   1:epel-release           ########################################### [100%]
[root@mdw ~]# yum install python-pip -y
Loaded plugins: fastestmirror, security
Setting up Install Process
Loading mirror speeds from cached hostfile
epel/metalink                                                                                                                                    |  11 kB     00:00     
epel                                                                                                                                             | 4.3 kB     00:00     
epel/primary_db                                                                                                                                  | 5.9 MB     00:01     
Resolving Dependencies
--> Running transaction check
---> Package python-pip.noarch 0:7.1.0-1.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================
 Package                                   Arch                                  Version                                      Repository                           Size
========================================================================================================================================================================
Installing:
 python-pip                                noarch                                7.1.0-1.el6                                  epel                                1.5 M

Transaction Summary
========================================================================================================================================================================
Install       1 Package(s)

Total download size: 1.5 M
Installed size: 6.6 M
Downloading Packages:
python-pip-7.1.0-1.el6.noarch.rpm                                                                                                                | 1.5 MB     00:00     
warning: rpmts_HdrFromFdno: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
Importing GPG key 0x0608B895:
 Userid : EPEL (6) <ep...@fedoraproject.org>
 Package: epel-release-6-8.noarch (installed)
 From   : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Warning: RPMDB altered outside of yum.
  Installing : python-pip-7.1.0-1.el6.noarch                                                                                                                        1/1 
  Verifying  : python-pip-7.1.0-1.el6.noarch                                                                                                                        1/1 

Installed:
  python-pip.noarch 0:7.1.0-1.el6                                                                                                                                       

Complete!



[root@mdw ~]# pip install --upgrade psutil
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
You are using pip version 7.1.0, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting psutil
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading psutil-5.2.1.tar.gz (347kB)
    100% |████████████████████████████████| 348kB 892kB/s 
Installing collected packages: psutil
  Running setup.py install for psutil
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-H6S_e5/psutil/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-2OO_nP-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.6
    creating build/lib.linux-x86_64-2.6/psutil
    copying psutil/_pssunos.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/__init__.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_pswindows.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_compat.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_psposix.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_common.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_psosx.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_psbsd.py -> build/lib.linux-x86_64-2.6/psutil
    copying psutil/_pslinux.py -> build/lib.linux-x86_64-2.6/psutil
    creating build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_sunos.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/__init__.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_memory_leaks.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_process.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_system.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_linux.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_osx.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/runner.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_windows.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_misc.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_bsd.py -> build/lib.linux-x86_64-2.6/psutil/tests
    copying psutil/tests/test_posix.py -> build/lib.linux-x86_64-2.6/psutil/tests
    running build_ext
    building 'psutil._psutil_linux' extension
    creating build/temp.linux-x86_64-2.6
    creating build/temp.linux-x86_64-2.6/psutil
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_VERSION=521 -DPSUTIL_LINUX=1 -I/usr/include/python2.6 -c psutil/_psutil_linux.c -o build/temp.linux-x86_64-2.6/psutil/_psutil_linux.o
    psutil/_psutil_linux.c:12:20: error: Python.h: No such file or directory
    psutil/_psutil_linux.c:40: error: ‘CHAR_BIT’ undeclared here (not in a function)
    psutil/_psutil_linux.c: In function ‘ioprio_get’:
    psutil/_psutil_linux.c:72: warning: implicit declaration of function ‘syscall’
    psutil/_psutil_linux.c: At top level:
    psutil/_psutil_linux.c:91: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:111: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:192: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:242: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:270: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:377: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:435: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:484: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
    psutil/_psutil_linux.c:545: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘PsutilMethods’
    psutil/_psutil_linux.c:584: error: expected specifier-qualifier-list before ‘PyObject’
    psutil/_psutil_linux.c: In function ‘init_psutil_linux’:
    psutil/_psutil_linux.c:630: error: ‘PyObject’ undeclared (first use in this function)
    psutil/_psutil_linux.c:630: error: (Each undeclared identifier is reported only once
    psutil/_psutil_linux.c:630: error: for each function it appears in.)
    psutil/_psutil_linux.c:630: error: ‘v’ undeclared (first use in this function)
    psutil/_psutil_linux.c:634: error: ‘module’ undeclared (first use in this function)
    psutil/_psutil_linux.c:634: warning: implicit declaration of function ‘Py_InitModule’
    psutil/_psutil_linux.c:634: error: ‘PsutilMethods’ undeclared (first use in this function)
    psutil/_psutil_linux.c:637: warning: implicit declaration of function ‘PyModule_AddIntConstant’
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-H6S_e5/psutil/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-2OO_nP-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-H6S_e5/psutil

Shubham Sharma

unread,
Apr 5, 2017, 2:06:00 PM4/5/17
to kad...@pivotal.io, Greenplum Developers
Hey Keaton,

Try installing Python development libraries, yum install python-devel. This should help.

--
You received this message because you are subscribed to the Google Groups "Greenplum Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+unsubscribe@greenplum.org.

Dave Cramer

unread,
Apr 5, 2017, 2:20:20 PM4/5/17
to Shubham Sharma, Keaton Adams, Greenplum Developers
On Wed, Apr 5, 2017 at 2:05 PM, Shubham Sharma <topologi...@gmail.com> wrote:
Hey Keaton,

Try installing Python development libraries, yum install python-devel. This should help.


I think his point was is there a way to install GPDB without having development libraries on the box.

It appear no...


Dave Cramer

kad...@pivotal.io

unread,
Apr 5, 2017, 2:31:34 PM4/5/17
to Greenplum Developers, topologi...@gmail.com, kad...@pivotal.io
Right.  That is the point.

It is been a very long while since I did anything with packaging software for distribution.  Apologies if the answer is obvious and posted somewhere that I am not looking.

How can I build / package GPDB with all of its dependencies into a single rpm to deploy on servers that do not have the development libraries installed?  Is there a fairly easy way to do this?

Thanks again,

Keaton

Jimmy Yih

unread,
Apr 5, 2017, 2:32:24 PM4/5/17
to Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers
Without installing the libraries on each server, you would have to compile and package a standalone Python with the dependencies into your Greenplum tarball (and also make an edit to greenplum_path.sh).  This is what I think happens for Pivotal Greenplum.  Hopefully this disappears quickly with the switch to Golang.

Stanley Sung

unread,
Apr 5, 2017, 2:34:39 PM4/5/17
to Jimmy Yih, Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers
This reminds me of what ansible is capable. We can write a playbook to install all dependencies and get gpdb installed on slaves.
--
Regards,
--
Stanley Sung | Pivotal Data Engineering

Scott Kahler

unread,
Apr 5, 2017, 2:54:34 PM4/5/17
to Jimmy Yih, Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers
With the OSS build some of the dependancies are externalized such as the Python pieces. So you would need to make sure those python additions were on the non-build server. You could satisfy that by building an rpm that call it out as a pre-req or building a conda bundle you drop in your tarball.

There's a few projects out there that are taking a swipe at the bundling aspect.

I feel if we do too much of that bundling aspect in the OSS we start to lock ourselves and be opinionated on something where we should be more neutral.
--

Scott Kahler | Pivotal, Greenplum Product Management  | ska...@pivotal.io | 816.237.0610

kad...@pivotal.io

unread,
Apr 5, 2017, 3:06:14 PM4/5/17
to Greenplum Developers, jy...@pivotal.io, dcr...@pivotal.io, topologi...@gmail.com, kad...@pivotal.io
I installed the Python development libraries and the specific extensions required by GPDB and that worked. Thanks for the info.

I like what the Postgres community does. They have pre-built binary packages available from their Website.  They also have links to 3rd party sites that have done their own packaging and added in their own specifics to extend / enhance a given release.


This seems most flexible for those who want to just download the basics, packaged in a way that is as easily installable as PG is, with links to other forms of the packaging / distribution that other teams have felt valuable to produce.

Thanks for all of the input. Appreciated.

K

Roman Shaposhnik

unread,
Apr 5, 2017, 3:08:31 PM4/5/17
to Scott Kahler, Jimmy Yih, Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers
On Wed, Apr 5, 2017 at 11:54 AM, Scott Kahler <ska...@pivotal.io> wrote:
> With the OSS build some of the dependancies are externalized such as the
> Python pieces. So you would need to make sure those python additions were on
> the non-build server. You could satisfy that by building an rpm that call it
> out as a pre-req or building a conda bundle you drop in your tarball.
>
> There's a few projects out there that are taking a swipe at the bundling
> aspect.
>
> I feel if we do too much of that bundling aspect in the OSS we start to lock
> ourselves and be opinionated on something where we should be more neutral.

FYI: this is what installing Greenplum packaged by Bigtop looks like
on CentOS 7:
# curl https://www.apache.org/dist/bigtop/stable/repos/centos7/bigtop.repo
> /etc/yum.repos.d/bigtop.repo
# yum install gpdb
Dependencies Resolved

=====================================================================================================================================
Package Arch Version
Repository Size
=====================================================================================================================================
Installing:
gpdb x86_64
5.0.0+alpha.0-1.el7.centos bigtop 7.8
M
Installing for dependencies:
apr x86_64
1.4.8-3.el7 base 103
k
groff-base x86_64
1.22.2-8.el7 base 942
k
libevent x86_64
2.0.21-4.el7 base 214
k
perl x86_64
4:5.16.3-291.el7 base 8.0
M
perl-Carp noarch
1.26-244.el7 base 19
k
perl-Data-Dumper x86_64
2.145-3.el7 base 47
k
perl-Encode x86_64
2.51-7.el7 base 1.5
M
perl-Env noarch
1.04-2.el7 base 16
k
perl-Exporter noarch
5.68-3.el7 base 28
k
perl-File-Path noarch
2.09-2.el7 base 26
k
perl-File-Temp noarch
0.23.01-3.el7 base 56
k
perl-Filter x86_64
1.49-3.el7 base 76
k
perl-Getopt-Long noarch
2.40-2.el7 base 56
k
perl-HTTP-Tiny noarch
0.033-3.el7 base 38
k
perl-PathTools x86_64
3.40-5.el7 base 82
k
perl-Pod-Escapes noarch
1:1.04-291.el7 base 51
k
perl-Pod-Perldoc noarch
3.20-4.el7 base 87
k
perl-Pod-Simple noarch
1:3.28-4.el7 base 216
k
perl-Pod-Usage noarch
1.63-3.el7 base 27
k
perl-Scalar-List-Utils x86_64
1.27-248.el7 base 36
k
perl-Socket x86_64
2.010-4.el7 base 49
k
perl-Storable x86_64
2.45-3.el7 base 77
k
perl-Text-ParseWords noarch
3.29-4.el7 base 14
k
perl-Time-HiRes x86_64
4:1.9725-3.el7 base 45
k
perl-Time-Local noarch
1.2300-2.el7 base 24
k
perl-constant noarch
1.27-2.el7 base 19
k
perl-libs x86_64
4:5.16.3-291.el7 base 688
k
perl-macros x86_64
4:5.16.3-291.el7 base 43
k
perl-parent noarch
1:0.225-244.el7 base 12
k
perl-podlators noarch
2.5.1-3.el7 base 112
k
perl-threads x86_64
1.87-4.el7 base 49
k
perl-threads-shared x86_64
1.43-6.el7 base 39
k

Transaction Summary
=====================================================================================================================================
Install 1 Package (+32 Dependent packages)

Total download size: 20 M
Installed size: 72 M
Is this ok [y/d/N]:

Thanks,
Roman.

Roman Shaposhnik

unread,
Apr 5, 2017, 3:17:04 PM4/5/17
to Keaton Adams, Greenplum Developers
On Wed, Apr 5, 2017 at 6:40 AM, <kad...@pivotal.io> wrote:
> So I built GPDB 5.0.alpha (including ORCA) and brought up the database on my
> Dev machine. Great start.
>
> I then tar-gzipped up /usr/local/gpdb and copied it to another CentOS 6.8
> server for deployment. This went well until I attempted to initialize
> Greenplum, then I ran into errors:
>
> 20170405:07:26:47:009622 gpinitsystem:mdw:gpadmin-[INFO]:-Starting the
> Master in admin mode
> Error: unable to import module: No module named psutil
> 20170405:07:26:50:gpinitsystem:mdw:gpadmin-[FATAL]:-Unknown host Script
> Exiting!
> 20170405:07:26:50:009622 gpinitsystem:mdw:gpadmin-[WARN]:-Script has left
> Greenplum Database in an incomplete state
>
>
> So I attempted to install the psutil python module, but that failed because
> this is not a Dev machine with all of the dependencies installed:
>
> psutil/_psutil_linux.c:12:20: error: Python.h: No such file or directory
> psutil/_psutil_linux.c:40: error: ‘CHAR_BIT’ undeclared here (not in a
> function)
>
>
> How are companies deploying the OSS version of GPDB on servers that are not
> full Dev boxes?

One way this is happening with some of the companies is that they avoid
our Python scripts and integrate Greenplum into their overall orchestration
systems such as Puppet and Chef. In fact, EPAM has donated the very same
Puppet code that they use for their GPDB clients to Bigtop recently:
https://github.com/apache/bigtop/tree/master/bigtop-deploy/puppet/modules/gpdb

This approach works great especially since Puppet code is well integrated with
cloud deployment systems like Juju and Docker Swarm/Compose where our
Python code has a lot of problems with those.

Note, however, that Alpha1 broke the Puppet code and I'll be working on fixing
that this and next week so we can have a single command GPDB deployment
working with Puppet again.

> Is there an easy way to package up a compiled /usr/local/gpdb OSS release
> with all of its dependencies, for the purpose of deploying it on other
> (non-Dev) machines elsewhere?
>
> If there is something I can just go read to get the details of how to
> package up something like this, I would be happy to be pointed in the right
> direction and read up on the details myself.

Personally I feel that we need to coalesce around just a few
deployment/packaging
options. Our Python scripts are not going anywhere -- so that's option #1 that's
here to stay. As for option #2 I felt that what EPAM did was actually
the right thing
to do. GPDB existing independently of the rest of the IT
infrastructure is getting
more rare these days. All those accounts are running things like
Puppet/Chef/Ansible
and demand that THAT is the single source of truth for configuration
and deployment
of every single node in that IT department (not some random
configuration files of
individual software packages).

I would really like to encourage GPDB community to consider this as
option #2 and
help maintain Bigtop's puppet code.

Thoughts?

Thanks,
Roman.

Dave Cramer

unread,
Apr 5, 2017, 4:45:42 PM4/5/17
to Roman Shaposhnik, Andreas Scherbaum, Keaton Adams, Greenplum Developers
+Andreas

Andreas has ansible scripts somewhere... can you link to them?

Dave Cramer

Scott Kahler

unread,
Apr 5, 2017, 4:57:11 PM4/5/17
to Greenplum Developers, kad...@pivotal.io


On Wednesday, April 5, 2017 at 2:17:04 PM UTC-5, rshaposhnik wrote:

Personally I feel that we need to coalesce around just a few
deployment/packaging
options. Our Python scripts are not going anywhere -- so that's option #1 that's
here to stay. As for option #2 I felt that what EPAM did was actually
the right thing
to do. GPDB existing independently of the rest of the IT
infrastructure is getting
more rare these days. All those accounts are running things like
Puppet/Chef/Ansible
and demand that THAT is the single source of truth for configuration
and deployment
of every single node in that IT department (not some random
configuration files of
individual software packages).

I would really like to encourage GPDB community to consider this as
option #2 and
help maintain Bigtop's puppet code.

Thoughts?

Thanks,
Roman.

I agree that Greenplum needs to move in a direction that makes it more installable and governable by configuration systems out there (Chef, Puppet, Ansible) and the existing tooling needs to make changes to allow for that. EPAM's init all the nodes and then create a master where the information is pushed rather than using gpinitsystem to create all the endpoints from the master is definitely a direction in which I see things going. We do need to move in direction that accommodates that deployment model better.

However I don't think we (GPDB OSS) should align with a specific deployment system and try to remain as agnostic to them as possible. While I fully support Bigtop's puppet code and helping maintain it, the choice to support Puppet specifics should stay at Bigtop's level and not be part of the Greenplum. While it might drive changes to Greenplum they should be added in such a way that it is generic or modules can be written for other orchestration in other systems.   

Roman Shaposhnik

unread,
Apr 5, 2017, 5:08:01 PM4/5/17
to Scott Kahler, Greenplum Developers, Keaton Adams
We're on the same page then. What I'm asking here is a stable API that said
Puppet code can rely on. Case in point: recent commits broke Bigtop's Puppet
code. More specifically the following scripts no longer work:
https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/gpdb/templates/start-master-db-in-admin-mode.sh
https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/gpdb/templates/postmaster.opts

This simply has to do with the fact that certain command line options
are no longer
recognized.

So what I'm asking for is this:
1. can GPDB community consider this deployment practice somehow valid?
(sounds like your answer is yes)
2. can GPDB community come up with a stable, very minimalistic API that
would then allow Puppet code on the Bigtop side not to break as often?

IOW, the Bigtop community is more than fine to champion packaging and
orchestration-driven
deployment. However, if that is running against the guidance of GPDB community
it is doomed to failure.

Thanks,
Roman.

Andreas Scherbaum

unread,
Apr 6, 2017, 9:16:38 AM4/6/17
to Stanley Sung, Jimmy Yih, Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers

I have a set of Ansible scripts which can install GPDB4 (role: common & gpdb4) and GPDB5 build environment (role: common, git2, gpdb5-dev, buildclient).

Find everything here:

https://github.com/andreasscherbaum/gpdb-ansible


Regards,
Andreas

--

Andreas Scherbaum

Principal Software Engineer

GoPivotal Deutschland GmbH


Hauptverwaltung und Sitz: Am Kronberger Hang 2a, 65824 Schwalbach/Ts., Deutschland

Amtsgericht Königstein im Taunus, HRB 8433

Geschäftsführer: Andrew Michael Cohen, Paul Thomas Dacier

C.J. Jameson

unread,
Apr 6, 2017, 3:30:52 PM4/6/17
to Roman Shaposhnik, Scott Kahler, Greenplum Developers, Keaton Adams
My sense, given the codebase, is that it'd be very hard to identify a comprehensive API that's stable enough for you to rely on. Even if we characterized it 98%, the small missing parts would be unknown and risks. Greenplum's "surface area", so to speak, isn't well organized like an API, I feel, because of the various entry points.

Rather than prescribing the API and trying to stabilize it, we can describe the needs of BigTop and automate checks that it still works. So the idea is: codify BigTop smoke-testing as a downstream check in our pipeline ... not necessarily to block further pushes but to know when it breaks.

Thoughts? Valuable enough?

C.J.

Roman Shaposhnik

unread,
Apr 6, 2017, 3:40:13 PM4/6/17
to C.J. Jameson, Scott Kahler, Greenplum Developers, Keaton Adams
On Thu, Apr 6, 2017 at 12:30 PM, C.J. Jameson <cjam...@pivotal.io> wrote:
> My sense, given the codebase, is that it'd be very hard to identify a
> comprehensive API that's stable enough for you to rely on. Even if we
> characterized it 98%, the small missing parts would be unknown and risks.
> Greenplum's "surface area", so to speak, isn't well organized like an API, I
> feel, because of the various entry points.

I think we mean different things when we say "API". What I mean by this
is literally, GPDB taking responsibility of maintaining a few scripts that
Bigtop now has to maintain on its own (and risk breakage with every
release of GPDB). These are far from rocket science:
https://github.com/apache/bigtop/tree/master/bigtop-deploy/puppet/modules/gpdb/templates
Basically all of the files under there aside from gpssh.conf and gp_dbid

Take a look for yourself - those are trivial. In fact, we can even merge them
into a single uber script on the GPDB side with actions that a system
like Puppet expects.

That script IS the API I'm suggesting here -- nothing more.

> Rather than prescribing the API and trying to stabilize it, we can describe
> the needs of BigTop and automate checks that it still works. So the idea is:
> codify BigTop smoke-testing as a downstream check in our pipeline ... not
> necessarily to block further pushes but to know when it breaks.

Sure. Testing comes next and Bigtop would be more than happy to participate,
but lets start with the basics -- GPDB offering a single script with a fixed set
of actions as a stable API for all consumers like Puppet/Chef/Ansible/etc.

Thanks,
Roman.

Dave Cramer

unread,
Apr 6, 2017, 3:44:15 PM4/6/17
to Roman Shaposhnik, C.J. Jameson, Scott Kahler, Greenplum Developers, Keaton Adams
On Thu, Apr 6, 2017 at 3:40 PM, Roman Shaposhnik <rshap...@pivotal.io> wrote:
On Thu, Apr 6, 2017 at 12:30 PM, C.J. Jameson <cjam...@pivotal.io> wrote:
> My sense, given the codebase, is that it'd be very hard to identify a
> comprehensive API that's stable enough for you to rely on. Even if we
> characterized it 98%, the small missing parts would be unknown and risks.
> Greenplum's "surface area", so to speak, isn't well organized like an API, I
> feel, because of the various entry points.

I suspect this is an artefact of being closed source and not actually wanting to interface 
with anyone else. Once one starts viewing the world from the open source perspective 
keeping an "API" static should a priority.  

I think we mean different things when we say "API". What I mean by this
is literally, GPDB taking responsibility of maintaining a few scripts that
Bigtop now has to maintain on its own (and risk breakage with every
release of GPDB). These are far from rocket science:
    https://github.com/apache/bigtop/tree/master/bigtop-deploy/puppet/modules/gpdb/templates
Basically all of the files under there aside from gpssh.conf and gp_dbid

Take a look for yourself - those are trivial. In fact, we can even merge them
into a single uber script on the GPDB side with actions that a system
like Puppet expects.

That script IS the API I'm suggesting here -- nothing more.

+1 
> Rather than prescribing the API and trying to stabilize it, we can describe
> the needs of BigTop and automate checks that it still works. So the idea is:
> codify BigTop smoke-testing as a downstream check in our pipeline ... not
> necessarily to block further pushes but to know when it breaks.

Sure. Testing comes next and Bigtop would be more than happy to participate,
but lets start with the basics -- GPDB offering a single script with a fixed set
of actions as a stable API for all consumers like Puppet/Chef/Ansible/etc.

+1



Dave Cramer

Roman Shaposhnik

unread,
Apr 6, 2017, 3:47:32 PM4/6/17
to Andreas Scherbaum, Stanley Sung, Jimmy Yih, Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers
On Thu, Apr 6, 2017 at 6:16 AM, Andreas Scherbaum <asche...@pivotal.io> wrote:
>
> I have a set of Ansible scripts which can install GPDB4 (role: common &
> gpdb4) and GPDB5 build environment (role: common, git2, gpdb5-dev,
> buildclient).
>
> Find everything here:
>
> https://github.com/andreasscherbaum/gpdb-ansible

Thanks Andreas, this is super useful as a reference point, but it seems to still
rely on high level Python scripts and, most sadly, paswordless ssh:
https://github.com/andreasscherbaum/gpdb-ansible/blob/master/roles/gpdb4/tasks/redhat.yml#L403
which is a deal breaker in a lot of environments where Bigtop's Puppet has
to operate.

Thanks,
Roman.

Andreas Scherbaum

unread,
Apr 6, 2017, 6:27:21 PM4/6/17
to Roman Shaposhnik, Stanley Sung, Jimmy Yih, Dave Cramer, Shubham Sharma, Keaton Adams, Greenplum Developers
Well, it's Ansible, not Puppet. Different automation systems. I use Ansible to quickly spin up test VMs and install either GPDB 4 or 5 on it, to run tests or show something at a conference. It's agent-less, so I don't have to worry about running a Puppet Master somewhere the Agent on the new VM can reach.

You can get the idea of how to deploy GPDB, but if you want to use another deployment automation tool, you have to rewrite the Playbook into a Manifest.

kad...@pivotal.io

unread,
Apr 7, 2017, 12:09:20 PM4/7/17
to Greenplum Developers, kad...@pivotal.io
Here are my final (non-Developer) thoughts on this, as a production DBA of many years, many of which with PostgreSQL in a production environment.

If there is an effort to actually cut a stable, releasable branch of GPOSS 5.x, then for non-Devs there should be a fairly easy way to download an "installer" to facilitate the installation of /usr/local/gpdb. Compiling and installing the product is not "simple" for someone who is not familiar with the Linux Dev environment, or who has no interest in working with Puppets all day, and has no desire or ability to be a master Chef in a kitchen.

It would be very helpful to Data Architects and DBAs who want to use GPDB OSS if on the greenplum.org website there was a link to download the latest stable release of the product, including all dependencies, etc. so a simple install can be done on their end. With documentation / installation instructions on greeplum.org, of course. Either something packaged such as a Yum Repository, an rpm, binary, or similar by the GP community directly, or links to 3rd party sites (such as Apache Big Top possibly) which makes it "that easy" to install and run. Just like they can with PostgreSQL, MongoDB, MySQL Community Edition, Cassandra, Hadoop from Hortonworks & Apache Bigtop, etc, etc.

I believe there would be wider adoption of GPDB OSS if there was a distribution of the product such as this, by small companies, startups, not-for-profit organizations, educational institutions who would not ever really consider (mainly for funding reasons) purchasing Pivotal GP as a commercial / enterprise customer.

Thanks.


 


Heikki Linnakangas

unread,
Apr 7, 2017, 12:37:09 PM4/7/17
to Roman Shaposhnik, C.J. Jameson, Scott Kahler, Greenplum Developers, Keaton Adams
On 04/06/2017 10:40 PM, Roman Shaposhnik wrote:
> On Thu, Apr 6, 2017 at 12:30 PM, C.J. Jameson <cjam...@pivotal.io> wrote:
>> My sense, given the codebase, is that it'd be very hard to identify a
>> comprehensive API that's stable enough for you to rely on. Even if we
>> characterized it 98%, the small missing parts would be unknown and risks.
>> Greenplum's "surface area", so to speak, isn't well organized like an API, I
>> feel, because of the various entry points.
>
> I think we mean different things when we say "API". What I mean by this
> is literally, GPDB taking responsibility of maintaining a few scripts that
> Bigtop now has to maintain on its own (and risk breakage with every
> release of GPDB). These are far from rocket science:
> https://github.com/apache/bigtop/tree/master/bigtop-deploy/puppet/modules/gpdb/templates
> Basically all of the files under there aside from gpssh.conf and gp_dbid
>
> Take a look for yourself - those are trivial. In fact, we can even merge them
> into a single uber script on the GPDB side with actions that a system
> like Puppet expects.
>
> That script IS the API I'm suggesting here -- nothing more.

FWIW, I totally agree. Looking at those scripts, I'm not sure what
exactly the API is. But yes, something like that.

What are the operations that the puppet scripts need to perform? I'm
guessing it's something like:

* Install the software on a node (probably handled by rpm install or
similar)

* Initialize a data directory on a node

* Configure the data directory so that it becomes part of a cluster.

* Start/stop the server on a node

The crucial point here is that all of those operations operate on a
single node. The magic to decide which node is the dispatcher and which
ones are segments, and which segments they are, and which one is a
master and which one a standby, is all in the Puppet configuration. it
could be Chef, or it could be hand-written shell scripts, or it could be
the existing gpMgmt scripts as well, but the point is to expose and
document how to perform those operations on each node, so that you have
a working cluster.

To get started, I'm OK with the saying that those scripts are the API.
Until we document it better, I don't think we can promise to keep it
stable, certainly not across major releases, but that's OK. We can
update the scripts as necessary. As long as we don't break it accidentally.

>> Rather than prescribing the API and trying to stabilize it, we can describe
>> the needs of BigTop and automate checks that it still works. So the idea is:
>> codify BigTop smoke-testing as a downstream check in our pipeline ... not
>> necessarily to block further pushes but to know when it breaks.

+1

> Sure. Testing comes next and Bigtop would be more than happy to participate,
> but lets start with the basics -- GPDB offering a single script with a fixed set
> of actions as a stable API for all consumers like Puppet/Chef/Ansible/etc.

Well, doesn't necessarily need to be a single script. "pg_ctl
start/stop" can be used to start/stop a cluster, for example, and that's
separate from "initdb". But I agree with the spirit. Perhaps a README or
some other document that lists the operations and how to perform them?

- Heikki

Scott Kahler

unread,
Apr 10, 2017, 9:46:20 AM4/10/17
to Greenplum Developers, asche...@pivotal.io, ys...@pivotal.io, jy...@pivotal.io, dcr...@pivotal.io, topologi...@gmail.com, kad...@pivotal.io
I think the issue you are running into Roman is that what you are trying to do there isn't something that has been a supported workflow to get the cluster setup. This far it's been about pushing everything through gpinitsystem rather than working outside that process to set everything up and then coalescing that into the catalog. It's a workflow we should come up with a way to handle as it is more cluster friendly.
Reply all
Reply to author
Forward
0 new messages