Input File format for cluster initialization utility

17 views
Skip to first unread message

Piyush Chandwadkar

unread,
Feb 2, 2023, 1:04:45 PM2/2/23
to 'Piyush Chandwadkar' via Greenplum Developers

Hello Community, 

As part of improving user experience when creating GPDB cluster, we are planning to redesign the cluster initialization utility. 

Currently we are looking for suggestions on cluster initialization utility input configuration file format. 

The existing gpinitsystem utility for creating the cluster reads input in the form of config file which is plain text and has key-value pairs and heavily dependent on bash shell script formatting. There are several options which can be provided through CLI, and some options needs to be provided from input file like heap-checksum, hosts-list etc. The current config file is not self-sustained as it points to other files like host-list, add-on configuration etc. 

 The new utility to initialize cluster will support reading configuration from a file, from STDIN and will also support generating sample configuration input file. Plan to seek input via file and the file format will be JSON. Using this configuration file all the required information to create the cluster must be provided. The highlight here is this input file will be fully self-contained and will not contain references to any other configuration files. All the cluster configurations including coordinator and segments will be in the single place to make configuration management easy. 

Cluster creation command will look as follows:     

General format is:  

gpctl init [input/output options] 

 

Command options are as follows:  
                gpctl init <input-config-file.json>: reads given config file and creates cluster 

gpctl init -: reads input config from STDIN and creates cluster 

gpctl init –o <output-config-file>: Creates sample config file for cluster initialization. Asks the user to enter details of the cluster configuration like hosts, data directory, segment prefix etc.  

gpctl init –o -: Creates sample config file for cluster initialization and prints to STDOUT. Asks the user to enter details of the cluster configuration like hosts, data directory, segment prefix etc. 

 

Sample cluster-input-config.json file: 


{ 
  "cluster-name": "GPDB", 
   "encoding": "Unicode", 
  "locale": { 
    "lc-all": "utf-8", 
    "lc-collate": "utf-8", 
    "lc-ctype": "utf-8", 
    "lc-messages": "utf-8", 
    "lc-monetary": "utf-8", 
    "lc-numeric": "utf-8", 
    "lc-time": "utf-8" 
  }, 
  "heap-checksum": true, 
  "hba-hostnames": true, 
  "su-password": "gparray", 
  "shared-buffer": "128000kB", 
  "postgresAddOnConfig": [ 
     " debug_pretty_print = off", 
      "log_min_messages = warning", 

      "lc_time= utf8", 
   ], 

  "coordinator-config": {"maxConections": 50}, 

   "segment-config": {"maxConections": 150}, 
  "coordinator": {"hostname": "cdw", "port": 7000, "directory": "/data/datadirs/qddir/"}, 
  "primary-segments-array": [ 
      {"hostname": "sdw1", "address": "192.168.1.5", "port": 7002, "data-directory": "/data/datadirs/gpseg1"}, 
      {"hostname": "sdw1", "address": "192.168.1.5", "port": 7003, "data-directory": "/data/datadirs/gpseg2"}, 
      {"hostname": "sdw2", "address": "192.168.1.6", "port": 7002, "data-directory": "/data/datadirs/gpseg3"}, 
      {"hostname": "sdw2", "address": "192.168.1.6", "port": 7003, "data-directory": "/data/datadirs/gpseg4"} 
   ] 
} 

 

Inputs/Thoughts? 

 

 

Regards,
Piyush 

 

Kirill Reshke

unread,
Feb 2, 2023, 2:54:49 PM2/2/23
to Piyush Chandwadkar, 'Piyush Chandwadkar' via Greenplum Developers
HI! Thanks for working on this.
In general, config file design looks acceptable to me. No issues, but
just two thoughts:
1) Config file format considered to be JSON. But, as you are going to
write some golang code, it would require zero effort to support yaml
or toml input format at the same time (as you just need to define an
additional tag in your golang config structure). So, maybe it's worth
it? IMO, it would make ```gpctl``` utility more flexible.
2) Should we support segment log file path as an additional field in
```primary-segments-array``` section? In some cases, storing segments
log files in postgresql data directory is undesirable.
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+u...@greenplum.org.

Ashwin Agrawal

unread,
Feb 3, 2023, 12:32:31 PM2/3/23
to Kirill Reshke, Piyush Chandwadkar, 'Piyush Chandwadkar' via Greenplum Developers
On Thu, Feb 2, 2023 at 11:54 AM Kirill Reshke <reshke...@gmail.com> wrote:
HI! Thanks for working on this.
In general, config file design looks acceptable to me.

Thanks for the feedback, appreciate it.
 
No issues, but
just two thoughts:
1) Config file format considered to be JSON. But, as you are going to
write some golang code, it would require zero effort to support yaml
or toml input format at the same time (as you just need to define an
additional tag in your golang config structure). So, maybe it's worth
it? IMO, it would make ```gpctl``` utility more flexible.

Sure, we plan to look into it as an iterative step, starting with JSON to focus
and move forward.
 
2) Should we support segment log file path as an additional field in
```primary-segments-array``` section? In some cases, storing segments
log files in postgresql data directory is undesirable.

We will have to bake the feature first in the core server side, currently in GPDB
as you said it's hard-coded. Most likely it will be GUC to control the behavior
which can be set via input-config presented here like any other GUC.
High-possibility it can be a symbolic link similar to tablespaces implementation
from data directory log directory to the specified location. So, not much impact to
initsystem config from the same.

--
Ashwin Agrawal (VMware)
Reply all
Reply to author
Forward
0 new messages