Do not pollute PYTHONPATH and get rid of annoying import _pg error

18 views
Skip to first unread message

Hao zhang

unread,
Jul 20, 2023, 9:52:47 PM7/20/23
to gpdb...@greenplum.org

Problem: the PYTHONPATH is set in greenplum-path.sh to make gp-management-utilities work, while it can pollute the user's env and cause quite a lot of trouble:

  • plpython needs to introduce a plpython3.python_path GUC so the plpython UDFs won't search for libs in the wrong location.
  • The coming postgres-ml calls python interpreter, which will also be polluted by this.
  • The PYTHONPATH set by greenplum-path.sh can conflict with other development utilities and make them unworkable.
  • gpstart is not good at reporting accurate error messages. For example, if the pygresql and the python verison mismatch, gpstart won't print a visible message about it. (interestingly, gpstop reports the correct error)

Analysis: After https://github.com/greenplum-db/gpdb/pull/15988 is implemented, the only python libraries needed by setting PYTHONPATH is gppylib, which is used by gpMgmt/bin and gpMgmt/sbin.

Solution: remove the PYTHONPATH from greenplm-path.sh and introduce a setup_env() function at the top of the gp-management python script. In this function, we can set the sys.path for python interpreter to find the python libraries which gp would use.

Benefit:

  • The system PYTHONPATH won't be polluted, only the gp processes contain the path of python libraries vendored by us.
  • We can check the python env and throw an error to hint the user to set the correct env. For example, we can give users a hint like sudo yum install python3-yaml .

Example:

we can have the setup_env.py:

import os, sys 

def setup_env(): 
    check_env_and_throw_error()
    sys.path.append(os.environ.get("GPHOME") + "/lib/python")

then import and call this function at the top of python scripts:

from setup_env import setup_env
setup_env()


Reply all
Reply to author
Forward
0 new messages