Problem: the PYTHONPATH is set in greenplum-path.sh to make gp-management-utilities work, while it can pollute the user's env and cause quite a lot of trouble:
- plpython needs to introduce a plpython3.python_path GUC so the plpython UDFs won't search for libs in the wrong location.
- The coming postgres-ml calls python interpreter, which will also be polluted by this.
- The PYTHONPATH set by greenplum-path.sh can conflict with other development utilities and make them unworkable.
- gpstart is not good at reporting accurate error messages. For example, if the pygresql and the python verison mismatch, gpstart won't print a visible message about it. (interestingly, gpstop reports the correct error)
Analysis: After https://github.com/greenplum-db/gpdb/pull/15988 is
implemented, the only python libraries needed by setting PYTHONPATH is gppylib, which is used by gpMgmt/bin and gpMgmt/sbin.
Solution: remove the PYTHONPATH from greenplm-path.sh and introduce a setup_env() function at the top of the gp-management python script. In this function, we can set the sys.path for python interpreter to find the python libraries which gp would use.
Benefit:
- The system PYTHONPATH won't be polluted, only the gp processes contain the path of python libraries vendored by us.
- We can check the python env and throw an error to hint the user to set the correct env. For example, we can give users a hint like sudo yum install python3-yaml .
Example:
we can have the setup_env.py:
import os, sys
def setup_env():
check_env_and_throw_error()
sys.path.append(os.environ.get("GPHOME") + "/lib/python")
then import and call this function at the top of python scripts:
from setup_env import setup_env
setup_env()