organizing many python scripts, in a large corporate environment.

Message has been deleted

bukzor

unread,

Mar 12, 2011, 12:25:20 AM3/12/11

to

We've been doing a fair amount of Python scripting, and now we have a
directory with almost a hundred loosely related scripts. It's
obviously time to organize this, but there's a problem. These scripts
import freely from each other and although code reuse is generally a
good thing it makes it quite complicated to organize them into
directories.

There's a few things that you should know about our corporate
environment:

1) I don't have access to the users' environment. Editing the
PYTHONPATH is out, unless it happens in the script itself.
2) Users don't install things. Systems are expected to be *already*
installed and working, so setup.py is not a solution.

I'm quite willing to edit my import statements and do some minor
refactoring, but the solutions I see currently require me to divide
all the code strictly between "user runnable scripts" and "libraries",
which isn't feasible, considering the amount of code.

Has anyone out there solved a similar problem? Are you happy with it?
--Buck

eryksun ()

unread,

Mar 12, 2011, 1:14:17 AM3/12/11

to

I'm not an expert at Python packaging, but assuming a structure such as

folder1
\
__init__.py
module1.py
folder2
\
__init__.py
module2.py

Then from the root folder I can run

python -m folder1.module1

and within module1, I can import from module2, e.g.:

from folder2.module2 import foo

The __init__.py files are empty. They make Python treat the folder as a package namespace from which you can import, or execute a module directly with the -m option.

--
EOF

bukzor

unread,

Mar 12, 2011, 11:50:46 AM3/12/11

to

On Mar 11, 10:14 pm, "eryksun ()" <eryk...@gmail.com> wrote:
> I'm not an expert at Python packaging, but assuming a structure such as
>
> folder1
> \
> __init__.py
> module1.py
> folder2
> \
> __init__.py
> module2.py
>

> within module1, I can import from module2, e.g.:

This only works if you can edit the PYTHONPATH. With thousands of
users and dozens of groups each with their own custom environments,
this is a herculean effort.

$ ./folder1/module1.py
module 1 !
Traceback (most recent call last):
File "./folder1/module1.py", line 4, in <module>
import folder2.module2
ImportError: No module named folder2.module2

$ export PYTHONPATH=$PWD
$ ./folder1/module1.py
module 1 !
module 2 !

eryksun ()

unread,

Mar 12, 2011, 2:01:20 PM3/12/11

to

bukzor wrote:
> This only works if you can edit the PYTHONPATH. With thousands of
> users and dozens of groups each with their own custom environments,
> this is a herculean effort.

It works for me without setting PYTHONPATH. Again, I run the module from the root folder with the -m option as a package module (no .py extension), which is easy enough to do with a Windows shortcut.

No doubt there's a way to hack things to make direct execution work, but I don't think it's recommended to directly run a script that's part of a package namespace. Experts on the subject can feel free to chime in.

If you need a site-wide solution that works with Unix shebang, you could symlink the names of all the scripts to a dispatcher. Extract the name from argv[0] based on the symlink filename, and then find the script in the package (or periodically update a stored hash of all scripts). Have the dispatcher run it as a package module with `python -m path.to.module [args]`.

I hope someone else has a better idea...

--
EOF

Tim Johnson

unread,

Mar 12, 2011, 2:37:02 PM3/12/11

to pytho...@python.org

* Phat Fly Alanna <flanne...@gmail.com> [110312 07:22]:

Slightly similar - time doesn't permit details, but I used among
other things 4 methods that worked well for me:
1)'Site modules' with controlling constants,including paths
2)Wrappers for the __import__ function that enabled me to fine -
tune where I was importing from.
3)Cut down on the number of executables by using 'loaderers'.
4)I modified legacy code to take lessons from the MVC architecture,
and in fact my architecture following these changes could be
called 'LMVCC' for
loader
model
view
controller
config

I hope I've made some sense with these brief sentences.
--
Tim
tim at johnsons-web.com or akwebsoft.com
http://www.akwebsoft.com

Tim Johnson

unread,

Mar 12, 2011, 2:54:18 PM3/12/11

to pytho...@python.org

* Tim Johnson <t...@johnsons-web.com> [110312 10:41]:
<...> 3)Cut down on the number of executables by using 'loaderers'.
Sheesh! Typo, meant to say 'loaders'..
sorry

bukzor

unread,

Mar 13, 2011, 7:27:47 PM3/13/11

to

On Mar 12, 12:01 pm, "eryksun ()" <eryk...@gmail.com> wrote:
> bukzor wrote:
> > This only works if you can edit the PYTHONPATH. With thousands of
> > users and dozens of groups each with their own custom environments,
> > this is a herculean effort.
>

> ... I don't think it's recommended to directly run a script that's part of a package namespace. Experts on the subject can feel free to chime in.
>

I think this touches on my core problem. It's dead simple (and
natural) to use .py files simultaneously as both scripts and
libraries, as long as they're in a flat organization (all piled into a
single directory). Because of this, I never expected it to be so
difficult do do the same in a tiered organization. In fact the various
systems, syntaxes, and utilities for import seem to be conspiring to
disallow it. Is there a good reason for this?

Let's walk through it, to make it more concrete:
1) we have a bunch of scripts in a directory
2) we organize these scripts into a hierarchy of directories. This
works except for where scripts use code that exists in a different
directory.
3) we move the re-used code causing issues in #2 to a central 'lib'
directory. For this centralized area to be found by our scipts, we
need to do one of the following
a) install the lib to site-packages. This is unfriendly for
development, and impossible in a corporate environment where the IT-
blessed python installation has a read-only site-packages.
b) put the lib's directory on the PYTHONPATH. This is somewhat
unfriendly for development, as the environment will either be
incorrect or unset sometimes. This goes double so for users.
c) change the cwd to the lib's directory before running the tool.
This is heinous in terms of usability. Have you ever seen a tool that
requires you to 'cd /usr/bin' before running it?
d) (eryksun's suggestion) create symlinks to a "loader" that
exist in the same directory as the lib. This effectively puts us back
to #1 (flat organization), with the added disadvantage of obfuscating
where the code actually exists.
e) create custom boilerplate in each script that addresses the
issues in a-d. This seems to be the best practice at the moment...

Please correct me if I'm wrong. I'd like to be.

--Buck

bukzor

unread,

Mar 13, 2011, 7:35:15 PM3/13/11

to

On Mar 12, 12:37 pm, Tim Johnson <t...@johnsons-web.com> wrote:
> * Phat Fly Alanna <flannelsau...@gmail.com> [110312 07:22]:

Thanks Tim.

I believe I understand it. You create loaders in a flat organization,
in the same directory as your shared library, so that it's found
naturally. These loaders use custom code to find and run the "real"
scripts. This seems to be a combination of solutions d) and e) in my
above post.

This is a solution I hadn't considered.

It seems to have few disadvantages, although it does obfuscate where
to find the "real" code somewhat. It also has the implicit requirement
that all of your scripts can be categorized into a few top-level
categories. I'll have to think about whether this applies in my
situation...

Thanks again,
--Buck

Tim Johnson

unread,

Mar 13, 2011, 8:10:22 PM3/13/11

to pytho...@python.org

* bukzor <workit...@gmail.com> [110313 15:48]:

>
> Thanks Tim.
>
> I believe I understand it. You create loaders in a flat organization,
> in the same directory as your shared library, so that it's found

Not in the same directory as shared libraries.

> naturally. These loaders use custom code to find and run the "real"
> scripts. This seems to be a combination of solutions d) and e) in my
> above post.

In my case, the loader
1)Executes code that would otherwise be duplicated.
2)Loads modules (usually from a lower-level directory) based on
keywords passed as a URL segment.

>
> It seems to have few disadvantages, although it does obfuscate where
> to find the "real" code somewhat. It also has the implicit requirement
> that all of your scripts can be categorized into a few top-level
> categories.

Correct. In my case that is desirable.

> I'll have to think about whether this applies in my
> situation...

cheers

Terry Reedy

unread,

Mar 13, 2011, 9:50:08 PM3/13/11

to pytho...@python.org

On 3/13/2011 7:27 PM, bukzor wrote:

> I think this touches on my core problem. It's dead simple (and
> natural) to use .py files simultaneously as both scripts and
> libraries, as long as they're in a flat organization (all piled into a
> single directory). Because of this, I never expected it to be so
> difficult do do the same in a tiered organization. In fact the various
> systems, syntaxes, and utilities for import seem to be conspiring to
> disallow it. Is there a good reason for this?
>
> Let's walk through it, to make it more concrete:
> 1) we have a bunch of scripts in a directory
> 2) we organize these scripts into a hierarchy of directories. This
> works except for where scripts use code that exists in a different
> directory.
> 3) we move the re-used code causing issues in #2 to a central 'lib'
> directory. For this centralized area to be found by our scipts, we
> need to do one of the following
> a) install the lib to site-packages. This is unfriendly for
> development,

I find it very friendly for development. I am testing in the same
environment as users will have. I do intra-package imports with absolute
imports. I normally run from IDLE edit windows, so I just tied running
'python -m pack.sub.mod' from .../Python32 (WinXp, no PATH addition for
Python) and it seems to work fine.

> impossible in a corporate environment where the IT-
> blessed python installation has a read-only site-packages.

My package is intended for free individuals, not straight-jacketed
machines in asylums ;-).

--
Terry Jan Reedy

eryksun ()

unread,

Mar 14, 2011, 1:52:54 AM3/14/11

to

On Sunday, March 13, 2011 7:27:47 PM UTC-4, bukzor wrote:

> e) create custom boilerplate in each script that addresses the
> issues in a-d. This seems to be the best practice at the moment...

The boilerplate should be pretty simple. For example, if the base path is the parent directory, then the following works for me:

import os.path
import sys
base = os.path.dirname(os.path.dirname(__file__))
sys.path.insert(0, base)

Of course, you need to have the __init__.py in each subdirectory.

Frank Millman

unread,

Mar 14, 2011, 2:09:20 AM3/14/11

to pytho...@python.org

"bukzor" <workit...@gmail.com> wrote

> Let's walk through it, to make it more concrete:
> 1) we have a bunch of scripts in a directory
> 2) we organize these scripts into a hierarchy of directories. This
> works except for where scripts use code that exists in a different
> directory.
> 3) we move the re-used code causing issues in #2 to a central 'lib'
> directory. For this centralized area to be found by our scipts, we
> need to do one of the following
> a) install the lib to site-packages. This is unfriendly for
> development, and impossible in a corporate environment where the IT-
> blessed python installation has a read-only site-packages.
> b) put the lib's directory on the PYTHONPATH. This is somewhat
> unfriendly for development, as the environment will either be
> incorrect or unset sometimes. This goes double so for users.
> c) change the cwd to the lib's directory before running the tool.
> This is heinous in terms of usability. Have you ever seen a tool that
> requires you to 'cd /usr/bin' before running it?
> d) (eryksun's suggestion) create symlinks to a "loader" that
> exist in the same directory as the lib. This effectively puts us back
> to #1 (flat organization), with the added disadvantage of obfuscating
> where the code actually exists.
> e) create custom boilerplate in each script that addresses the
> issues in a-d. This seems to be the best practice at the moment...

Disclaimers -

1. I don't know if this will solve your problem.
2. Even if it does, I don't know if this is good practice - I suspect not.

I put the following lines at the top of __init__.py in my package
directory -
import os
import sys
sys.path.insert(0, os.path.dirname(__file__))

This causes the package directory to be placed in the search path.

In your scripts you have to 'import' the package first, to ensure that these
lines get executed.

My 2c

Frank Millman

bukzor

unread,

Mar 14, 2011, 2:38:50 AM3/14/11

to

I've written this many times. It has issues. In fact, I've created a
library for this purpose, for the following reasons.

What if your script is compiled? You add an rstrip('c') I guess.
What if your script is optimized (python -O). You add an rstrip('o')
probably.
What if someone likes your script enough to simlink it elsewhere? You
add a realpath().
In some debuggers (pudb), __file__ is relative, so you need to add
abspath as well.
Since you're putting this in each of your scripts, it's wise to check
if the directory is already in sys.path before inserting.
We're polluting the global namespace a good bit now, so it would be
good to wrap this in a function.
To make our function more general (especially since we're going to be
copying it everywhere), we'd like to use relative paths.

In total, it looks like below after several years of trial, error, and
debugging. That's a fair amount of code to put in each script, and is
a maintenance headache whenever something needs to be changed. Now the
usual solution to that type of problem is to put it in a library, but
the purpose of this code is to give access to the libraries... What I
do right now is to symlink this library to all script directories to
allow them to bootstrap and gain access to libraries not in the local
directory.

I'd really love to delete this code forever. That's mainly why I'm
here.

#!/not/executable/python
"""
This module helps your code find your libraries more simply, easily,
reliably.

No dependencies apart from the standard library.
Tested in python version 2.3 through 2.6.
"""

DEBUG = False

#this is just enough code to give the module access to the libraries.
#any other shared code should go in a library

def normfile(fname):
"norm the filename to account for compiled and symlinked scripts"
from os.path import abspath, islink, realpath
if fname.endswith(".pyc") or fname.endswith(".pyo"):
fname = fname[:-1]
if fname.startswith('./'):
#hack to get pudb to work in a script that changes directory
from os import environ
fname = environ['PWD'] + '/' + fname
if islink(fname):
fname = realpath(fname)
return abspath(fname)

def importer(depth=1):
"get the importing script's __file__ variable"
from inspect import getframeinfo
frame = prev_frame(depth + 1)
if frame is None:
return '(none)'
else:
return normfile( getframeinfo(frame)[0] )

def prev_frame(depth=1):
"get the calling stack frame"
from inspect import currentframe
frame = currentframe()
depth += 1
try:
while depth:
frame = frame.f_back
depth -= 1
except (KeyError, AttributeError):
frame = None
return frame

def here(depth=1):
"give the path to the current script's directory"
from os.path import dirname
return dirname(importer(depth))

def use(mydir, pwd=None):
"""
add a directory to the Python module search path
relative paths are relative to the currently executing script,
unless specified otherwise
"""
from os.path import join, normpath
from sys import path

if not pwd:
pwd = here(2)
if not mydir.startswith("/"):
mydir = join(pwd, mydir)
mydir = normpath(mydir)
path.insert(1, mydir)
return mydir

bukzor

unread,

Mar 14, 2011, 2:40:38 AM3/14/11

to

virtualenv would let me install into site-packages without needing to
muck with IT. Perhaps I should look closer at that..

--Buck

Jean-Michel Pichavant

unread,

Mar 14, 2011, 6:32:44 AM3/14/11

to bukzor, pytho...@python.org

Short story:
- Too late ! haha

Long story:
- Windows: no clue
- Unix like:
* There was a time I didn't have root access to my machine. In
that case what most people do is create a ~/bin and ~/lib directory
where libraries and executable will be placed in. Works for everything
including python and most installers have a --prefix options to change
the installation root directory. I have a ~/lib/python2.5/site-packages
for instance where python package are installed, and I don't need root
access to install official packages.
* Ask your IT to install on every user python site a symbolic
link to a network directory where you'll install your package

I'm sopping here 'cause from your OP, I have the feeling it's a locked
Windows environment (troll free statement).

JM

eryksun ()

unread,

Mar 14, 2011, 9:45:51 AM3/14/11

to

On Monday, March 14, 2011 2:38:50 AM UTC-4, bukzor wrote:

> I've written this many times. It has issues. In fact, I've created a
> library for this purpose, for the following reasons.

If you're linking to a common file, couldn't you just add in the base folder there? I don't think it's a bad practice to hard-code an absolute path in a single file. If the path changes you only have to update one line. For example:

# script.py
import _path # _path.py is a symbolic link

# _path.py:
base = '/absolute/path/to/base'
import site
site.addsitedir(base)

This also adds paths from any .pth files located in the base folder, but that shouldn't be necessary if all of the scripts are located within one package and sub-packages.

On a Posix-compliant system, you should be able to use the following to avoid hard-coding the path (untested, though):

# _path.py:
import os.path
import site
base = os.path.dirname(os.path.realpath(__file__))
site.addsitedir(base)

Possibly it has to be abspath(realpath(__file__)), but the docs say realpath returns a 'canonical format', which should be the absolute path. I can't check right now, and this doesn't work for me at all on Windows. Python can't resolve the symbolic links created by mklink to the real path, which is odd since mklink has been standard in Windows for many years now. Maybe Python 3.x handles it better.

eryksun ()

unread,

Mar 14, 2011, 3:56:15 PM3/14/11

to

On Monday, March 14, 2011 9:45:51 AM UTC-4, eryksun () wrote:
>
> If you're linking to a common file, couldn't you just add in
> the base folder there?
>

> ...

>
> # script.py
> import _path # _path.py is a symbolic link
>
> # _path.py:
> base = '/absolute/path/to/base'
> import site
> site.addsitedir(base)

To be clear on the file structure, I'm picturing that 'base' is a path on each user's shell path where all the accessible scripts are linked, and that this is also the package directory. So when a linked script runs "import _path" it will import the _path.py that's located in base, which adds the base path to Python's sys.path. On the other hand, any subsequently imported modules will be found by searching sys.path. Thus each subdirectory needs the additional symbolic link back to the base _path.py. It's a bit convoluted, but so are the constraints of this problem -- at least to me.

eryksun ()

unread,

Mar 14, 2011, 4:30:14 PM3/14/11

to

On Monday, March 14, 2011 3:56:15 PM UTC-4, eryksun () wrote:
> To be clear on the file structure, I'm picturing that 'base' is a
> path on each user's shell path where all the accessible scripts
> are linked, and that this is also the package directory.

Wait, this won't work when the script is linked to from somewhere else, which means the code still has to be based on __file__ or sys.argv[0] or sys.path[0], and have to get the absolute/real path in case it's a link.

Along those lines, you (bukzor) wrote that

> What I do right now is to symlink this library to all script
> directories to allow them to bootstrap and gain access to
> libraries not in the local directory.

Won't this also fail if it's running from a link? The link to the library won't necessarily be in the current directory.

Terry Reedy

unread,

Mar 14, 2011, 10:08:01 PM3/14/11

to pytho...@python.org

On 3/14/2011 4:31 PM, bruce bushby wrote:
>
> but has anybody seen any
> efforts to allow python to "import modules via a socket"

I do not remember any such thing.

--
Terry Jan Reedy

bukzor

unread,

Mar 15, 2011, 12:44:48 AM3/15/11

to

You're right! QQ
Currently it requires either: 1) no symlinks to scripts or 2)
installation of the pathtools to site-packages.

Mostly I came here because I felt this was a real pain with no good
solution, and felt that I must be missing something essential if
nobody else is thinking or talking about it. When looking at google
code search, this kind of code is rampant (below). Is everyone really
happy with this?

sys.path.insert(0,
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))

see: http://www.google.com/codesearch?hl=en&lr=&q=sys%5C.path.*__file__

eryksun ()

unread,

Mar 15, 2011, 6:45:05 AM3/15/11

to

On Tuesday, March 15, 2011 12:44:48 AM UTC-4, bukzor wrote:
> When looking at google code search, this kind of code is
> rampant (below). Is everyone really happy with this?
>
> sys.path.insert(0,
> os.path.dirname(os.path.dirname(
> os.path.dirname(os.path.abspath(__file__)))))

Applications and libraries should be properly packaged (e.g. distutils, egg, rpm, deb, msi, etc). If that can't be done for some reason, then we're stuck with kludges.

If you could just get a .pth file symlink'd to site-packages.

booklover

unread,

Mar 15, 2011, 3:24:43 PM3/15/11

to

> Is everyone really happy with this?

I'm not happy with this. In fact, if Python 3.3 came out with a
solution for this problem, it would be a major motivation for me to
migrate.

I don't think that it would take much to fix either. Perhaps if Python
looked in the current directory for ".pth" files? Instead of having
some boiler-plate at the top of every file, you could specify your
paths there. I haven't thought about it enough to know that this idea
specifically is the best way to go, but it would be nice if there were
some way of solving this cleanly.

bukzor

unread,

Mar 16, 2011, 2:32:23 AM3/16/11

to

I'm going to try to get our solution open-sourced, then I'll get your
feedback on it.

eryksun ()

unread,

Mar 16, 2011, 2:57:00 AM3/16/11

to

On Tuesday, March 15, 2011 12:44:48 AM UTC-4, bukzor wrote:
>

> Currently it requires either: 1) no symlinks to scripts or 2)
> installation of the pathtools to site-packages.

An executable with a unique name on the system PATH could be executed it as a subprocess that pipes the configured base directory back via stdout. Then you can just do site.addsitedir(base).

booklover

unread,

Mar 16, 2011, 10:42:04 AM3/16/11

to

> I'm going to try to get our solution open-sourced, then I'll get your
> feedback on it.

Thanks bukzor! I think that it would be very helpful to have a library
like this available.

In the longer term, what do people think about the possibility of
writing up a PEP to fix this problem in the core language? Anyone have
any ideas on the cleanest way to achieve this goal?

bukzor

unread,

Mar 16, 2011, 6:28:07 PM3/16/11

to

If we had relative imports for scripts as well as package modules,
that would be the end of it.
When the relative import escapes the current package (or module) start
using filesystem semantics, where another dot means a parent
directory, and something that looks like a package is just a
directory. This replaces the "relative import from non-package" error,
so backward-compatibility is a non-issue.

bin/parrot/speak.py:
def say(text):
print text

bin/parrot/mccaw.py
#this is a script, not a module!
from ..speak import say

bin/guy/__init__.py
from ..parrot.speak import say

bukzor

unread,

Mar 16, 2011, 6:29:37 PM3/16/11

to

I'm not able to picture this. Could you add detail?

bukzor

unread,

Mar 16, 2011, 9:03:19 PM3/16/11

to

I finally understand. You mean something along the lines of `kde-
config`: an executable to help figure out the configuration at
runtime. This requires either installation or control of the $PATH
environment variable to work well, which makes not so useful to me.

eryksun ()

unread,

Mar 17, 2011, 8:46:38 AM3/17/11

to

On Wednesday, March 16, 2011 9:03:19 PM UTC-4, bukzor wrote:
>
> I finally understand. You mean something along the lines of `kde-
> config`: an executable to help figure out the configuration at
> runtime. This requires either installation or control of the $PATH
> environment variable to work well, which makes not so useful to me.

There's always the virtualenv option in a POSIX system. Just change the shebang to point to your package's virtual Python installation. That obviously won't work for Windows, which uses system-wide file-type associations. I think the closest to shebang-style control on Windows would require changing the file extension to something like 'py_' and having an admin add the appropriate ftype/assoc to the system, the same as is done for the pyw extension. That would be obscene.

If the problem with installation is merely continuing to have central access to the scripts for development, you could deploy with setuptools in development mode:

http://packages.python.org/distribute/setuptools.html#development-mode
http://packages.python.org/distribute/setuptools.html#develop

However, I think over time you may come to regret this as other group's projects become dependent on your code's behavior (including bugs) and a rigid interface.