What's the story for packaging generated content?

Skip to first unread message

Tim Allen

Mar 29, 2010, 8:20:19 AM3/29/10
to packagi...@googlegroups.com
The other day I watched the official video of the PyCon 2010 tutorial on
Sphinx documentation, and the guy leading the session spent a couple of
minutes discussing the pros and cons of keeping generated documentation
around. This reminded me of similar problems I'd faced while wrestling
with Twisted's distutils integration, so I thought I'd lay out the
problem-space and see if anybody had some for Best Practices for dealing
with this problem space, and/or ideas for if and how distutils2 should
support it.

Most packages have two levels of distribution: the bare minimum of files
are kept in source-control, an "sdist" package is a verbatim copy of
source-control (except without the working-directory metadata), and
a "bdist" comes with everything needed to get the package working on
a particular platform - .pyc files, compiled C extensions, whatever. The
problem arises with packages that want to have an intermediate stage
between these two extremes. Some use-cases:

- My package includes an C acceleration module written in Pyrex, but
I don't want to force people to install Pyrex to use it, so I'd like
to distribute the generated .c files.
- My package includes documentation written in Sphinx, but I don't
want to force potential users to learn Sphinx before they can use my
code; I'd like to distribute the generated HTML docs.
- My package's source-control version contains important tools useless
to end-users, such as release-automation scripts and documentation.
I'd like to keep them around, but remove them from distributed
- My package contains a core and various libraries that depend on the
core but not each other. I'd like to keep them together in
source-control for automated testing and release synchronisation, but
I'd like to give users separate distributions so they're not
- My package works with Python 2.x, and works with Python 3.x after
running 2to3 over it, but for simplicity I'd prefer to distribute
separate 2.x and 3.x packages.

Firstly: are these problems common enough that distutils should make it
easy to solve them? If not, are they at least common enough that
distutils should make it *possible* to solve them? (I suspect most large
open-source projects will suffer at least some of these - Twisted
certainly suffers the first four - but very few projects are that large)

Some potential workarounds include:

- Abandon the idea of source-control containing a viable Python
package; it should just contain source-files and a script that
creates a directory and populates it with a setup.py and source-code
and generated documentation and whatever other modifications are
- pro:
- No generated material in source-control
- No messing about with the bowels of distutils
- You can setup whatever generation scheme you want; you're not
limited to processing provided by distutils.
- con:
- If a user wants to use "version X with patch Y", they can
either patch against the sdist and use it, or patch against
source-control and submit it upstream, but not both.
- No easy way to use a fresh checkout in-place. (no "setup.py
develop", etc.)
- Potential contributors might be scared away by the difference
between what's in source-control and the package structure
they're familiar with.
- Override distutils' sdist to perform the modifications and processing
required to generate an sdist package.
- pro:
- A fresh checkout looks and mostly works like an sdist package.
- Integrates with other tools that use sdist internally, like
- No generated material in source-control
- You have reasonable flexibility within the scope of distutils'
sdist command, depending on how much distutils code you want
to replace.
- con
- You have to mess about with the innards of distutils if you
want to do anything non-trivial.
- Since you're changing files, not just copying them, this
probably means leaving behind the helpful MANIFEST/MANIFEST.in
- Impossible to generate multiple sdists from the same
source-control repository.
- Commit generated files to source-control, include steps to regenerate
generated files in your project's "release check-list" documentation.
- pro:
- distutils machinery works exactly as intended, no fuss.
- Potential contributors see a source-control repository that
looks exactly as they expect.
- Developers can use a checkout for development and testing
without a problem.
- con:
- You have generated files in your source-control.
- No really, you have generated files in your source-control.
- Still limited to exactly one sdist.

Are there any other workarounds? Have I missed any pros or cons for any
of those items, or claimed pros or cons that don't actually apply?

Is there a Best Way to handle this scenario, and does this Best Way
change in the shiny new world of distutils2?

Thanks for your feedback. :)

Reply all
Reply to author
0 new messages