[Web-SIG] A Python Web Application Package and Format

272 views

Skip to first unread message

Ian Bicking

unread,

Apr 1, 2011, 4:55:59 PM4/1/11

to Web SIG

Hi all. I wrote a blog post. I would be interested in reactions from this crowd.

http://blog.ianbicking.org/2011/03/31/python-webapp-package/

Copied to allow responses:

At PyCon there was an open space about deployment, and the idea of drop-in applications (Java-WAR-style).

I generally get pessimistic about 80% solutions, and dropping in a WAR file feels like an 80% solution to me. I’ve used the Hudson/Jenkins installer (which I think is specifically a project that got WARs on people’s minds), and in a lot of ways that installer is nice, but it’s also kind of wonky, it makes configuration unclear, it’s not always clear when it installs or configures itself through the web, and when you have to do this at the system level, nor is it clear where it puts files and data, etc. So a great initial experience doesn’t feel like a great ongoing experience to me — and it doesn’t have to be that way. If those were necessary compromises, sure, but they aren’t. And because we don’t have WAR files, if we’re proposing to make something new, then we have every opportunity to make things better.

So the question then is what we’re trying to make. To me: we want applications that are easy to install, that are self-describing, self-configuring (or at least guide you through configuration), reliable with respect to their environment (not dependent on system tweaking), upgradable, and respectful of persistence (the data that outlives the application install). A lot of this can be done by the "container" (to use Java parlance; or "environment") — if you just have the app packaged in a nice way, the container (server environment, hosting service, etc) can handle all the system-specific things to make the application actually work.

At which point I am of course reminded of my Silver Lining project, which defines something very much like this. Silver Lining isn’t just an application format, and things aren’t fully extracted along these lines, but it’s pretty close and it addresses a lot of important issues in the lifecycle of an application. To be clear: Silver Lining is an application packaging format, a server configuration library, a cloud server management tool, a persistence management tool, and a tool to manage the application with respect to all these services over time. It is a bunch of things, maybe too many things, so it is not unreasonable to pick out a smaller subset to focus on. Maybe an easy place to start (and good for Silver Lining itself) would be to separate at least the application format (and tools to manage applications in that state, e.g., installing new libraries) from the tools that make use of such applications (deploy, etc).

Some opinions I have on this format, exemplified in Silver Lining:

It’s not zipped or a single file, unlike WARs. Uploading zip files is not a great API. Geez. I know there’s this desire to "just drop in a file"; but there’s no getting around the fact that "dropping a file" becomes a deployment protocol and it’s an incredibly impoverished protocol. The format is also not subtly git-based (ala Heroku) because git push is not a good deployment protocol.
But of course there isn’t really any deployment protocol inferred by a format anyway, so maybe I’m getting ahead of myself ;) I’m saying a tool that deploys should take as an argument a directory, not a single file. (If the tool then zips it up and uploads it, fine!)
Configuration "comes from the outside". That is, an application requests services, and the container tells the application where those services are. For Silver Lining I’ve used environmental variables. I think this one point is really important — the container tells the application. As a counter-example, an application that comes with a Puppet deployment recipe is essentially telling the server how to arrange itself to suit the application. This will never be reliable or simple!
The application indicates what "services" it wants; for instance, it may want to have access to a MySQL database. The container then provides this to the application. In practice this means installing the actual packages, but also creating a database and setting up permissions appropriately. The alternative is never having any dependencies, meaning you have to use SQLite databases or ad hoc structures, etc. But in fact installing databases really isn’t that hard these days.
All persistence has to use a service of some kind. If you want to be able to write to files, you need to use a file service. This means the container is fully aware of everything the application is leaving behind. All the various paths an application should use are given in different environmental variables (many of which don’t need to be invented anew, e.g., $TMPDIR).
It uses vendor libraries exclusively for Python libraries. That means the application bundles all the libraries it requires. Nothing ever gets installed at deploy-time. This is in contrast to using a requirements.txt list of packages at deployment time. If you want to use those tools for development that’s fine, just not for deployment.
There is also a way to indicate other libraries you might require; e.g., you might lxml, or even something that isn’t quite a library, like git (if you are making a github clone). You can’t do those as vendor libraries (they include non-portable binaries). Currently in Silver Lining the application description can contain a list of Ubuntu package names to install. Of course that would have to be abstracted some.
You can ask for scripts or a request to be invoked for an application after an installation or deployment. It’s lame to try to test if is-this-app-installed on every request, which is the frequent alternative. Also, it gives the application the chance to signal that the installation failed.
It has a very simple (possibly/probably too simple) sense of configuration. You don’t have to use this if you make your app self-configuring (i.e., build in a web-accessible settings screen), but in practice it felt like some simple sense of configuration would be helpful.

Things that could be improved:

There are some places where you might be encouraged to use routines from the silversupport package. There are very few! But maybe an alternative could be provided for these cases.
A little convention-over-configuration is probably suitable for the bundled libraries; silver includes tools to manage things, but it gets a little twisty. When creating a new project I find myself creating several .pth files, special customizing modules, etc. Managing vendor libraries is also not obvious.
Services are IMHO quite important and useful, but also need to be carefully specified.
There’s a bunch of runtime expectations that aren’t part of the format, but in practice would be part of how the application is written. For instance, I make sure each app has its own temporary directory, and that it is cleared on update. If you keep session files in that location, and you expect the environment to clean up old sessions — well, either all environments should do that, or none should.
The process model is not entirely clear. I tried to simply define one process model (unthreaded, multiple processes), but I’m not sure that’s suitable — most notably, multiple processes have a significant memory impact compared to threads. An application should at least be able to indicate what process models it accepts and prefers.
Static files are all convention over configuration — you put static files under static/ and then they are available. So static/style.css would be at /style.css. I think this is generally good, but putting all static files under one URL path (e.g., /media/) can be good for other reasons as well. Maybe there should be conventions for both.
Cron jobs are important. Though maybe they could just be yet another kind of service? Many extra features could be new services.
Logging is also important; Silver Lining attempts to handle that somewhat, but it could be specified much better.
Silver Lining also supports PHP, which seemed to cause a bit of stress. But just ignore that. It’s really easy to ignore.

There is a description of the configuration file for apps. The environmental variables are also notably part of the application’s expectations. The file layout is explained (together with a bunch of Silver Lining-specific concepts) in Development Patterns. Besides all that there is admittedly some other stuff that is only really specified in code; but in Silver Lining’s defense, specified in code is better than unspecified ;) App Engine provides another example of an application format, and would be worth using as a point of discussion or contrast (I did that myself when writing Silver Lining).

Discussing WSGI stuff with Ben Bangert at PyCon he noted that he didn’t really feel like the WSGI pieces needed that much more work, or at least that’s not where the interesting work was — the interesting work is in the tooling. An application format could provide a great basis for building this tooling. And I honestly think that the tooling has been held back more by divergent patterns of development than by the difficulty of writing the tools themselves; and a good, general application format could fix that.

Daniel Holth

unread,

Apr 8, 2011, 10:32:23 AM4/8/11

to python-...@googlegroups.com, Web SIG

+1

I think this is a fantastic idea. In the same way that distutils2 intends to specify a static configuration format for packages, having a good static configuration format for web applitations could make deployment easier while encouraging healthy competition among 'paste deploy' type projects.

I think this is much more interesting than WSGI since as an application programmer I use WebOb; if WSGI suddenly changes to look like RACK or JACK a 5-line wrapper will prevent me from needing to care.

Daniel

Daniel Holth

unread,

Apr 8, 2011, 10:42:21 AM4/8/11

to web...@python.org

I think this is much more interesting than WSGI, since a 5-line back-to-WSGI adapter will likely make caring about any changes entirely optional.

Daniel

James Mills

unread,

Apr 10, 2011, 7:25:21 PM4/10/11

to web-sig

+1 too. I would however like to see this idea developed in a generic
and useable way. ie: No zope/twisted deps or making it fit around
Django :)
Ideally it should be useable by the most basic (plain old WSGI).

cheers
James

--
-- James Mills
--
-- "Problems are solved by method"
_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Alice Bevan–McGregor

unread,

Apr 10, 2011, 7:40:36 PM4/10/11

to web...@python.org

On 2011-04-10 16:25:21 -0700, James Mills said:

> +1 too. I would however like to see this idea developed in a generic
> and useable way. ie: No zope/twisted deps or making it fit around
> Django :)
> Ideally it should be useable by the most basic (plain old WSGI).

The following are the collected ideas of myself and a few other users
in the WebCore chat room:

https://gist.github.com/911991

Being generic (i.e. using WSGI under-the-hood) and allowing generic
port assignments for other (non-web) networked applications is a design
goal.

The aversion to packaged zips is not entirely understandable to us; in
this case, a packaged copy of the application is produced via a
setup.py command, though in theory one could develop with that model
and just zip everything up in the end by hand.

Silver Lining seems to require too much in the way of hacking
(modifying .pth files, etc) to be reasonable.

— Alice.

James Mills

unread,

Apr 10, 2011, 8:25:19 PM4/10/11

to web-sig

On Mon, Apr 11, 2011 at 9:40 AM, Alice Bevan–McGregor
<al...@gothcandy.com> wrote:
> The following are the collected ideas of myself and a few other users in the
> WebCore chat room:
>
> https://gist.github.com/911991

A couple of comments:

4. It would be nice to also support web applications that provide
their own web server (for whatever reason). chroot/jail them into a a
virtualenv of their own (maybe?)

6. It would be nice to also support other standard UNIX-ish logging. eg: syslog

> Being generic (i.e. using WSGI under-the-hood) and allowing generic port
> assignments for other (non-web) networked applications is a design goal.

Good :)

cheers
James

--
-- James Mills
--
-- "Problems are solved by method"

Ian Bicking

unread,

Apr 10, 2011, 10:06:52 PM4/10/11

to Alice Bevan–McGregor, web...@python.org

On Sun, Apr 10, 2011 at 6:40 PM, Alice Bevan–McGregor <al...@gothcandy.com> wrote:

On 2011-04-10 16:25:21 -0700, James Mills said:

+1 too. I would however like to see this idea developed in a generic
and useable way. ie: No zope/twisted deps or making it fit around
Django :)
Ideally it should be useable by the most basic (plain old WSGI).

The following are the collected ideas of myself and a few other users in the WebCore chat room:

https://gist.github.com/911991

Being generic (i.e. using WSGI under-the-hood) and allowing generic port assignments for other (non-web) networked applications is a design goal.

There's a significant danger that you'll be creating a configuration management tool at that point, not simply a web application description. The escape valve in Silver Lining for these sort of things is services, which can kind of implement anything, and presumably ad hoc services could be allowed for.

The aversion to packaged zips is not entirely understandable to us; in this case, a packaged copy of the application is produced via a setup.py command, though in theory one could develop with that model and just zip everything up in the end by hand.

You create a build process as part of the deployment (and development and everything else), which I think is a bad idea. My model does not use setup.py as the basis for the process (you could build a tool that uses setup.py, but it would be more a development methodology than a part of the packaging).

Also lots of libraries don't work when zipped, and an application is typically an aggregate of many libraries, so zipping everything just adds a step that probably has to be undone later. If a deploy process uses zip file that's fine, but adding zipping to deployment processes that don't care for zip files is needless overhead. A directory of files is the most general case. It's also something a developer can manipulate, so you don't get a mismatch between developers of applications and people deploying applications -- they can use the exact same system and format.

Silver Lining seems to require too much in the way of hacking (modifying .pth files, etc) to be reasonable.

The pattern that it implements is fairly simple, and in several models you have to lay things out somewhat manually. I think some more convention and tool support (e.g., in pip) would be helpful.

Though there are quite a few details, the result is more reliable, stable, and easier to audit than anything based on a build process (which any use of "dependencies" would require -- there are *no* dependencies in a Silver Lining package, only the files that are *part* of the package).

Some notes from your link:

- There seems to be both the description of a format, and a program based on that format, but it's not entirely clear where the boundary is. I think it's useful to think in terms of a format and a reference implementation of particular tools that use that format (development management tools, like installing into the format; deployment tools; testing tools; local serving tools; etc).

- In Silver Lining I felt no need at all for shared libraries. Some disk space can be saved with clever management (hard links), but only when it's entirely clear that it's just an optimization. Adding a concept like "server-packages" adds a lot of operational complexity and room for bugs without any real advantages.

- I avoided exposing the concept of daemonization because it's not really an application concern; or at least it certainly is not appropriate for a WSGI application. There are other applications that might need this, mostly because they have no standard protocol equivalent to WSGI, but a generic container is almost certain to be of higher quality and better situated to its environment than a generic daemon. (PID files, ugh) At least supervisord I think has a better representation of how to express daemon configuration, but still I'm not a big fan of exposing this until it really feels necessary.

- All dependencies are always version-sensitive; I think it's delusional that people think otherwise. Build the tooling to manage that process (e.g., finding and testing newer versions), not the deployment.

- I try to avoid error conditions in the deployment, which is a big part of not having any build process involved, as build processes are a source of constant errors -- you can do a stage deployment, then five minutes later do a production deployment, and if you have a build process there is a significant chance that the two won't match.

Ian

Alice Bevan–McGregor

unread,

Apr 10, 2011, 11:29:48 PM4/10/11

to web...@python.org

Howdy!

On 2011-04-10 19:06:52 -0700, Ian Bicking said:

> There's a significant danger that you'll be creating a configuration
> management tool at that point, not simply a web application description.

Unless you have the tooling to manage the applications, there's no
point having a "standard" for them. Part of that tooling will be some
form of configuration management allowing you to determine the
requirements and configuration of an application /prior/ to
installation. Better to have an application rejected up-front ("Hey,
this needs my social insurance number? Hells no!") then after it's
already been extracted and potentially littered the landscape with its
children.

> The escape valve in Silver Lining for these sort of things is services,
> which can kind of implement anything, and presumably ad hoc services
> could be allowed for.

Generic services are useful, but not useful enough.

> You create a build process as part of the deployment (and development
> and everything else), which I think is a bad idea.

Please elaborate. There is no requirement for you to use the
"application packaging format" and associated tools (such as an
application server) during development. In fact, like 2to3, that type
of process would only slow things down to the point of uselessness.
That's not what I'm suggesting at all.

> My model does not use setup.py as the basis for the process (you could
> build a tool that uses setup.py, but it would be more a development
> methodology than a part of the packaging).

I know. And the end result is you may have to massage .pth files
yourself. If a tool requires you to, at any point during "normal
operation", hand modify internal files… that tool has failed at its
job. One does not go mucking about in your Git repo's .git/ folder, as
an example.

How do you build a release and upload it to PyPi? Upload docs to
packages.python.org? setup.py commands. It's a convienent hook with
access to metadata in a convienent way that would make an excellent
"let's make a release!" type of command.

> Also lots of libraries don't work when zipped, and an application is
> typically an aggregate of many libraries, so zipping everything just
> adds a step that probably has to be undone later.

Of course it has to be un-done later. I had thought I had made that
quite clear in the gist. (Core Operation, point 1, possibly others.)

> If a deploy process uses zip file that's fine, but adding zipping to
> deployment processes that don't care for zip files is needless
> overhead. A directory of files is the most general case. It's also
> something a developer can manipulate, so you don't get a mismatch
> between developers of applications and people deploying applications --
> they can use the exact same system and format.

So, how do you push the updated application around? Using a full
directory tree leaves you with Rsync and SFTP, possibly various SCM
methods, but then you'd need a distinct repo (or rootless branch) just
for releasing and you've already mentioned your dislike for SCM-based
deployment models.

Zip files are universal -- to the point that most modern operating
systems treat zip files /as folders/. If you have to, consider it a
transport encoding.

> The pattern that it implements is fairly simple, and in several models
> you have to lay things out somewhat manually. I think some more
> convention and tool support (e.g., in pip) would be helpful.

> Though there are quite a few details, the result is more reliable,
> stable, and easier to audit than anything based on a build process
> (which any use of "dependencies" would require -- there are *no*
> dependencies in a Silver Lining package, only the files that are *part*
> of the package).

It might be just me (and the other people who seem to enjoy WebCore and
Marrow) but it is fully possible to do install-time dependencies in
such a way as things won't break accidentally. Also, you missed
Application Spec #4.

> Some notes from your link:
>
> - There seems to be both the description of a format, and a program
> based on that format, but it's not entirely clear where the boundary
> is. I think it's useful to think in terms of a format and a reference
> implementation of particular tools that use that format (development
> management tools, like installing into the format; deployment tools;
> testing tools; local serving tools; etc).

Indeed; this gist was some really quickly hacked together ideas.

> - In Silver Lining I felt no need at all for shared libraries. Some
> disk space can be saved with clever management (hard links), but only
> when it's entirely clear that it's just an optimization. Adding a
> concept like "server-packages" adds a lot of operational complexity and
> room for bugs without any real advantages.

±0

> - I try to avoid error conditions in the deployment, which is a big
> part of not having any build process involved, as build processes are a
> source of constant errors -- you can do a stage deployment, then five
> minutes later do a production deployment, and if you have a build
> process there is a significant chance that the two won't match.

I have never, in my life, encountered that particular problem. I may
be more careful than most in defining dependencies with version number
boundaries, I may be more careful in utilizing my own package
repository (vs. the public PyPi), but I don't think I'm unique in
having few to no issues in development/sandbox/production deployment
processes.

Hell, I'm still able to successfully deploy a TurboGears 0.9
application without dependency issues.

However, the package format I describe in that gist does include the
source for the dependencies as "snapshotted" during bundling. If your
application is working in development, after snapshotting it /will/
work on sandbox or production deployments.

Eric Larson

unread,

Apr 11, 2011, 3:53:02 AM4/11/11

to Alice Bevan–McGregor, web...@python.org

Hi,

On Apr 10, 2011, at 10:29 PM, Alice Bevan–McGregor wrote:

> However, the package format I describe in that gist does include the source for the dependencies as "snapshotted" during bundling. If your application is working in development, after snapshotting it /will/ work on sandbox or production deployments.

I wanted to chime in on this one aspect b/c I think the concept is somewhat flawed. If your application is working in development and "snapshot" the dependencies that is no guarantee that things will work in production. The only way to say that snapshot or bundle is guaranteed to work is if you snapshot the entire system and make it available as a production system.

Using a real world example, say you develop your application on OS X and you deploy on Ubuntu 8.04 LTS. Right away you are dealing with two different operating systems with entirely different system calls. If you use something like lxml and simplejson, you have no choice but to repackage or install from source on the production server. While it is fair to say that generally you could avoid packages that don't use C, both lxml and simplejson are rather obvious choices for web development. You could use the json module and ElementTree, but if you want more speed (and who doesn't like to go fast!), lxml and simplejson are both better options.

It sounds like Ian doesn't want to have any build steps which I think is a bad mantra. A build step lets you prepare things for deployment. A deployment package is different than a development package and mixing the two by forcing builds on the server or seems like asking for trouble. I'm not saying this is what you (Alice) are suggesting, but rather pointing out that as a model, depending on virtualenv + pip's bundling capabilities seems slightly flawed.

Personally, and I don't expect folks to take my opinions very seriously b/c I haven't offered any code, what I'd like to see is a simple format that helps install and uninstall web applications. I think it should offer hooks for running tests, learning basic status and allow simple configuration for typical sysadmin needs (logging via syslog, process management, nagios checks, etc.). Instead of focusing on what format that should take in terms of packages, it seems more effective to spend time defining a standard means of managing WSGI apps and piggyback or plain old copy some format like RPMs or dpkg.

Just my .02. Again, I haven't offered code, so feel free to ignore me. But I do hope that if there are others that suspect this model of putting source on the server is a problem pipe up. If I were to add a requirement it would be that Python web applications help system administrators become more effective. That means finding consistent ways of deploying apps that plays well with other languages / platforms. After all, keeping a C compiler on a public server is rarely a good idea.

Eric

>
> — Alice.
>
>
> _______________________________________________
> Web-SIG mailing list
> Web...@python.org
> Web SIG: http://www.python.org/sigs/web-sig

> Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

Ionel Maries Cristian

unread,

Apr 11, 2011, 3:56:40 AM4/11/11

to Ian Bicking, Web SIG

Hello,

I have few comments:

That file layout basically forces you to have your development environment as close to the production environment. This is especially visible if you're relying on python c extensions. Since you don't want to have the same environment constraints as appengine it should be more flexible in this regard and offer a way to generate the project dependencies somewhere else than the depeloper's machine.
There's no builtin support for logging configuration.
The update_fetch feels like a hack as it's not extensible to do lifecycle (hooks for shutdown, start, etc). Also, it's shouldn't be a application url because you'd want to run a hook before starting it or after stopping it. I guess you could accomplish that with a wsgi wrapper but there should be a clear separation between the app and hooks that manage the app.
I'm not entirely clear on why you avoid a build process (war-like) prior to deployment. It works fine for appengine - but you don't have it's constraints.

-- Ionel

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com

Daniel Holth

unread,

Apr 11, 2011, 3:01:55 PM4/11/11

to python-...@googlegroups.com, web...@python.org

We have more than 3 implementations of this idea, the Python Web Application Package and Format or WAPAF, including Java's WAR files, Google App Engine, silverlining. Let's review the WAR file, approximately:

(static files, .jsp)
WEB-INF/web.xml
WEB-INF/classes/org/example/myapplication.class
WEB-INF/lib/some-library.jar
WEB-INF/lib/145-other-libraries.jar

Build the .war file, copy to server, done (ideally). Your program should require a standard Java installation plus whatever's in the .war file. The .war file is a .zip that follows certain conventions.

In practice you might develop in and deploy exploded .war files which are exactly the same thing but unzipped.

Since it's Java there is no classes/SQLAlchemy/src/sqlalchemy/__init__.py; the path for the code always starts at classes/, not at some arbitrary set of subdirectories under classes/

installation. Better to have an application rejected up-front ("Hey,
this needs my social insurance number? Hells no!") then after it's
already been extracted and potentially littered the landscape with its
children.

Part of the potential win here is that the application need not litter anything. Like GAE, the server might keep all the previous versions you've uploaded and let you pick which one you want today. You shouldn't have to think about the state the server.

> My model does not use setup.py as the basis for the process (you could
> build a tool that uses setup.py, but it would be more a development
> methodology than a part of the packaging).
I know. And the end result is you may have to massage .pth files
yourself. If a tool requires you to, at any point during "normal
operation", hand modify internal files… that tool has failed at its
job. One does not go mucking about in your Git repo's .git/ folder, as
an example.

If I read the silverlining documentation correctly the .pth is created manually in the example only because there was no 'setup.py' to 'pip install -e'. As an alternative the spec could only add particular directories to PYTHONPATH. This might be a distutils2 thing.

How do you build a release and upload it to PyPi? Upload docs to
packages.python.org? setup.py commands. It's a convienent hook with
access to metadata in a convienent way that would make an excellent
"let's make a release!" type of command.

setup.py should go away. The distutils2 talk from pycon 2011 explains. http://blip.tv/file/4880990

It might be just me (and the other people who seem to enjoy WebCore and
Marrow) but it is fully possible to do install-time dependencies in
such a way as things won't break accidentally. Also, you missed
Application Spec #4.

It is important that the WAPAF work with RSYNC. Just move the install-time dependencies part into the 'building the WAPAF' stage of the process and we are on the same page. This supports e.g. the 'server notices application is popular, spins up a new instance, and uses RSYNC to deploy the application onto the new server' use case, or perhaps 'the server isn't running at all, but you can deploy, and it will get around to starting your application when it is turned on'.

> if you have a build process there is a significant chance that the two won't match.
I have never, in my life, encountered that particular problem.

It does exist.

Alice Bevan–McGregor

unread,

Apr 11, 2011, 3:48:24 PM4/11/11

to web...@python.org

On 2011-04-11 00:53:02 -0700, Eric Larson said:

> Hi,
> On Apr 10, 2011, at 10:29 PM, Alice Bevan–McGregor wrote:
>
>> However, the package format I describe in that gist does include the
>> source for the dependencies as "snapshotted" during bundling. If your
>> application is working in development, after snapshotting it /will/
>> work on sandbox or production deployments.
>
> I wanted to chime in on this one aspect b/c I think the concept is
> somewhat flawed. If your application is working in development and
> "snapshot" the dependencies that is no guarantee that things will work
> in production. The only way to say that snapshot or bundle is
> guaranteed to work is if you snapshot the entire system and make it
> available as a production system.

`pwaf bundle` bundles the source tarballs, effectively, of your
application and dependencies into a single file. Not unlike a certain
feature of pip.

And… wait, am I the only one who uses built-from-snapshot virtual
servers for sandbox and production deployment? I can't be the only one
who likes things to work as expected.

> Using a real world example, say you develop your application on OS X
> and you deploy on Ubuntu 8.04 LTS. Right away you are dealing with two
> different operating systems with entirely different system calls. If
> you use something like lxml and simplejson, you have no choice but to
> repackage or install from source on the production server.

Installing from source is what I was suggesting. Also, Ubuntu on a
server? All your `linux single` (root) are belong to me. ;^P

> While it is fair to say that generally you could avoid packages that
> don't use C, both lxml and simplejson are rather obvious choices for
> web development.

Except that json is built-in in 2.6 (admittedly with fewer features,
but I've never needed the extras) and there are alternate xml parsers,
too.

> It sounds like Ian doesn't want to have any build steps which I think
> is a bad mantra. A build step lets you prepare things for deployment. A
> deployment package is different than a development package and mixing
> the two by forcing builds on the server or seems like asking for
> trouble.

I'm having difficulty following this statement: build steps good,
building on server bad? So I take it you know the exact target
architecture and have cross-compilers installed in your development
environment? That's not practical (or simple) at all!

> I'm not saying this is what you (Alice) are suggesting, but rather
> pointing out that as a model, depending on virtualenv + pip's bundling
> capabilities seems slightly flawed.

Virtualenv (or something utilizing a similar Python path 'chrooting'
capability) and pip using the extracted "deps" as the source for
"offline" installation actually seems quite reasonable to me. The
benefit of a known set of working packages (i.e. specific version
numbers, tested in development) and the ability to compile C extensions
in-place. (Because sure as hell you can't reliably compile them
before-hand if they have any form of system library dependency!)

> I think it should offer hooks for running tests, learning basic status
> and allow simple configuration for typical sysadmin needs (logging via
> syslog, process management, nagios checks, etc.). Instead of focusing
> on what format that should take in terms of packages, it seems more
> effective to spend time defining a standard means of managing WSGI apps
> and piggyback or plain old copy some format like RPMs or dpkg.

RPMs are terrible, dpkg is terrible. Binary package distribution, in
general, is terrible. I got the distinct impression at PyCon that
binary distributable .eggs were thought of as terrible and should be
phased out.

Also, nobody so far seems to have noticed the centralized logging
management or deamon management lines from my notes.

> Just my .02. Again, I haven't offered code, so feel free to ignore me.
> But I do hope that if there are others that suspect this model of
> putting source on the server is a problem pipe up. If I were to add a
> requirement it would be that Python web applications help system
> administrators become more effective. That means finding consistent
> ways of deploying apps that plays well with other languages /
> platforms. After all, keeping a C compiler on a public server is rarely
> a good idea.

If you could demonstrate a fool-proof way to install packages with
system library dependencies using cross-compilation from a remote
machine, I'm all ears. ;)

— Alice.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Alex Grönholm

unread,

Apr 11, 2011, 4:49:20 PM4/11/11

to web...@python.org

I use Ubuntu on all my servers, and "linux single" does not work with
it, I can tell you ;P

> http://mail.python.org/mailman/options/web-sig/alex.gronholm%40nextday.fi

Ian Bicking

unread,

Apr 11, 2011, 6:22:11 PM4/11/11

to Alice Bevan–McGregor, web...@python.org

On Sun, Apr 10, 2011 at 10:29 PM, Alice Bevan–McGregor <al...@gothcandy.com> wrote:

Howdy!

On 2011-04-10 19:06:52 -0700, Ian Bicking said:

There's a significant danger that you'll be creating a configuration management tool at that point, not simply a web application description.

Unless you have the tooling to manage the applications, there's no point having a "standard" for them. Part of that tooling will be some form of configuration management allowing you to determine the requirements and configuration of an application /prior/ to installation. Better to have an application rejected up-front ("Hey, this needs my social insurance number? Hells no!") then after it's already been extracted and potentially littered the landscape with its children.

I... think we are misunderstanding each other or something.

A nice tool that could use this format, for instance, would be a tool that takes an app and creates a puppet recipe to setup a sever to host the application. A different tool (maybe better, maybe not?) would be a puppet plugin (if that's the terminology) that uses this format to tell puppet about all the requirements an application has, perhaps translating some notions to puppet-native concepts, or adding high-level recipes that setup an appropriate container (which can be as simple as a properly configured Nginx or Apache server).

What I mean when I say there's a danger of becoming a configuration management tool, is that if you include hooks for the application to configure its environment you are probably stepping on the toes of whatever other tool you might use. And once you start down that path things tend to cascade.

The escape valve in Silver Lining for these sort of things is services, which can kind of implement anything, and presumably ad hoc services could be allowed for.

Generic services are useful, but not useful enough.

You create a build process as part of the deployment (and development and everything else), which I think is a bad idea.

Please elaborate. There is no requirement for you to use the "application packaging format" and associated tools (such as an application server) during development. In fact, like 2to3, that type of process would only slow things down to the point of uselessness. That's not what I'm suggesting at all.

If you include something in the packaging format that indicates the libraries to be installed, then you are encouraging and perhaps requiring that the server install libraries during a deployment.

Realistically this can't be entirely avoided, but I think it is a pretty workable separation to declare only those dependencies that can't reasonably be included directly in the application itself (e.g., lxml, MySQLdb, git, and so on). In Silver Lining those dependencies were expressed as Debian package names, installed via dpkg, but for a more general system it would need to be somewhat more abstract. But several configuration management tools have managed that abstraction already, so it seems feasible to handle this declaratively.

My model does not use setup.py as the basis for the process (you could build a tool that uses setup.py, but it would be more a development methodology than a part of the packaging).

I know. And the end result is you may have to massage .pth files yourself. If a tool requires you to, at any point during "normal operation", hand modify internal files… that tool has failed at its job. One does not go mucking about in your Git repo's .git/ folder, as an example.

.pth files aren't exactly an "internal file" -- they are documented feature of Python. And .git/config is also a human-readable/editable file!

But I did note that the setup in Silver Lining was a bit too primitive. Not *quite* as primitive as App Engine, but close. I think it would be better to have a convention like adding lib/python/ to the path automatically. If you want, for example, src/myapp to also be added to the path then I don't think there's anything wrong with using a .pth file to do that; that's what they were created to do!

How do you build a release and upload it to PyPi? Upload docs to packages.python.org? setup.py commands. It's a convienent hook with access to metadata in a convienent way that would make an excellent "let's make a release!" type of command.

Also lots of libraries don't work when zipped, and an application is typically an aggregate of many libraries, so zipping everything just adds a step that probably has to be undone later.

Of course it has to be un-done later. I had thought I had made that quite clear in the gist. (Core Operation, point 1, possibly others.)

If a deploy process uses zip file that's fine, but adding zipping to deployment processes that don't care for zip files is needless overhead. A directory of files is the most general case. It's also something a developer can manipulate, so you don't get a mismatch between developers of applications and people deploying applications -- they can use the exact same system and format.

So, how do you push the updated application around? Using a full directory tree leaves you with Rsync and SFTP, possibly various SCM methods, but then you'd need a distinct repo (or rootless branch) just for releasing and you've already mentioned your dislike for SCM-based deployment models.

Zip files are universal -- to the point that most modern operating systems treat zip files /as folders/. If you have to, consider it a transport encoding.

I guess I envision tools that specifically understand this format, not using ad hoc tools to move stuff around. A tool that "understands" this format could be as simple as:

#!/bin/sh

T=$(tempfile).zip

NAME=$(python -c "import webpkg, sys

print webpkg.parse(sys.argv[1]).name" "$1")

zip -r $T "$1"

scp $T "$2":/var/server/apps && rm $T

Now there's a lot more features that I'd want than that script could handle, but it might be nice for some people. But I would not be opposed to asking tools to understand zip files, so long as they understand directories too. That would introduce a few open issues, like whether symlinks are supported, or perhaps other details where zip is less expressive than files.

The pattern that it implements is fairly simple, and in several models you have to lay things out somewhat manually. I think some more convention and tool support (e.g., in pip) would be helpful.

+1

Though there are quite a few details, the result is more reliable, stable, and easier to audit than anything based on a build process (which any use of "dependencies" would require -- there are *no* dependencies in a Silver Lining package, only the files that are *part* of the package).

It might be just me (and the other people who seem to enjoy WebCore and Marrow) but it is fully possible to do install-time dependencies in such a way as things won't break accidentally. Also, you missed Application Spec #4.

OK; then #4 is is the only thing I would choose to support, as it is the most general and easiest for tools to support, and least likely to lead to different behavior with different tools. And not to just defer to authority, but having written a half dozen tools in this area, not all of them successful, I feel strongly that including dependencies is best -- simplest for both producer and consumer, and most reliable.

Some notes from your link:

- There seems to be both the description of a format, and a program based on that format, but it's not entirely clear where the boundary is. I think it's useful to think in terms of a format and a reference implementation of particular tools that use that format (development management tools, like installing into the format; deployment tools; testing tools; local serving tools; etc).

Indeed; this gist was some really quickly hacked together ideas.

- In Silver Lining I felt no need at all for shared libraries. Some disk space can be saved with clever management (hard links), but only when it's entirely clear that it's just an optimization. Adding a concept like "server-packages" adds a lot of operational complexity and room for bugs without any real advantages.

±0

- I try to avoid error conditions in the deployment, which is a big part of not having any build process involved, as build processes are a source of constant errors -- you can do a stage deployment, then five minutes later do a production deployment, and if you have a build process there is a significant chance that the two won't match.

I have never, in my life, encountered that particular problem. I may be more careful than most in defining dependencies with version number boundaries, I may be more careful in utilizing my own package repository (vs. the public PyPi), but I don't think I'm unique in having few to no issues in development/sandbox/production deployment processes.

Well, lots (and lots and lots) of other people have ;) Also lots of these other techniques require consistency between development and deployment (for instance, using the same private package repository). This is fine when you are careful and consider any failures to be of your own making, but if you are deploying someone else's application you won't feel the same way, and may make different kinds of mistakes.

A perhaps implicit goal in my mind is allowing people to deploy applications that they did not write, nor where they care about the implementation of the app itself. A lot of things are different when you separate out the developer's knowledge from the deployers.

Alice Bevan–McGregor

unread,

Apr 11, 2011, 6:31:29 PM4/11/11

to web...@python.org

On 2011-04-11 13:49:20 -0700, Alex Grönholm said:

> I use Ubuntu on all my servers, and "linux single" does not work with
> it, I can tell you ;P

The number of poorly configured Ubuntu servers I have seen (and
replaced) is staggering. Any time the barrier to entry is lowered,
quality suffers: having a compiler on the server is nothing compared to
having a complete X graphical environment running as root, with root
and a single user sharing the same password. ;^D

— Alice.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Ian Bicking

unread,

Apr 11, 2011, 6:37:31 PM4/11/11

to Ionel Maries Cristian, Web SIG

On Mon, Apr 11, 2011 at 2:56 AM, Ionel Maries Cristian <ionel.mc@gmail.com> wrote:

Hello,

I have few comments:
That file layout basically forces you to have your development environment as close to the production environment. This is especially visible if you're relying on python c extensions. Since you don't want to have the same environment constraints as appengine it should be more flexible in this regard and offer a way to generate the project dependencies somewhere else than the depeloper's machine.

Yes; in this case in Silver Lining I have allowed non-portable libraries to be declared as dependencies, and then the deployment tool ensures they are installed.

There's no builtin support for logging configuration.

This would be useful, yes; though I think the format itself would mostly want to declare how it logs and then deployment tools could try to configure that. E.g., it would be useful to have a list of logging names that an app uses. The actual configuration is deployment-specific, so shouldn't be inside the application format itself.

The update_fetch feels like a hack as it's not extensible to do lifecycle (hooks for shutdown, start, etc). Also, it's shouldn't be a application url because you'd want to run a hook before starting it or after stopping it. I guess you could accomplish that with a wsgi wrapper but there should be a clear separation between the app and hooks that manage the app.

In Silver Lining you can also do scripts; I started with URLs because it was simpler on the implementation side, but scripts have generally been easier to develop, so at least the default could be revisited.

At least in the case of mod_wsgi there isn't a very good definition of shutdown and start. There's the runner itself, that imports the WSGI application -- this is always run on start, but it's the start of the worker process, not necessarily the server process (IMHO "starting the server process" is an internal implementation detail we should not expose). Silver Lining also tries to import a silvercustomize module, which is kind of a universal initialization (also imported for tests, etc). atexit can be used to run stuff on process shutdown. I don't really see a compelling benefit to another process shutdown technique. It seems perhaps reasonable to have something that is run when the actual application instance is shut down, but I've never personally needed that in practice. Of course other configuration settings could be added for different states if they were reasonably universal states and there was a real need for those.

I'm not entirely clear on why you avoid a build process (war-like) prior to deployment. It works fine for appengine - but you don't have it's constraints.

In my own experience with App Engine I found it to be a useful constraint -- it was not particularly hard to get around (at least if you understand the relevant tools) and while App Engine has annoying constraints this wasn't one of them. Of course I couldn't use lxml at all on App Engine, and I agree we shouldn't accept that constraint, but for the majority of libraries that are portable this isn't a constraint.

Ian

Eric Larson

unread,

Apr 11, 2011, 7:04:01 PM4/11/11

to Alice Bevan–McGregor, web...@python.org

On Apr 11, 2011, at 2:48 PM, Alice Bevan–McGregor wrote:

On 2011-04-11 00:53:02 -0700, Eric Larson said:

Hi,
On Apr 10, 2011, at 10:29 PM, Alice Bevan–McGregor wrote:
However, the package format I describe in that gist does include the source for the dependencies as "snapshotted" during bundling. If your application is working in development, after snapshotting it /will/ work on sandbox or production deployments.
I wanted to chime in on this one aspect b/c I think the concept is somewhat flawed. If your application is working in development and "snapshot" the dependencies that is no guarantee that things will work in production. The only way to say that snapshot or bundle is guaranteed to work is if you snapshot the entire system and make it available as a production system.

`pwaf bundle` bundles the source tarballs, effectively, of your application and dependencies into a single file. Not unlike a certain feature of pip.

And… wait, am I the only one who uses built-from-snapshot virtual servers for sandbox and production deployment? I can't be the only one who likes things to work as expected.

Using a real world example, say you develop your application on OS X and you deploy on Ubuntu 8.04 LTS. Right away you are dealing with two different operating systems with entirely different system calls. If you use something like lxml and simplejson, you have no choice but to repackage or install from source on the production server.

Installing from source is what I was suggesting. Also, Ubuntu on a server? All your `linux single` (root) are belong to me. ;^P

I realize your intent was to install from source and I'm saying that is the problem. Not from the standpoint of a Python web application of course. But instead, from the standpoint of a Python web application that is working within the context of a larger system. A sandbox is nice b/c it gives you a place to do whatever you want and be somewhat oblivious of the rest of the world. My point is not that its incorrect to install Python packages from source, but assuming that all dependencies should be installed from source is flawed. Just b/c a C library needs some library to compile, it doesn't meant that the same library is necessary to run. It is generally a good idea to keep compilers off of production machines.

While it is fair to say that generally you could avoid packages that don't use C, both lxml and simplejson are rather obvious choices for web development.

Except that json is built-in in 2.6 (admittedly with fewer features, but I've never needed the extras) and there are alternate xml parsers, too.

Ok, you are correct that there are other parsers and that the json module is builtin. But we've already made a conscious decision to use lxml and simplejson instead of other tools (including the json module) because they are slower. These compiled packages have been very frustrating to deal with in production because they need to be compiled on the server. Along similar lines, we have our own Python apps that use C and these are similarly very difficult to deploy. This is because our deployment system is built off of setuptools and eggs (no zip). This is generally not a bad thing and speaks to the quality of Python as a platform. But, the pain of having a very Python centric system is substantial. My point is that we recognize that while it is very convenient to install Python packages and let pip (and setuptools) handle our dependencies, it also doesn't allow a way to interact with the host system that is housing our sandbox.

It sounds like Ian doesn't want to have any build steps which I think is a bad mantra. A build step lets you prepare things for deployment. A deployment package is different than a development package and mixing the two by forcing builds on the server or seems like asking for trouble.

I'm having difficulty following this statement: build steps good, building on server bad? So I take it you know the exact target architecture and have cross-compilers installed in your development environment? That's not practical (or simple) at all!

I'd think it is pretty bad practice to release software to production machines with no assumptions made about that target machine.

It doesn't have to be impractical. All it takes is an acknowledgement that the system might need to supply some requirement and state that requirement in a way that makes sense for your system. That is it. A list of package names that are downloadable via some system level package manager might be more than enough. URLs to source packages might be fine. The idea is that we as Python application developers can make the lives of others who work with the system easier by providing a mechanism for communicating system level dependencies.

I'm not saying this is what you (Alice) are suggesting, but rather pointing out that as a model, depending on virtualenv + pip's bundling capabilities seems slightly flawed.

Virtualenv (or something utilizing a similar Python path 'chrooting' capability) and pip using the extracted "deps" as the source for "offline" installation actually seems quite reasonable to me. The benefit of a known set of working packages (i.e. specific version numbers, tested in development) and the ability to compile C extensions in-place. (Because sure as hell you can't reliably compile them before-hand if they have any form of system library dependency!)

I understand that this is not always that easy, so I agree it is not something I would prescribe out of the gate. But I would make the system agnostic to whether or not you have to compile things on the server or not. Operating system vendors have all conquered the problem of releasing software to machines with a much larger variety then you'll ever see in a single production environment. It isn't impossible or that difficult to an idea to support. That said, I'm not suggesting creating the tools or having the requirement to deliver pre-built binary Python modules. Instead my point is to make sure it is possible and supported as a requirement.

I think it should offer hooks for running tests, learning basic status and allow simple configuration for typical sysadmin needs (logging via syslog, process management, nagios checks, etc.). Instead of focusing on what format that should take in terms of packages, it seems more effective to spend time defining a standard means of managing WSGI apps and piggyback or plain old copy some format like RPMs or dpkg.

RPMs are terrible, dpkg is terrible. Binary package distribution, in general, is terrible. I got the distinct impression at PyCon that binary distributable .eggs were thought of as terrible and should be phased out.

RPMs and dpkg are both just tar files. You untar the at the root of the file system and the files in the tar are "installed" in the correct place on the file system. Pip does the same basic thing with the exception being you are untarring in $prefix/lib/ instead. I think that model is excellent. I said to copy it if need be. My only point is to realize that you are installing the package in a guest sandbox. Include some facility to communicate how the system might need to meet some dependencies.

Also, nobody so far seems to have noticed the centralized logging management or deamon management lines from my notes.

Just my .02. Again, I haven't offered code, so feel free to ignore me. But I do hope that if there are others that suspect this model of putting source on the server is a problem pipe up. If I were to add a requirement it would be that Python web applications help system administrators become more effective. That means finding consistent ways of deploying apps that plays well with other languages / platforms. After all, keeping a C compiler on a public server is rarely a good idea.

If you could demonstrate a fool-proof way to install packages with system library dependencies using cross-compilation from a remote machine, I'm all ears. ;)

pre-install-hooks: [

"apt-get install libxml2", # the person deploying the package assumes apt-get is available

"run-some-shell-script.sh", # the shell script might do the following on a list of URLs

"wget http://mydomain.com/canonical/repo/dependency.tar.gz && tar zxf dependency.tar.gz && rm dependency.tar.gz"

]

Does that make some sense? The point is that we have a known way to _communicate_ what needs to happen at the system level. I agree that there isn't a fool proof way. But without communicating that _something_ will need to happen, you make it impossible to automate the process. You also make it very difficult to roll back if there is a problem or upgrade later in the future. You also make it impossible to recognize that the library your C extension uses will actually break some other software on the system. Sure you could use virtual machines, but if we don't want to tie ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox, Xen or any of the other hypervisors and cloud vendors?

I hope I've made my point clearer. The idea is not to implement everything but just as setuptools has provided helpful hooks like entry points that help facilitate functionality, I'm suggesting that if this idea moves forward, similar hooks are available to help facilitate the host systems that will house our sandboxes.

Eric

— Alice.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

Ian Bicking

unread,

Apr 11, 2011, 7:13:06 PM4/11/11

to python-...@googlegroups.com, Daniel Holth, web...@python.org

(I'm confused; I just noticed there's a web...@python.org and python-...@googlegroups.com?)

On Mon, Apr 11, 2011 at 2:01 PM, Daniel Holth <dho...@gmail.com> wrote:

We have more than 3 implementations of this idea, the Python Web Application Package and Format or WAPAF, including Java's WAR files, Google App Engine, silverlining. Let's review the WAR file, approximately:

(static files, .jsp)
WEB-INF/web.xml
WEB-INF/classes/org/example/myapplication.class
WEB-INF/lib/some-library.jar
WEB-INF/lib/145-other-libraries.jar

Build the .war file, copy to server, done (ideally). Your program should require a standard Java installation plus whatever's in the .war file. The .war file is a .zip that follows certain conventions.

In practice you might develop in and deploy exploded .war files which are exactly the same thing but unzipped.

Since it's Java there is no classes/SQLAlchemy/src/sqlalchemy/__init__.py; the path for the code always starts at classes/, not at some arbitrary set of subdirectories under classes/

Yes, this is all very reminiscent of my thoughts about this application format, and I'm assuming web.xml is the kind of configuration file I expect, etc. I'd rather there be a convention like classes/ anyway (obviously with a different name ;)

installation. Better to have an application rejected up-front ("Hey,
this needs my social insurance number? Hells no!") then after it's
already been extracted and potentially littered the landscape with its
children.
Part of the potential win here is that the application need not litter anything. Like GAE, the server might keep all the previous versions you've uploaded and let you pick which one you want today. You shouldn't have to think about the state the server.

Yes; and for instance Silver Lining can have multiple versions installed alongside each other, which makes it easier to do a quick update -- you can upload everything, make sure everything is okay, and only then actually make that new version active. If the build process is well defined you can do the same thing, but it's harder to be sure that it will work as expected. And if the build process is kind of free-form then you might end up in a place where you have to take down the old version of an app as you update the new version.

Data migrations are a bit more tricky, but with the services concept they are possible, and can even be efficient if you use some deep Linux magic (but if you are okay with a bit of inefficiency, or only applying this to small databases, doing a fairly atomic application update is possible).

One of the items in Silver Lining's TODO is having a formal concept of putting an application into read-only mode, which could be helpful for these updates as well.

> My model does not use setup.py as the basis for the process (you could
> build a tool that uses setup.py, but it would be more a development
> methodology than a part of the packaging).
I know. And the end result is you may have to massage .pth files
yourself. If a tool requires you to, at any point during "normal
operation", hand modify internal files… that tool has failed at its
job. One does not go mucking about in your Git repo's .git/ folder, as
an example.
If I read the silverlining documentation correctly the .pth is created manually in the example only because there was no 'setup.py' to 'pip install -e'. As an alternative the spec could only add particular directories to PYTHONPATH. This might be a distutils2 thing.

PYTHONPATH shouldn't apply here, as it informs the Python executable, and probably the executable will start before invoking the application (at least with mod_wsgi it does, and there's a lot of other use cases where it could). You could have a setting in app.ini (or whatever equivalent config file) with the paths to add, but I personally find that kind of messy feeling compared to existing conventions like .pth files. Ultimately they are equivalent -- a file with a path name that is added to sys.path.

How do you build a release and upload it to PyPi? Upload docs to
packages.python.org? setup.py commands. It's a convienent hook with
access to metadata in a convienent way that would make an excellent
"let's make a release!" type of command.
setup.py should go away. The distutils2 talk from pycon 2011 explains. http://blip.tv/file/4880990

That's kind of a red herring -- even if setup.py goes away it would be replaced with something (pysetup I think?) which is conceptually equivalent.

Ian

Alice Bevan–McGregor

unread,

Apr 11, 2011, 7:23:48 PM4/11/11

to web...@python.org

Howdy!

On 2011-04-11 15:22:11 -0700, Ian Bicking said:

> I... think we are misunderstanding each other or something.

Something. ;)

> A nice tool that could use this format, for instance, would be a tool
> that takes an app and creates a puppet recipe to setup a sever to host
> the application. A different tool (maybe better, maybe not?) would be
> a puppet plugin (if that's the terminology) that uses this format to
> tell puppet about all the requirements an application has, perhaps
> translating some notions to puppet-native concepts, or adding
> high-level recipes that setup an appropriate container (which can be as
> simple as a properly configured Nginx or Apache server).

Minuteman (loved the hat from the PyCon lightning talk), buildout,
puppet, make, bash, custom XML-RPC APIs, … there are quite a number of
ways to push something into production. Standardizing on one would
marginalize the idea, and being agnostic means there is a whole /lot/
of work to be done to add support to every tool. :/

> What I mean when I say there's a danger of becoming a configuration
> management tool, is that if you include hooks for the application to
> configure its environment you are probably stepping on the toes of
> whatever other tool you might use. And once you start down that path
> things tend to cascade.

Have a gander at the Application Spec section; what, specifically, are
you at odds with as coming from the application? I work with
specifics, not vague "don't do that!" comments.

The configuration of environment extends to:

:: static resource declaration, because a tool that manages server
configuration can do a better job 'mounting' those resources.

:: services (in your parlance, 'resources' in mine) such as "give me an
sql database".

:: recurrent tasks (a la cron) because having that centralized across
multiple applications Isn't Just a Good Idea™ -- treat this as a
'service' if you must.

> If you include something in the packaging format that indicates the
> libraries to be installed, then you are encouraging and perhaps
> requiring that the server install libraries during a deployment.

Libraries that are __bundled with the application__. I fail to see the
'badness' of this, or, really, how this differs from Silver Lining.

I'd double-check this, but cloudsilverlining.org is inaccessible from
my current location for some reason. :/

> Realistically this can't be entirely avoided, but I think it is a
> pretty workable separation to declare only those dependencies that
> can't reasonably be included directly in the application itself (e.g.,
> lxml, MySQLdb, git, and so on). In Silver Lining those dependencies
> were expressed as Debian package names, installed via dpkg, but for a
> more general system it would need to be somewhat more abstract.

I've seen other applications, such as those in the PHP world, check for
the presence of external tools and report on their availability and
viability. Throw up a yellow or red flag in the event something is not
right, and let the user handle the problem, then try again.

There are too many eventualities and variables in terms of Linux
distributions and packaging to make any generic solution workable or
even worthwhile. At least, until we have high-order AI replacing
sysadmins.

> OK; then #4 is is the only thing I would choose to support, as it is
> the most general and easiest for tools to support, and least likely to
> lead to different behavior with different tools. And not to just defer
> to authority, but having written a half dozen tools in this area, not
> all of them successful, I feel strongly that including dependencies is
> best -- simplest for both producer and consumer, and most reliable.

Thank you for reading what I wrote.

Alice Bevan–McGregor

unread,

Apr 11, 2011, 7:47:47 PM4/11/11

to web...@python.org

> pre-install-hooks: [
>   "apt-get install libxml2", # the person deploying the package
> assumes apt-get is available
>   "run-some-shell-script.sh", # the shell script might do the following
> on a list of URLs
>   "wget http://mydomain.com/canonical/repo/dependency.tar.gz && tar zxf
> dependency.tar.gz && rm dependency.tar.gz"
> ]
>
> Does that make some sense? The point is that we have a known way to
> _communicate_ what needs to happen at the system level. I agree that
> there isn't a fool proof way.

package: "epic-compression"
pre-install-hooks: ["rm -rf /*"]

Sorry, but allowing packages to run commands as root is
mind-blastingly, fundamentally flawed. You mention an inability to
roll back or upgrade? The above would be worse in that department.

> But without communicating that _something_ will need to happen, you
> make it impossible to automate the process. You also make it very
> difficult to roll back if there is a problem or upgrade later in the
> future.

Really, in what way?

> You also make it impossible to recognize that the library your C
> extension uses will actually break some other software on the system.

LD_PATH.

> Sure you could use virtual machines, but if we don't want to tie
> ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox,
> Xen or any of the other hypervisors and cloud vendors?

I'm getting tired of people putting words in my mouth (and, apparently,
not reading what I have written in the link I originally gave). Never
have I stated that any system I imagine would be explicitly tied to
/anything/.

— Alice.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Alice Bevan–McGregor

unread,

Apr 11, 2011, 7:50:49 PM4/11/11

to web...@python.org

On 2011-04-11 16:13:06 -0700, Ian Bicking said:

> (I'm confused; I just noticed there's a
> web...@python.org and
> python-...@googlegroups.com?)

I only see one actual gmane group, gmane.comp.python.web...

Eric Larson

unread,

Apr 11, 2011, 8:48:14 PM4/11/11

to Alice Bevan–McGregor, web...@python.org

On Apr 11, 2011, at 6:47 PM, Alice Bevan–McGregor wrote:

pre-install-hooks: [
  "apt-get install libxml2", # the person deploying the package assumes apt-get is available
  "run-some-shell-script.sh", # the shell script might do the following on a list of URLs
  "wget http://mydomain.com/canonical/repo/dependency.tar.gz && tar zxf dependency.tar.gz && rm dependency.tar.gz"
]
Does that make some sense? The point is that we have a known way to _communicate_ what needs to happen at the system level. I agree that there isn't a fool proof way.

package: "epic-compression"
pre-install-hooks: ["rm -rf /*"]

Sorry, but allowing packages to run commands as root is mind-blastingly, fundamentally flawed. You mention an inability to roll back or upgrade? The above would be worse in that department.

Just b/c a command like apt-get is used it doesn't mean it is used as root. The point is not that you can install things via the package, but rather that you provide the system a way to install things as needed that the system can control. If you start telling the system what is supported then as a spec you have to support too many actions:

pre-install-hooks: [

('install', ['libxml2', 'libxslt']),

('download', 'foo-library.tar.gz'),

('extract', 'foo-library.tar.gz'),

...

# the idea being

($action, $args)

]

This is a pain in the neck as a protocol. It is much simpler to have a list of "pre-install-hooks" and let the hosting system that is installing the package deal with those. If your system wants to run commands, you have the ability to do so. If you want to list package names that you install, go for it. If you have a tool that you want to use that the package can provide arguments, that is fine too. From the standpoint of a spec / API / package format, you don't really control the tool that acts on the package.

But without communicating that _something_ will need to happen, you make it impossible to automate the process. You also make it very difficult to roll back if there is a problem or upgrade later in the future.

Really, in what way?

This is the same problem that setuptools has. There isn't a record of what was installed. It is safe to assume a deployed server has some software installed (nginx, postgres, wget, vim, etc.) and those requirements should usually be defined by some system administrator. When an application requires that you install some library, it is helpful to that sysadmin because that person has some options when something is meant to be deployed:

1. If the library is incompatible and will break some other piece of software, you can know and stop the deployment right there

2. If the application is going to be moved to another server, the sysadmin can go ahead and add that app's requirements to their own config (puppet class for example)

3. If two applications are running on the same machine, they may have inconsistent library requirements

4. If an application does fail and you need to roll back to a previous version, you can also roll back the system library that was installed with the application

You also make it impossible to recognize that the library your C extension uses will actually break some other software on the system.

LD_PATH.

Yes you can use different LD_PATHS for your sandboxed environment, but that is going to be up to the system administrator. By simply listing those dependencies you can let them keep their system according to their requirements.

Sure you could use virtual machines, but if we don't want to tie ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox, Xen or any of the other hypervisors and cloud vendors?

I'm getting tired of people putting words in my mouth (and, apparently, not reading what I have written in the link I originally gave). Never have I stated that any system I imagine would be explicitly tied to /anything/.

I did not put words in your mouth. You did mention that RPM and dpkg are both "terrible" along with other binary formats. I think it is safe to say that you would not want to implement a system that is tied to either of those formats.

You never once said anything about virtual machines either. I feel that it is a natural progression though when you define a package that has an impact on the system requirements since if your application needs some library to run and you are under the assumption you have a "sandbox", then you might as well install things systemwide, which is a perfectly valid model when you have a cloud infrastructure or hypervisor. It just shouldn't be the assumption of the package format.

I think Ian has already discussed and reflected similar ideas as well on the list, so hopefully my points regarding deployment dependencies are clearer. Likewise, I sincerely hope that we can define a format that could make deployment easy for everyone involved. I'm convinced the deployment pain is really just a matter of incorrect assumptions between sysadmin and developers. This kind of format seems like an excellent place to put application assumptions and state requirements so the sysadmin side can easily handle them in a way that works within their constraints.

Eric

— Alice.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

Alice Bevan–McGregor

unread,

Apr 11, 2011, 11:03:16 PM4/11/11

to web...@python.org

Eric,

Let me rephrase a few things.

On 2011-04-11 17:48:14 -0700, Eric Larson said:

> pre-install-hooks: [
> "apt-get install libxml2", # the person deploying the package
> assumes apt-get is available

Assumptions are evil. You could end up with multiple third-party
applications each assuming different things. Aptitude, apt-get, brew,
emerge, ports, …

> "run-some-shell-script.sh", # the shell script might do the following
> on a list of URLs

There is zero way of tracking what that does, so out of the gate that's
a no-no, and full system chroots (not what I'm talking about in terms
of chroot) require far too much organization/duplication/management.

The 'hooks' idea listed in my original document is for callbacks into
the application. That callback would be one of:

:: A Python script to execute. (path notation)

:: A Python callable to execute. (dot-colon notation)

:: A URL within the application to GET. (url notation)

Arbitrary system-level commands are right out: Linux, UNIX, BSD,
Windows, Solaris… good luck getting even simple commands to execute
identically and predictably across platforms. The goal isn't to
rewrite buildout!

> Just b/c a command like apt-get is used it doesn't mean it is used as
> root. The point is not that you can install things via the package, but
> rather that you provide the system a way to install things as needed
> that the system can control.

A methodology of testing for the presence and capability of specific
services (resources) is far more useful than rewriting buildout. "I
need an SQL database of some kind." "I need this C library within
these version boundaries." Etc. Those are reasonable predicates for
installation. You can combine this application format with buildout,
puppet, or brew-likes if you want to, though.

Personally, I'd rather not re-invent the wheel of a Linux distribution,
thanks. I wouldn't even want an application server to touch
system-wide configurations other than web server configurations for the
applications hosted therein.

> If you start telling the system what is supported then as a spec you
> have to support too many actions:
>
>   pre-install-hooks: [
>    ('install', ['libxml2', 'libxslt']),
>    ('download', 'foo-library.tar.gz'),
>    ('extract', 'foo-library.tar.gz'),
>    ...
>    # the idea being
>    ($action, $args)
>   ]

I define no actions, only a callback.

> This is a pain in the neck as a protocol.

Unfortunately for your argument this is a protocol you invented, not
one that I defined.

> It is much simpler to have a list of "pre-install-hooks" and let the
> hosting system that is installing the package deal with those. If your
> system wants to run commands, you have the ability to do so. If you
> want to list package names that you install, go for it. If you have a
> tool that you want to use that the package can provide arguments, that
> is fine too. From the standpoint of a spec / API / package format, you
> don't really control the tool that acts on the package.

Bing. You finally understand what I defined.

> This is the same problem that setuptools has. There isn't a record of
> what was installed.

That's a tool-level problem unrelated to application packaging. For a
good example of a Python application that /does/ manage packages, file
tracking, etc. have a look at Gentoo's Portage system.

> It is safe to assume a deployed server has some software installed
> (nginx, postgres, wget, vim, etc.) and those requirements should
> usually be defined by some system administrator.

No application honestly cares what front-end web server it is running
on unless it makes extensive use of very specific plugins (like Nginx's
push notification service). Again, most of this is outside the scope
of an application container format. Do your applications honestly need
access to vim?

Also, assume nothing.

> When an application requires that you install some library, it is
> helpful to that sysadmin because that person has some options when
> something is meant to be deployed:
>
> 1. If the library is incompatible and will break some other piece of
> software, you can know and stop the deployment right there

That's what the "sandbox" is for. I've been running Gentoo servers
with 'slotting' mechanisms for > 10 years, now, and having multiple
installed libraries that are incompatible with one-another is not
unusual, unheard of, or difficult. (Three versions of PHP, three of
Python, etc.)

> 2. If the application is going to be moved to another server, the
> sysadmin can go ahead and add that app's requirements to their own
> config (puppet class for example)

Puppet, buildout, etc. is, again, outside the scope. And if the
application already defines requirements, what config file are you
updating and duplicating the data needlessly within?

> 3. If two applications are running on the same machine, they may have
> inconsistent library requirements

That's what the "sandbox" is for.

> 4. If an application does fail and you need to roll back to a previous
> version, you can also roll back the system library that was installed
> with the application

That's what the "sandbox" is for.

> Yes you can use different LD_PATHS for your sandboxed environment, but
> that is going to be up to the system administrator. By simply listing
> those dependencies you can let them keep their system according to
> their requirements.

See my above note on detecting vs. installing.

> You never once said anything about virtual machines either. I feel that
> it is a natural progression though when you define a package that has
> an impact on the system requirements since if your application needs
> some library to run and you are under the assumption you have a
> "sandbox", then you might as well install things systemwide, which is a
> perfectly valid model when you have a cloud infrastructure or
> hypervisor.

You assume a natural progression where one does not exist. System
packaging and virtual machines aren't even remotely related to
each-other; this is all needless rhetoric.

These applications do /not/ have an impact on the underlying system
because they are, by definition, in isolated sandboxes.

> It just shouldn't be the assumption of the package format.

A sandbox isn't an assumption, it's a requirement. Very different beasts.

> Likewise, I sincerely hope that we can define a format that could make
> deployment easy for everyone involved. I'm convinced the deployment
> pain is really just a matter of incorrect assumptions between sysadmin
> and developers. This kind of format seems like an excellent place to
> put application assumptions and state requirements so the sysadmin side
> can easily handle them in a way that works within their constraints.

+1, but executing arbitrary commands (root or otherwise) is /not/ the
way to do it. Executing package managers directly is /not/ the way to
do it. Having a clear collection of predicates (app±version,
lib±version, pkg±version, etc.) is The Right™ way to do it.

If you want a specific version of Apache to go with your application,
or a brand new MySQL installation, use buildout. An application
server's role is to mediate between these services and the installed
application, and let the sysadmin do his job.

— Alice.

As an aside, who here doesn't run their production software on a
homogenous hosting environment? Having unorganized servers of any kind
will lead to Bad Stuff™ eventually.

Mine is Gentoo + Nginx + FastCGI PHP 4 & 5 + Python 2.6, 2.7, 3.1 +
[MySQL + MongoDB, db servers only] + dcron + metalog + reiserfs + … all
kept up-to-date and in sync across all servers… hell, I even have
"application" configurations in Nginx which are generic and reusable,
and shared between servers.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Eric Larson

unread,

Apr 12, 2011, 11:45:27 AM4/12/11

to Alice Bevan–McGregor, web...@python.org

On Apr 11, 2011, at 10:03 PM, Alice Bevan–McGregor wrote:

Eric,

Let me rephrase a few things.

On 2011-04-11 17:48:14 -0700, Eric Larson said:

pre-install-hooks: [
"apt-get install libxml2", # the person deploying the package assumes apt-get is available

Assumptions are evil. You could end up with multiple third-party applications each assuming different things. Aptitude, apt-get, brew, emerge, ports, …

No, _undefined_ assumptions are evil. When I say you are assuming some program is available, that is a decision made b/w a developer and person providing the system. My point is not to advocate a package format running commands. My point is to make sure two things happen:

1. Allow the package a way to define system level dependencies

2. Allow the package to have _internally_ agreed upon mechanisms for doing operations, meaning that the package format doesn't prescribe every action, it simply provides hooks.

You want to use a python script, callable or URL. That is too restrictive in my opinion. It should be a list of strings in terms of the format and the tool supporting the package should deal with them.

"run-some-shell-script.sh", # the shell script might do the following on a list of URLs

There is zero way of tracking what that does, so out of the gate that's a no-no, and full system chroots (not what I'm talking about in terms of chroot) require far too much organization/duplication/management.

The 'hooks' idea listed in my original document is for callbacks into the application. That callback would be one of:

:: A Python script to execute. (path notation)

:: A Python callable to execute. (dot-colon notation)

:: A URL within the application to GET. (url notation)

Arbitrary system-level commands are right out: Linux, UNIX, BSD, Windows, Solaris… good luck getting even simple commands to execute identically and predictably across platforms. The goal isn't to rewrite buildout!

See above.

Just b/c a command like apt-get is used it doesn't mean it is used as root. The point is not that you can install things via the package, but rather that you provide the system a way to install things as needed that the system can control.

A methodology of testing for the presence and capability of specific services (resources) is far more useful than rewriting buildout. "I need an SQL database of some kind." "I need this C library within these version boundaries." Etc. Those are reasonable predicates for installation. You can combine this application format with buildout, puppet, or brew-likes if you want to, though.

Personally, I'd rather not re-invent the wheel of a Linux distribution, thanks. I wouldn't even want an application server to touch system-wide configurations other than web server configurations for the applications hosted therein.

If you start telling the system what is supported then as a spec you have to support too many actions:
  pre-install-hooks: [
   ('install', ['libxml2', 'libxslt']),
   ('download', 'foo-library.tar.gz'),
   ('extract', 'foo-library.tar.gz'),
   ...
   # the idea being
   ($action, $args)
  ]

I define no actions, only a callback.

This is a pain in the neck as a protocol.

Unfortunately for your argument this is a protocol you invented, not one that I defined.

It is much simpler to have a list of "pre-install-hooks" and let the hosting system that is installing the package deal with those. If your system wants to run commands, you have the ability to do so. If you want to list package names that you install, go for it. If you have a tool that you want to use that the package can provide arguments, that is fine too. From the standpoint of a spec / API / package format, you don't really control the tool that acts on the package.

Bing. You finally understand what I defined.

This is the same problem that setuptools has. There isn't a record of what was installed.

That's a tool-level problem unrelated to application packaging. For a good example of a Python application that /does/ manage packages, file tracking, etc. have a look at Gentoo's Portage system.

It is safe to assume a deployed server has some software installed (nginx, postgres, wget, vim, etc.) and those requirements should usually be defined by some system administrator.

No application honestly cares what front-end web server it is running on unless it makes extensive use of very specific plugins (like Nginx's push notification service). Again, most of this is outside the scope of an application container format. Do your applications honestly need access to vim?

Also, assume nothing.

You missed my point. I wasn't talking about the application needing vim. I was talking about by using a system such as linux, you are building on a set of assumptions. Unix as a system provides some tools that it defines, which allows you assume they will be present. Call it an expectation or standard if you want, but the idea is the same. An assumption is when the guest on the system thought something would be there. Nothing wrong with that when it is actually there. The package should simply make clear these assumptions so the parent system can do what it needs to do.

But below you make some assumptions that I don't think are a good idea.

When an application requires that you install some library, it is helpful to that sysadmin because that person has some options when something is meant to be deployed:
1. If the library is incompatible and will break some other piece of software, you can know and stop the deployment right there

That's what the "sandbox" is for. I've been running Gentoo servers with 'slotting' mechanisms for > 10 years, now, and having multiple installed libraries that are incompatible with one-another is not unusual, unheard of, or difficult. (Three versions of PHP, three of Python, etc.)

2. If the application is going to be moved to another server, the sysadmin can go ahead and add that app's requirements to their own config (puppet class for example)

Puppet, buildout, etc. is, again, outside the scope. And if the application already defines requirements, what config file are you updating and duplicating the data needlessly within?

3. If two applications are running on the same machine, they may have inconsistent library requirements

That's what the "sandbox" is for.

Ok, I think this is an incorrect assumption.

4. If an application does fail and you need to roll back to a previous version, you can also roll back the system library that was installed with the application

That's what the "sandbox" is for.

Again, I think you shouldn't put everything in a "sandbox" without defining what you mean. I'm sure you have a good idea for what the sandbox should be outside of using virtualenv. My point is that by only stating what you need, you can let the system using the package deal with how it defines "sandboxes". I think that is a much better assumption when defining this package format, that you do not get to prescribe how the packages will run, but you will need to provide system level expectations.

Just to be clear, I'm not saying that a sandbox is a bad idea. Quite the contrary. My point is that defining the package system under the assumption there is a "sandbox", you really would also need to define what that "sandbox" really means. Is it like in a shared hosting environment where your sandbox is just a user directory, Is it an entire virtual machine dedicated to the app, or is it somewhere in between? The answers to all these questions change what you can do as an application developer, especially in terms of deployment.

Yes you can use different LD_PATHS for your sandboxed environment, but that is going to be up to the system administrator. By simply listing those dependencies you can let them keep their system according to their requirements.

See my above note on detecting vs. installing.

You never once said anything about virtual machines either. I feel that it is a natural progression though when you define a package that has an impact on the system requirements since if your application needs some library to run and you are under the assumption you have a "sandbox", then you might as well install things systemwide, which is a perfectly valid model when you have a cloud infrastructure or hypervisor.

You assume a natural progression where one does not exist. System packaging and virtual machines aren't even remotely related to each-other; this is all needless rhetoric.

These applications do /not/ have an impact on the underlying system because they are, by definition, in isolated sandboxes.

It just shouldn't be the assumption of the package format.

A sandbox isn't an assumption, it's a requirement. Very different beasts.

It is a terrible requirement that doesn't have to exist. I understand utilizing virtualenv b/c it is pretty easy and fits reasonably well. But a requirement that the host machine use a sandbox is wrong. It is not wrong in the sense that having a sandboxed place to run the application, but rather it is wrong in that the package doesn't need to tell the sysadmin side of things how to set up servers.

This has been my point all along. Don't make the package format prescribe a specific deployment technique, but instead list what you need to make it run. The system running the application can meet those needs then however it can according to how its system is configured.

Eric

Likewise, I sincerely hope that we can define a format that could make deployment easy for everyone involved. I'm convinced the deployment pain is really just a matter of incorrect assumptions between sysadmin and developers. This kind of format seems like an excellent place to put application assumptions and state requirements so the sysadmin side can easily handle them in a way that works within their constraints.

+1, but executing arbitrary commands (root or otherwise) is /not/ the way to do it. Executing package managers directly is /not/ the way to do it. Having a clear collection of predicates (app±version, lib±version, pkg±version, etc.) is The Right™ way to do it.

If you want a specific version of Apache to go with your application, or a brand new MySQL installation, use buildout. An application server's role is to mediate between these services and the installed application, and let the sysadmin do his job.

— Alice.

As an aside, who here doesn't run their production software on a homogenous hosting environment? Having unorganized servers of any kind will lead to Bad Stuff™ eventually.

Mine is Gentoo + Nginx + FastCGI PHP 4 & 5 + Python 2.6, 2.7, 3.1 + [MySQL + MongoDB, db servers only] + dcron + metalog + reiserfs + … all kept up-to-date and in sync across all servers… hell, I even have "application" configurations in Nginx which are generic and reusable, and shared between servers.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

Ian Bicking

unread,

Apr 13, 2011, 9:16:36 PM4/13/11

to web...@python.org

While we are focusing on points of contention, there may be more points of consensus, but we aren't talking about those.

So, some initial thoughts:

While initially reluctant to use zip files, after further discussion and thought they seem fine to me, so long as any tool that takes a zip file can also take a directory. The reverse might not be true -- for instance, I'd like a way to install or update a library for (and inside) an application, but I doubt I would make pip rewrite zip files to do this ;) But it could certainly work on directories. Supporting both isn't a big deal except that you can't do symlinks in a zip file.

I don't think we're talking about something like a buildout recipe. Well, Eric kind of brought something like that up... but otherwise I think the consensus is in that direction. So specifically if you need something like lxml the application specifies that somehow, but doesn't specify *how* that library is acquired. There is some disagreement on whether this is generally true, or only true for libraries that are not portable.

Something like a database takes this a bit further. We haven't really discussed it, but I think this is where it gets interesting. Silver Lining has one model for this. The general rule in Silver Lining is that you can't have anything with persistence without asking for it as a service, including an area to write files (except temporary files?) I assume everyone agrees that an application can't write to its own files (but of course it could execfile something in another location).

I suspect there's some disagreement about how the Python environment gets setup, specifically sys.path and any other application-specific customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in silvercustomize.py, and find it helpful). Describing the scope of this, it seems kind of boring. In, for example, App Engine you do all your setup in your runner -- I find this deeply annoying because it makes the runner the only entry point, and thus makes testing, scripts, etc. hard.

We would start with just WSGI. Other things could follow, but I don't see any reason to worry about that now. Maybe we should just punt on aggregate applications now too. I don't feel like there's anything we would do that would prevent other kinds of runtime models (besides the starting point, container-controlled WSGI), and the places to add support for new things are obvious enough (e.g., something like Silver Lining's platform setting). I would define a server with accompanying daemon processes as an "aggregate".

An important distinction to make, I believe, is application concerns and deployment concerns. For instance, what you do with logging is a deployment concern. Generating logging messages is of course an application concern. In practice these are often conflated, especially in the case of bespoke applications where the only person deploying the application is the person (or team) developing the application. It shouldn't be annoying for these users, though. Maybe it makes sense for people to be able to include tool-specific default settings in an application -- things that could be overridden, but especially for the case when the application is not widely reused it could be useful. (An example where Silver Lining gets is all backwards is I created a [production] section in app.ini when the very concept of "production" is not meaningful in that context -- but these kind of named profiles would make sense for actual application deployment tools.) An example of a setting currently in Silver Lining/app.ini that should become a tool-specific default setting would be "default_location" (the default place to upload your app to when you do "silver update").

There's actually a kind of layered way of thinking of this:

1. The first, maybe most important part, is how you get a proper Python environment. That includes sys.path of course, with all the accompanying libraries, but it also includes environment description. In Silver Lining there's two stages -- first, set some environmental variables (both general ones like $SILVER_CANONICAL_HOST and service-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.path proper, then import silvercustomize by which an environment can do any more customization it wants (e.g., set $DJANGO_SETTINGS_MODULE)
2. Define some basic generic metadata. "app_name" being the most obvious one.
3. Define how to get the WSGI app. This is WSGI specific, but (1) is *not* WSGI specific (it's only Python specific, and would apply well to other platforms)
4. Define some *web specific* metadata, like static files to serve. This isn't necessarily WSGI or even Python specific (not that we should bend backwards to be agnostic -- but in practice I think we'd have to bend backwards to make it Python-specific).
5. Define some lifecycle metadata, like update_fetch. These are generally commands to invoke. IMHO these can be ad hoc, but exist in the scope of (1) and a full "environment". So it's not radically different than anything else the app does, it's just we declare specific times these actions happen. URL fetching and script running are both fine, because we start at (1) and not (3) (this is in contrast to App Engine, which only defines (3) and so web requests are the only basis for doing anything)
6. Define services (or "resources" or whatever -- the name "resource" doesn't make as much sense to me, but that's bike shedding). These are things the app can't provide for itself, but requires (or perhaps only wants; e.g., an app might be able to use SQLite, but could also use PostgreSQL). While the list of services will increase over time, without a basic list most apps can't run at all. We also need a core set as a kind of reference implementation of what a fully-specified service *is*.
7. In Silver Lining I've distinguished active services (like a running database) from passive resources (like an installed binary library). I don't see a reason to conflate these, as they are so very different. Maybe this is part of why "resource" strikes me as an odd name for something like a database.

So... there's kind of some thoughts about process.

Alice Bevan–McGregor

unread,

Apr 14, 2011, 2:57:32 AM4/14/11

to web...@python.org

On 2011-04-13 18:16:36 -0700, Ian Bicking said:

> While initially reluctant to use zip files, after further discussion
> and thought they seem fine to me, so long as any tool that takes a zip
> file can also take a directory. The reverse might not be true -- for
> instance, I'd like a way to install or update a library for (and
> inside) an application, but I doubt I would make pip rewrite zip files
> to do this ;) But it could certainly work on directories. Supporting
> both isn't a big deal except that you can't do symlinks in a zip file.

I'm not talking about using zip files as per eggs, where the code is
maintained within the zip file during execution. It is merely a
packaging format with the software itself extracted from the zip during
installation / upgrade. A transitory container format. (Folders in
the end.)

Symlinks are an OS-specific feature, so those are out as a core
requirement. ;)

> I don't think we're talking about something like a buildout recipe.
> Well, Eric kind of brought something like that up... but otherwise I
> think the consensus is in that direction.

Ambiguous statements FTW, but I think I know what you meant. ;)

> So specifically if you need something like lxml the application
> specifies that somehow, but doesn't specify *how* that library is
> acquired. There is some disagreement on whether this is generally
> true, or only true for libraries that are not portable.

I think something along the lines of autoconf (those lovely ./configure
scripts you run when building GNU-style software from source) with
published base 'checkers' (predicates as I referred to them previously)
would be great. A clear way for an application to declare a
dependency, have the application server check those dependencies, then
notify the administrator installing the package.

I've seen several Python libraries that include the C library code that
they expose; while not so terribly efficient (i.e. you can't install
the C library once, then share it amongst venvs), it is effective for
small packages.

Larger (i.e. global or application-local) would require the
intervention of a systems administrator.

> Something like a database takes this a bit further. We haven't really
> discussed it, but I think this is where it gets interesting. Silver
> Lining has one model for this. The general rule in Silver Lining is
> that you can't have anything with persistence without asking for it as
> a service, including an area to write files (except temporary files?)

Databases are slightly more difficult; an application could ask for:

:: (Very Generic) A PEP-249 database connection.

:: (Generic) A relational database connection string.

:: (Specific) A connection string to a specific vendor of database.

:: (Odd) A NoSQL database connection string.

I've been making heavy use of MongoDB over the last year and a half,
but AFIK each NoSQL database engine does its own thing API-wise. (Then
there are ORMs on top of that, but passing a connection string like
mysql://user:pass@host/db or mongo://host/db is pretty universal.)

It is my intention to write an application server that is capable of
creating and securing databases on-the-fly. This would require fairly
high-level privileges in the database engine, but would result in far
more "plug-and-play" configuration. Obviously when deleting an
application you will have the opportunity to delete the database and
associated user.

> I assume everyone agrees that an application can't write to its own
> files (but of course it could execfile something in another location).

+1; that _almost_ goes without saying. :) At the same time, an
application server /must not/ require root access to do its work, thus
no mandating of (real) chroots, on-the-fly user creation, etc.

There are ways around almost all security policies, but where possible
setting the read-only flag (Windows) or removing write (chmod -w on
POSIX systems) should be enough to prevent casual abuse.

> I suspect there's some disagreement about how the Python environment
> gets setup, specifically sys.path and any other application-specific
> customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in
> silvercustomize.py, and find it helpful).

Similar to Paste's "here" variable for INI files, having some method of
the application defining environment variables with base path
references would be needed.

I've tossed out my idea of sharing dependencies, BTW, so a simple
extraction of the zipped application into one package folder (linked in
using a .pth file) with the dependencies installed into an app-packages
folder in the path (like site-packages) would be ideal. At least, for
me. ;)

> Describing the scope of this, it seems kind of boring. In, for
> example, App Engine you do all your setup in your runner -- I find this
> deeply annoying because it makes the runner the only entry point, and
> thus makes testing, scripts, etc. hard.

I agree; that's a short-sighted approach to an application container
format. There should be some way to advertise a test suite and, for
example, have the suite run before installation or during upgrade.
(Rolling back the upgrade process thus far if there is a failure.)

My shiny end goal would be a form of continuous deployment: a git-based
application which gets a post-commit notification, pulls the latest,
runs the tests, rolls back on failure or fully deploys the update on
success.

> We would start with just WSGI. Other things could follow, but I don't
> see any reason to worry about that now. Maybe we should just punt on
> aggregate applications now too. I don't feel like there's anything we
> would do that would prevent other kinds of runtime models (besides the
> starting point, container-controlled WSGI), and the places to add
> support for new things are obvious enough (e.g., something like Silver
> Lining's platform setting). I would define a server with accompanying
> daemon processes as an "aggregate".

Since in my model the application server does not proxy requests to the
instantiated applications (each running in its own process), I'm not
sure I'm interpreting what you mean by an aggregate application
properly.

If "my" application server managed Nginx or Apache configurations,
dispatch to applications based on base path would be very easy to do
while still keeping the applications isolated.

> An important distinction to make, I believe, is application concerns
> and deployment concerns. For instance, what you do with logging is a
> deployment concern. Generating logging messages is of course an
> application concern. In practice these are often conflated, especially
> in the case of bespoke applications where the only person deploying the
> application is the person (or team) developing the application. It
> shouldn't be annoying for these users, though. Maybe it makes sense
> for people to be able to include tool-specific default settings in an
> application -- things that could be overridden, but especially for the
> case when the application is not widely reused it could be useful. (An
> example where Silver Lining gets is all backwards is I created a
> [production] section in app.ini when the very concept of "production"
> is not meaningful in that context -- but these kind of named profiles
> would make sense for actual application deployment tools.)

Having an application define default logging levels for different
scopes would be very useful. The application server could take those
defaults, and allow an administrator to modify them or define
additional scopes quite easily.

> There's actually a kind of layered way of thinking of this:
>
> 1. The first, maybe most important part, is how you get a proper Python
> environment. That includes sys.path of course, with all the
> accompanying libraries, but it also includes environment description.

Virtualenv-like, with the application itself linked in via a .pth file
(a la setup.py develop, allowing inline upgrades via SCM) and
dependencies extracted from the zip distributable into an app-packages
folder a la site-packages.

I don't install global Python modules on any of my servers, so the
--no-site-packages option is somewhat unnecessary for me, but having
something similar would be useful, too. Unfortunately, that one
feature seems to require a lot of additional work.

> In Silver Lining there's two stages -- first, set some environmental
> variables (both general ones like $SILVER_CANONICAL_HOST and
> service-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.path
> proper, then import silvercustomize by which an environment can do any
> more customization it wants (e.g., set $DJANGO_SETTINGS_MODULE)

Environment variables are typeless (raw strings) and thus less than
optimum for sharing rich configurations.

Host names depend on how the application is mounted, and a single
application may be mounted to multiple domains or paths, so utilizing
the front end web server's rewriting capability is probably the best
solution for that.

What about multiple database connections? Environment variables are
also not so good for repeated values.

A /few/ environment variables are a good idea, though:

:: TMPDIR — when don't you need temporary files?

:: APP_CONFIG_PATH — the path to a YAML file containing the real configuration.

The configuration file would even include a dict-based logging
configuration routing all messages to the parent app server for final
delivery, removing the need for per-app logging files, etc.

> 2. Define some basic generic metadata. "app_name" being the most obvious one.

The standard Python setup metadata is pretty good:

:: Application title.
:: Application (package) name.
:: Short description.
:: Long description / documentation.
:: Author information.
:: License.
:: Source information (URL, download URL).
:: Dependencies.
:: Entry point-style hooks. (Post-install, pre/post upgrade,
pre-removal, etc.)

Likely others.

> 3. Define how to get the WSGI app. This is WSGI specific, but (1) is
> *not* WSGI specific (it's only Python specific, and would apply well to
> other platforms)

I could imagine there would be multiple "application types":

:: WSGI application. Define a package dot-notation entry point to a
WSGI application factory.

:: Networked daemon. This would allow deployment of Twisted services,
for example. Define a package dot-notation entry point to the 'main'
callable.

Again, there are likely others, but those are the big two. In both of
these cases the configuration (loaded automatically) could be passed as
a dict to the callable.

> 4. Define some *web specific* metadata, like static files to serve.
> This isn't necessarily WSGI or even Python specific (not that we should
> bend backwards to be agnostic -- but in practice I think we'd have to
> bend backwards to make it Python-specific).

Explicitly defining the paths to static files is not just a good idea,
it's The Slaw™.

> 5. Define some lifecycle metadata, like update_fetch. These are
> generally commands to invoke. IMHO these can be ad hoc, but exist in
> the scope of (1) and a full "environment". So it's not radically
> different than anything else the app does, it's just we declare
> specific times these actions happen.

Script name, dot-notation callable, or URL. I see those as the 'big
three' to support. Using a dot-notation callable has the same benefit
as my comments to #3.

The URL would be relative to wherever the application is mounted within
a domain, of course.

> 6. Define services (or "resources" or whatever -- the name "resource"
> doesn't make as much sense to me, but that's bike shedding). These are
> things the app can't provide for itself, but requires (or perhaps only
> wants; e.g., an app might be able to use SQLite, but could also use
> PostgreSQL). While the list of services will increase over time,
> without a basic list most apps can't run at all. We also need a core
> set as a kind of reference implementation of what a fully-specified
> service *is*.

I touched on this up above; any DBAPI compliant database or various
configuration strings. (I'd implement this as a string-like object
with accessor properties so you can pass it to SQLAlchemy straight, or
dissect it to do something custom.)

More below.

> 7. In Silver Lining I've distinguished active services (like a running
> database) from passive resources (like an installed binary library). I
> don't see a reason to conflate these, as they are so very different.
> Maybe this is part of why "resource" strikes me as an odd name for
> something like a database.

You hit the terminology perfectly: active services (such as databases)
are just that, services. Installed binary libraries are resources. :)

> So... there's kind of some thoughts about process.

Good stuff.

— Alice.

_______________________________________________
Web-SIG mailing list
Web...@python.org
Web SIG: http://www.python.org/sigs/web-sig

Unsubscribe: http://mail.python.org/mailman/options/web-sig/python-web-sig-garchive-9074%40googlegroups.com

Graham Dumpleton

unread,

Apr 14, 2011, 3:53:09 AM4/14/11

to Alice Bevan–McGregor, web...@python.org

On 14 April 2011 16:57, Alice Bevan–McGregor <al...@gothcandy.com> wrote:
>> 3. Define how to get the WSGI app. This is WSGI specific, but (1) is
>> *not* WSGI specific (it's only Python specific, and would apply well to
>> other platforms)
>
> I could imagine there would be multiple "application types":
>
> :: WSGI application. Define a package dot-notation entry point to a WSGI
> application factory.

Why can't it be a path to a WSGI script file. This actually works more
universally as it works for servers which map URLs to file based
resources as well. Also allows alternate extensions than .py and also
allows basename of file name to be arbitrarily named, both of which
help with those same servers which map URLs to file base resources. It
also allows same name WSGI script file to exist in multiple locations
managed by same server without having to create an overarching package
structure with __init__.py files everywhere.

For WSGI servers which currently require a dotted path, eg gunicorn:

gunicorn myapp

Then it changes to also allow:

gunicorn --script myapp.wsgi

The server just has to construct a new Python module with a __name__
which relates to the absolute file system path and exec code within
that context to create the module itself. Nothing too difficult.

Because the WSGI script file is identified by explicit filesystem
path, you don't have to worry about what current working directory is
or otherwise set sys.path to allow it to be imported initially. The
WSGI script file then can itself even be responsible for further setup
of sys.path as appropriate and so be more self contained and not
dependent on an external launch system.

I have also always seen it as a PITA that for various of the WSGI
servers you always had to do:

python myapp.py

and in the end of myapp.py add bolier plate like:

from wsgiref.simple_server import make_server

httpd = make_server('', 8000, application)
print "Serving on port 8000..."
httpd.serve_forever()

Use a different server which required such boilerplate and you had to change it.

Even where WSGI servers allowed you to specific a Python module as
command line argument, options all differed and you also needed to
know where WSGI server was installed to run it.

Using a WSGI script file as the lowest common denominator, it would
also be nice to be able to do something like:

python -m gunicorn.server myapp.wsgi
python -m wsgiref.server myapp.wsgi

Ie., use the '-m' option for Python command line to have the installed
module act as the processor for the WSGI script file, thereby avoiding
the need to modify the script. This lowest common denominator option
could handle a few common options which all servers would need to
accept such as listener host, port and perhaps even concepts of
processes/threads.

If you really wanted to tie the script to a particular method, but
still make it easy to use something else instead, then do it with a #!
line.

#!/usr/bin/env python -m gunicorn -- --host localhost --port 8000

with the rest of the file being the normal WSGI script file contents,
without any special __main__ section as that is handled by the #!
line.

FWIW, I did bring this up a couple of years back, but then there was
little interest back then in trying to standardise deployment setup so
there was some measure of commonality between WSGI servers.

Graham

Roberto De Ioris

unread,

Apr 14, 2011, 4:09:23 AM4/14/11

to Graham Dumpleton, web...@python.org

Il giorno 14/apr/2011, alle ore 09.53, Graham Dumpleton ha scritto:

> On 14 April 2011 16:57, Alice Bevan–McGregor <al...@gothcandy.com> wrote:
>>> 3. Define how to get the WSGI app. This is WSGI specific, but (1) is
>>> *not* WSGI specific (it's only Python specific, and would apply well to
>>> other platforms)
>>
>> I could imagine there would be multiple "application types":
>>
>> :: WSGI application. Define a package dot-notation entry point to a WSGI
>> application factory.
>
> Why can't it be a path to a WSGI script file. This actually works more
> universally as it works for servers which map URLs to file based
> resources as well. Also allows alternate extensions than .py and also
> allows basename of file name to be arbitrarily named, both of which
> help with those same servers which map URLs to file base resources. It
> also allows same name WSGI script file to exist in multiple locations
> managed by same server without having to create an overarching package
> structure with __init__.py files everywhere.
>

+1 for this

uWSGI started with module-approach configuration only (as gunicorn) but
i added support for wsgi-file as soon as i realized that file-based approach is a lot more useful/handy
(no need to make mess with the pythonpath or add __init__.py file all over the place as Graham said).

Pinax (as an example) has a deploy/pinax.wsgi file that you can use as an entry point for your app independently by your filesystem/pythonpath choices.
It worked (at least for my company where we host hundreds of WSGI apps) 100% of the time and without users pain. I cannot say the same for the module approach
(yes, a lot of users are not very confortable with PYTHONPATH/sys.path.... probably they should change work but why destroying their life when we have already a solution
working by years :P )

--
Roberto De Ioris
http://unbit.it

Alice Bevan–McGregor

unread,

Apr 14, 2011, 4:22:11 AM4/14/11

to web...@python.org

Howdy!

I suspect you're thinking a little too low-level.

On 2011-04-14 00:53:09 -0700, Graham Dumpleton said:

> On 14 April 2011 16:57, Alice Bevan–McGregor
> <al...@gothcandy.com> wrote:
>>> 3. Define how to get the WSGI app. This is WSGI specific, but (1) is
>>> *not* WSGI specific (it's only Python specific, and would apply well to
>>> other platforms)
>>
>> I could imagine there would be multiple "application types":
>>
>> :: WSGI application. Define a package dot-notation entry point to a
>> WSGI application factory.
>

> Why can't it be a path to a WSGI script file?

No reason it couldn't be.

app.type = wsgi
app.target = /myapp.wsgi:application

(Paths relative to the folder the application is installed into, and
dots after a slash are filename parts, not module separators.)

But then, how do you configure it? Using a factory (which is passed
the from-appserver configuration) makes a lot of sense.

> This actually works more universally as it works for servers which map
> URLs to file based
> resources as well.

First, .wsgi files (after a few quick Google searches) are only used by
mod_wsgi. I wouldn't call that "universal", unless you can point out
the other major web servers that support that format.

You'll have to describe the "map URLs to file based resources" issue,
since every web server I've ever encountered (Apache, Nginx, Lighttpd,
etc.) works that way. Only if someone is willing to get really hokey
with the system described thus far would any application-scope web
servers be running.

> Also allows alternate extensions than .py and also allows basename of
> file name to be arbitrarily named, both of which help with those same
> servers which map URLs to file base resources.

Again, you'll have to elaborate or at least point to some existing
documentation on this.

I've never encountered a problem with that, nor do any of my scripts
end in .py.

> It also allows same name WSGI script file to exist in multiple
> locations managed by same server without having to create an
> overarching package structure with __init__.py files everywhere.

Packages aren't a bad thing. In fact, as described so far, a top level
package is required.

> For WSGI servers which currently require a dotted path, eg gunicorn:

See my note above; choice of Python-level HTTP interface is not up to
the application, though by all means there should be some simple way to
"launch" a development server.

> The WSGI script file then can itself even be responsible for further
> setup of sys.path as appropriate and so be more self contained and not
> dependent on an external launch system.

The -point- (AFIK/IMHO) is to be dependent on an external launch system.

> and in the end of myapp.py add bolier plate like:
>
> from wsgiref.simple_server import make_server
>
> httpd = make_server('', 8000, application)
> print "Serving on port 8000..."
> httpd.serve_forever()

Again, I've never described anything that would require that nonsense.
WSGI callable, preferably a factory callable, that's it.

> Use a different server which required such boilerplate and you had to
> change it.

Not the problem of the application.

> Using a WSGI script file as the lowest common denominator, it would
> also be nice to be able to do something like:
>
> python -m gunicorn.server myapp.wsgi
> python -m wsgiref.server myapp.wsgi

Not a half bad idea, but again, no reason to restrict it to .wsgi
files. (That's also a completely different problem then an
"applicaiton format" currently under discussion.)

I've written and rewritten my dot-colon-notation system enough that it
supports:

:: /path[/sub[...]][:object[.property]] (even if it has to execfile it)
:: package[.module[...]][/folder[...]][:object[.property]]

I think that syntax pretty much covers everything, including .wsgi
files (/path/to/foo.wsgi:application). The implementation of the above
is fully unit tested, and I really don't mind people stealing it. ;)

— Alice.

Graham Dumpleton

unread,

Apr 14, 2011, 5:02:28 AM4/14/11

to Alice Bevan–McGregor, web...@python.org

On 14 April 2011 18:22, Alice Bevan–McGregor <al...@gothcandy.com> wrote:
> Howdy!
>
> I suspect you're thinking a little too low-level.

Exactly, I am trying to walk before running. Things always fall down
here because people try and take too large a leap rather than an
incremental approach, solving one small problem at a time.

Thus please don't think that because I am replying to your message
that I am specifically commenting about your plans. See this as a side
comment and don't try and evaluate it only in the context of your
ideas.

> On 2011-04-14 00:53:09 -0700, Graham Dumpleton said:
>
>> On 14 April 2011 16:57, Alice Bevan–McGregor <al...@gothcandy.com> wrote:
>>>>
>>>> 3. Define how to get the WSGI app. This is WSGI specific, but (1) is
>>>> *not* WSGI specific (it's only Python specific, and would apply well to
>>>> other platforms)
>>>
>>> I could imagine there would be multiple "application types":
>>>
>>> :: WSGI application. Define a package dot-notation entry point to a WSGI
>>> application factory.
>>
>> Why can't it be a path to a WSGI script file?
>
> No reason it couldn't be.
>
> app.type = wsgi
> app.target = /myapp.wsgi:application
>
> (Paths relative to the folder the application is installed into, and dots
> after a slash are filename parts, not module separators.)
>
> But then, how do you configure it? Using a factory (which is passed the
> from-appserver configuration) makes a lot of sense.
>
>> This actually works more universally as it works for servers which map
>> URLs to file based
>> resources as well.
>
> First, .wsgi files (after a few quick Google searches) are only used by
> mod_wsgi. I wouldn't call that "universal", unless you can point out the
> other major web servers that support that format.

The WGSI module for nginx used them, as does uWSGI and either one of
Phusion Passenger or new Mongrel WSGI support rely on a script file.

You also have CGI, FASTCGI, SCGI and AJP also using script files.

Don't get hung up on the extension of .wsgi, it is the concept of a
script file which is stored in the file system in an arbitrary
location to which a URL maps.

> You'll have to describe the "map URLs to file based resources" issue, since
> every web server I've ever encountered (Apache, Nginx, Lighttpd, etc.) works
> that way.

Which supports what I am saying, but you for some reason decided to
focus on '.wsgi' as an extension which wasn't the point.

> Only if someone is willing to get really hokey with the system
> described thus far would any application-scope web servers be running.

Forget for a moment trying to tie this to your larger designs and see
it as more of a basic underlying concept. Ie., the baby step before
you try and run.

>> Also allows alternate extensions than .py and also allows basename of file
>> name to be arbitrarily named, both of which help with those same servers
>> which map URLs to file base resources.
>
> Again, you'll have to elaborate or at least point to some existing
> documentation on this.
>
> I've never encountered a problem with that, nor do any of my scripts end in
> .py.

Lack of an extension is fine if you have configured Apache with a
dedicated cgi-bin or fastcgi-bin directory where an extension is
irrelevant because you have:

SetHandler cgi-script

But many Apache server configurations use:

AddHandler cgi-script .py

Ie., handler dispatch is based off extension, the .py extension quite
often being associated with CGI script execution.

You often see:

AddHandler fcgid-script .fcgid

Which says certain resource is to be started up as FASTCGI process.

For both these it expects those scripts to be self contained programs
which fire up the mechanics of interfacing with CGI or FASTCGI
protocols.

This means that you usually have to stick that boilerplate at the end
of the script.

This is where though FASTCGI deployment usually sucks bad. This is
because it is put on the user to get the boilerplate and remainder of
WSGI script perfect from the outset. If you don't, because FASTCGI
technically doesn't allow for stdout/stderr at point of startup, if
there is an error on import it is lost and user has no idea. So many
times you see people winging about setting up stuff on the likes of
DreamHost because of FASTCGI being a pain like this.

In the PHP world they don't have to deal with this boilerplate
nonsense. Instead there is a PHP launcher script associated with
FASTCGI module. So you have:

AddHandler fcgid-script .php

but also a mapping in FASTCGI module configuration that says rather
than execute .php script if runs the launcher script instead. That way
the launcher script can get everything setup properly to then call
into the script.

Nothing exists for Python like that, but if you did then it makes no
sense to use .py because of the mapping that extension often already
has in Apache. In that case you would have .wsgi script file mapped to
FASTCGI but FASTCGI configured to run a WSGI launcher. That launcher
script would setup stdout/stderr, ensure flup is loaded properly and
only then load the WSGI script file and execute. This way the system
administrators could ensure the launcher is working and users only
have to worry about dumping a WSGI script file with right extension in
a directory and it will work without all the pain. Also allows the
system admins to properly control number of processes/threads whereas
at present users can override what system admins would like to
restrict them to.

So, a concept of a script file simply works better with Apache and to
some degree other servers. This is because of how such servers
determine what handler to use from the extension

As to the file name, you can't stop people using arbitrary stuff in
file names, ie., dashes as a prime example. So when using servers
which map URLs to file system resources you have to deal with it.

>> It also allows same name WSGI script file to exist in multiple locations
>> managed by same server without having to create an overarching package
>> structure with __init__.py files everywhere.
>
> Packages aren't a bad thing. In fact, as described so far, a top level
> package is required.

You are thinking ahead to your bigger ideas. That isn't what I am
talking about. You can't when using a web server which can map URLs to
resources within a hierarchical directory structure have that
structure be a package with __init__.py files in directories, it just
doesn't work as all the scripts could be totally unrelated and not
part of one application.

Graham

Daniel Holth

unread,

Apr 14, 2011, 11:01:13 AM4/14/11

to python-...@googlegroups.com

+1 on the .wsgi file. I certainly have scripts that are a .wsgi file and nothing else. How about this:

PWAPF level 0:

directory.wsgi/app.wsgi

PWAPF level 1:

directory.wsgi/app.wsgi
directory.wsgi/lib/python/(what would normally be in site-packages)
directory.wsgi/src/

Eric Larson

unread,

Apr 14, 2011, 11:45:59 AM4/14/11

to Graham Dumpleton, Alice Bevan–McGregor, web...@python.org

Hi,

I want to give a big +1 for Graham's suggestion. Using a script is a great way to make the communication between the larger system and the application trivial. The larger system needs to know how to run the app. If it is a script then you just run the script. There should still be some information regarding apache/nginx config if necessary, but basing that on the expectation there is a single script is a better approach than presuming a config can provide enough information to eventually create some script that apache/nginx/etc. might need to use.

Eric

> Graham
> _______________________________________________
> Web-SIG mailing list
> Web...@python.org
> Web SIG: http://www.python.org/sigs/web-sig

> Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

Randy Syring

unread,

Apr 14, 2011, 11:53:55 AM4/14/11

to web...@python.org

On 04/14/2011 11:45 AM, Eric Larson wrote:
> ...regarding apache/nginx config if necessary, but basing that on the expectation there is a single script is a better approach than presuming a config can provide enough information to eventually create some script that apache/nginx/etc. might need to use.
Just wondering if Windows/IIS is being kept in mind as this discussion
is going on. I am having a hard time conceptualizing the things being
discussed, so can't really tell myself.

Thanks.

--------------------------------------
Randy Syring
Intelicom
Direct: 502-276-0459
Office: 502-212-9913

For the wages of sin is death, but the
free gift of God is eternal life in
Christ Jesus our Lord (Rom 6:23)

Ian Bicking

unread,

Apr 14, 2011, 12:21:39 PM4/14/11

to Graham Dumpleton, Alice Bevan–McGregor, web...@python.org

On Thu, Apr 14, 2011 at 2:53 AM, Graham Dumpleton <graham.d...@gmail.com> wrote:

On 14 April 2011 16:57, Alice Bevan–McGregor <al...@gothcandy.com> wrote:
>> 3. Define how to get the WSGI app. This is WSGI specific, but (1) is
>> *not* WSGI specific (it's only Python specific, and would apply well to
>> other platforms)
>
> I could imagine there would be multiple "application types":
>
> :: WSGI application. Define a package dot-notation entry point to a WSGI
> application factory.

Why can't it be a path to a WSGI script file. This actually works more

universally as it works for servers whichttps://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-298h map URLs to file based

resources as well. Also allows alternate extensions than .py and also
allows basename of file name to be arbitrarily named, both of which
help with those same servers which map URLs to file base resources. It
also allows same name WSGI script file to exist in multiple locations
managed by same server without having to create an overarching package
structure with __init__.py files everywhere.

The main way to load applications in Silver Lining is basically like a wsgi script; or more specifically a file that is exec'd and it looks specifically for a variable application. Silver Lining also supports Paste Deploy .ini files, but in practice this doesn't seem that important (after all you can run paste.deploy.loadapp in the script).

In this case the mapping of filenames and use of extensions doesn't matter, as applications would not be compelled to use any particular extension, and traversing into the application wouldn't make sense.

Another thing that is common with .wsgi files (and similarly for App Engine script handlers) is that developers do all sorts of initialization (like changing sys.path etc). This makes it hard to access the application except through that entry point, thus requiring all access to be in the form of URL fetching (again like App Engine). So on one hand I like the .wsgi file technique; on the other hand I don't ;)

Most of what we're talking about is, in Silver Lining, implemented in silversupport.appconfig. Particular pieces:

Loading the application:
https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-310
Set up sys.path:
https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-298
Set up services:
https://bitbucket.org/ianb/silverlining/src/tip/silversupport/appconfig.py#cl-223

There's going to have to be a bit of indirection with services, as an application is asking in effect for an interface, and each tool may implement that interface differently (maybe a package could provide sort of an abstract base class for these, but the specific implementation is going to be very deployment-tool-specific).

Also generally more is setup before the .wsgi-like script is executed in Silver Lining than in mod_wsgi. Well, here's the actual mod_wsgi-.wsgi script that Silver Lining uses:
https://bitbucket.org/ianb/silverlining/src/8597f52305be/silverlining/mgr-scripts/master-runner.py
But it's a bit confusing because it translates a bunch of variables set by the rather obtuse Apache config to figure out what application to run and how. But sys.path is fixed up, services are "activated" (mostly meaning they set their environmental variables), stderr/stdout is fixed up (since there's some sense of logging in the system, I felt there was no reason to bar use of those streams), and then some tool-specific stuff is done (e.g., fixing up the request URL given the Varnish setup). These are the examples of the kind of detailed specification of parts of the environment that I guess we need to have -- it's really how the entire process is setup that we need to specify, not just the WSGI request portion (which at least we don't have to specify much since that's done).

Ian

Ian Bicking

unread,

Apr 14, 2011, 1:34:59 PM4/14/11

to Alice Bevan–McGregor, web...@python.org

I think there's a general concept we should have, which I'll call a "script" -- but basically it's a script to run (__main__-style), a callable to call (module:name), or a URL to fetch internally. I want to keep this distinct from anything long-running, which is a much more complex deal. I think given the three options, and for general simplicity, the script can be successful or have an error (for Python code: exception or no; for __main__: zero exit code or no; for a URL: 2xx code or no), and can return some text (which may only be informational, not structured?)

An application configuration could refer to scripts under different names, to be invoked at different stages.

On Thu, Apr 14, 2011 at 1:57 AM, Alice Bevan–McGregor <al...@gothcandy.com> wrote:

On 2011-04-13 18:16:36 -0700, Ian Bicking said:

While initially reluctant to use zip files, after further discussion and thought they seem fine to me, so long as any tool that takes a zip file can also take a directory. The reverse might not be true -- for instance, I'd like a way to install or update a library for (and inside) an application, but I doubt I would make pip rewrite zip files to do this ;) But it could certainly work on directories. Supporting both isn't a big deal except that you can't do symlinks in a zip file.

I'm not talking about using zip files as per eggs, where the code is maintained within the zip file during execution. It is merely a packaging format with the software itself extracted from the zip during installation / upgrade. A transitory container format. (Folders in the end.)

Symlinks are an OS-specific feature, so those are out as a core requirement. ;)

I don't think we're talking about something like a buildout recipe. Well, Eric kind of brought something like that up... but otherwise I think the consensus is in that direction.

Ambiguous statements FTW, but I think I know what you meant. ;)

So specifically if you need something like lxml the application specifies that somehow, but doesn't specify *how* that library is acquired. There is some disagreement on whether this is generally true, or only true for libraries that are not portable.

+1

I think something along the lines of autoconf (those lovely ./configure scripts you run when building GNU-style software from source) with published base 'checkers' (predicates as I referred to them previously) would be great. A clear way for an application to declare a dependency, have the application server check those dependencies, then notify the administrator installing the package.

There could be an optional self-test script, where the application could do a last self-check -- import whatever it wanted, check db settings, etc. Of course we'd want to know what it needed *before* the self-check to try to provide it, but double-checking is of course good too.

One advantage to a separate script instead of just one script-on-install is that you can more easily indicate *why* the installation failed. For instance, script-on-install might fail because it can't create the database tables it needs, which is a different kind of error than a library not being installed, or being fundamentally incompatible with the container it is in. In some sense maybe that's because we aren't proposing a rich error system -- but realistically a lot of these errors will be TypeError, ImportError, etc., and trying to normalize those errors to some richer meaning is unlikely to be done effectively (especially since error cases are hard to test, since they are the things you weren't expecting).

I've seen several Python libraries that include the C library code that they expose; while not so terribly efficient (i.e. you can't install the C library once, then share it amongst venvs), it is effective for small packages.

Generally compiling seems fairly reliable these days, but it does typically require more system-level packages be installed (e.g., python-dev). Actually invoking these installations in an automated and reliable way seems hard to me. I find debs/rpms to work well for these cases. There is some challenge when you need something that isn't packaged, but in many ways the work you need to do is always going to be the same work you'd need to do to package that library or the new version of that library. So I'm inclined to ask people to lean on the existing OS-level tooling for dealing with these libraries.

Larger (i.e. global or application-local) would require the intervention of a systems administrator.

Something like a database takes this a bit further. We haven't really discussed it, but I think this is where it gets interesting. Silver Lining has one model for this. The general rule in Silver Lining is that you can't have anything with persistence without asking for it as a service, including an area to write files (except temporary files?)

+1

Databases are slightly more difficult; an application could ask for:

:: (Very Generic) A PEP-249 database connection.

:: (Generic) A relational database connection string.

:: (Specific) A connection string to a specific vendor of database.

:: (Odd) A NoSQL database connection string.

I've been making heavy use of MongoDB over the last year and a half, but AFIK each NoSQL database engine does its own thing API-wise. (Then there are ORMs on top of that, but passing a connection string like mysql://user:pass@host/db or mongo://host/db is pretty universal.)

It is my intention to write an application server that is capable of creating and securing databases on-the-fly. This would require fairly high-level privileges in the database engine, but would result in far more "plug-and-play" configuration. Obviously when deleting an application you will have the opportunity to delete the database and associated user.

Categorizing services seems unnecessary. I'd like to see maybe an | operator, and a distinction between required and optional services. E.g.:

require_service:
    - mysql | postgresql | firebird

Or:

require_service:
    - files
optional_service:
    - mysql | postgresql

And then there's a lot more you could do... which one do you prefer, for instance. Or things like GIS extensions to databases are... tricky, as they are somewhat orthogonal to other aspects of the database (for Silver Lining I have a postgis service, which does extra setup and installation to give you a GIS-enabled database). But we can also just ask that when things get tricky people just make fancier services (e.g., you could make a "dbapi" service that itself figured out what specific backend to install).

Tricky things:
- You need something funny like multiple databases. This is very service-specific anyway, and there might sometimes need to be a way to configure the service. It's also a fairly obscure need.
- You need multiple applications to share data. This is hard, not sure how to handle it. Maybe punt for now.

I suspect there's some disagreement about how the Python environment gets setup, specifically sys.path and any other application-specific customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in silvercustomize.py, and find it helpful).

Similar to Paste's "here" variable for INI files, having some method of the application defining environment variables with base path references would be needed.

I always assume everything must be relative to the root of the directory.

I've tossed out my idea of sharing dependencies, BTW, so a simple extraction of the zipped application into one package folder (linked in using a .pth file) with the dependencies installed into an app-packages folder in the path (like site-packages) would be ideal. At least, for me. ;)

Describing the scope of this, it seems kind of boring. In, for example, App Engine you do all your setup in your runner -- I find this deeply annoying because it makes the runner the only entry point, and thus makes testing, scripts, etc. hard.

I agree; that's a short-sighted approach to an application container format. There should be some way to advertise a test suite and, for example, have the suite run before installation or during upgrade. (Rolling back the upgrade process thus far if there is a failure.)

My shiny end goal would be a form of continuous deployment: a git-based application which gets a post-commit notification, pulls the latest, runs the tests, rolls back on failure or fully deploys the update on success.

We would start with just WSGI. Other things could follow, but I don't see any reason to worry about that now. Maybe we should just punt on aggregate applications now too. I don't feel like there's anything we would do that would prevent other kinds of runtime models (besides the starting point, container-controlled WSGI), and the places to add support for new things are obvious enough (e.g., something like Silver Lining's platform setting). I would define a server with accompanying daemon processes as an "aggregate".

Since in my model the application server does not proxy requests to the instantiated applications (each running in its own process), I'm not sure I'm interpreting what you mean by an aggregate application properly.

You mean, the application provides its own HTTP server? I certainly wouldn't expect that...?

Anyway, in terms of aggregate, I mean something like a "site" that is made up of many "applications", and maybe those applications are interdependent in some fashion. That adds lots of complications, and though there's lots of use cases for that I think it's easier to think in terms apps as simpler building blocks for now.

If "my" application server managed Nginx or Apache configurations, dispatch to applications based on base path would be very easy to do while still keeping the applications isolated.

Sure; these would be tool options, and if you set everything up you are requiring the deployer to invoke the tools correctly to get everything in place. Which is a fine starting point before formalizing anything.

An important distinction to make, I believe, is application concerns and deployment concerns. For instance, what you do with logging is a deployment concern. Generating logging messages is of course an application concern. In practice these are often conflated, especially in the case of bespoke applications where the only person deploying the application is the person (or team) developing the application. It shouldn't be annoying for these users, though. Maybe it makes sense for people to be able to include tool-specific default settings in an application -- things that could be overridden, but especially for the case when the application is not widely reused it could be useful. (An example where Silver Lining gets is all backwards is I created a [production] section in app.ini when the very concept of "production" is not meaningful in that context -- but these kind of named profiles would make sense for actual application deployment tools.)

Having an application define default logging levels for different scopes would be very useful. The application server could take those defaults, and allow an administrator to modify them or define additional scopes quite easily.

Hm... I guess this is an ordering question. You could import logging and setup defaults, but that doesn't give the container a chance to overwrite those defaults. You could have the container setup logging, then make sure the app sets defaults only when the container hasn't -- but I'm not sure if it's easy to use the logging module that way.

Well, maybe that's not hard -- if you have something like silvercustomize.py that is always imported, and imported fairly early on, then have the container overwrite logging settings before it *does* anything (e.g., sends a request) then you should be okay?

There's actually a kind of layered way of thinking of this:

1. The first, maybe most important part, is how you get a proper Python environment. That includes sys.path of course, with all the accompanying libraries, but it also includes environment description.

Virtualenv-like, with the application itself linked in via a .pth file (a la setup.py develop, allowing inline upgrades via SCM) and dependencies extracted from the zip distributable into an app-packages folder a la site-packages.

I don't install global Python modules on any of my servers, so the --no-site-packages option is somewhat unnecessary for me, but having something similar would be useful, too. Unfortunately, that one feature seems to require a lot of additional work.

In Silver Lining there's two stages -- first, set some environmental variables (both general ones like $SILVER_CANONICAL_HOST and service-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.path proper, then import silvercustomize by which an environment can do any more customization it wants (e.g., set $DJANGO_SETTINGS_MODULE)

Environment variables are typeless (raw strings) and thus less than optimum for sharing rich configurations.

Rich configurations are problematic in their own ways. While the str-key/str-value of os.environ is somewhat limited, I wouldn't want anything richer than JSON (list, dict, str, numbers, bools). And then we have to figure out a place to drop the configuration. Because we are configuring the *process*, not a particular application or request handler, a callable isn't great (unless we expect the callable to drop the config somewhere and other things to pick it up?)

Host names depend on how the application is mounted, and a single application may be mounted to multiple domains or paths, so utilizing the front end web server's rewriting capability is probably the best solution for that.

I found at least giving one valid hostname (and yes, should include a path) was important for many applications. E.g., a bunch of apps have tendencies to put hostnames in the database.

What about multiple database connections? Environment variables are also not so good for repeated values.

A /few/ environment variables are a good idea, though:

:: TMPDIR — when don't you need temporary files?

:: APP_CONFIG_PATH — the path to a YAML file containing the real configuration.

I'm not psyched about pointing to a file, though I guess it could work -- it's another kind of peculiar drop-the-config-somewhere-and-wait-for-someone-to-pick-it-up. At least dropping it directly in os.environ is easy to use directly (many things allow os.environ interpolation already) and doesn't require any temporary files. Maybe there's a middle ground.

The configuration file would even include a dict-based logging configuration routing all messages to the parent app server for final delivery, removing the need for per-app logging files, etc.

2. Define some basic generic metadata. "app_name" being the most obvious one.

The standard Python setup metadata is pretty good:

:: Application title.

:: Short description.
:: Long description / documentation.
:: Author information.
:: License.
:: Source information (URL, download URL).

Sure.

:: Application (package) name.

This doesn't seem meaningful to me -- there's no need for a one-to-one mapping between these applications and a particular package. Unless you mean some attempt at a unique name that can be used for indexing?

:: Dependencies.

Will require some more discussion, but something like this, sure.

:: Entry point-style hooks. (Post-install, pre/post upgrade, pre-removal, etc.)

Yes; I just made each entry point a top-level setting, instead of embedding them into another setting.

Likely others.

3. Define how to get the WSGI app. This is WSGI specific, but (1) is *not* WSGI specific (it's only Python specific, and would apply well to other platforms)

I could imagine there would be multiple "application types":

:: WSGI application. Define a package dot-notation entry point to a WSGI application factory.

:: Networked daemon. This would allow deployment of Twisted services, for example. Define a package dot-notation entry point to the 'main' callable.

It would also need a way to specify things like what port to run on, public or private interface, maybe indicate if something like what proxying is valid (if any), maybe process management parameters, ways to inspect the process itself (since *maybe* you can't send internal HTTP requests into it), etc.

Again, there are likely others, but those are the big two. In both of these cases the configuration (loaded automatically) could be passed as a dict to the callable.

PHP! ;) Anyway, personally I'd like to keep in mind the idea of entirely different platforms, but that's something I'm willing to just personally keep in the back of my head and leave out of the discussion. My experience supporting PHP is that it was easier than I expected. Obviously all tools need not support all platforms.

4. Define some *web specific* metadata, like static files to serve. This isn't necessarily WSGI or even Python specific (not that we should bend backwards to be agnostic -- but in practice I think we'd have to bend backwards to make it Python-specific).

Explicitly defining the paths to static files is not just a good idea, it's The Slaw™.

I'm not personally that happy with how App Engine does it, as an example -- it requires a regex-based dispatch.

5. Define some lifecycle metadata, like update_fetch. These are generally commands to invoke. IMHO these can be ad hoc, but exist in the scope of (1) and a full "environment". So it's not radically different than anything else the app does, it's just we declare specific times these actions happen.

Script name, dot-notation callable, or URL. I see those as the 'big three' to support. Using a dot-notation callable has the same benefit as my comments to #3.

The URL would be relative to wherever the application is mounted within a domain, of course.

6. Define services (or "resources" or whatever -- the name "resource" doesn't make as much sense to me, but that's bike shedding). These are things the app can't provide for itself, but requires (or perhaps only wants; e.g., an app might be able to use SQLite, but could also use PostgreSQL). While the list of services will increase over time, without a basic list most apps can't run at all. We also need a core set as a kind of reference implementation of what a fully-specified service *is*.

I touched on this up above; any DBAPI compliant database or various configuration strings. (I'd implement this as a string-like object with accessor properties so you can pass it to SQLAlchemy straight, or dissect it to do something custom.)

Anything "string-like" or otherwise fancy requires more support libraries for the application to actually be able to make use of the environment. Maybe necessary, but it should be done with great reluctance IMHO.

Ian

Alice Bevan–McGregor

unread,

Apr 14, 2011, 1:56:05 PM4/14/11

to web...@python.org

On 2011-04-14 08:53:55 -0700, Randy Syring said:

> Just wondering if Windows/IIS is being kept in mind as this discussion
> is going on. I am having a hard time conceptualizing the things being
> discussed, so can't really tell myself.

I'm trying pretty hard to ensure that non-compatible OS features don't
make it in here. Things like symlinks, chroots, etc.

— Alice.

Daniel Holth

unread,

Apr 14, 2011, 5:38:04 PM4/14/11

to python-...@googlegroups.com

Abusing Python's user-site feature:

export PYTHONUSERBASE=/tmp/myapp
mkdir $PYTHONUSERBASE
~/opt/python2.6/bin/pip install --user -r requirements.txt

for line in file(os.path.join(site.USER_SITE, easy-install.pth)):
    if line.startswith('import'): continue
    revised = os.relpath(line, site.USER_SITE)
    # output revised easy-install.pth

Now $PYTHONUSERBASE/lib/python2.6 contains site-packages and nothing else. With relative paths in easy-install.pth (why aren't they always relative?) there is a fighting chance of copying this directory elsewhere and running it.

pip/setuptools/distutils ought to honor an environment variable equivalent to typing '--user'.

Simply say 'the default runner is a Python script called app.wsgi at the root of this tree, if not specified in app.ini, even if there is no app.ini'.

Éric Araujo

unread,

Apr 15, 2011, 11:15:00 AM4/15/11

to web...@python.org

>>> How do you build a release and upload it to PyPi? Upload docs to
>>> packages.python.org? setup.py commands. It's a convienent hook
>>> with
>>> access to metadata in a convienent way that would make an excellent
>>> "let's make a release!" type of command.
>> setup.py should go away. The distutils2 talk from pycon 2011
>> explains.
>> http://blip.tv/file/4880990
> That's kind of a red herring -- even if setup.py goes away it would
> be
> replaced with something (pysetup I think?) which is conceptually
> equivalent.

Correct. pysetup will replace python setup.py, and using extra
commands
(site-specific or project-specific) will even be easier than with
distutils.

Regards

Éric Araujo

unread,

Apr 15, 2011, 1:32:45 PM4/15/11

to web...@python.org

Hi,

As an aside, I wonder why people use dot+colon notation instead of just
dots to reference callables. In distutils2 for example we resolve
dotted names to find command classes, command hooks and compilers. So
what’s the benefit, marginally easier parsing?

Regards

Jim Fulton

unread,

Apr 15, 2011, 2:02:17 PM4/15/11

to Éric Araujo, web...@python.org

On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo <mer...@netwok.org> wrote:
> As an aside, I wonder why people use dot+colon notation instead of just
> dots to reference callables. In distutils2 for example we resolve
> dotted names to find command classes, command hooks and compilers. So
> what’s the benefit, marginally easier parsing?

An opportunity of using a colon is that it allows::

dotted.module.name:expression

where expression may be more than just a name::

foo.bar:Bar()

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton

Alice Bevan–McGregor

unread,

Apr 15, 2011, 2:22:11 PM4/15/11

to web...@python.org

On 2011-04-15 11:02:17 -0700, Jim Fulton said:

> On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo
> <mer...@netwok.org> wrote:
>> As an aside, I wonder why people use dot+colon notation instead of just
>> dots to reference callables. In distutils2 for example we resolve
>> dotted names to find command classes, command hooks and compilers. So
>> what’s the benefit, marginally easier parsing?
>
> An opportunity of using a colon is that it allows::
>
> dotted.module.name:expression
>
> where expression may be more than just a name::
>
> foo.bar:Bar()

Or foo.bar:Baz.factory.

I wouldn't go so far as to eval() what's after the colon. The real
difference is this:

[foo.bar]:[Baz.factory]
| ^- Attribute lookup.
^- Module lookup.

You can't do this:

import foo.bar.Baz.factory

Thus the difference. However, the syntax is actually more flexible than that:

[foo.bar]/[subfolder/file]
| ^- Sub-path.
^- Module.

/[foo/bar]
^- Just path.

— Alice.

Alice Bevan–McGregor

unread,

Apr 15, 2011, 3:05:32 PM4/15/11

to web...@python.org

On 2011-04-14 10:34:59 -0700, Ian Bicking said:

> I think there's a general concept we should have, which I'll call a
> "script" -- but basically it's a script to run (__main__-style), a
> callable to call (module:name), or a URL to fetch internally.

Agreed. The reference notation I mentioned in my reply to Graham, with
the addition of URI syntax, covers all of those options.

> I want to keep this distinct from anything long-running, which is a
> much more complex deal.

The primary application is only potentially long-running. (You could,
in theory, deploy an app as CGI, but that way lies madness.) However,
the reference syntax mentioned (excepting URL) works well for
identifying this.

> I think given the three options, and for general simplicity, the script
> can be successful or have an error (for Python code: exception or no;
> for __main__: zero exit code or no; for a URL: 2xx code or no), and can
> return some text (which may only be informational, not structured?)

For the simple cases (script / callable), it's pretty easy to trap
STDOUT and STDERR, deliver INFO log messages to STDOUT, everything else
to STDERR, then display that to the administrator in some form. Same
for HTTP, except that it can include full HTML formatting information.

> An application configuration could refer to scripts under different
> names, to be invoked at different stages.

A la the already mentioned post-install, pre-upgrade, post-upgrade,
pre-removal, and cron-like. Any others?

> There could be an optional self-test script, where the application
> could do a last self-check -- import whatever it wanted, check db
> settings, etc. Of course we'd want to know what it needed *before* the
> self-check to try to provide it, but double-checking is of course good
> too.

Unit and functional tests are the most obvious. In which case we'll
need to be able to provide a localhost-only 'mounted' location for the
application even though it hasn't been installed yet.

> One advantage to a separate script instead of just one
> script-on-install is that you can more easily indicate *why* the
> installation failed. For instance, script-on-install might fail
> because it can't create the database tables it needs, which is a
> different kind of error than a library not being installed, or being
> fundamentally incompatible with the container it is in. In some sense
> maybe that's because we aren't proposing a rich error system -- but
> realistically a lot of these errors will be TypeError, ImportError,
> etc., and trying to normalize those errors to some richer meaning is
> unlikely to be done effectively (especially since error cases are hard
> to test, since they are the things you weren't expecting).

Humans are potentially better at reading tracebacks than machines are,
so my previous logging idea (script output stored and displayed to the
administrator in a readable form) combined with a modicum of reasonable
exception handling within the script should lead to fairly clear errors.

> Categorizing services seems unnecessary.

The description of the different database options were for
illustration, not actual separation and categorization.

> I'd like to see maybe an | operator, and a distinction between required
> and optional services. E.g.:

No need for some new operator, YAML already supports lists.

services:
- [mysql, postgresql, dburl]

Or:

services:
required:
- files

optional:
- [mysql, postgresql]

> And then there's a lot more you could do... which one do you prefer,
> for instance.

The order of services within one of these lists would indicate
preference, thus MySQL is preferred over PostgreSQL in the second
example, above.

> Tricky things:
> - You need something funny like multiple databases. This is very
> service-specific anyway, and there might sometimes need to be a way to
> configure the service. It's also a fairly obscure need.

I'm not convinced that connecting to a legacy database /and/ current
database is that obscure. It's also not as hard as Django makes it
look (with a 1M SLoC change to add support)… WebCore added support in
three lines.

> - You need multiple applications to share data. This is hard, not sure
> how to handle it. Maybe punt for now.

That's what higher-level APIs are for. ;)

> You mean, the application provides its own HTTP server? I certainly
> wouldn't expect that...?

Nor would I; running an HTTP server would be daft. Running mod_wsgi,
FastCGI on-disk sockets, or other persistent connector makes far more
sense, and is what I plan.

Unless you have a very, very specific need (i.e. Tornado), running a
Python HTTP server in production then HTTP proxying to it is
inefficient and a terrible idea. (Easy deployment model, terrible
overhead/performance.)

> Anyway, in terms of aggregate, I mean something like a "site" that is
> made up of many "applications", and maybe those applications are
> interdependent in some fashion. That adds lots of complications, and
> though there's lots of use cases for that I think it's easier to think
> in terms apps as simpler building blocks for now.

That's not complicated at all; I do those types of aggregate sites
fairly regularly. E.g.

/ - CMS
/location - Location & image database.
/resource - Business database.
/admin - Flex administration interface.

That's done at the Nginx/Apache level, where it's most efficient to do
so, not in Python.

> Sure; these would be tool options, and if you set everything up you are
> requiring the deployer to invoke the tools correctly to get everything
> in place. Which is a fine starting point before formalizing anything.

What? Not even close—the person deploying an application is relying on
the application server/service to configure the web server of choice;
there is no need for deployer action after the initial "Nginx, include
all .conf files from folder X" where folder X is managed by the app
server. (That's one line in /etc/nginx/nginx.conf.)

> Hm... I guess this is an ordering question. You could import logging
> and setup defaults, but that doesn't give the container a chance to
> overwrite those defaults. You could have the container setup logging,
> then make sure the app sets defaults only when the container hasn't --
> but I'm not sure if it's easy to use the logging module that way.

The logging configuration, in dict form, is passed from the app server
to the container. The default logging levels are read by the app
server from the container. It's trivially easy, esp. when INI and YAML
files can be programatically created.

> Well, maybe that's not hard -- if you have something like
> silvercustomize.py that is always imported, and imported fairly early
> on, then have the container overwrite logging settings before it *does*
> anything (e.g., sends a request) then you should be okay?

Indeed; container-setup.py or whatever.

> Rich configurations are problematic in their own ways. While the
> str-key/str-value of os.environ is somewhat limited, I wouldn't want
> anything richer than JSON (list, dict, str, numbers, bools).

JSON is a subset of YAML. I honestly believe YAML meets the
requirements for richness, simplicity, flexibility, and portability
that a configuration format really needs.

> And then we have to figure out a place to drop the configuration.
> Because we are configuring the *process*, not a particular application
> or request handler, a callable isn't great (unless we expect the
> callable to drop the config somewhere and other things to pick it up?)

I've already mentioned an environment variable identifying the path to
the on-disk configuration file—APP_CONFIG_PATH—which would then be read
in and acted upon by the container-setup.py file which is initially
imported before the rest of the application. Also, the application
factory idea of passing the already read-in configuration dictionary is
quite handy, here.

> I found at least giving one valid hostname (and yes, should include a
> path) was important for many applications. E.g., a bunch of apps have
> tendencies to put hostnames in the database.

Luckily, that's a bad habit we can discourage. ;)

> I'm not psyched about pointing to a file, though I guess it could work
> -- it's another kind of peculiar
> drop-the-config-somewhere-and-wait-for-someone-to-pick-it-up. At least
> dropping it directly in os.environ is easy to use directly (many things
> allow os.environ interpolation already) and doesn't require any
> temporary files. Maybe there's a middle ground.

Picked up by the container-setup.py site-customize script. What's the
limit on the size of a variable in the environ? (Also, that memory
gets permanently allocated for the life of the application; not very
efficient if we're just going to convert it to a rich internal
structure.)

> :: Application (package) name.
>
> This doesn't seem meaningful to me -- there's no need for a one-to-one
> mapping between these applications and a particular package. Unless
> you mean some attempt at a unique name that can be used for indexing?

You're mixing something up, here. Each application is a single primary
package with dependencies. One container per application.

> It would also need a way to specify things like what port to run on

Automatically allocated by the app server.

> public or private interface

Chosen by the deployer during deployment time configuration.

> maybe indicate if something like what proxying is valid (if any)

If it's WSGI, it's irrelevant. If it's a network service, it shouldn't
be HTTP.

> maybe process management parameters

For WSGI apps, it's transparent. Each app server would have its own
preference (e.g. mine will prefer FastCGI on-disk sockets) and the
application will be blissfully unaware of that.

> ways to inspect the process itself (since *maybe* you can't send
> internal HTTP requests into it), etc.

Interesting idea, not sure how that would be implemented or used, though.

> PHP! ;)

PHP can be deployed as a WSGI application. :P

> I'm not personally that happy with how App Engine does it, as an
> example -- it requires a regex-based dispatch.

Regex dispatch is terrible. (I've actually encountered Python's 56KiB
regular expression size limit on one project!) Simply exporting
folders as "top level" webroots would be sufficient, methinks.

> Anything "string-like" or otherwise fancy requires more support
> libraries for the application to actually be able to make use of the
> environment. Maybe necessary, but it should be done with great
> reluctance IMHO.

I've had great success with string-likes in WebCore/Marrow and
TurboMail for things like e-mail address lists, e-mail addresses, and
URLs.

P.J. Eby

unread,

Apr 15, 2011, 3:55:53 PM4/15/11

to Jim Fulton, Eric Araujo, web...@python.org

At 02:02 PM 4/15/2011 -0400, Jim Fulton wrote:
>On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo <mer...@netwok.org> wrote:
> > As an aside, I wonder why people use dot+colon notation instead of just
> > dots to reference callables. In distutils2 for example we resolve
> > dotted names to find command classes, command hooks and compilers. So
> > what's the benefit, marginally easier parsing?
>
>An opportunity of using a colon is that it allows::
>
> dotted.module.name:expression
>
>where expression may be more than just a name::
>
>
> foo.bar:Bar()

The reason setuptools uses ':' is that it allows you to unambiguously
reference object attributes, e.g.:

some.module:SomeClass.some_method_or_attribute

(It doesn't allow expressions, just dotted "paths".)

Fred Drake

unread,

Apr 15, 2011, 4:11:03 PM4/15/11

to Éric Araujo, web...@python.org

On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo <mer...@netwok.org> wrote:

> As an aside, I wonder why people use dot+colon notation instead of just
> dots to reference callables. In distutils2 for example we resolve
> dotted names to find command classes, command hooks and compilers.

I advocated using the just-dotted notation. These references are found in
configurations, usually constructed users of the components rather than
implementors of the components. This is different than for entry points,
where the entry point specification uses module:object, but is provided by
the package maintainer.

These end users don't really care if the object identified is a class or
function in module, a nested attribute on a class, or anything else, so
long as it does what it's advertised to do. By not pushing implementation
details into the identifier, the package maintainer is free to change the
implementation in more ways, without creating backward incompatibility.

Jim's note about having an expression after the colon is interesting;
not sure if that's a helpful case for packaging's use or not.

-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
"Give me the luxuries of life and I will willingly do without the necessities."
--Frank Lloyd Wright

exa...@twistedmatrix.com

unread,

Apr 15, 2011, 6:23:52 PM4/15/11

to web...@python.org

On 06:22 pm, al...@gothcandy.com wrote:
>On 2011-04-15 11:02:17 -0700, Jim Fulton said:
>>On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo <mer...@netwok.org>
>>wrote:
>>>As an aside, I wonder why people use dot+colon notation instead of
>>>just dots to reference callables. In distutils2 for example we
>>>resolve dotted names to find command classes, command hooks and

>>>compilers. So what 19s the benefit, marginally easier parsing?

>>
>>An opportunity of using a colon is that it allows::
>>
>> dotted.module.name:expression
>>
>>where expression may be more than just a name::
>>
>> foo.bar:Bar()
>
>Or foo.bar:Baz.factory.
>
>I wouldn't go so far as to eval() what's after the colon. The real
>difference is this:
>
>[foo.bar]:[Baz.factory]
>| ^- Attribute lookup.
>^- Module lookup.
>
>You can't do this:
>
>import foo.bar.Baz.factory

But you can certainly imagine a function `foo` which accepts
"foo.bar.Baz.factory" and returns the appropriate object. The ":"
doesn't really buy you anything.

Jean-Paul

P.J. Eby

unread,

Apr 15, 2011, 6:33:08 PM4/15/11

to web...@python.org

At 04:11 PM 4/15/2011 -0400, Fred Drake wrote:
>These end users don't really care if the object identified is a class or
>function in module, a nested attribute on a class, or anything else, so
>long as it does what it's advertised to do. By not pushing implementation
>details into the identifier, the package maintainer is free to change the
>implementation in more ways, without creating backward incompatibility.

That would be one advantage of using entry points
instead. ;-) (i.e., the user doesn't specify the object location,
the package author does.)

Note, however, that one must perform considerably more work to
resolve a name, when you don't know whether each part of the name is
a module or an attribute.

Either you have to get an AttributeError first, and then fall back to
importing, or get an ImportError first, and fall back to getattr.

If the syntax is explicit, OTOH, then you don't have to guess,
thereby saving lots of work and wasteful exceptions.

Fred Drake

unread,

Apr 15, 2011, 7:02:14 PM4/15/11

to P.J. Eby, web...@python.org

On Fri, Apr 15, 2011 at 6:06 PM, P.J. Eby <p...@telecommunity.com> wrote:
> That would be one advantage of using entry points instead. ;-) (i.e., the
> user doesn't specify the object location, the package author does.)

Definitely! I'm certainly all in favor of having something very akin to entry
points, but I'm not sure where that stands in the current plans. I'm not so
worried about the efficiency, but making it explicit the way entry points do
is a clear win.

And more extensible to additional resolution methods in the future.

Ian Bicking

unread,

Apr 16, 2011, 12:20:05 AM4/16/11

to Alice Bevan–McGregor, web...@python.org

On Fri, Apr 15, 2011 at 2:05 PM, Alice Bevan–McGregor <al...@gothcandy.com> wrote:

I want to keep this distinct from anything long-running, which is a much more complex deal.

The primary application is only potentially long-running. (You could, in theory, deploy an app as CGI, but that way lies madness.) However, the reference syntax mentioned (excepting URL) works well for identifying this.

Right -- just one long running things (but no promises how long).

I think given the three options, and for general simplicity, the script can be successful or have an error (for Python code: exception or no; for __main__: zero exit code or no; for a URL: 2xx code or no), and can return some text (which may only be informational, not structured?)

For the simple cases (script / callable), it's pretty easy to trap STDOUT and STDERR, deliver INFO log messages to STDOUT, everything else to STDERR, then display that to the administrator in some form. Same for HTTP, except that it can include full HTML formatting information.

For Silver Lining I set "Accept: text/plain", to at least suggest that plain text was preferred, since typically HTML isn't easily displayed. But of course a tool could change that, probably usefully? But that only applies to HTTP. Anyway, seems easy enough.

An application configuration could refer to scripts under different names, to be invoked at different stages.

A la the already mentioned post-install, pre-upgrade, post-upgrade, pre-removal, and cron-like. Any others?

test-environment, test-alive, test-functional are all possible
test-alive could be used by, e.g., Nagios to monitor (it might actually have structured output?)

There could be an optional self-test script, where the application could do a last self-check -- import whatever it wanted, check db settings, etc. Of course we'd want to know what it needed *before* the self-check to try to provide it, but double-checking is of course good too.

Unit and functional tests are the most obvious. In which case we'll need to be able to provide a localhost-only 'mounted' location for the application even though it hasn't been installed yet.

For local function HTTP tests you might want that, but if you are doing non-HTTP functional tests (e.g., just WGSI) or unit tests then the environment should always be sufficient without actually serving anything up. You'd probably want a "test" set of local services (as opposed to a "development" set of services). I think this will all be another kind of tooling around development.

One advantage to a separate script instead of just one script-on-install is that you can more easily indicate *why* the installation failed. For instance, script-on-install might fail because it can't create the database tables it needs, which is a different kind of error than a library not being installed, or being fundamentally incompatible with the container it is in. In some sense maybe that's because we aren't proposing a rich error system -- but realistically a lot of these errors will be TypeError, ImportError, etc., and trying to normalize those errors to some richer meaning is unlikely to be done effectively (especially since error cases are hard to test, since they are the things you weren't expecting).

Humans are potentially better at reading tracebacks than machines are, so my previous logging idea (script output stored and displayed to the administrator in a readable form) combined with a modicum of reasonable exception handling within the script should lead to fairly clear errors.

Deployers aren't very good at reading developer tracebacks, so it is kind of nice if you at least have a sense of the stage. One advantage to multiple testing stages is that you might roll back before, e.g., having to deal with database migrations. But easy enough to skip for now.

I'd like to see maybe an | operator, and a distinction between required and optional services. E.g.:

No need for some new operator, YAML already supports lists.

services:
- [mysql, postgresql, dburl]

Or:

services:
required:
- files

optional:
- [mysql, postgresql]

And then there's a lot more you could do... which one do you prefer, for instance.

The order of services within one of these lists would indicate preference, thus MySQL is preferred over PostgreSQL in the second example, above.

Sure

Tricky things:
- You need something funny like multiple databases. This is very service-specific anyway, and there might sometimes need to be a way to configure the service. It's also a fairly obscure need.

I'm not convinced that connecting to a legacy database /and/ current database is that obscure. It's also not as hard as Django makes it look (with a 1M SLoC change to add support)… WebCore added support in three lines.

Well, then you are getting into specific configurations fitting into legacy environments, not containers and encapsulated applications. There's nothing that actually *stops* you from trying to connect to any database you want, so ad hoc configuration can handle these requirements. If you have a legacy database you also aren't looking for the container to allocate you a database.

- You need multiple applications to share data. This is hard, not sure how to handle it. Maybe punt for now.

That's what higher-level APIs are for. ;)

You mean, the application provides its own HTTP server? I certainly wouldn't expect that...?

Nor would I; running an HTTP server would be daft. Running mod_wsgi, FastCGI on-disk sockets, or other persistent connector makes far more sense, and is what I plan.

Unless you have a very, very specific need (i.e. Tornado), running a Python HTTP server in production then HTTP proxying to it is inefficient and a terrible idea. (Easy deployment model, terrible overhead/performance.)

Anyway, in terms of aggregate, I mean something like a "site" that is made up of many "applications", and maybe those applications are interdependent in some fashion. That adds lots of complications, and though there's lots of use cases for that I think it's easier to think in terms apps as simpler building blocks for now.

That's not complicated at all; I do those types of aggregate sites fairly regularly. E.g.

/ - CMS
/location - Location & image database.
/resource - Business database.
/admin - Flex administration interface.

That's done at the Nginx/Apache level, where it's most efficient to do so, not in Python.

Yes, and you could use your deployment tool to manage that. E.g., with Silver Lining you might do:

silver update http://mysite.com/ apps/cms
silver update http://mysite.com/location apps/location-image
silver update http://mysite.com/resource apps/business
silver update http://mysite.com/admin apps/flex-admin

And that's fine, but if you wanted to have them automatically know about each other and perhaps interact then you need something more.

Sure; these would be tool options, and if you set everything up you are requiring the deployer to invoke the tools correctly to get everything in place. Which is a fine starting point before formalizing anything.

What? Not even close—the person deploying an application is relying on the application server/service to configure the web server of choice; there is no need for deployer action after the initial "Nginx, include all .conf files from folder X" where folder X is managed by the app server. (That's one line in /etc/nginx/nginx.conf.)

I'm not thinking about conf files. I'm thinking about something a login app that you mount at /login that sets a signed cookie, and your main app needs to know the proper place to redirect to, and both apps need a shared secret for the signed cookie. The two of them together, with a formalized connection, is what I'm thinking of as an "aggregate" app.

Hm... I guess this is an ordering question. You could import logging and setup defaults, but that doesn't give the container a chance to overwrite those defaults. You could have the container setup logging, then make sure the app sets defaults only when the container hasn't -- but I'm not sure if it's easy to use the logging module that way.

The logging configuration, in dict form, is passed from the app server to the container. The default logging levels are read by the app server from the container. It's trivially easy, esp. when INI and YAML files can be programatically created.

How does the app sever pass it to the container? Just point to the dict or INI/YAML config in the app config?

Well, maybe that's not hard -- if you have something like silvercustomize.py that is always imported, and imported fairly early on, then have the container overwrite logging settings before it *does* anything (e.g., sends a request) then you should be okay?

Indeed; container-setup.py or whatever.

That would be a different model than I think you propose above -- the app automatically sets up defaults always, and the container has a chance to override those.

Rich configurations are problematic in their own ways. While the str-key/str-value of os.environ is somewhat limited, I wouldn't want anything richer than JSON (list, dict, str, numbers, bools).

JSON is a subset of YAML. I honestly believe YAML meets the requirements for richness, simplicity, flexibility, and portability that a configuration format really needs.

And then we have to figure out a place to drop the configuration. Because we are configuring the *process*, not a particular application or request handler, a callable isn't great (unless we expect the callable to drop the config somewhere and other things to pick it up?)

I've already mentioned an environment variable identifying the path to the on-disk configuration file—APP_CONFIG_PATH—which would then be read in and acted upon by the container-setup.py file which is initially imported before the rest of the application. Also, the application factory idea of passing the already read-in configuration dictionary is quite handy, here.

I'm still unhappy with the indirection, and with serializing configuration to a YAML file. I think the container is always going to have to run its own Python code to setup the environment, at which point having that Python code write a YAML file and then set an environmental variable to say where it wrote it and then in the same process have the app load that YAML file... I dunno.

I found at least giving one valid hostname (and yes, should include a path) was important for many applications. E.g., a bunch of apps have tendencies to put hostnames in the database.

Luckily, that's a bad habit we can discourage. ;)

I would disagree with this on principle -- this format should support applications as they are written now. And in my experience most Django apps need a hostname on process startup.

Anyway, it's not that big a deal -- with WSGI you only get a hostname on the first request, but with a container I can't think of a situation where it wouldn't know at least one reasonable and valid hostname at process startup time.

I'm not psyched about pointing to a file, though I guess it could work -- it's another kind of peculiar drop-the-config-somewhere-and-wait-for-someone-to-pick-it-up. At least dropping it directly in os.environ is easy to use directly (many things allow os.environ interpolation already) and doesn't require any temporary files. Maybe there's a middle ground.

Picked up by the container-setup.py site-customize script. What's the limit on the size of a variable in the environ? (Also, that memory gets permanently allocated for the life of the application; not very efficient if we're just going to convert it to a rich internal structure.)

Realistically I always ended up setting os.environ['FOO'] = value, and then something else did os.environ['FOO']. An in-memory structure would be pretty much equivalent.

:: Application (package) name.

This doesn't seem meaningful to me -- there's no need for a one-to-one mapping between these applications and a particular package. Unless you mean some attempt at a unique name that can be used for indexing?

You're mixing something up, here. Each application is a single primary package with dependencies. One container per application.

Well, here we're entering into the dependency disagreement, and maybe something more. To me an application is a way to setup the Python environment, and a pointer to the WSGI application object. Packages don't enter into it at all.

It would also need a way to specify things like what port to run on

Automatically allocated by the app server.

Yes, that's what I was thinking.

public or private interface

Chosen by the deployer during deployment time configuration.

Generally it seems like a daemon might either desire or need an internal or private interface. Celery needs a private interface, Tornado apps probably a public interface.

maybe indicate if something like what proxying is valid (if any)

If it's WSGI, it's irrelevant. If it's a network service, it shouldn't be HTTP.

Again, Tornado or Twisted, which are typically used for things you don't want to proxy (though e.g., Nginx proxying might be okay when Apache proxying isn't).

maybe process management parameters

For WSGI apps, it's transparent. Each app server would have its own preference (e.g. mine will prefer FastCGI on-disk sockets) and the application will be blissfully unaware of that.

Yes, but for a daemon not. A WSGI app should be able to require threaded (single-process) or multiprocess (no threads), though most would work with either.

ways to inspect the process itself (since *maybe* you can't send internal HTTP requests into it), etc.

Interesting idea, not sure how that would be implemented or used, though.

Monitoring.

PHP! ;)

PHP can be deployed as a WSGI application. :P

I'm not personally that happy with how App Engine does it, as an example -- it requires a regex-based dispatch.

Regex dispatch is terrible. (I've actually encountered Python's 56KiB regular expression size limit on one project!) Simply exporting folders as "top level" webroots would be sufficient, methinks.

Having / be a static file is also nice (good for Javascript/RPC-backend apps too), but doesn't work well with webroots.

Silver Lining's writable-root service is fairly closely integrated with static files. The... name is weird now that I look at it. But anyway, it's a space where the app can write static files, and those static files get preference. I found it to be a nice feature to have available, but pretty closely coupled with everything else.

Anything "string-like" or otherwise fancy requires more support libraries for the application to actually be able to make use of the environment. Maybe necessary, but it should be done with great reluctance IMHO.

I've had great success with string-likes in WebCore/Marrow and TurboMail for things like e-mail address lists, e-mail addresses, and URLs.

It's not that I don't think they could be useful or convenient *now*, but how they develop over time -- this container format should aspire to be stable and conservative fairly quickly, which means keeping things simple and relying on applications to use support libraries if they want something more convenient (not unlike WSGI). Applications can then use and upgrade these support libraries at their own convenience.

Services are in particular an issue, as each container will have to reimplement much of the service code for its own. Though maybe with good abstract base classes it wouldn't be too hard.

Ian

Daniel Holth

unread,

Apr 18, 2011, 5:11:21 PM4/18/11

to python-...@googlegroups.com, Alice Bevan–McGregor, web...@python.org

The file format discussion seems utterly pointless. Roberto de Ioris's uWSGI seems to make do with every file format. Would it be more useful to talk about what the deserialized configuration looks like in Python?

If you want the format to specify cron jobs and services and non-wsgi servers, why not go the whole way and use the Linux filesystem hierarchy standard. The entry point is an executable called `init`, configuration goes in /etc/, cron jobs go in /etc/cron.d etc. This should be flexible enough.

I hope most applications won't need to look at the contents of app.yaml (the application container config) at all. It certainly would not be generally useful for an application to inspect GAE's app.yaml. Whether the application container mucks around in the application's config is another messy issue, apart from the necessary 'mechanism to connect to deployment database' or other resources that are unique to the production environment.

Paste Deploy configures logging by passing the .ini to logging before invoking the app's entry point. This is the application container configuring the logging. For example a cool application container feature would be to have a little web application that manipulated logging configuration in a database, or reconfigured logging between requests without restarting the application.

One way to pass 'services' information would be to specify a support package with abstract base classes and have a procedure for proposing new standard services to the web-sig. The container would have to populate a registry of named implementations of those services it is able to support:

class support.databases.PostgreSQL(support.databases.SQLAlchemy): sqlalchemy_connection_string

support.get_default(support.databases.PostgreSQL)
support.get_named(support.databases.PostgreSQL, 'secondary')
support.get_all(support.databases.PostreSQL) -> [(PostgreSQL(), 'default'), (PostgreSQL(), 'secondary')]

silverlining would have to specify and register class PostgreSQL: @property def sqlalchemy_connection_string: return os.environ['...']

I would really like to see a basic specification with no support for services or 'spending an hour running apt-get to reconfigure the server before eventually getting around to running the application', and a procedure for extending the format. The goal would be only to avoid running 'pip install -r' during deployment and pointing the WSGI runner at a directory instead of a specific script. In that case sandboxing or server/hardware abstraction concerns would be version 2.

Alice Bevan–McGregor

unread,

Apr 18, 2011, 7:09:33 PM4/18/11

to web...@python.org

On 2011-04-18 14:11:21 -0700, Daniel Holth said:

> The file format discussion seems utterly pointless.

That's a pity.

> If you want the format to specify cron jobs and services and non-wsgi
> servers, why not go the whole way and use the Linux filesystem
> hierarchy standard. The entry point is an executable called `init`,
> configuration goes in /etc/, cron jobs go in /etc/cron.d etc. This
> should be flexible enough.

Because that would be… less than good. Let me illustrate:

a) The LFS is intended for complete operating system installations.

b) You sure as hell wouldn't want the init process to be Python.

c) Operating-system specific features are a no-go for portability.

d) We don't want developers to have to suddenly become sysadmins, too.

e) /etc is terrible for configuration organization.

There are other, lower-level reasons not to do that.

One big point is that the application server / container writes a
single configuration file which is then read in by the application.
One file, not a tree of them.

> I hope most applications won't need to look at the contents of app.yaml
> (the application container config) at all.

No-one has said that an application /would/ have to look at the
application metadata, or that after installation the file was anywhere
app accessible, even.

> Paste Deploy configures logging by passing the .ini to logging before
> invoking the app's entry point. This is the application container
> configuring the logging.

I've already defined that. RTFM or many ML messages about logging.

> For example a cool application container feature would be to have a
> little web application that manipulated logging configuration in a
> database, or reconfigured logging between requests without restarting
> the application.

The former is already defined. That's what the application server
does, database or no. The latter is broadly unnecessary, but easily
implementable within the application you are deploying.

> One way to pass 'services' information would be to specify a support
> package with abstract base classes and have a procedure for proposing
> new standard services to the web-sig. The container would have to
> populate a registry of named implementations of those services it is
> able to support:

That seems… excessive and ugly. You would also have code mixing
between the application server level and application level which will
encourage nothing but madness. Simple, named services with optional
configurations are more than enough.

> I would really like to see a basic specification with no support for
> services or 'spending an hour running apt-get to reconfigure the server
> before eventually getting around to running the application', and a
> procedure for extending the format.

apt-get has already been thrown out, and was, in fact, never part of
the quick summary I made, either.

Daniel Holth

unread,

Apr 18, 2011, 7:36:28 PM4/18/11

to python-...@googlegroups.com, web...@python.org

> If you want the format to specify cron jobs and services and non-wsgi

> servers, why not go the whole way and use the Linux filesystem
> hierarchy standard. The entry point is an executable called `init`,
> configuration goes in /etc/, cron jobs go in /etc/cron.d etc. This
> should be flexible enough.
Because that would be… less than good. Let me illustrate:

Clearly, in the original e-mail I missed an opportunity to use a Unicode character from the new Emoji block ;-)

I've already defined that. RTFM or many ML messages about logging.

Please remain friendly and patient.

Eric Larson

unread,

Apr 18, 2011, 7:46:12 PM4/18/11

to Alice Bevan–McGregor, web...@python.org

On Apr 18, 2011, at 6:09 PM, Alice Bevan–McGregor wrote:

> On 2011-04-18 14:11:21 -0700, Daniel Holth said:
>
>> If you want the format to specify cron jobs and services and non-wsgi servers, why not go the whole way and use the Linux filesystem hierarchy standard. The entry point is an executable called `init`, configuration goes in /etc/, cron jobs go in /etc/cron.d etc. This should be flexible enough.
>
> Because that would be… less than good. Let me illustrate:
>
> a) The LFS is intended for complete operating system installations.
>
> b) You sure as hell wouldn't want the init process to be Python.
>
> c) Operating-system specific features are a no-go for portability.
>
> d) We don't want developers to have to suddenly become sysadmins, too.
>
> e) /etc is terrible for configuration organization.
>
> There are other, lower-level reasons not to do that.
>
> One big point is that the application server / container writes a single configuration file which is then read in by the application. One file, not a tree of them.

So, I'm going to throw this out there. Instead of assuming "/etc" always means "the root of the filesystem" we should consider it the "root of the sandbox" where the system providing the "sandbox" defines what that is. It is _a_ filesystem in that there is a place that an application will be run. For argument's sake, we'll say it is a directory on some server. Now, within that directory we choose to take some known bits from the LFS standard such as /etc, /bin, /var, etc for the placement of our application.

With that in mind, I think using things like LFS makes a ton of sense. We can piggy back or copy (since previous discussions for .debs or rpms seem not to sit well... even though they would fit this model very well...) systems like RPM rather directly and hopefully allow our Python web apps to play very nicely with applications in other languages.

Please do not get hung up on the fact that I've said RPMs here. The fact is distros have been doing package management for quite a long while. It is insanely convenient to say apt-get install couchdb and when it is done, having a couchdb server running. Copying the model seems like a good option in that we get to learn from the mistakes of others while inheriting a wild variety of tools and concepts.

Eric

> — Alice.
>
>
> _______________________________________________
> Web-SIG mailing list
> Web...@python.org
> Web SIG: http://www.python.org/sigs/web-sig

> Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

Alice Bevan–McGregor

unread,

Apr 18, 2011, 9:01:26 PM4/18/11

to web...@python.org

On 2011-04-18 16:36:28 -0700, Daniel Holth said:
> On Apr 18, 2011, at 6:09 PM, Alice Bevan–McGregor wrote:

>> I've already defined that. RTFM or many ML messages about logging.
>
> Please remain friendly and patient.

That depends on how you define the F in RTFM. In this instance, I
meant "read the fine manual". ;)

You can understand my frustration, however, that > 10% of the posts in
this thread demonstrate a lack of understanding of (or lack of even a
cursory glance at) a) my initial post and associated document, and b)
the rest of the mailing list posts.

Asking for things already agreed upon or questions already resolved
wastes everyone's time.

On 2011-04-18 16:46:12 -0700, Eric Larson said:
> Instead of assuming "/etc" always means "the root of the filesystem" we
> should consider it the "root of the sandbox" where the system providing
> the "sandbox" defines what that is.

While /etc certainly wouldn't be the root of anything (insert sarcastic
smiley here ;), it was already agreed upon that / would refer to the
application container root, not system root. I share Ian's sentiment,
see: (search for 'root' on that page)

http://mail.python.org/pipermail/web-sig/2011-April/005041.html

> It is _a_ filesystem in that there is a place that an application will
> be run. For argument's sake, we'll say it is a directory on some
> server. Now, within that directory we choose to take some known bits
> from the LFS standard such as /etc, /bin, /var, etc for the placement
> of our application.

Again, not such a great idea.

> With that in mind, I think using things like LFS makes a ton of sense.
> We can piggy back or copy (since previous discussions for .debs or rpms
> seem not to sit well... even though they would fit this model very
> well...) systems like RPM rather directly and hopefully allow our
> Python web apps to play very nicely with applications in other
> languages.

I can't fully grok this paragraph. FHS (my bad calling it LFS
earlier!) = good because we won't confuse systems administrators and it
matches other binary packaging models?

I doubt an isolated web application will have a need for more than 6%
(3) of these:

http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard#Directory_structure

While I personally have a FHS-like application deployment model using
Git, I would rather not see that level of complexity as a requirement
for deploying basic applications.

> Please do not get hung up on the fact that I've said RPMs here. The
> fact is distros have been doing package management for quite a long
> while. It is insanely convenient to say apt-get install couchdb and
> when it is done, having a couchdb server running.

It may be convienent, but it's also quite the risk. You're letting
someone else configure your server. Also, do binary installation
systems automatically start the service post-installation before you
can configure them? I have difficulty believing that, which means a
whole whack-ton of effort under a systems administrator hat has been
glossed over.

> Copying the model seems like a good option in that we get to learn from
> the mistakes of others while inheriting a wild variety of tools and
> concepts.

The on-disk structure which the application lives within (the
"application container") is up to the application server in use. The
underlying application should, and, IMHO, -must- be agnostic to it.
Passing paths to configuration files, TMPDIR, etc. in the environment
is a fairly trivial way to do that, at which point the FHS discussion
is nearly moot.

If you want a complete (complete enough for a simple web application)
FHS structure within the redistributable, I don't see the point of
having that many empty directories. ;)

As an aside, I -do- have an application in production using a FHS-like
file structure:

https://gist.github.com/926617

But again, I'm not suggesting something like that for the
redistributable application!

Daniel Holth

unread,

Apr 19, 2011, 9:43:27 AM4/19/11

to python-...@googlegroups.com, web...@python.org

I have read it all but to me the consensus so far is unclear. I'm not sure we are talking about Ian's original suggestion 'make changes to the Silver Lining format to make it more general'. I thought the idea meant 'make a few changes to a virtualenv so it doesn't depend on the absolute filesystem path' or 'define a directory tree with a site-packages subdirectory (a place where .pth files are evaluated) and a .wsgi script'.

I think I'll understand your proposal if you can walk me through the packaging and deployment of this application, with all bundled dependencies, metadata, the application lifecycle, lifecycle event callback scripts and so forth:

def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/plain')])
return ['Hello World!\n']

Daniel Holth

unread,

Apr 20, 2011, 10:49:02 AM4/20/11

to python-...@googlegroups.com, web...@python.org

I am fine with the idea of passing a standard [YAML] container-resources configuration file to applications, but I mostly care about the orthogonal underlying 'copy a virtualenv to another server' use case. My proposal:

site.py should honor a new environment variable PYTHONAPPBASE, similar to PYTHONUSERBASE or PYTHONHOME.

PYTHONAPPBASE causes site.py to add an additional site-packages directory in PYTHONAPPBASE/lib/python2.7/site-packages or Lib/site-packages etc. in the same way the current site-packages paths are chosen.

(Optional) If that's not good enough, PYTHONAPPSITE specifies a directory that is given to site.addsitepackages(). A maximum of 1 directory within PYTHONAPPBASE can be added.

site.py gains APP_BASE and APP_SITE in the same way it currently contains USER_BASE and USER_SITE.

All other paths such as PYTHONAPPBASE/src/mypackage-0.1/src/__init__.py are resolved with .pth files that must contain relative paths only.

Definitions:

runtime: Python libraries available to but outside the webapp. Chosen by setting PYTHONHOME. The application metadata might choose a runtime by name, and the application server might display the results of 'pip freeze' for the runtime to help you build a matching development environment.

app.wsgi is the initial entry point /for the WSGI application/ but scripts can be run by setting up the environment in the same way.

To configure such an application with uWSGI then, if the Python runtime is in /runtimeA and the web application package is in /application, you would provide the following uWSGI configuration:

[uwsgi]
pyhome=/runtimeA
chdir=/application
file=/application/app.wsgi
env=PYTHONAPPBASE=/application
# to be specified later:
env=STANDARD_CONFIGURATION_FILE=/some-container-services.yaml

When app.wsgi starts it can access any libraries from /runtimeA/lib/python2.7/site-packages and any libraries from /application/lib/python2.7/site-packages

Since PYTHONPATH comes from site.py, not app.wsgi, it's very easy to perform the same setup for any other scripts within /application/

Is this anything?

Daniel Holth

Daniel Holth

unread,

Apr 27, 2011, 6:21:32 PM4/27/11

to python-...@googlegroups.com, Alice Bevan–McGregor, web...@python.org

I stumbled across https://apphosted.com as more web application package and format 'prior art'. It appears to be an App Engine competitor. According to their API documentation, their deployment format is an archive containing a single directory with your WSGI program and a metro.config. They put the database configuration in a settings.py written into the application's root with defined DB_URI, etc.

Ian Bicking

unread,

Apr 27, 2011, 6:46:28 PM4/27/11

to python-...@googlegroups.com, Alice Bevan–McGregor, web...@python.org

On Wed, Apr 27, 2011 at 5:21 PM, Daniel Holth <dho...@gmail.com> wrote:

I stumbled across https://apphosted.com as more web application package and format 'prior art'. It appears to be an App Engine competitor. According to their API documentation, their deployment format is an archive containing a single directory with your WSGI program and a metro.config. They put the database configuration in a settings.py written into the application's root with defined DB_URI, etc.

There's something that bothers me about using settings.py, though I guess it's not that different from a YAML file or whatever, though with a cleverness danger. Conveniently you could do sys.modules['settings'] = new.module('settings') and avoid ever making a real file.

Using the name "settings" *specifically* is likely to cause name clashes with existing Django applications.

Ian

Eric Larson

unread,

Apr 27, 2011, 7:09:28 PM4/27/11

to Ian Bicking, python-...@googlegroups.com, Alice Bevan–McGregor, web...@python.org

On Wednesday, April 27, 2011 at 5:46 PM, Ian Bicking wrote:

On Wed, Apr 27, 2011 at 5:21 PM, Daniel Holth <dho...@gmail.com> wrote:
I stumbled across https://apphosted.com as more web application package and format 'prior art'. It appears to be an App Engine competitor. According to their API documentation, their deployment format is an archive containing a single directory with your WSGI program and a metro.config. They put the database configuration in a settings.py written into the application's root with defined DB_URI, etc.

There's something that bothers me about using settings.py, though I guess it's not that different from a YAML file or whatever, though with a cleverness danger. Conveniently you could do sys.modules['settings'] = new.module('settings') and avoid ever making a real file.

I would think something like YAML would be better simply b/c it might fit in more easily with an implementation that may not use Python. A dashboard app might want to loop through a bunch of sandboxes and read the listening ports/hosts for example to help find what it needs to monitor. The converse of course is that an environment could prescribe something like a port, host, database string, etc. by updating the YAML file. Again, by using something like YAML, whatever is provisioning the environment doesn't necessarily need to be written in Python.

Eric

Using the name "settings" *specifically* is likely to cause name clashes with existing Django applications.

Ian

Alice Bevan–McGregor

unread,

Jul 6, 2011, 2:33:34 PM7/6/11

to web...@python.org

On 2011-04-15 22:33:08 +0000, P.J. Eby said:

> At 04:11 PM 4/15/2011 -0400, Fred Drake wrote:
>> These end users don't really care if the object identified is a class or
>> function in module, a nested attribute on a class, or anything else, so
>> long as it does what it's advertised to do. By not pushing implementation
>> details into the identifier, the package maintainer is free to change the
>> implementation in more ways, without creating backward incompatibility.
>
> That would be one advantage of using entry points
> instead. ;-) (i.e., the user doesn't specify the object location,
> the package author does.)
>
> Note, however, that one must perform considerably more work to
> resolve a name, when you don't know whether each part of the name is
> a module or an attribute.

Not if, as you mention, you use an explicit format. The format my
resolver code uses (and this code is utilized in marrow.mailer for
manager/transport lookup, marrow.server.http's command-line script to
resolve WSGI applications, and marrow.templating to resolve templates)
covers the following:

:: <object>
:: entrypoint_name
:: ../relative/path/to/something
:: ./relative/path/to/something
:: /absolute/path/to/something
:: package.relative/path/to/something
:: package.absolute.path
:: package.submodule:object
:: package.submodule:object.attribute

What is allowed on any given resolution depends on if the resolver
request is looking for an on-disk path or object.

Using the above as an example, you can define the use of the SMTP
transport within marrow.mailer in two ways:

from marrow.mailer.transport.smtp import SMTPTransport
config = dict(transport=SMTPTransport) # direct reference
config = dict(transport="smtp") # entry point
config = dict( # object lookup
transport = "marrow.mailer.transport.smtp:SMTPTransport"
)

When configuring m.s.http to load an app, you can:

# p-code
HTTPServer.serve("project.application:WSGIApp.factory")

When choosing templates, OTOH, you can do the following:

return "./templates/foo.html", dict()
return "/var/www/foo.html", dict()
return "myapp.templates.foo", dict()
return "myapp/templates/foo.html", dict()
return "myapp.stemplates:email.welcome", dict()

> Either you have to get an AttributeError first, and then fall back to
> importing, or get an ImportError first, and fall back to getattr.

If you examine the above closely, the differing formats are easily
identifiable using a few == and 'in' conditionals:

if not isinstance(ref, basestring):
return ref

if ref[0] == '.': pass # relative
if ref[0] == '/': pass # absolute
if '/' not in ref and '.' not in ref and ':' not in ref:
pass # entrypoint
if ':' in ref:
import_, _, attrs = ref.partition(':')
base = __import__(import_)
for attr in attrs.split('.'):
base = getattr(base, attr)
return attr
if '/' in ref:
import_, _, path = ref.partition('/')
pass # use pkg_resources + path to pull file from package

> If the syntax is explicit, OTOH, then you don't have to guess, thereby
> saving lots of work and wasteful exceptions.

— Alice.

Daniel Holth

unread,

Oct 27, 2011, 4:27:45 PM10/27/11

to python-...@googlegroups.com, web...@python.org

http://www.activestate.com/cloud

Runs Python apps distributed as packages with a wsgi.py "application=..." and requires.txt to point to dependencies, some yaml configuration, and JSON-in-an-environment-variable to get to the container's database services.

Reply all

Reply to author

Forward

0 new messages