How to let every wsgi application have its own processgroup

Daan Davidsz

unread,

Nov 16, 2009, 6:50:27 AM11/16/09

to modwsgi

At first I would like to say I want to run in daemon mode.

My current global configuration is this:
WSGIProcessGroup abc
WSGIDaemonProcess abc threads=8 display-name=mod_wsgi:abc

The problem with this is that all sorts of things start interfering
with eachother. For example the os.chdir() function causes havoc
because of all the different threads changing it at different times.

Now I think this configuration should solve these issues:
WSGIProcessGroup abc
WSGIDaemonProcess abc threads=1 processes=8 display-name=mod_wsgi:abc

But I don't think that is very scalable or lean.

I've tried some config with %{ENV:SCRIPT_FILENAME} but it doesn't seem
to work. Is there a way that I can give every script its own
processgroup and threads from a global configuration file?

Graham Dumpleton

unread,

Nov 16, 2009, 6:37:42 PM11/16/09

to mod...@googlegroups.com

2009/11/16 Daan Davidsz <daand...@gmail.com>:

>
> At first I would like to say I want to run in daemon mode.
>
> My current global configuration is this:
> WSGIProcessGroup abc
> WSGIDaemonProcess abc threads=8 display-name=mod_wsgi:abc
>
> The problem with this is that all sorts of things start interfering
> with eachother. For example the os.chdir() function causes havoc
> because of all the different threads changing it at different times.

Web applications assuming they can change directories for individual
requests or which only work when run from a specific directory are
arguably poorly designed. Such techniques will never work properly
where multithreading is used or where multiple instances of the
application need to run in same process.

It is much better that file system accesses always be by absolute pathname.

> Now I think this configuration should solve these issues:
> WSGIProcessGroup abc
> WSGIDaemonProcess abc threads=1 processes=8 display-name=mod_wsgi:abc
>
> But I don't think that is very scalable or lean.

Not scalable and lean in what way?

You will use just as much memory if you were to create as many
distinct process groups as you have applications and with each having
8 processes each running a single thread.

The only thing wrong I can see in above is that it will still not
solve problem where a web application expects to always be run out of
a specific directory and only sets that directory location once when
web application first loaded. This is because last such similar
application to load will override location for others due to web
application instances still being in same process.

> I've tried some config with %{ENV:SCRIPT_FILENAME} but it doesn't seem
> to work. Is there a way that I can give every script its own
> processgroup and threads from a global configuration file?

At the moment only by duplicating WSGIDaemonProcess/WSGIProcessGroup
configuration for each application.

Because I don't know whether you are currently using WSGIScriptAlias
on specific WSGI script file, or mapping it against a directory of
WSGI script files, or using AddHandler, can't guide you as to exactly
what you need to do.

Short answer is though that you aren't restricted to one
WSGIDaemonProcess/WSGIProcessGroup configuration.

Can you provide the rest of your mod_wsgi related configuration?

Can you also explain why your web applications are changing the
working directory in the first place and not using absolute path names
for file system access as would be regarded as being best practice?

Graham

Daan Davidsz

unread,

Nov 16, 2009, 7:25:40 PM11/16/09

to modwsgi

Graham, thank you very much for your comments.

On Nov 17, 12:37 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

> Web applications assuming they can change directories for individual
> requests or which only work when run from a specific directory are
> arguably poorly designed. Such techniques will never work properly
> where multithreading is used or where multiple instances of the
> application need to run in same process.
>
> It is much better that file system accesses always be by absolute pathname.

I'm pretty new to mod_wsgi, but I can see now how it is bad practice
in a multithreaded environment. In the future I will use absolute
pathnames.

> Not scalable and lean in what way?
>
> You will use just as much memory if you were to create as many
> distinct process groups as you have applications and with each having
> 8 processes each running a single thread.
>
> The only thing wrong I can see in above is that it will still not
> solve problem where a web application expects to always be run out of
> a specific directory and only sets that directory location once when
> web application first loaded. This is because last such similar
> application to load will override location for others due to web
> application instances still being in same process.

Not scalable because it currently uses a global configuration file for
the server. This means all different sites on that server will be run
using those 8 processes. When the amount of traffic increases 8
processes may not be enough.

>
> At the moment only by duplicating WSGIDaemonProcess/WSGIProcessGroup
> configuration for each application.
>
> Because I don't know whether you are currently using WSGIScriptAlias
> on specific WSGI script file, or mapping it against a directory of
> WSGI script files, or using AddHandler, can't guide you as to exactly
> what you need to do.

I control the mod_wsgi flow using .htaccess files and AddHandler. The
problem is that I don't have full control (at the moment) of the
system administration. The global configuration file was set by the
administrators.

> Short answer is though that you aren't restricted to one
> WSGIDaemonProcess/WSGIProcessGroup configuration.
>
> Can you provide the rest of your mod_wsgi related configuration?

My first post contains all the configuration except the .htaccess
configurations.

> Can you also explain why your web applications are changing the
> working directory in the first place and not using absolute path names
> for file system access as would be regarded as being best practice?

Just my ignorance.

> Graham

I'm afraid that my current configuration will also combine the PATH
variables for different websites. Using a ProgressGroup per website
would solve that, but I don't know how to accomplish that with my
limited server administration rights. Is there any other way or should
I just ask for more rights on the server because it isn't workable
otherwise?

On another note, great work on mod_wsgi. I really like its speed and
functionality.

Daan

Graham Dumpleton

unread,

Nov 16, 2009, 9:10:57 PM11/16/09

to mod...@googlegroups.com

2009/11/17 Daan Davidsz <daand...@gmail.com>:

>> Not scalable and lean in what way?
>>
>> You will use just as much memory if you were to create as many
>> distinct process groups as you have applications and with each having
>> 8 processes each running a single thread.
>

> Not scalable because it currently uses a global configuration file for
> the server. This means all different sites on that server will be run
> using those 8 processes. When the amount of traffic increases 8
> processes may not be enough.

Most people over estimate how much traffic their site will
realistically receive. It also isn't necessarily the number of
concurrent users you have to worry about as much is it how long each
request takes. If your request response requests are well optimised
and short, then even a single threaded mod_wsgi daemon process is
generally going to be more than adequate for most peoples needs. More
processes/threads is only going to warranted where you have many
requests which take more than a short amount of time as they will tie
up threads available thereby reducing number of requests/sec you can
sustain.

If your site is going to receive the amounts of traffic that would
trouble such a setup then you are already in a poor hosting setup in
as much as you have no control of the Apache configuration and
management. If your site is going to see substantial amounts of
traffic then you should be on a VPS where you have full control or
with a hosting company such as WebFaction where you can offload static
file serving to their infrastructure and still have full control of
your own Apache instance.

>> At the moment only by duplicating WSGIDaemonProcess/WSGIProcessGroup
>> configuration for each application.
>>
>> Because I don't know whether you are currently using WSGIScriptAlias
>> on specific WSGI script file, or mapping it against a directory of
>> WSGI script files, or using AddHandler, can't guide you as to exactly
>> what you need to do.
>
> I control the mod_wsgi flow using .htaccess files and AddHandler. The
> problem is that I don't have full control (at the moment) of the
> system administration. The global configuration file was set by the
> administrators.
>
>> Short answer is though that you aren't restricted to one
>> WSGIDaemonProcess/WSGIProcessGroup configuration.
>>
>> Can you provide the rest of your mod_wsgi related configuration?
>
> My first post contains all the configuration except the .htaccess
> configurations.
>

> I'm afraid that my current configuration will also combine the PATH
> variables for different websites.

Relying on PATH setup is also again a bad idea. References to
executable programs should also be by absolute path to avoid problems.

> Using a ProgressGroup per website
> would solve that, but I don't know how to accomplish that with my
> limited server administration rights. Is there any other way or should
> I just ask for more rights on the server because it isn't workable
> otherwise?

If the administrators aren't going to constrain you as far as how many
processes you have, then would suggest you have them do something like
the following. This presumes all your stuff is under one VirtualHost
which you own.

<VirtualHost *:80>
ServerName daan.example.com

WSGIDaemonProcess daan-1 threads=5 display-name=%{GROUP} user=daan group=daan
WSGIDaemonProcess daan-2 threads=5 display-name=%{GROUP} user=daan group=daan
WSGIDaemonProcess daan-3 threads=1 processes=5 display-name=%{GROUP}
user=daan group=daan

WSGIRestrictProcess daan-1 daan-2 daan-3

WSGIProcessGroup %{ENV:site.process_group}
WSGIApplicationGroup %{ENV:site.application_group}

SetEnv site.process_group daan-1
SetEnv site.application_group %{RESOURCE}
</VirtualHost>

What this does is set up three different process groups for you to
use, two multi threaded ones and one single threaded one. The default
is set to be the first of the multi threaded ones.

Because the delegation to process and application groups is handled
indirect via SetEnv variables, then you can override them in your
.htaccess file provided you have FileInfo override, which since you
can use AddHandler you would.

The WSGIRestrictProcess directive allows administrators though to
constrain you to only being able to delegate to the process groups set
up for you.

In your .htaccess file you would then have:

AddHandler wsgi-script .wsgi

If you then wanted to delegate a specific application to the single
threaded process you could use:

<Files django.wsgi>
SetEnv site.process_group daan-3
</Files>

That is, for WSGI script file for the site, using SetEnv override
site.process_group variable from which which process group to use is
being read.

If you had a specific site that needed to be run in main Python
interpreter within a process, eg. Trac with Python subversion
wrappers, you could also set the application group to change which
interpreter is used:

<Files trac.wsgi>
SetEnv site.process_group daan-2
SetEnv site.application_group %{GLOBAL}
</Files>

So, still requires administrators to do stuff to set up initial
process groups, but your then have flexibility to move applications
between process groups provided.

The issue then becomes how many different process groups and processes
within them the administrators are prepared to let you have.

Graham

Daan Davidsz

unread,

Nov 17, 2009, 11:09:38 AM11/17/09

to modwsgi

Thank you very much, this seems like the configuration I was looking
for. I've tweaked it a bit and e-mailed it to the administrators.

Daan

Jason Garber

unread,

Nov 17, 2009, 3:43:06 PM11/17/09

to mod...@googlegroups.com

Hi Daan,

Just a note on performance. I'm running a WSGI 3 application with a page which includes dynamic HTML generation and a couple PostgreSQL queries per request, etc...

With 1 process and 5 threads, the server hosting it will sustain 800-1000 requests per second, using: ab -n 10000 -c 30 http://<site-running-on-localhost>/path/to/page

Keep in mind that latency could affect those numbers significantly in the real world, but still, you can handle a LOT of traffic with few processes/threads if your application is well written.

Sincerely,

Jason Garber

--

You received this message because you are subscribed to the Google Groups "modwsgi" group.
To post to this group, send email to mod...@googlegroups.com.
To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/modwsgi?hl=.

Graham Dumpleton

unread,

Nov 17, 2009, 6:52:22 PM11/17/09

to mod...@googlegroups.com

2009/11/18 Jason Garber <bo...@gahooa.com>:

> Hi Daan,
> Just a note on performance. I'm running a WSGI 3 application with a page
> which includes dynamic HTML generation and a couple PostgreSQL queries per
> request, etc...
> With 1 process and 5 threads, the server hosting it will sustain 800-1000
> requests per second, using: ab -n 10000 -c 30

In hindsight the default of 1 process and with 15 threads is indeed
more than what well designed sites would need, albeit that it does
provide a good buffer. For mod_wsgi 2.X that was traded off with
slightly higher thread memory usage because of way threads were used.
For mod_wsgi 3.0 a new scheme is used whereby always attempted to use
most recently used thread for a new request. This means that unless
actually needed, extra threads in the pool will never call into Python
and will not incur any additional per thread memory overhead that
Python may impose.

As mentioned before, how many threads actually use can depend a lot on
what percentage of long requests you have.

> http://<site-running-on-localhost>/path/to/page
> Keep in mind that latency could affect those numbers significantly in the
> real world, but still, you can handle a LOT of traffic with few
> processes/threads if your application is well written.

Such latency issues around slow clients can be largely eliminated
through use of nginx front end proxy to Apache. Although nginx will
also handle static files better than Apache and leave Apache/mod_wsgi
just to handle dynamic requests, that isn't even the benefit we would
get in case I am talking about.

Specifically, nginx helps with latency and slow clients because for
POST requests it will buffer up request content, so long as not over a
default of 1MB (I think), and only when it has whole request headers
and content will it pass request on to Apache/mod_wsgi. This means
that if client is slow to deliver up request, doesn't affect
Apache/mod_wsgi.

Similarly on the response, the implicit buffering within sockets from
daemon process back through Apache worker process and through to nginx
allow Apache/mod_wsgi to release the request and the connection
quicker, with nginx then doing the potentially slow job of dribbling
the response back to the slow client.

The effect of the two means that Apache/mod_wsgi is involved for as
little time as possible and thus can better utilise more limited
resources. Thus don't need to configure as many processes/threads to
handle same load as nginx will take on burden of slow clients and it
being asynchronous does a better job of handling that for many
connections in startup or responding state.

In OP's setup though since they don't have control of their Apache
even less likelihood they can get a nginx front end proxy for it
going. :-)

Graham

Daan Davidsz

unread,

Nov 18, 2009, 4:35:05 AM11/18/09

to modwsgi

On Nov 18, 12:52 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

> In OP's setup though since they don't have control of their Apache
> even less likelihood they can get a nginx front end proxy for it
> going. :-)
>
> Graham

That could be true, although the current workload is next to nothing
so the current setup will be fine. It is not so much that things
aren't configurable, it's just that bothering the admins too much will
cost my partner money :) Today my new mod_wsgi configuration was
accepted and implemented. When things really get rough I could use
memcached and I'm confident that the installation would be no issue.

The current goal is to develop one CMS system - in Python of course -
which basically is a frontend for every users MySQL database. The
users will get a custom site also in Python. Does anybody have any
tips for this type of setup? I'm not really sure what is the right way
to share libraries. At the moment the communication is very simple,
webservice like. A site will issue a request to the CMS system and the
system will query the right database and return a nice JSON model of
variables the site can use. For the easy stuff this works fine,
although I'm afraid this may not be enough for more complicated
situations.

Daan

Graham Dumpleton

unread,

Nov 18, 2009, 4:47:43 AM11/18/09

to mod...@googlegroups.com

2009/11/18 Daan Davidsz <daand...@gmail.com>:

> On Nov 18, 12:52 am, Graham Dumpleton <graham.dumple...@gmail.com>
> wrote:
>> In OP's setup though since they don't have control of their Apache
>> even less likelihood they can get a nginx front end proxy for it
>> going. :-)
>>
>> Graham
>
> That could be true, although the current workload is next to nothing
> so the current setup will be fine. It is not so much that things
> aren't configurable, it's just that bothering the admins too much will
> cost my partner money :) Today my new mod_wsgi configuration was
> accepted and implemented.

I am curious what mix of groups, processes and threads you thought
might give you flexibility you need for what you have in mind. Can you
post the updated configuration?

> When things really get rough I could use
> memcached and I'm confident that the installation would be no issue.
>
> The current goal is to develop one CMS system - in Python of course -
> which basically is a frontend for every users MySQL database. The
> users will get a custom site also in Python. Does anybody have any
> tips for this type of setup? I'm not really sure what is the right way
> to share libraries. At the moment the communication is very simple,
> webservice like. A site will issue a request to the CMS system and the
> system will query the right database and return a nice JSON model of
> variables the site can use. For the easy stuff this works fine,
> although I'm afraid this may not be enough for more complicated
> situations.

How hard or easy that is is going to be governed by what Python web
framework you are using. Some can't handle multiple databases, others
can.

Graham

Daan Davidsz

unread,

Nov 18, 2009, 5:09:04 AM11/18/09

to modwsgi

On Nov 18, 10:47 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

> I am curious what mix of groups, processes and threads you thought
> might give you flexibility you need for what you have in mind. Can you
> post the updated configuration?

Sure, here it is:

####################################################################
WSGIDaemonProcess cms threads=8 display-name=%{GROUP}
WSGIDaemonProcess sites threads=8 display-name=%{GROUP}
WSGIDaemonProcess other threads=8 display-name=%{GROUP}
WSGIDaemonProcess single threads=1 processes=4 display-name=%{GROUP}

WSGIRestrictProcess cms sites other single

WSGIProcessGroup %{ENV:site.process_group}
WSGIApplicationGroup %{ENV:site.application_group}

SetEnv site.process_group sites
SetEnv site.application_group %{RESOURCE}
####################################################################

At default all scripts (probably one per website) will be assigned to
the "sites" processgroup and will get their own applicationgroup. Only
for the CMS I will use the cms processgroup.

> How hard or easy that is is going to be governed by what Python web
> framework you are using. Some can't handle multiple databases, others
> can.

That's the kicker; I'm not using any other framework than my own
directly on top of mod_wsgi. Multiple database support, dynamic "CMS
Module" loading, URL parsing and a simple Document/ORM DB model are
some things that have been implemented already.

Daan

Graham Dumpleton

unread,

Nov 18, 2009, 5:15:32 AM11/18/09

to mod...@googlegroups.com

2009/11/18 Daan Davidsz <daand...@gmail.com>:

>
>
> On Nov 18, 10:47 am, Graham Dumpleton <graham.dumple...@gmail.com>
> wrote:
>> I am curious what mix of groups, processes and threads you thought
>> might give you flexibility you need for what you have in mind. Can you
>> post the updated configuration?
>
> Sure, here it is:
>
> ####################################################################
> WSGIDaemonProcess cms threads=8 display-name=%{GROUP}
> WSGIDaemonProcess sites threads=8 display-name=%{GROUP}
> WSGIDaemonProcess other threads=8 display-name=%{GROUP}
> WSGIDaemonProcess single threads=1 processes=4 display-name=%{GROUP}
>
> WSGIRestrictProcess cms sites other single
>
> WSGIProcessGroup %{ENV:site.process_group}
> WSGIApplicationGroup %{ENV:site.application_group}
>
> SetEnv site.process_group sites
> SetEnv site.application_group %{RESOURCE}
> ####################################################################
>
> At default all scripts (probably one per website) will be assigned to
> the "sites" processgroup and will get their own applicationgroup. Only
> for the CMS I will use the cms processgroup.

BTW, I presume this configuration works, or is it still being set up?

Suggest you use the following WSGI test script to verify what process
group and application group is being used.

import StringIO
import mod_wsgi

def application(environ, start_response):
status = '200 OK'

output = StringIO.StringIO()
print >> output, 'process_group: %s' % mod_wsgi.process_group
print >> output, 'application_group: %s' % mod_wsgi.application_group
output = output.getvalue()

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

return [output]

This presumes mod_wsgi 2.4 or later being used and would have to be
modified for older mod_wsgi versions to get the process group and
application group information out of 'environ' dictionary.

>> How hard or easy that is is going to be governed by what Python web
>> framework you are using. Some can't handle multiple databases, others
>> can.
>
> That's the kicker; I'm not using any other framework than my own
> directly on top of mod_wsgi. Multiple database support, dynamic "CMS
> Module" loading, URL parsing and a simple Document/ORM DB model are
> some things that have been implemented already.

That certainly helps, but at the same time means more work for you.

I might suggest you have a look at Werkzeug. It doesn't mandate a
specific database adapter or ORM layer but still provides a lot of
useful infrastructure for constructing web applications so you aren't
doing everything from scratch.

Graham

Daan Davidsz

unread,

Nov 18, 2009, 5:49:29 AM11/18/09

to modwsgi

On Nov 18, 11:15 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

> BTW, I presume this configuration works, or is it still being set up?
>
> Suggest you use the following WSGI test script to verify what process
> group and application group is being used.

Yes, the configuration is implemented and works fine. I've verified
the process group and application group using your script and that
gives the expected result.
Is there a rationale for using StringIO by the way? I just use a list
and return that.

> That certainly helps, but at the same time means more work for you.
>
> I might suggest you have a look at Werkzeug. It doesn't mandate a
> specific database adapter or ORM layer but still provides a lot of
> useful infrastructure for constructing web applications so you aren't
> doing everything from scratch.

I am aware of the existence of Werkzeug. Thanks for the heads up
nonetheless. There is a good possibility I will use some of its
functionality in the future. I've looked at a lot of other "full
featured" frameworks like Django, Pylons and web2py, but decided
"rolling my own" would be the best choice in my situation.

Graham Dumpleton

unread,

Nov 18, 2009, 5:53:58 AM11/18/09

to mod...@googlegroups.com

2009/11/18 Daan Davidsz <daand...@gmail.com>:

> On Nov 18, 11:15 am, Graham Dumpleton <graham.dumple...@gmail.com>
> wrote:
>> BTW, I presume this configuration works, or is it still being set up?
>>
>> Suggest you use the following WSGI test script to verify what process
>> group and application group is being used.
>
> Yes, the configuration is implemented and works fine. I've verified
> the process group and application group using your script and that
> gives the expected result.
> Is there a rationale for using StringIO by the way? I just use a list
> and return that.

Formatting into strings yourself using % operator can just be a pain sometimes.

The StringIO internally captures them all as individual strings and
only collates them into one string when final value extracted. The way
it does this is more efficient than continually appending to a single
string and performance wise is technically better than the alternative
of you return a list of strings from WSGI application, which would see
WSGI adapter doing a forced flush, as required by WSGI specification,
between each string in the list.

Graham

>> That certainly helps, but at the same time means more work for you.
>>
>> I might suggest you have a look at Werkzeug. It doesn't mandate a
>> specific database adapter or ORM layer but still provides a lot of
>> useful infrastructure for constructing web applications so you aren't
>> doing everything from scratch.
>
> I am aware of the existence of Werkzeug. Thanks for the heads up
> nonetheless. There is a good possibility I will use some of its
> functionality in the future. I've looked at a lot of other "full
> featured" frameworks like Django, Pylons and web2py, but decided
> "rolling my own" would be the best choice in my situation.
>

Jason Garber

unread,

Nov 18, 2009, 3:39:01 PM11/18/09

to mod...@googlegroups.com

Hi Daan,

I'm not sure what number of client sites you are talking about, but I did hear you say they each have a seperate database.

Over the years of developing custom software for clients, we've created a handful of modules that are quite useful for re-use. However, modules are complicated because they have live data behind them, whereas libraries are just static code imported into the project.

The real danger is that an upgrade of component X will break sites A and B. As you get more sites and modules, testing becomes next to impossible.

We started with a shared model. Essentially, there was a "Billing System" running as a python process on the server, waiting for connections. Each other (web) app would connect to it, and process transactions or retrieve data. We also did this with order management, email delivery, content management, and others. However, before long, what started as a good idea turned into a nightmare. We could not effectively upgrade the modules because of the dependencies on them. When running business grade web apps, "oops" downtime is not acceptable.

So we made a big paradigm shift. Share nothing. Each application now gets a copy of (a) the code for the module, and (b) the tables in IT'S own database to support the modules.

I have to point out that we are *really* heavy users of git, and if it were not for git's excellent submodules, I don't know how we would do what we are doing.

Within the directory structure of a given project, we have a Python directory, which is the first thing on the Python Path (as specified in the apache configuration). Both the project code, and the module code (via git submodules) are placed there.

In this way, we

(a) never break another app by making a change to a module

(b) are in complete control of the upgrade process of a given app's modules

(c) can test the given app at that time, when upgrading the modules

We are careful to ensure that WSGI apps all run in their own process groups, because if not, it could lead to strange import conflicts. But mod_wsgi made that easy.

I'm not sure if these "tips" are applicable, but I hope that it at least gives you another perspective to consider when moving forward with your design.

Thanks!

Jason Garber

Daan

Daan Davidsz

unread,

Nov 18, 2009, 5:11:55 PM11/18/09

to modwsgi

Hi Jason,

Thank you very much for your extensive overview. I'll try to give my
thoughts on the subjects mentioned.

I've seperated all modules to administration, backend and frontend
modules. The administration modules are for us, the backend modules
basically are the CMS system and the frontend modules will provide the
datamodel for the website. Because of this separation, an error in a
backend module generally won't affect a frontend module. Right now the
real meat is in the backend modules and the frontend modules are
basically "dumb" data generators. The website will decide what to do
with the data it has received from a frontend module.

Every website will indeed get its own database. I've made a general
datamodel that will gracefully allow updates.

I also use git quite a lot, but currently don't see it as a good
update/package management tool for my needs. I'm not looking forward
to going through all the websites manually and doing a pull/update of
the submodules. Maybe I am missing the knowledge of some git features
to grasp its usefulness in this regard. Could you please elaborate on
your git usage?
My idea was to build a simple version control system into the CMS
itself. Each module has different versions like 'stable', 'testing',
'dev', '1.0' etc. That way I can manage the versions though some
management interface in the CMS. All websites get the 'stable' version
of some module, but I can make an exception for site A, which gets
'dev', the one I am working on currently. When the 'dev' version is
stable enough it gets pushed to 'stable' and the current stable will
be pushed to 'deprecated' or 'old' or something. This is a sort of
rolling release system with per site exceptions.

I should point out that I strive for very contained modules. Except
for some data, they don't have dependencies.

Daan

Jason Garber

unread,

Nov 20, 2009, 12:20:53 AM11/20/09

to mod...@googlegroups.com

Hi Daan,

On Wed, Nov 18, 2009 at 5:11 PM, Daan Davidsz <daand...@gmail.com> wrote:

Hi Jason,
...

Every website will indeed get its own database. I've made a general
datamodel that will gracefully allow updates.

I also use git quite a lot, but currently don't see it as a good
update/package management tool for my needs. I'm not looking forward
to going through all the websites manually and doing a pull/update of
the submodules. Maybe I am missing the knowledge of some git features
to grasp its usefulness in this regard. Could you please elaborate on
your git usage?

The model of using git submodules to pull updates into specific projects works well for us. Generally, we are working with a smaller number (eg dozens) of large and complex web apps that are totally customized to a diverse range of client needs. Furthermore, clients are typically paying us to "change something", and that is an ideal time to do an upgrade (otherwise, we may leave it alone unless security or other major bugs are found).

If you have a lot more sites, or desire to upgrade them all at once, then this is most likely NOT the way to go. But I do suggest you establish a really through automated testing procedure of any shared components. The problem with bugs is that we missed something in the first place :)

My idea was to build a simple version control system into the CMS
itself. Each module has different versions like 'stable', 'testing',
'dev', '1.0' etc. That way I can manage the versions though some
management interface in the CMS. All websites get the 'stable' version
of some module, but I can make an exception for site A, which gets
'dev', the one I am working on currently. When the 'dev' version is
stable enough it gets pushed to 'stable' and the current stable will
be pushed to 'deprecated' or 'old' or something. This is a sort of
rolling release system with per site exceptions.

That sounds cool. I love doing stuff like that. Have you considered using symbolic links in some fashion, similar to how System V init scripts are "pointed to" ?

(that may be totally nonsense --- I don't know your project structure).

I should point out that I strive for very contained modules. Except
for some data, they don't have dependencies.

At each database upgrade (which are all scripted) we have a file like:

Version-100-DDL.sql

-- the sql to upgrade from DB version 99 to 100

Version-100-ALL.sql

-- a full sql dump (--no-data) of the database after the change (for historical record)