Project: Heroku Buildpack for Web2py

139 views
Skip to first unread message

Louis Amon

unread,
Mar 12, 2015, 10:02:54 AM3/12/15
to web...@googlegroups.com
I'm trying to create a Buildpack designed specifically for deploying Web2py-based applications on Heroku.

Buildpacks are shell programs that are used to build & deploy slugs on Heroku.
The buildpack runs before the web2py application is launched.

It is basically a kind of deploy hook.


The features I'd like to include in the web2py buildpack are :
  • Byte-compilation of the web2py application
  • Migration and/or re-creation of the .table files
  • Installation of pip dependencies
This would allow automatic optimization of the run speed and tackle the very tricky ephemeral filesystem of Heroku.


I could really use some insights about how I should go about building this.

Specifically, I need to know:
  1. how to to byte-compile directly from the shell or from a python script
  2. what strategy I should use to handle migrations at deploy-time

This is the official buildpack for Python, which I intend to build mine upon : https://github.com/heroku/heroku-buildpack-python

This is the documentation for the Buildpack API provided by Heroku : https://devcenter.heroku.com/articles/buildpack-api

Massimo Di Pierro

unread,
Mar 12, 2015, 2:49:24 PM3/12/15
to web...@googlegroups.com
1. You do not need to byte compile form shell. The code is byte compiled when requested

2. bug can of worms. Problem in a multi-server environment is that only one of the server can do migrations at once but all need the correct .table files. I do not have a simple answer. Make sure you look into gluon/contrib/heroku.py

Louis Amon

unread,
Mar 14, 2015, 2:10:48 AM3/14/15
to web...@googlegroups.com
You do not need to byte compile form shell. The code is byte compiled when requested

I was referring to admin/compile_app. Isn’t that supposed to help improve performance ?

Make sure you look into gluon/contrib/heroku.py

I’ve been looking at it for the past few months yeah… and ended up rewriting most of it.
The problem with web2py and PaaS thus far is that most of the cloud/multi-server logic in web2py is hard-coded and dedicated to GoogleAppEngine (in dal.py for instance)

That’s cool and all, but many little features end up missing for other type of PaaS services.
As far as Heroku’s concerned, I think the recommended way to outscale gluon/contrib/heroku.py is to build a proper buildpack.

Problem in a multi-server environment is that only one of the server can do migrations at once but all need the correct .table files
 
If the buildpack runs before the deployment of the main server (or any additional instance for that matter), it would build all the correct .table files.
The only tricky part I see is how to operate the actual migrations on the shared database only once.

So far the only solution I’ve found is manually running SQL code to alter or create the tables I need.

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/uNYUnvZSxqs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Massimo Di Pierro

unread,
Mar 14, 2015, 12:27:23 PM3/14/15
to web...@googlegroups.com
I was referring to admin/compile_app. Isn’t that supposed to help improve performance ?

A little. Not significative compared to DB IO. It helps if you have complex templates and little to none controller/models logic.
 

Make sure you look into gluon/contrib/heroku.py

I’ve been looking at it for the past few months yeah… and ended up rewriting most of it.

:-) I would be great if you could share it.
 
To unsubscribe from this group and all its topics, send an email to web2py+unsubscribe@googlegroups.com.

Louis Amon

unread,
Mar 14, 2015, 3:35:17 PM3/14/15
to web...@googlegroups.com
:-) I would be great if you could share it.

Sure!

The logic I ended up using for Heroku is more explicit than gluon/contrib/heroku.py, and closer to how the doc handles GAE :

import os
detect_heroku = lambda: bool([n for n in os.environ.keys() if n[:18]+n[-4:]=='HEROKU_POSTGRESQL__URL'])

# use environment variables to enable multiple deployment environments
request.step = os.getenv('STEP')

HEROKU_DATABASES
= {'production':'HEROKU_POSTGRESQL_BLACK_URL',
                    'staging':'HEROKU_POSTGRESQL_PURPLE_URL'}

if detect_heroku():
  # find db URI
    database_name = HEROKU_DATABASES[request.step]
    heroku_uri = os.environ[database_name]
    db = DAL(heroku_uri)
    if request.step == 'production':
        # store sessions in database
        session.connect(request, response, db=db, migrate='web2py_session.table')
    else:
        # store sessions in the filesystem
        session.connect(request, response)
else:
    db = DAL('postgres://...')
    session.connect(request, response, db=db, migrate='web2py_session.table')

To run the server locally, I use a shell script:
STEP = 'local'
export STEP
python web2py
.py ...

When I first configure my Heroku applications, I always set the "STEP" variable environment either in the web-interface or using the CLI.

With this code you tackle several issues : using multiple databases, using multiple deployment environments & storing sessions.
Heroku's pricing makes it so that you only have 10000 rows available on free servers so if you're not in production, you're better off storing sessions in files (or cookies for that matter).

As for migrations, I track all .table files in my GIT repository as files.
I don't use the "HerokuPostgresAdapter" because some of the migration logic is broken. From what I recall it has to do with PostgresAdapter logic versus GoogleAppEngine logic in dal.py, which is hard-coded in the framework.

Tracking .table files allows me to know when migrations are to be done, so I end up altering my tables with pgAdmin3 (after creating a backup with Heroku's CLI tool).


This isn't the cleanest way to handle migrations, but it worked for me over the past few months.

I'm now trying to put all of this logic in a better-suited solution : a Buildpack dedicated to running web2py on Heroku.

Louis Amon

unread,
Apr 5, 2015, 3:29:52 PM4/5/15
to web...@googlegroups.com
After much thought, and based on the fact that I've been using the Heroku / Web2py stack for more than a year now, I think I'll try to build a scaffolding app dedicated to Heroku in order to demonstrate exactly how one should go about building a web2py-based application on a PaaS cloud with no persistent filesystem.
I could also contribute the doc section on Heroku but I'm not quite sure I'm senior enough just yet.

That being said, the most difficult issue I've been having so far with web2py on Heroku (and still haven't solved yet) is how to handle migration files.

If you guys can help me do that then it's pretty much all clear down the road ;)


So, basically, there are 4 approaches to storing data on cloud systems :
  1. Use a remote bucket (e.g. Amazon S3)
  2. Store data in the database
  3. Postdeploy hooks
  4. CVS (Git/Mercurial...)
Let's go over those :

1. Bucket

If you use a library like pyfs, you can replace the filesystem used to access local files with a remote system (a bucket). That's very handy for instance if you want to manage uploads (e.g. pictures) in web2py using a remote storage service like Amazon S3.

HOWEVER, migration files being files you need to access a lot and very fast in web2py, any latency you add in the loading of those files would drastically impact your app's responsivity.
Even using a CDN service like Cloudfront, I would highly recommend not going for that option.

2. Database

That is the preferred option so far, and the only one offered in web2py's doc.
gluon/contrib/heroku.py uses the UseDatabaseStoredFiles class to do most of the heavy lifting here.

Problem is : this class was built with GAE in consideration and you can find many explicit implementations dedicated to GAE (e.g. if gae : ...)
I'm sure this can be improved to solve inconsistencies when this class is used with Heroku (for example this issue https://groups.google.com/forum/#!topic/web2py/w2RJBqKIwRE)

Using the database is a solid option when using an ephemeral filesystem.

3. Postdeploy hook

This option hasn't been explored at all so far.

Based on Heroku's doc, one can specify explicitely what shell commands to run when building an app into a slug.

If there was a python script that could run the migrations (migrate_enabled = True, fake_migrate_all = False, lazy_tables = False), you could run it at postdeploy-time then basically disable migrations consistently throughout your project without having to worry about how one changeset would affect your production DB.

4. CVS

Including .table files in my Git changesets if the solution I'm currently using.
It makes sure your local and production environment are on the same page migration-wise, and it's a good way to remind you that your changeset may trigger migrations.

I'm not 100% sure it's the best way to do things though, and there are times when it breaks : if you don't specify explicit names for your migration files then you'll end up with tons of .table files in your changesets due to the hashed prefix being changed (why is it changed all the time anyway ??).


These are the only options I know of, and none of them is fully satisfactory so far.

What do you think ? What's the best way to handle migrations on a git-based deploy system that has an ephemeral filesystem ?

Reply all
Reply to author
Forward
0 new messages