[Python-il] Apache with mod WSGI (for django) crashes when you "import nltk"

71 views
Skip to first unread message

Avishalom Shalit

unread,
Jan 31, 2013, 11:44:34 AM1/31/13
to python-il
As title.

It just silently hangs.

as far as i found on google, other people have ran into it,
but nobody posted a solution.

anybody overcame this before ?

thanks


-- vish

asaf greenberg

unread,
Jan 31, 2013, 12:30:59 PM1/31/13
to python-il

i don't know enough django, but i worked with nltk.
NLTK is a very heavy module, lagging on import is expected, especially if you're using certain modules.

AFAIK you should `import' it only once, on server (re)start, and it costs about 10-30 secs (did you optimize with *pyc or *pyo?). unless you're short on RAM... but i hope that's not the case.

NLTK has also many sub-modules, which can and should be disabled, for performance.

Does it hang elsewhere (apart from server startup)?
does it have a longer delay than 20-30 secs.?
_______________________________________________
Python-il mailing list
Pyth...@hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

Emanuel Ilyayev

unread,
Jan 31, 2013, 6:48:16 PM1/31/13
to asaf greenberg, python-il
I don't know enough NLTK but I work with django :)

From Asaf's description it looks like you have to change your architecture. Apache - in it's default configuration - is not efficient in working with heavy processes because it creates a new process for each request. There are better setups like using gUnicorn or uWSGI that load n workers and distribute the work between them (usually n = number of cores X 2 + 1).

More robust and scalable setup would include a separate workers that answer to the NLTK requests asynchronously and django approaches these workers via a message queue. This setup will allow you to put your NLTK workers even on a separate machine without creating situation where your web server is competing with your NLTK workers on limited resources (CPU and RAM).

Even if you will eventually find the way to configure apache to load NLTK without crashing - the URL that handles NLTK requests would be a perfect point to attack you server and to bring it into a DOS (denial of service) situation using only a couple of strong machines approaching this URL....

I urge you to read a little bit about gEvent and Celery to understand what I'm talking about.

HTH

--
Emanuel


Avishalom Shalit

unread,
Feb 1, 2013, 5:49:00 AM2/1/13
to Emanuel Ilyayev, asaf greenberg, python-il
thanks.
actually this is an internal app, only available on our VPN,
so security is not an issue ,
and i only expect a maximum of 4 users

i will look at the other setups.
thanks

-- vish

Shai Berger

unread,
Feb 1, 2013, 4:45:56 PM2/1/13
to pyth...@hamakor.org.il, asaf greenberg, Emanuel Ilyayev
Hi,

On Friday 01 February 2013, Emanuel Ilyayev wrote:
>
> From Asaf's description it looks like you have to change your
> architecture.
>

... before you go to all that trouble, make sure you are using mod_wsgi in
daemon mode and not embedded mode. In daemon mode, the Python server (Django,
NTLK, whatever) is run in a separate process, and Apache talks to it through
sockets. It may not give you all the benefits of Emanuel's suggestions, but it
will give you a significant part of them at a very small fraction of the effort.

Shai.

grit...@gmail.com

unread,
Oct 14, 2013, 2:26:58 PM10/14/13
to pyth...@googlegroups.com, avis...@gmail.com, asaf greenberg, python-il, Emanuel Ilyayev
Hi Vish - Did you manage to get this working somehow? I am also having problems using the NLTK in a django app. I'd appreciate any tips you have on setting things up correctly.

Thanks,

Graham

Avishalom Shalit

unread,
Oct 14, 2013, 2:59:34 PM10/14/13
to grit...@gmail.com, pyth...@googlegroups.com, asaf greenberg, python-il, Emanuel Ilyayev
well, it was solved by someone else, and it was a while ago,
for our needs we run it not under apache, but under > python manage.py runserver 0:8000
and it works fine.

i think at some point we were able to get it running under apache after updating our linux distribution and apache,
but we fell back to the django dev server

-- vish

Avishalom Shalit

unread,
Oct 14, 2013, 3:02:43 PM10/14/13
to gritchie, python-il, asaf greenberg, python-il, Emanuel Ilyayev
btw python-il is rejecting my post, so if anyone wants to forward it there feel free


-- vish

Omer Zak

unread,
Oct 14, 2013, 3:23:49 PM10/14/13
to pyth...@hamakor.org.il
Seems that there are two Python-IL mailing lists:
pyth...@googlegroups.com
pyth...@hamakor.org.il
I get messages sent to the @hamakor.org.il list.
--
Make love not war.
More cleavage, less carnage.
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html

grit...@gmail.com

unread,
Oct 15, 2013, 5:33:29 AM10/15/13
to pyth...@googlegroups.com, avis...@gmail.com, Emanuel Ilyayev, asaf greenberg, python-il, grit...@gmail.com
Thanks Vish. I eventually got it to work by including the following line in my apache virtualhost config:

WSGIApplicationGroup %{GLOBAL}
Reply all
Reply to author
Forward
0 new messages