Running pattern in Google App Engine

167 views
Skip to first unread message

Romain

unread,
Apr 10, 2011, 7:15:37 PM4/10/11
to Pattern
Hi all,

I am trying to run a pattern based app in GAE and I am struggling with
the quotas of 10MB max per file.
I have zipped the whole pattern but this is 12MB.
I tried to zipsplit in 2 zips of max 10MB and then run this program :


import sys
sys.path.insert(0, 'pattern.zip')
import pattern
pattern.__path__.append('pattern2.zip/pattern')

from google.appengine.ext import webapp
from google.appengine.ext.webapp import util
from pattern.vector import Document
from pattern.search import Pattern, Constraint, Classifier, taxonomy,
search
from pattern.en import Sentence, parse


etc...

then I get the following error:

Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 3858, in _HandleRequest
self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 3792, in _Dispatch
base_env_dict=env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 580, in Dispatch
base_env_dict=base_env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 2918, in Dispatch
self._module_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 2822, in ExecuteCGI
reset_modules = exec_script(handler_path, cgi_path, hook)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 2702, in
ExecuteOrImportScript
exec module_code in script_module.__dict__
File "/Users/romain/Documents/sites/testrvestr/main.py", line 28, in
<module>
from pattern.en import Sentence, parse
ImportError: No module named en

It look the loading of the libs is a bit broken.

Any idea or better approach to use Pattern on GAE ?

Thanks in advance ( and apologies because I am new to Python).

Romain.

Ross M Karchner

unread,
Apr 11, 2011, 8:56:24 AM4/11/11
to pattern-f...@googlegroups.com
Do you have to zip it?

--
Ross M Karchner

Romain

unread,
Apr 11, 2011, 9:16:53 AM4/11/11
to Pattern
Maybe I was not clear (or I don't understand your question), so will
clarify:

zipping the pattern lib is > than 10MB.

So I am splitting it into 2 zips, each less that 10MB, but then the
loading of the lib gets broken.

Tom De Smedt

unread,
Apr 11, 2011, 9:32:32 AM4/11/11
to pattern-f...@googlegroups.com
I have no experience with Google App Engine yet, but if the issue is
about file size: the biggest files are in pattern/en/wordnet/. Leaving
out the wordnet folder should keep file size below 10MB. You then have
a Pattern module without WordNet, but all other functionality should
work as documented.

Romain

unread,
Apr 11, 2011, 10:34:52 AM4/11/11
to Pattern
yep you are correct, the pb is the big data.noun file.
Of course an option is to not use wordnet at all but ... wordnet is
what I need :-o

Any thought on if this file data.noun could be split in 2 and then
loaded in 2 steps at run time ? ..

I don't know the wordnet implementation and how it gets loaded.

Tom De Smedt

unread,
Apr 11, 2011, 1:18:06 PM4/11/11
to pattern-f...@googlegroups.com
Rather than loading the files into memory, pywordnet (by Oliver
Steele) will use a binary search on the index files, and then directly
retrieve the offset it needs from the corresponding data file. This
happens in the _lineAt() function in wordnet/pywordnet/wordnet.py. The
solution would be to split data.noun into two files of 7MB. If it
reads from the first file and EOF is encountered, it should instead
read from the second file, something along the lines of:


import os, stat

p1 = "dict/data.noun1"
p2 = "dict/data.noun2"
f1 = file(p1,"rb")
f2 = file(p2,"rb")

offset = 20000000

f1.seek(offset)
line = f1.readline()
if len(line) == 0:
f2.seek(offset - os.stat(p1)[stat.ST_SIZE])
line = f2.readline()

print line


So is this a hack or a new feature? I can implement it in the next
revision, but then I'll need some more time to do it carefully so
there is no performance drop.

Ross M Karchner

unread,
Apr 11, 2011, 5:47:06 PM4/11/11
to pattern-f...@googlegroups.com
I was trying to feel out why you need to zip the library at all--
zipped libraries are a nice option if you're running into GAE's
file-count limit, but it's generally not a requirement.

I also *think* (not 100% sure) zipped libraries can't take advantage
of GAE's python pre-compilation. At least, there's a ticket asking for
that:

http://code.google.com/p/googleappengine/issues/detail?id=4634

--
Ross M Karchner

Tom De Smedt

unread,
Apr 13, 2011, 7:54:54 AM4/13/11
to pattern-f...@googlegroups.com
I've made some changes to pywordnet to support partitioned data files.
Pattern now uses a data.noun1 + data.noun2 instead of a single
data.noun. Both files are below 10MB so this should enable you to
upload Pattern to GAE. I've also upgraded to WordNet 3.0. You can grab
the latest source code from http://code.google.com/p/pattern-for-
python or wait for the official new release (this should be available
in the coming days).

David Young

unread,
Apr 7, 2014, 7:22:48 PM4/7/14
to pattern-f...@googlegroups.com
I hear Google have now increased the limit. Has anyone managed to get this to work? I am using GAE with Django and If I paste the pattern folder into my app folder I still can't import pattern.en. The error message is No module named pattern.text
Reply all
Reply to author
Forward
0 new messages