I am trying to run a pattern based app in GAE and I am struggling with
the quotas of 10MB max per file.
I have zipped the whole pattern but this is 12MB.
I tried to zipsplit in 2 zips of max 10MB and then run this program :
from google.appengine.ext import webapp
from google.appengine.ext.webapp import util
from pattern.vector import Document
from pattern.search import Pattern, Constraint, Classifier, taxonomy,
search
from pattern.en import Sentence, parse
etc...
then I get the following error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 3858, in _HandleRequest
self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 3792, in _Dispatch
base_env_dict=env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 580, in Dispatch
base_env_dict=base_env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 2918, in Dispatch
self._module_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 2822, in ExecuteCGI
reset_modules = exec_script(handler_path, cgi_path, hook)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/
GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/
google/appengine/tools/dev_appserver.py", line 2702, in
ExecuteOrImportScript
exec module_code in script_module.__dict__
File "/Users/romain/Documents/sites/testrvestr/main.py", line 28, in
<module>
from pattern.en import Sentence, parse
ImportError: No module named en
It look the loading of the libs is a bit broken.
Any idea or better approach to use Pattern on GAE ?
Thanks in advance ( and apologies because I am new to Python).
On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com> wrote: > Hi all,
> I am trying to run a pattern based app in GAE and I am struggling with > the quotas of 10MB max per file. > I have zipped the whole pattern but this is 12MB. > I tried to zipsplit in 2 zips of max 10MB and then run this program :
> On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com> wrote:
> > Hi all,
> > I am trying to run a pattern based app in GAE and I am struggling with
> > the quotas of 10MB max per file.
> > I have zipped the whole pattern but this is 12MB.
> > I tried to zipsplit in 2 zips of max 10MB and then run this program :
I have no experience with Google App Engine yet, but if the issue is about file size: the biggest files are in pattern/en/wordnet/. Leaving out the wordnet folder should keep file size below 10MB. You then have a Pattern module without WordNet, but all other functionality should work as documented.
> Maybe I was not clear (or I don't understand your question), so will > clarify:
> zipping the pattern lib is > than 10MB.
> So I am splitting it into 2 zips, each less that 10MB, but then the > loading of the lib gets broken.
> On Apr 11, 1:56 pm, Ross M Karchner <rosskarch...@gmail.com> wrote: >> Do you have to zip it?
>> On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com> >> wrote: >>> Hi all,
>>> I am trying to run a pattern based app in GAE and I am struggling >>> with >>> the quotas of 10MB max per file. >>> I have zipped the whole pattern but this is 12MB. >>> I tried to zipsplit in 2 zips of max 10MB and then run this >>> program :
> I have no experience with Google App Engine yet, but if the issue is
> about file size: the biggest files are in pattern/en/wordnet/. Leaving
> out the wordnet folder should keep file size below 10MB. You then have
> a Pattern module without WordNet, but all other functionality should
> work as documented.
> On 11 Apr 2011, at 15:16, Romain wrote:
> > Maybe I was not clear (or I don't understand your question), so will
> > clarify:
> > zipping the pattern lib is > than 10MB.
> > So I am splitting it into 2 zips, each less that 10MB, but then the
> > loading of the lib gets broken.
> > On Apr 11, 1:56 pm, Ross M Karchner <rosskarch...@gmail.com> wrote:
> >> Do you have to zip it?
> >> On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com>
> >> wrote:
> >>> Hi all,
> >>> I am trying to run a pattern based app in GAE and I am struggling
> >>> with
> >>> the quotas of 10MB max per file.
> >>> I have zipped the whole pattern but this is 12MB.
> >>> I tried to zipsplit in 2 zips of max 10MB and then run this
> >>> program :
Rather than loading the files into memory, pywordnet (by Oliver Steele) will use a binary search on the index files, and then directly retrieve the offset it needs from the corresponding data file. This happens in the _lineAt() function in wordnet/pywordnet/wordnet.py. The solution would be to split data.noun into two files of 7MB. If it reads from the first file and EOF is encountered, it should instead read from the second file, something along the lines of:
f1.seek(offset) line = f1.readline() if len(line) == 0: f2.seek(offset - os.stat(p1)[stat.ST_SIZE]) line = f2.readline()
print line
So is this a hack or a new feature? I can implement it in the next revision, but then I'll need some more time to do it carefully so there is no performance drop.
> yep you are correct, the pb is the big data.noun file. > Of course an option is to not use wordnet at all but ... wordnet is > what I need :-o
> Any thought on if this file data.noun could be split in 2 and then > loaded in 2 steps at run time ? ..
> I don't know the wordnet implementation and how it gets loaded.
> On Apr 11, 2:32 pm, Tom De Smedt <tomdesm...@gmail.com> wrote: >> I have no experience with Google App Engine yet, but if the issue is >> about file size: the biggest files are in pattern/en/wordnet/. >> Leaving >> out the wordnet folder should keep file size below 10MB. You then >> have >> a Pattern module without WordNet, but all other functionality should >> work as documented.
>> On 11 Apr 2011, at 15:16, Romain wrote:
>>> Maybe I was not clear (or I don't understand your question), so will >>> clarify:
>>> zipping the pattern lib is > than 10MB.
>>> So I am splitting it into 2 zips, each less that 10MB, but then the >>> loading of the lib gets broken.
>>> On Apr 11, 1:56 pm, Ross M Karchner <rosskarch...@gmail.com> wrote: >>>> Do you have to zip it?
>>>> On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com> >>>> wrote: >>>>> Hi all,
>>>>> I am trying to run a pattern based app in GAE and I am struggling >>>>> with >>>>> the quotas of 10MB max per file. >>>>> I have zipped the whole pattern but this is 12MB. >>>>> I tried to zipsplit in 2 zips of max 10MB and then run this >>>>> program :
I was trying to feel out why you need to zip the library at all-- zipped libraries are a nice option if you're running into GAE's file-count limit, but it's generally not a requirement.
I also *think* (not 100% sure) zipped libraries can't take advantage of GAE's python pre-compilation. At least, there's a ticket asking for that:
On Mon, Apr 11, 2011 at 9:16 AM, Romain <romain.e...@gmail.com> wrote: > Maybe I was not clear (or I don't understand your question), so will > clarify:
> zipping the pattern lib is > than 10MB.
> So I am splitting it into 2 zips, each less that 10MB, but then the > loading of the lib gets broken.
> On Apr 11, 1:56 pm, Ross M Karchner <rosskarch...@gmail.com> wrote: >> Do you have to zip it?
>> On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com> wrote: >> > Hi all,
>> > I am trying to run a pattern based app in GAE and I am struggling with >> > the quotas of 10MB max per file. >> > I have zipped the whole pattern but this is 12MB. >> > I tried to zipsplit in 2 zips of max 10MB and then run this program :
I've made some changes to pywordnet to support partitioned data files. Pattern now uses a data.noun1 + data.noun2 instead of a single data.noun. Both files are below 10MB so this should enable you to upload Pattern to GAE. I've also upgraded to WordNet 3.0. You can grab the latest source code from http://code.google.com/p/pattern-for- python or wait for the official new release (this should be available in the coming days).
> I was trying to feel out why you need to zip the library at all-- > zipped libraries are a nice option if you're running into GAE's > file-count limit, but it's generally not a requirement.
> I also *think* (not 100% sure) zipped libraries can't take advantage > of GAE's python pre-compilation. At least, there's a ticket asking for > that:
> On Mon, Apr 11, 2011 at 9:16 AM, Romain <romain.e...@gmail.com> wrote: >> Maybe I was not clear (or I don't understand your question), so will >> clarify:
>> zipping the pattern lib is > than 10MB.
>> So I am splitting it into 2 zips, each less that 10MB, but then the >> loading of the lib gets broken.
>> On Apr 11, 1:56 pm, Ross M Karchner <rosskarch...@gmail.com> wrote: >>> Do you have to zip it?
>>> On Sun, Apr 10, 2011 at 7:15 PM, Romain <romain.e...@gmail.com> >>> wrote: >>>> Hi all,
>>>> I am trying to run a pattern based app in GAE and I am struggling >>>> with >>>> the quotas of 10MB max per file. >>>> I have zipped the whole pattern but this is 12MB. >>>> I tried to zipsplit in 2 zips of max 10MB and then run this >>>> program :