Fwd: Porting Moin 2.0 to Google App Engine

183 views
Skip to first unread message

Guido van Rossum

unread,
Jul 7, 2012, 9:37:11 AM7/7/12
to who...@googlegroups.com
FYI. There are some whoosh patches in here; I'd love to hear what I
would have to do to get these accepted.

--Guido

---------- Forwarded message ----------
From: Guido van Rossum <gu...@python.org>
Date: Sat, Jul 7, 2012 at 3:34 PM
Subject: Porting Moin 2.0 to Google App Engine
To: moin...@lists.sourceforge.net


At today's EuroPython sprints I decided to try porting Moin 2.0 to
Google App engine. After a lot of swearing and repeatedly nearly
giving up and being calmed each time by Thomas Waldmann, I have a
minimal set of patches to moin2 and whoosh that make this somewhat
working. Now, this doesn't mean the port is finished, and I don't
think these should be committed (not even to a branch). Many things
don't work yet, and I see several (apparently non-fatal) tracebacks
logged for each request. But it's a start for anyone who might want to
help.

Instructions:

- Start with a checkout of moin2 and one of whoosh.
- Apply the two attached files as patches.
- In moin2, create a directory named 'support' containing all the
dependencies. (I created this by running moin2's setup.py script and
capturing the contents of site-packages.)
- Symlink the whoosh package directory (whoosh/src/whoosh if you have
a checkout of whoosh) into the support directory as well (get rid of
the version downloaded by setup.py).
- You need to create some extra __init__.py files in some directories
in support/...: flaskext/, xstatic/, xstatic/pkg/.

Now you can run it using dev_appserver.py moin2; after editing the
first line of app.yaml you can also upload it to App Engine
production.

You can play around with my version at http://moinmoin-hr.appspot.com/. Enjoy!

--
--Guido van Rossum (python.org/~guido)


--
--Guido van Rossum (python.org/~guido)
WHOOSH.diff
MOIN2.diff

Matt Chaput

unread,
Jul 7, 2012, 3:23:38 PM7/7/12
to who...@googlegroups.com
> FYI. There are some whoosh patches in here; I'd love to hear what I
> would have to do to get these accepted.
>
> --Guido

I'll apply the patches in compound.py (guard if mmap is unavailable) and gae (add file_modified method).

The patches in whoosh.index that make DatastoreStorage the default are less generally useful ;) I assume you modified whoosh.index.create_in() and whoosh.index.open_dir() because Moin is using them to create and open the index?

If so, I think Moin should be modified to not use these convenience functions, which are hardwired to use FileStorage. Instead Moin should create/accept a storage object (through some configuration or API) and then use

ix = storage.create_index(schema)

and

ix = storage.open_index()

to create and open the index.

Matt

PS: Oops, the code in whoosh.index.create_in() is goofy; I think I got distracted in the middle of changing how it worked! Another thing to patch ;)

Guido van Rossum

unread,
Jul 7, 2012, 5:55:05 PM7/7/12
to who...@googlegroups.com
On Sat, Jul 7, 2012 at 9:23 PM, Matt Chaput <ma...@whoosh.ca> wrote:
>> FYI. There are some whoosh patches in here; I'd love to hear what I
>> would have to do to get these accepted.
>>
>> --Guido
>
> I'll apply the patches in compound.py (guard if mmap is unavailable) and gae (add file_modified method).

Great! Thanks for getting back to me so quickly.

Some remarks about these patches:

- The mmap problem is probably an indication of some confusion about
the abstraction that is implemented in filedb/compound.py. It seems
you actually use the mmap module only in one place; perhaps you could
move the import there so it fails more clearly when it is needed? (If
you get to that place when mmap is unavailable you'd get a rather
confusing AttributeError: 'NoneType' object has no attribute ...)

- I didn't immediately see any negative effects in Moin from always
returning 0 from file_modified(), but perhaps we can do better. It
shouldn't be hard to extend the DatastoreFile model class with a
'modified' property giving a int or float that is set to time.time()
whenever the entity is written, so we can return the actual mtime.
(I'd propose to use DateTimeProperty(auto_now=True), which
automatically takes care of this, but it returns a datetime value
which is somewhat problematic to convert to to seconds as you expect.)

- Let me know if you want me to send you improved patches for either
of these. (Do you use codereview.appspot.com or some other code review
tool?)

> The patches in whoosh.index that make DatastoreStorage the default are less generally useful ;) I assume you modified whoosh.index.create_in() and whoosh.index.open_dir() because Moin is using them to create and open the index?

Heh, this was definitely a hack to get it working while I was fighting
other fires... TBH I had initially expected there to be some
configuration option that determines what storage backend to use. Are
there others besides the regular file storage and gae.py? I'm not sure
that whoosh has configuration of that kind, or whether the caller is
supposed to handle this by simply not calling those convenience
functions, or whether perhaps it would make sense to have an option
passed into all three functions (and their callers...?) to determine
what kind of storage to create or expect.

> If so, I think Moin should be modified to not use these convenience functions, which are hardwired to use FileStorage. Instead Moin should create/accept a storage object (through some configuration or API) and then use
>
> ix = storage.create_index(schema)
>
> and
>
> ix = storage.open_index()
>
> to create and open the index.

If that's your final answer I'll look into implementing this into Moin.

> Matt
>
> PS: Oops, the code in whoosh.index.create_in() is goofy; I think I got distracted in the middle of changing how it worked! Another thing to patch ;)

I'm not sure I understand the goofiness. But I'm not sure that matters. :)

Matt Chaput

unread,
Jul 7, 2012, 6:29:41 PM7/7/12
to who...@googlegroups.com
> - The mmap problem is probably an indication of some confusion about
> the abstraction that is implemented in filedb/compound.py. It seems
> you actually use the mmap module only in one place; perhaps you could
> move the import there so it fails more clearly when it is needed? (If
> you get to that place when mmap is unavailable you'd get a rather
> confusing AttributeError: 'NoneType' object has no attribute ...)

The CompoundStorage class reads a file containing concatenated sub-files and presents them as a Storage object. It uses mmap and BytesIO-wrapped memoryviews to read sub-files if available, and if not falls back to pure-Python "SubFile" objects that translate calls to seek, tell, read, etc. onto the underlying file. As part of the change I'll just make it fall back automatically when mmap == None.

> - I didn't immediately see any negative effects in Moin from always
> returning 0 from file_modified(), but perhaps we can do better. It
> shouldn't be hard to extend the DatastoreFile model class with a
> 'modified' property giving a int or float that is set to time.time()
> whenever the entity is written, so we can return the actual mtime.
> (I'd propose to use DateTimeProperty(auto_now=True), which
> automatically takes care of this, but it returns a datetime value
> which is somewhat problematic to convert to to seconds as you expect.)

I'll use the current return 0 behaviour for now and look into doing it right. The funny thing is I can't think offhand where the code would care about a file's modification time... I'll have to do some searching.

> - Let me know if you want me to send you improved patches for either
> of these. (Do you use codereview.appspot.com or some other code review
> tool?)

I've never even heard of it before :) TBH I haven't received enough contributed code to look into it.

> Heh, this was definitely a hack to get it working while I was fighting
> other fires... TBH I had initially expected there to be some
> configuration option that determines what storage backend to use. Are
> there others besides the regular file storage and gae.py? I'm not sure
> that whoosh has configuration of that kind, or whether the caller is
> supposed to handle this by simply not calling those convenience
> functions, or whether perhaps it would make sense to have an option
> passed into all three functions (and their callers...?) to determine
> what kind of storage to create or expect.

Whoosh is just a library, without configuration files or anything like that, just API. I'm not sure whether you're going to fork Moin or add GAE-enabling configuration options to it, but at some point Moin is going to have to create a Whoosh storage object and tell Whoosh to use it.

Theoretically there could be a convenience option like open_index(storage="gae") but I distrust such things, I'd rather the user do it with code if you see my meaning, otherwise you end up stuff configuration options into that string and it becomes a mess.

> I'm not sure I understand the goofiness. But I'm not sure that matters. :)

It doesn't :) It just has an unused import and variable, because I'm easily distra--SQUIRREL!

Matt

Guido van Rossum

unread,
Jul 9, 2012, 8:20:51 AM7/9/12
to who...@googlegroups.com
On Sun, Jul 8, 2012 at 12:29 AM, Matt Chaput <ma...@whoosh.ca> wrote:
>> - The mmap problem is probably an indication of some confusion about
>> the abstraction that is implemented in filedb/compound.py. It seems
>> you actually use the mmap module only in one place; perhaps you could
>> move the import there so it fails more clearly when it is needed? (If
>> you get to that place when mmap is unavailable you'd get a rather
>> confusing AttributeError: 'NoneType' object has no attribute ...)
>
> The CompoundStorage class reads a file containing concatenated sub-files and presents them as a Storage object. It uses mmap and BytesIO-wrapped memoryviews to read sub-files if available, and if not falls back to pure-Python "SubFile" objects that translate calls to seek, tell, read, etc. onto the underlying file. As part of the change I'll just make it fall back automatically when mmap == None.
>
>> - I didn't immediately see any negative effects in Moin from always
>> returning 0 from file_modified(), but perhaps we can do better. It
>> shouldn't be hard to extend the DatastoreFile model class with a
>> 'modified' property giving a int or float that is set to time.time()
>> whenever the entity is written, so we can return the actual mtime.
>> (I'd propose to use DateTimeProperty(auto_now=True), which
>> automatically takes care of this, but it returns a datetime value
>> which is somewhat problematic to convert to to seconds as you expect.)
>
> I'll use the current return 0 behaviour for now and look into doing it right. The funny thing is I can't think offhand where the code would care about a file's modification time... I'll have to do some searching.

Actually you should probably return -1, as you do in the other driver
that doesn't support mtime (filestore.py). But attached is a version
that stores the mtime as a separate property.

>> - Let me know if you want me to send you improved patches for either
>> of these. (Do you use codereview.appspot.com or some other code review
>> tool?)
>
> I've never even heard of it before :) TBH I haven't received enough contributed code to look into it.
>
>> Heh, this was definitely a hack to get it working while I was fighting
>> other fires... TBH I had initially expected there to be some
>> configuration option that determines what storage backend to use. Are
>> there others besides the regular file storage and gae.py? I'm not sure
>> that whoosh has configuration of that kind, or whether the caller is
>> supposed to handle this by simply not calling those convenience
>> functions, or whether perhaps it would make sense to have an option
>> passed into all three functions (and their callers...?) to determine
>> what kind of storage to create or expect.
>
> Whoosh is just a library, without configuration files or anything like that, just API. I'm not sure whether you're going to fork Moin or add GAE-enabling configuration options to it, but at some point Moin is going to have to create a Whoosh storage object and tell Whoosh to use it.
>
> Theoretically there could be a convenience option like open_index(storage="gae") but I distrust such things, I'd rather the user do it with code if you see my meaning, otherwise you end up stuff configuration options into that string and it becomes a mess.

Agreed, actually. I'll come up with a proper fix to moin.

>> I'm not sure I understand the goofiness. But I'm not sure that matters. :)
>
> It doesn't :) It just has an unused import and variable, because I'm easily distra--SQUIRREL!
>
> Matt
>
> --
> You received this message because you are subscribed to the Google Groups "Whoosh" group.
> To post to this group, send email to who...@googlegroups.com.
> To unsubscribe from this group, send email to whoosh+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/whoosh?hl=en.
gae.py.diff

Thomas Waldmann

unread,
Sep 7, 2012, 8:48:30 PM9/7/12
to who...@googlegroups.com, gu...@python.org
On Monday, July 9, 2012 2:20:51 PM UTC+2, Guido van Rossum wrote:
Agreed, actually. I'll come up with a proper fix to moin.

Hi Guido, hi Matt!

I must admit that it took me a while to find this thread here on the whoosh mailing list.
I read it now and then, but not that much recently as I was rather busy with moin2 and Google Summer of Code students.

I'll have a look  at the patches now. Just wanted to ask if they are still the latest stuff available?

Matt: about codereview.appspot.com: it's really nice (if one has a use case for it). We used it a lot in GSOC to review students' code and also use it now and then to discuss non-trivial code changes between developers. As easy to use as a pastebin, but much more powerful / useful.

Cheers,

Thomas

Thomas Waldmann

unread,
Sep 8, 2012, 11:27:21 AM9/8/12
to who...@googlegroups.com, gu...@python.org
For the moin2 / GAE related stuff, I opened an issue on our tracker (so we can discuss there rather than using whoosh ML for moin2 stuff):

https://bitbucket.org/thomaswaldmann/moin-2.0/issue/255/google-appengine-gae-support

Thomas Waldmann

unread,
Sep 22, 2012, 8:30:29 AM9/22/12
to who...@googlegroups.com, gu...@python.org


On Saturday, September 8, 2012 5:27:21 PM UTC+2, Thomas Waldmann wrote:
For the moin2 / GAE related stuff, I opened an issue on our tracker (so we can discuss there rather than using whoosh ML for moin2 stuff):

https://bitbucket.org/thomaswaldmann/moin-2.0/issue/255/google-appengine-gae-support

Just to update on that: issue was resolved yesterday, thanks to Guido's work we now have some basic GAE support in moin-2.0 repo branch "gae" and also some fixes were applied to whoosh (repo, default branch).

https://moin2-test.appspot.com/   < there is a test site I installed using that code.

The docs of moin2 were also updated with "moin2 on gae" installation and whoosh/moin gae storage configuration infos (note: as they are also still in "gae" branch, the online docs on rtd are not reflecting this yet).

What's still needed is much more testing and fixes for specific functionality. I'll try to update moin2's and whoosh's issue tracker with all stuff I find.

I must admit that I am very new to GAE (and as I am running own internet servers, I maybe won't personally use it very much), so help with this stuff is very much appreciated. I think it's very useful though, as many people and projects are looking for some easy and reliable hosting solution for their moin or whoosh based applications.

Reply all
Reply to author
Forward
0 new messages