Hi Carlos,
i'm glad to see interest on gae front and backends. one comment on version
control usage, mirror.gae is effectively a branch/fork of contentmirror. it
would be nicer if you could make it as one, ie copy the trunk as mirror.gae,
and check in your changes. it makes it easier to merge from the trunk as
needed, and to make changes easier to diff, importing it effectively removes
all history from the code base.
as we've already discussed in person, i think this is the wrong approach to
gae integration. sdk desktop integration isn't particularly useful in and of
itself, its not a scalable content store, and never was meant to be. afaics
the primary sdk bulk mechanism of transfering content directly to google is
csv upload. pushing data directly from plone to google directly from a
synchronous content mirror, sounds like a recipe for disaster, its a slow
network transfer of object serialization that blocks the request.
i think a better approach to contentmirror appengine intergration and a more
generically useful bit would be to define an rdbms sync to a gae datastore.
it would need a schema mapping, networking preferrably twisted or threaded
base ( concurrent datatransfers) and just sync the bits as directed,
defining index or sync state columns as needed in the mapping. or for the
impatient ;-) dumping the rdbms to cvs and using the sdk included bulk load
tools in a cron job.
all that said, if your interested in this approach, feel free to keep
developing it, i could be wrong. i think this approach is definitely more
feasible for transactional integration with gae when there is a
contentmirror async operation processing mode. even so i'd still clean up
the code to be a extension instead of a branch of the codebase. i'm
definitely curious in fixing what would help that. afaics you should just be
able to make an operation factory subclass, and then use all the same event
subscribers, and event coalescence, with an extension provided serialization
/ transform.
cheers,
kapil
On Thu, Apr 2, 2009 at 2:06 PM, Carlos de la Guardia <
On Thu, Apr 2, 2009 at 6:49 PM, Kapil Thangavelu <kap...@gmail.com> wrote: > Hi Carlos, > i'm glad to see interest on gae front and backends. one comment on version > control usage, mirror.gae is effectively a branch/fork of contentmirror. it > would be nicer if you could make it as one, ie copy the trunk as mirror.gae, > and check in your changes. it makes it easier to merge from the trunk as > needed, and to make changes easier to diff, importing it effectively removes > all history from the code base.
I understand, sorry about the breach of etiquette. One reason I just imported now was to have something up sooner because the guys who saw it working wanted to try it out. I'll do what you ask when I get a chance next week.
> as we've already discussed in person, i think this is the wrong approach to > gae integration. sdk desktop integration isn't particularly useful in and of > itself, its not a scalable content store, and never was meant to be. afaics > the primary sdk bulk mechanism of transfering content directly to google is > csv upload. pushing data directly from plone to google directly from a > synchronous content mirror, sounds like a recipe for disaster, its a slow > network transfer of object serialization that blocks the request.
Don't think I take this too seriously. Part of my motivation is I needed to get into GAE development for business reasons and this project seemed a nice way to do it. I also took the chance to learn about creating zcml directives and some other ZCA stuff (thanks to your code!). I'm sure if I ever want to do it in a production setting, I'll have to rethink the whole strategy. I also think GAE is a nice poster boy for Plone and content mirror, even if it's not for real use cases. Ilia was the only front end builder that had no previous Plone experience and yet he found the project appealing.
> i think a better approach to contentmirror appengine intergration and a > more generically useful bit would be to define an rdbms sync to a gae > datastore. it would need a schema mapping, networking preferrably twisted > or threaded base ( concurrent datatransfers) and just sync the bits as > directed, defining index or sync state columns as needed in the mapping. or > for the impatient ;-) dumping the rdbms to cvs and using the sdk included > bulk load tools in a cron job.
> all that said, if your interested in this approach, feel free to keep > developing it, i could be wrong. i think this approach is definitely more > feasible for transactional integration with gae when there is a > contentmirror async operation processing mode. even so i'd still clean up > the code to be a extension instead of a branch of the codebase. i'm > definitely curious in fixing what would help that. afaics you should just be > able to make an operation factory subclass, and then use all the same event > subscribers, and event coalescence, with an extension provided serialization > / transform.
I also wanted to create an extension, but it was a bit easier to take out the db code and make this work. I really want to be able to plug other backends in a more generic way, but there was only so much time available at the sprint ;) When I make the branch properly it will be more clear what I needed to change and that will help us see where we can make stuff a bit more generic.
Thanks for all the work you've done with content mirror. I really like the concepts behind it.
On Thu, Apr 2, 2009 at 7:49 PM, Kapil Thangavelu <kap...@gmail.com> wrote:
> Hi Carlos,
> i'm glad to see interest on gae front and backends. one comment on version
> control usage, mirror.gae is effectively a branch/fork of contentmirror. it
> would be nicer if you could make it as one, ie copy the trunk as mirror.gae,
> and check in your changes. it makes it easier to merge from the trunk as
> needed, and to make changes easier to diff, importing it effectively removes
> all history from the code base.
> as we've already discussed in person, i think this is the wrong approach to
> gae integration. sdk desktop integration isn't particularly useful in and of
> itself, its not a scalable content store, and never was meant to be. afaics
> the primary sdk bulk mechanism of transfering content directly to google is
> csv upload. pushing data directly from plone to google directly from a
> synchronous content mirror, sounds like a recipe for disaster, its a slow
> network transfer of object serialization that blocks the request.
> i think a better approach to contentmirror appengine intergration and a
> more generically useful bit would be to define an rdbms sync to a gae
> datastore. it would need a schema mapping, networking preferrably twisted
> or threaded base ( concurrent datatransfers) and just sync the bits as
> directed, defining index or sync state columns as needed in the mapping. or
> for the impatient ;-) dumping the rdbms to cvs and using the sdk included
> bulk load tools in a cron job.
> all that said, if your interested in this approach, feel free to keep
> developing it, i could be wrong. i think this approach is definitely more
> feasible for transactional integration with gae when there is a
> contentmirror async operation processing mode. even so i'd still clean up
> the code to be a extension instead of a branch of the codebase. i'm
> definitely curious in fixing what would help that. afaics you should just be
> able to make an operation factory subclass, and then use all the same event
> subscribers, and event coalescence, with an extension provided serialization
> / transform.
> cheers,
> kapil
> On Thu, Apr 2, 2009 at 2:06 PM, Carlos de la Guardia <
> carlos.delaguar...@gmail.com> wrote:
>> Hello,
>> Just a heads up. I checked in my GAE content mirror backend under the name
>> mirror.gae. Installation instructions are on install.txt.
thanks carlos. that's great.. incidentally i started looking into the bulk
uploader in appengine sdk, and its got a quite a lot of work in it and
survives interupts, so for apps looking to transfer content data from a
contentmirror db, i'd suggest using the included sdk tools at least for an
initial bulk upload via csv dump of database tables.
cheers,
kapil
On Tue, Apr 7, 2009 at 12:50 AM, Carlos de la Guardia <
> I did as you asked and deleted my import to replace it with a branch. You
> can now easily see what I changed using diff.
> Thanks,
> Carlos de la Guardia
> On Thu, Apr 2, 2009 at 7:49 PM, Kapil Thangavelu <kap...@gmail.com> wrote:
>> Hi Carlos,
>> i'm glad to see interest on gae front and backends. one comment on version
>> control usage, mirror.gae is effectively a branch/fork of contentmirror. it
>> would be nicer if you could make it as one, ie copy the trunk as mirror.gae,
>> and check in your changes. it makes it easier to merge from the trunk as
>> needed, and to make changes easier to diff, importing it effectively removes
>> all history from the code base.
>> as we've already discussed in person, i think this is the wrong approach
>> to gae integration. sdk desktop integration isn't particularly useful in and
>> of itself, its not a scalable content store, and never was meant to
>> be. afaics the primary sdk bulk mechanism of transfering content directly
>> to google is csv upload. pushing data directly from plone to google
>> directly from a synchronous content mirror, sounds like a recipe for
>> disaster, its a slow network transfer of object serialization that blocks
>> the request.
>> i think a better approach to contentmirror appengine intergration and a
>> more generically useful bit would be to define an rdbms sync to a gae
>> datastore. it would need a schema mapping, networking preferrably twisted
>> or threaded base ( concurrent datatransfers) and just sync the bits as
>> directed, defining index or sync state columns as needed in the mapping. or
>> for the impatient ;-) dumping the rdbms to cvs and using the sdk included
>> bulk load tools in a cron job.
>> all that said, if your interested in this approach, feel free to keep
>> developing it, i could be wrong. i think this approach is definitely more
>> feasible for transactional integration with gae when there is a
>> contentmirror async operation processing mode. even so i'd still clean up
>> the code to be a extension instead of a branch of the codebase. i'm
>> definitely curious in fixing what would help that. afaics you should just be
>> able to make an operation factory subclass, and then use all the same event
>> subscribers, and event coalescence, with an extension provided serialization
>> / transform.
>> cheers,
>> kapil
>> On Thu, Apr 2, 2009 at 2:06 PM, Carlos de la Guardia <
>> carlos.delaguar...@gmail.com> wrote:
>>> Hello,
>>> Just a heads up. I checked in my GAE content mirror backend under the
>>> name mirror.gae. Installation instructions are on install.txt.