Amara 2.x, which is currently under heavy development, includes a lot of
optimization, and should handle such cases much better. It's still
pre-alpha, but you could try it out parsing some of your large docs and
trying XPath queries to see whether the speed-up is sufficient for your
needs. See my message here from earlier today.
> The thing is though I don't actually need the contents of the Atom
> content element parsed at all. I just need easy access to the rest of
> the elements (Essentially all the element in the Atom namespace). So I
> tried looking at the rules options . simple_string_element_rule isnt
> going to work because the XML within the content element is not a
> simple string - its a completely different XML doc as well.
> omit_element_rule removes the content element entirely, but I need the
> contnt back in plase when i serialize the doc back into XML.
>
> So my question is there a way to bind only cetain elements of the XML
> leaving the others as strings that will be inserted back when I
> serialize the XML again. The pushdom method although promising at
> start seems more useful as a way to incrementally parse segmentss of
> an XML document. I need to to be able to access element dynamically.
>
> So is writing a custom binding class the only way to this - or is
> that also another rabbit hole ? Is there an easier way of doing this ?
> O r should I compleletely rethink my approcah?
>
The store-markup-as-simple-string rule idea is a good one, and I'll see
if we can add it into the default set for Amara 2.x. Depending on your
time constraints, following along with that development, and even
helping if possible, might be your best bet. If you do choose to do so,
you're already in the right spot (this list is just for Amara 2.x). But
if you have to stick to Amara 1.x, please continue discussion on the
4Suite mailing list
http://lists.fourthought.com/mailman/listinfo/4suite
Don't worry, I try to keep up there as well :-)
Thanks.
--
Uche Ogbuji http://uche.ogbuji.net
Founding Partner, Zepheira http://zepheira.com
Linked-in profile: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/
I checked out Amara2 from trunk. The parsing is faster, however the
bindery component
isn't ready yet ? I see that we have xpath support, but the lack of
the bindery is probably going to be a deal breaker for my use case.
> > The thing is though I don't actually need the contents of the Atom
> > content element parsed at all. I just need easy access to the rest of
> > the elements (Essentially all the element in the Atom namespace). So I
> > tried looking at the rules options . simple_string_element_rule isnt
> > going to work because the XML within the content element is not a
> > simple string - its a completely different XML doc as well.
> > omit_element_rule removes the content element entirely, but I need the
> > contnt back in plase when i serialize the doc back into XML.
> >
> > So my question is there a way to bind only cetain elements of the XML
> > leaving the others as strings that will be inserted back when I
> > serialize the XML again. The pushdom method although promising at
> > start seems more useful as a way to incrementally parse segmentss of
> > an XML document. I need to to be able to access element dynamically.
> >
> > So is writing a custom binding class the only way to this - or is
> > that also another rabbit hole ? Is there an easier way of doing this ?
> > O r should I compleletely rethink my approcah?
> >
>
> The store-markup-as-simple-string rule idea is a good one, and I'll see
> if we can add it into the default set for Amara 2.x. Depending on your
> time constraints, following along with that development, and even
> helping if possible, might be your best bet.
I am more then willing to help!
> If you do choose to do so,
> you're already in the right spot (this list is just for Amara 2.x).
Time is a constraint. But I would really like to get something working
in Amara1 and then help bringing it over to Amara2. The challange I am
having now is making sense of it all. So if I had to implement this
in Amara1 I should be able to pull this off with a rule and not
neccesarily a custom binding class? Can you give me some pointers as
to what I should use an example to implement this ?
If I wanted to do this in Amara2 - can you give me pointers as to
where I should be looking at ?
> But if you have to stick to Amara 1.x, please continue discussion on the
> 4Suite mailing list http://lists.fourthought.com/mailman/listinfo/4suite
Ok - Ill move this over based on your reply.
Thank you for your time.
Mohan
Mohanaraj Gopala Krishnan wrote:
> On Sun, Apr 20, 2008 at 4:01 AM, Uche Ogbuji <uc...@ogbuji.net> wrote:
>
>> Mohanaraj wrote:
>>
>> Amara 2.x, which is currently under heavy development, includes a lot of
>> optimization, and should handle such cases much better. It's still
>> pre-alpha, but you could try it out parsing some of your large docs and
>> trying XPath queries to see whether the speed-up is sufficient for your
>> needs. See my message here from earlier today.
>>
>>
>
> I checked out Amara2 from trunk. The parsing is faster, however the
> bindery component
> isn't ready yet ? I see that we have xpath support, but the lack of
> the bindery is probably going to be a deal breaker for my use case.
>
Here is the approximate order I expect for the work:
* Port core parsing from 4Suite 1.x
* Port XPath from 4Suite 1.x
* Port XSLT from 4Suite 1.x
* Port XUpdate from 4Suite 1.x
* Port Bindery from Amara 1.x
* Port Schema components (RELAX NG, Schematron) from 4Suite 1.x and
Amara 1.x
* Port remaining bits from 4Suite 1.x and Amara 1.x
We've got as far as the XSLT step, but we also have the XUpdate bit
primed, and with some help with testing, those should go pretty
quickly. I'm about to call for more help porting tests.
Anyway I expect we'll be porting Bindery in a week or two, so I hope
that's not too long.
>> The store-markup-as-simple-string rule idea is a good one, and I'll see
>> if we can add it into the default set for Amara 2.x. Depending on your
>> time constraints, following along with that development, and even
>> helping if possible, might be your best bet.
>>
>
> I am more then willing to help!
>
Great! I'm trying to work up some tasks for folks to help with. The
first set of hands really helped in porting the test suites, and Luis is
helping do some of the mechanical work for the XSLT port. Please join
the amara-dev group, because that's where I'm coordinating volunteer
efforts.
http://groups.google.com/group/amara-dev
>> If you do choose to do so,
>> you're already in the right spot (this list is just for Amara 2.x).
>>
>
> Time is a constraint. But I would really like to get something working
> in Amara1 and then help bringing it over to Amara2. The challange I am
> having now is making sense of it all. So if I had to implement this
> in Amara1 I should be able to pull this off with a rule and not
> neccesarily a custom binding class? Can you give me some pointers as
> to what I should use an example to implement this ?
>
> If I wanted to do this in Amara2 - can you give me pointers as to
> where I should be looking at ?
>
It will probably look like this (again I expect to start on that in a
week or two, so now is a great time for feedback):
import amara
rules = [amara.rules.unparsed_text_rule(u'atom:entry/atom:content')]
doc = amara.parse(XML, rules)
Then doc.feed.entry.content would be a special object that's little more
than a string and a flag that upon reserialization it should be emitted
verbatim and not escaped as would be regular Unicode representing
character data.
Should be just that simple :-)