Translator problems: digesting atom feeds as XML

14 views
Skip to first unread message

Rintze Zelle

unread,
Apr 18, 2009, 9:33:22 AM4/18/09
to zotero-dev
I am trying to upgrade the Google Books translator to use the atom
feeds that the Google Book API provides (e.g.
http://code.google.com/apis/books/docs/gdata/developers_guide_protocol.html#SearchResultFeed).

I've been working on the translator, but I can't find a way to access
the data in the atom feed. The example feed I'm testing is available
here: http://books.google.com/books/feeds/volumes/WDll8hA006AC

The code within the Code tab of the Scaffold is available here (it's
all that is needed to test this, as I hard-coded a URL to the Google
Books feed mentioned above):
http://gist.github.com/97601
and the debug window gives this as output:
http://gist.github.com/97607

The code is mostly copied from the NCBI Pubmed translator, as I knew
that that one reads XML. I've been trying to modify it in steps to get
it to work with Google Books. The problematic area has a few comments
("//RZ: bladieblah"). I'd be most interested to learn what I'm doing
wrong. Maybe it has something to do with namespaces (the feed contains
elements like <dc:identifier>text</dc:identifier>)?

Rintze

Dan Stillman

unread,
Apr 18, 2009, 4:28:00 PM4/18/09
to zoter...@googlegroups.com
On 4/18/09 9:33 AM, Rintze Zelle wrote:
> I am trying to upgrade the Google Books translator to use the atom
> feeds that the Google Book API provides (e.g.
> http://code.google.com/apis/books/docs/gdata/developers_guide_protocol.html#SearchResultFeed).
>
> [...]

>
> The code is mostly copied from the NCBI Pubmed translator, as I knew
> that that one reads XML. I've been trying to modify it in steps to get
> it to work with Google Books. The problematic area has a few comments
> ("//RZ: bladieblah"). I'd be most interested to learn what I'm doing
> wrong.

You're using xml.entry.id, but <entry> is the root element.
xml.id.toString() should give you what you want.

> Maybe it has something to do with namespaces (the feed contains
> elements like<dc:identifier>text</dc:identifier>)?

No, but E4X is namespace-aware, so you'll need to handle the namespaces
to get the data out of those elements:

https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Processing_XML_with_E4X#Handling_namespaces

Rintze Zelle

unread,
Apr 19, 2009, 3:40:06 AM4/19/09
to zotero-dev
You're my hero once again, thanks. I got it going. The main trick was
to set default namespaces (e.g. default xml namespace = "http://
purl.org/dc/terms";)

Rintze

Rintze Zelle

unread,
Apr 19, 2009, 8:49:28 AM4/19/09
to zotero-dev
I've got two more problems/questions. The code (for the Code tab of
Scaffold) is available here: http://gist.github.com/98053

a) In some cases multiple elements exist in the atom feed with the
same name (e.g. for creators and identifiers):

<entry xmlns="http://www.w3.org/2005/Atom" xmlns:gbs="http://
schemas.google.com/books/2008" xmlns:dc="http://purl.org/dc/terms"
xmlns:gd="http://schemas.google.com/g/2005">
<dc:identifier>WDll8hA006AC</dc:identifier>
<dc:identifier>ISBN:0849304857</dc:identifier>
<dc:identifier>ISBN:9780849304859</dc:identifier>
</entry>

For some reason though I can't loop through the different elements.
See the debug-output (also available here: http://gist.github.com/98054).
Instead, I get an array of length 1 for "var authors = new Array
(xml.creator);", with all the different identifiers in that single
element, instead of an array of length 3 with one identifier in each
array element.

b) As said earlier, I based my translator on the NCBI Pubmed
translator. Something strange is going on which baffles me. There is a
line which I can't remove (if I do, I get the error "message =>
xml.title[0] is undefined"). It's within an if-statement that is never
true, so I don't understand what goes wrong there.

In both cases the code is commented with "//RZ: ..."

Rintze

Bruce D'Arcus

unread,
Apr 19, 2009, 8:56:59 AM4/19/09
to zoter...@googlegroups.com

That means you'll only have access to those elements.

Bruce

Bruce D'Arcus

unread,
Apr 19, 2009, 9:02:26 AM4/19/09
to zoter...@googlegroups.com
On Sun, Apr 19, 2009 at 8:49 AM, Rintze Zelle <rintze...@gmail.com> wrote:
>
> I've got two more problems/questions. The code (for the Code tab of
> Scaffold) is available here: http://gist.github.com/98053
>
> a) In some cases multiple elements exist in the atom feed with the
> same name (e.g. for creators and identifiers):
>
> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:gbs="http://
> schemas.google.com/books/2008" xmlns:dc="http://purl.org/dc/terms"
> xmlns:gd="http://schemas.google.com/g/2005">
>        <dc:identifier>WDll8hA006AC</dc:identifier>
>        <dc:identifier>ISBN:0849304857</dc:identifier>
>        <dc:identifier>ISBN:9780849304859</dc:identifier>
> </entry>
>
> For some reason though I can't loop through the different elements.
> See the debug-output (also available here: http://gist.github.com/98054).
> Instead, I get an array of length 1 for "var authors = new Array
> (xml.creator);", with all the different identifiers in that single
> element, instead of an array of length 3 with one identifier in each
> array element.

I don't use E4X, but your code here seems wrong to me. Shouldn't you
be iterating through a loop to gather the identifiers? See, for
example:

<http://www.morearty.com/blog/2007/03/13/common-e4x-pitfalls/>

Bruce

Bruce

Dan Stillman

unread,
Apr 19, 2009, 4:48:27 PM4/19/09
to zoter...@googlegroups.com
What Bruce said.

I'm not sure why you're using "new Array(xml...)" everywhere, but, well, don't. E4X already provides access to XML data via standard JS syntax, so you don't need to stuff the E4X elements into arrays before accessing it.

// Single child
var foo = <foo><bar>blah</bar></foo>;
Zotero.debug(foo.bar.toString());  // outputs "blah"

// Multiple children
var foo = <foo><bar>blah</bar><bar>blah2</bar></foo>;
for each(var bar in foo.bar) {
    Zotero.debug(bar.toString());  // outputs "blah", then "blah2"
}

You also generally don't need text(). The toString() method is enough.


b) As said earlier, I based my translator on the NCBI Pubmed
translator. Something strange is going on which baffles me. There is a
line which I can't remove (if I do, I get the error "message =>
xml.title[0] is undefined"). It's within an if-statement that is never
true, so I don't understand what goes wrong there.
  

It's a Mozilla bug with "default xml namespace". We add "with ({});" after every use of "default xml namespace" to get around it.

- Dan

Rintze Zelle

unread,
Apr 20, 2009, 4:55:01 AM4/20/09
to zotero-dev
> I'm not sure why you're using "new Array(xml...)" everywhere, but, well,
> don't. E4X already provides access to XML data via standard JS syntax,
> so you don't need to stuff the E4X elements into arrays before accessing it.

Thanks, that works.

> > b) As said earlier, I based my translator on the NCBI Pubmed
> > translator. Something strange is going on which baffles me. There is a
> > line which I can't remove (if I do, I get the error "message =>
> > xml.title[0] is undefined"). It's within an if-statement that is never
> > true, so I don't understand what goes wrong there.
>
> It's a Mozilla bug with "default xml namespace". We add "with ({});"
> after every use of "default xml namespace" to get around it.

How it this exactly supposed to work? Do I have to put all the code
that uses that namespace inside those brackets after "with"? Can you
point me to some code where this workaround is used?

Rintze

Frank Bennett

unread,
Apr 20, 2009, 5:13:34 AM4/20/09
to zoter...@googlegroups.com
You can see this idiom used at line 31 of chrome/content/zotero/xpcom/csl.js.
The "with" and empty braces are just tacked onto the default declaration.
If you parse the XML of the style after issuing the declaration, you
should be able
to access the tags in the parsed object with E4X. You probably have this or
something better already, but it's one of my favourites:

http://wso2.org/project/mashup/1.5.1/docs/e4xquickstart.html

Frank


>
> Rintze
> >
>

Rintze Zelle

unread,
Apr 20, 2009, 12:11:26 PM4/20/09
to zotero-dev
Thanks for all the help. I added some finishing touches, and I would
like to post the new Google Books translator for review (it works for
me): http://gist.github.com/98596

It should fix tickets 818,963, and 1299 in one go, and enables saving
of items from Google Books cover view (e.g. on
http://books.google.com/books?q=+subject:%22+Science+/+Chemistry+/+General+%22&as_brr=3&rview=1&source=gbs_hplp_nofict).

Rintze
Reply all
Reply to author
Forward
0 new messages