Papers Past translator submission

2 views
Skip to first unread message

staplegun

unread,
Aug 16, 2010, 7:12:34 PM8/16/10
to zotero-dev
I have just submitted a translator "Papers Past.js" for the digitised
New Zealand historic newspapers website hosted by the National Library
of New Zealand - http://paperspast.natlib.govt.nz/

Luckily the website has good embedded metadata.

Avram Lyon

unread,
Aug 20, 2010, 6:08:47 PM8/20/10
to zoter...@googlegroups.com
2010/8/17 staplegun <stapl...@gmail.com>:

This translator is a good start, and it has worked quite well in my testing.

Since you are interacting with a proper database that provides the
full text of the articles, it would be good to add page snapshots of
the image and computer-generated text pages. That way, Zotero users
will have the full text of the items they save handy at all times,
much like PDF saving from journal databases.

It would also be very good to see support for saving from search results.

Please let me know if you have any questions about my suggestions. The
current translator is worth adding to Zotero as is, but it would be
great if we could make it even more useful before distributing it.

Best wishes,

Avram

staplegun

unread,
Aug 23, 2010, 12:21:36 AM8/23/10
to zotero-dev
On Aug 21, 10:08 am, Avram Lyon <ajl...@gmail.com> wrote:
> Since you are interacting with a proper database that provides the
> full text of the articles, it would be good to add page snapshots of
> the image and computer-generated text pages. That way, Zotero users
> will have the full text of the items they save handy at all times,
> much like PDF saving from journal databases.
>
> It would also be very good to see support for saving from search results.

I like the sound of saving the image of the fulltext automatically
(the scanned text is less effective - often only the article title is
included and other snippets are missing). I will have a look at other
translators for how to do this and see if I can at least add the
article image files added as attachments. Should I be saving a
snapshot of the webpage containing the images, or adding just the
actual GIF images (what is 'normal')?

I avoided the search results page as that doesn't have embedded
metadata so will need screenscraping (and I'm unlikely to be able to
maintain that over time if the site changes). Though, I hadn't
realised that translators usually do a GET for each multi-result - is
this done before or after the user selects which ones they want to
save? (Am worried about the server hitrate.)

Avram Lyon

unread,
Aug 23, 2010, 12:48:52 AM8/23/10
to zotero-dev
2010/8/23 staplegun <stapl...@gmail.com>:

> I like the sound of saving the image of the fulltext automatically
> (the scanned text is less effective - often only the article title is
> included and other snippets are missing).  I will have a look at other
> translators for how to do this and see if I can at least add the
> article image files added as attachments.  Should I be saving a
> snapshot of the webpage containing the images, or adding just the
> actual GIF images (what is 'normal')?

The recently-submitted National Archives of Australia translator added
the images themselves. I would add the full-text image if possible,
then a snapshot of the item page.

> I avoided the search results page as that doesn't have embedded
> metadata so will need screenscraping (and I'm unlikely to be able to
> maintain that over time if the site changes).  Though, I hadn't
> realised that translators usually do a GET for each multi-result - is
> this done before or after the user selects which ones they want to
> save? (Am worried about the server hitrate.)

Search results are usually parsed by grabbing the title and URL for
each individual item, then presenting the user with the select items
dialog. Most translators then call the individual item code to get the
pages for each desired result. This means n + 1 requests for a user
who selects n items from the search results (or 2n + 1 if you are
saving full-text images as well). Since you are only getting the item
names and URLs of their individual records, it is usually quite
straightforward to produce search result translators, and rather easy
to maintain them.

Let me know if I can provide any further help.

Best wishes,

Avram

staplegun

unread,
Aug 23, 2010, 11:35:18 AM8/23/10
to zotero-dev
Great, I see now how to deal with multiple records in search results
and have that working.

However, I'm having a problem downloading page snapshots. I don't
want to start asking debugging questions here, but I'm not sure where
the manuals are for these commands?

This doesn't work:
newItem.attachments = [{title:"Article scan", mimeType:"text/
html", url:doc.location.href }];

But if I add a snapshot parameter it does work (though just adds a
link attachment):
newItem.attachments = [{title:"Article scan", mimeType:"text/
html", url:doc.location.href, snapshot:false }];

I'm using Zotero 2.0.3 in Firefox 3.5.10.

Avram Lyon

unread,
Aug 23, 2010, 7:47:29 PM8/23/10
to zotero-dev
2010/8/23 staplegun <stapl...@gmail.com>:

> This doesn't work:
>    newItem.attachments = [{title:"Article scan", mimeType:"text/
> html", url:doc.location.href }];
>
> But if I add a snapshot parameter it does work (though just adds a
> link attachment):
>    newItem.attachments = [{title:"Article scan", mimeType:"text/
> html", url:doc.location.href, snapshot:false }];

Make sure that you have set Zotero to save attachments in your
preferences. The snapshot system respects that setting and will
silently refuse to cooperate if it is not set. Otherwise, your code
looks essentially correct.

Avram

staplegun

unread,
Aug 24, 2010, 10:31:59 AM8/24/10
to zotero-dev
On Aug 24, 11:47 am, Avram Lyon <ajl...@gmail.com> wrote:
> 2010/8/23 staplegun <stapleg...@gmail.com>:
>
> Make sure that you have set Zotero to save attachments in your
> preferences. The snapshot system respects that setting and will
> silently refuse to cooperate if it is not set. Otherwise, your code
> looks essentially correct.

Aha! That seemed to cure it.

I have uploaded a revised version that:
a. Attaches a snapshot of the entire webpage for the article
b. Attaches a GIF image file for each part of the scan of the article
c. Provides a selection folder icon on the search results page to save
multiple items

I have struck a problem with Zotero refusing to a few items - I'm
assuming it is odd characters probably when converting the title to
sentence case.

Thanks Avram for your help, I think it is basically ready now.

Avram Lyon

unread,
Sep 3, 2010, 1:28:08 AM9/3/10
to zotero-dev
2010/8/24 staplegun <stapl...@gmail.com>:

> I have uploaded a revised version that:
> a. Attaches a snapshot of the entire webpage for the article
> b. Attaches a GIF image file for each part of the scan of the article
> c. Provides a selection folder icon on the search results page to save
> multiple items

I'm trying to get back to reviewing the current backlog, and I noticed
that I can't actually access the updated Papers Past translator in the
files section. Can you delete Papers Past.js and Papers Past.js (2)
from the files section and reupload the newest one?

You are also welcome to use a different service to upload--
gist.github.com has worked well for me.

Thank you for bearing with me as I try to catch up on reviewing the
wonderful work that you translator authors have been doing in the past
weeks.
Avram

staplegun

unread,
Sep 7, 2010, 10:08:25 AM9/7/10
to zotero-dev
Sorry, I missed this reply. Have deleted and re-uploaded the
translator, and also added in GPL licence.
Thanks.

On Sep 3, 5:28 pm, Avram Lyon <ajl...@gmail.com> wrote:
> 2010/8/24 staplegun <stapleg...@gmail.com>:

Avram Lyon

unread,
Sep 14, 2010, 11:13:09 AM9/14/10
to zoter...@googlegroups.com
2010/9/7 staplegun <stapl...@gmail.com>:

> Sorry, I missed this reply.  Have deleted and re-uploaded the
> translator, and also added in GPL licence.
> Thanks.

The translator has been committed to Zotero SVN
(https://www.zotero.org/trac/changeset/6778). The translator
works like a charm -- it's translators like this and sites like Papers
Past that let Zotero really shine, and that make new kinds of research
possible.

If you have contact with the Papers Past administrators, it would be
nice to add place information to the Zotero items -- I understand that
it's not available at present, but they must have that metadata in
their database, so perhaps you could ask them to expose it.

Zotero is missing a volume field for newspaper entries, which is
unfortunate, since Papers Past provides it, and many newspapers have a
volume, which can sometimes be invaluable for looking up issues in
archives. This is not an issue for you to resolve, staplegun, but I'd
like to see it fixed in Zotero 2.1 (or sometime soon?) so that
translators can start saving such info.

Thanks for contributing!

Best wishes,

Avram

staplegun

unread,
Sep 15, 2010, 10:31:37 AM9/15/10
to zotero-dev
That's good news, and thanks!

Is Place the city and country of publication? If so, in the meantime
I could look at adding it in manually - it's a fairly static list of
around 60 newspapers.


On Sep 15, 3:13 am, Avram Lyon <ajl...@gmail.com> wrote:
> 2010/9/7 staplegun <stapleg...@gmail.com>:

Avram Lyon

unread,
Sep 15, 2010, 2:35:40 PM9/15/10
to zoter...@googlegroups.com
2010/9/15 staplegun <stapl...@gmail.com>:

> Is Place the city and country of publication?  If so, in the meantime
> I could look at adding it in manually - it's a fairly static list of
> around 60 newspapers.

It is usually understood to be the place of publication -- city and
country would be fine. As with most Zotero fields, you can use the
field however you like, but I understand that some citation styles
include the city and country/region for newspaper citations.

Adding a static list is sometimes the only or the best way to go -- I
did the same in my Radio Liberty translator. If you update the
translator, just post here and I'll review and commit the changes.

Avram

Reply all
Reply to author
Forward
0 new messages