simplifying writing translators?

4 views
Skip to first unread message

Bruce D'Arcus

unread,
Feb 1, 2009, 11:58:20 AM2/1/09
to zoter...@googlegroups.com
I believe this may have come up before, but I don't recall any
particular resolution ...

I'm fairly technically-adept, and can even do some basic programming
(though not so much javascript), but every time I have in mind writing
a translator, I give up; not enough time for me to bother to learn how
to write it. I can't believe I'm alone.

Question: is there a way to make writing translators easier such that
users don't have to write code? Or where code writing could at least
be significantly limited?

I'm thinking of maybe a simple JSON-based configuration where you're
just itemizing the key components (title, url pattern, variable
mapping).

Bruce

grieth

unread,
Feb 14, 2009, 9:22:01 PM2/14/09
to zotero-dev
I would be content if there was a better on-line tutorial on how to do
it. I have done a lot of VB macro programming, but not java - where
is there a page for me to learn how to do translators?

grieth

Dan Stillman

unread,
Feb 15, 2009, 3:43:35 PM2/15/09
to zoter...@googlegroups.com
On 2/14/09 9:22 PM, grieth wrote:
> I would be content if there was a better on-line tutorial on how to do
> it. I have done a lot of VB macro programming, but not java - where
> is there a page for me to learn how to do translators?

1) Translators use JavaScript, not Java (though the two use a similar
syntax).

2) http://www.zotero.org/support/dev/creating_translators_for_sites

3) You can look at existing translators here:
https://www.zotero.org/svn/extension/trunk/translators/

Bruce D'Arcus

unread,
Feb 15, 2009, 4:01:15 PM2/15/09
to zoter...@googlegroups.com
Dan,

As I said originally, I don't think this is helpful at all. The
barrier to entry is too high. Is there really not a way to (greatly)
simplify this?

Bruce

Frank Bennett

unread,
Feb 15, 2009, 6:32:21 PM2/15/09
to zotero-dev
On Feb 16, 6:01 am, "Bruce D'Arcus" <bdar...@gmail.com> wrote:
> Dan,
>
> On Sun, Feb 15, 2009 at 3:43 PM, Dan Stillman <dstill...@zotero.org> wrote:
> > On 2/14/09 9:22 PM, grieth wrote:
> >> I would be content if there was a better on-line tutorial on how to do
> >> it.  I have done a lot of VB macro programming, but not java - where
> >> is there a page for me to learn how to do translators?
>
> > 1) Translators use JavaScript, not Java (though the two use a similar
> > syntax).
>
> > 2)http://www.zotero.org/support/dev/creating_translators_for_sites
>
> > 3) You can look at existing translators here:
> >https://www.zotero.org/svn/extension/trunk/translators/
>
> As I said originally, I don't think this is helpful at all. The
> barrier to entry is too high. Is there really not a way to (greatly)
> simplify this?

This is a very tall order. But a starting point might be a tool to do
triage on a page. A lot of time can be consumed just figuring out
where the data in a page is coming from, so that you can lay some kind
of plan for digging it out. If you could do something like 'wgroke --
tellabout="John H. Smith" --ris http://www.thatsite.com/', and get
back a report telling what URL and XPath will access the name "John H.
Smith" that you see on the screen at that address, and what URLs
_might_ yield RIS data, that would cut out a lot of the early
uncertainty that dissuades people from the first attempt.

Frank

> Bruce

Bruce D'Arcus

unread,
Feb 15, 2009, 8:30:00 PM2/15/09
to zoter...@googlegroups.com
On Sun, Feb 15, 2009 at 6:32 PM, Frank Bennett <bierc...@gmail.com> wrote:

...

>> As I said originally, I don't think this is helpful at all. The
>> barrier to entry is too high. Is there really not a way to (greatly)
>> simplify this?
>
> This is a very tall order. But a starting point might be a tool to do
> triage on a page. A lot of time can be consumed just figuring out
> where the data in a page is coming from, so that you can lay some kind
> of plan for digging it out. If you could do something like 'wgroke --
> tellabout="John H. Smith" --ris http://www.thatsite.com/', and get
> back a report telling what URL and XPath will access the name "John H.
> Smith" that you see on the screen at that address, and what URLs
> _might_ yield RIS data, that would cut out a lot of the early
> uncertainty that dissuades people from the first attempt.

That's one issue, but for me, that's actually a trivial problem. I can
go really far, really quickly, with just view source, or Firebug.

The bigger issue is the amount of redundant code that gets written for
every translator. I don't have the time nor the JS skills to figure
this out.

What I have in mind is either some standard functions that allow me to
do stuff like:

mapData('title', doc.head.title)

... or even, potentially, to have the mappings defined in simple JSON
maps/dictionaries. In either case, I'm imagining much shorter,
simpler, translator files.

Bruce

Frank Bennett

unread,
Feb 15, 2009, 9:05:36 PM2/15/09
to zotero-dev
On Feb 16, 10:30 am, "Bruce D'Arcus" <bdar...@gmail.com> wrote:
> On Sun, Feb 15, 2009 at 6:32 PM, Frank Bennett <biercena...@gmail.com> wrote:
>
> ...
>
> >> As I said originally, I don't think this is helpful at all. The
> >> barrier to entry is too high. Is there really not a way to (greatly)
> >> simplify this?
>
> > This is a very tall order.  But a starting point might be a tool to do
> > triage on a page.  A lot of time can be consumed just figuring out
> > where the data in a page is coming from, so that you can lay some kind
> > of plan for digging it out.  If you could do something like 'wgroke --
> > tellabout="John H. Smith" --rishttp://www.thatsite.com/', and get
> > back a report telling what URL and XPath will access the name "John H.
> > Smith" that you see on the screen at that address, and what URLs
> > _might_ yield RIS data, that would cut out a lot of the early
> > uncertainty that dissuades people from the first attempt.
>
> That's one issue, but for me, that's actually a trivial problem. I can
> go really far, really quickly, with just view source, or Firebug.
>
> The bigger issue is the amount of redundant code that gets written for
> every translator. I don't have the time nor the JS skills to figure
> this out.
>
> What I have in mind is either some standard functions that allow me to
> do stuff like:
>
>      mapData('title', doc.head.title)
>
> ... or even, potentially, to have the mappings defined in simple JSON
> maps/dictionaries. In either case, I'm imagining much shorter,
> simpler, translator files.

I don't know, it can be pretty chaotic out there. I've built three
little translators for use in our faculty so far. For one, I needed
to split tagged "field" content with regular expressions, and combine
fragments from several parts of the page to get full names matched
with author/editor hints. For another, the necessary information was
spread across three separate pages, two of them encased in frames,
with a third necessary but seemingly unrelated page (the one
containing the title) accessible only by using a query string that had
to be calculated (not copied, calculated) from a URL to which I
luckily had access in the target page. The third translator was an
even more entertaining case, in which there was no meaningful tagging
at all, and the only way forward was to convert the page to plain text
and scratch at it with regular expressions.

I hope that I have just had a bad run of luck, but it does kind of
invite the question of how much bang one could get for one's
automation buck. Maybe a good starting point would be to collect a
good batch of links to sites "in the wild" that you feel should be
easy to translate, as a pool of test data.


> Bruce

Kieren Diment

unread,
Feb 15, 2009, 9:42:22 PM2/15/09
to zoter...@googlegroups.com

I tend to agree. There is a tutorial here: http://www.zotero.org/support/dev/scaffold_tutorial
which could do with a little more love (especially attaching a pdf
and/or, an attachment of a web page, and adding from a list of
multiple items, also finding and obtaining data from proper structured
data sources - RIS etc. )

It would be nice to have a translator "Wizard". I suppose the first
step to developing that would be for an expert in creating translators
to take some existing translators and turn them into pseudocode, to
expose the repetitive bits of the translator code and abstract them
away.

Bruce D'Arcus

unread,
Feb 19, 2009, 11:39:32 AM2/19/09
to zoter...@googlegroups.com
On Sun, Feb 15, 2009 at 9:05 PM, Frank Bennett <bierc...@gmail.com> wrote:

...

> I hope that I have just had a bad run of luck, but it does kind of


> invite the question of how much bang one could get for one's
> automation buck. Maybe a good starting point would be to collect a
> good batch of links to sites "in the wild" that you feel should be
> easy to translate, as a pool of test data.

I come across a lot of examples like this ...

<http://ccrjustice.org>

I want to create a translator for press releases, say.

I can derive the institution and type from the base URI.

I can derive the title from the selector 'h1.pagename' (or 'head
title' and then split the string on the '|' and take the first part).

The only tricky thing is the date, since it's not wrapped in any specific node.

That took me 60 seconds to figure out. Writing a translator would take
a whole lot longer; too much hassle for me to bother.

Bruce

acrymble

unread,
Mar 30, 2009, 11:47:14 AM3/30/09
to zotero-dev
I just finished a tutorial on writing Zotero translators. You can take
a look here: http://niche.uwo.ca/zotero-guide .

Hope that helps.

Adam

On Feb 19, 12:39 pm, "Bruce D'Arcus" <bdar...@gmail.com> wrote:

Bruce D'Arcus

unread,
Mar 30, 2009, 11:50:51 AM3/30/09
to zoter...@googlegroups.com
On Mon, Mar 30, 2009 at 11:47 AM, acrymble <acry...@uwo.ca> wrote:

> I just finished a tutorial on writing Zotero translators. You can take
> a look here: http://niche.uwo.ca/zotero-guide .
>
> Hope that helps.

You know, I saw that yesterday. Mucho kudos on that; a great piece of work!

OTOH, it's 17 chapters/pages, which underlines my point; it's too hard
to write translators!

Bruce

BestYagna

unread,
Apr 3, 2009, 5:58:21 PM4/3/09
to zotero-dev
Hello Adam,
Where can I found a list of all zotero API with description?By API I
mean
Zotero.Utilities.cleanString
Zotero.Utilities.cleanAuthor

also description about:

doc.evaluate(xpath_in, doc, null, XPathResult.ANY_TYPE,
null).iterateNext().value
doc.evaluate(xpath_in, doc, null, XPathResult.ANY_TYPE,
null).iterateNext().textContent etc

mcburton

unread,
Apr 6, 2009, 9:21:02 PM4/6/09
to zoter...@googlegroups.com
BestYagna,

The best place for finding information about the translator utilities
is in the source itself:
https://www.zotero.org/trac/browser/extension/trunk/chrome/content/zotero/xpcom/utilities.js

for doc.evaluate() look at the Mozilla Developer Center docs:
https://developer.mozilla.org/en/DOM/document.evaluate

The XPATH tutorial is pretty helpful too:
https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript

--
mcb

BestYagna

unread,
Apr 13, 2009, 4:29:48 PM4/13/09
to zotero-dev
Thanks, I wish someone create a online documentation of API with
example of invocation and its results.

On Apr 6, 9:21 pm, mcburton <mcbur...@gmail.com> wrote:
> BestYagna,
>
> The best place for finding information about the translator utilities
> is in the source itself:https://www.zotero.org/trac/browser/extension/trunk/chrome/content/zo...
>
> for doc.evaluate() look at the Mozilla Developer Center docs:https://developer.mozilla.org/en/DOM/document.evaluate
>
> The XPATH tutorial is pretty helpful too:https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaS...
>
> --
> mcb
Reply all
Reply to author
Forward
0 new messages