Translator for Frontiers

11 views
Skip to first unread message

Jason Friedman

unread,
Dec 13, 2010, 8:06:38 AM12/13/10
to zotero-dev
Hi all,

I have put together a simple translator for the Frontiers open access
journals web site

www.frontiersin.org

I have uploaded it to the google group files:

http://groups.google.com/group/zotero-dev/web/Frontiers.js

I have tested it on a few sample pages, e.g.:

http://www.frontiersin.org/human_neuroscience/10.3389/fnhum.2010.00215/abstract

http://www.frontiersin.org/decision_neuroscience/10.3389/fnins.2010.00184/abstract

http://www.frontiersin.org/fractal_physiology/10.3389/fphys.2010.00128/abstract

I think I have captured all the required information. I would like to
download the pdf automatically also, but I can't work out the link (it
uses some javascript to get it).

I'll be happy to get feedback, and hopefully to have this included i
with zotero. I use zotero regularly and this is pretty much the only
site that the translators do not work on (for me).

Thanks,

Jason

Avram Lyon

unread,
Dec 20, 2010, 4:24:59 AM12/20/10
to zoter...@googlegroups.com
Jason,

2010/12/13 Jason Friedman <write.t...@gmail.com>:


> I have put together a simple translator for the Frontiers open access
> journals web site
> www.frontiersin.org

..


> I'll be happy to get feedback, and hopefully to have this included i
> with zotero. I use zotero regularly and this is pretty much the only
> site that the translators do not work on (for me).

This is a good start. The PDF download will just require sending the
correct POST request -- see the Tamper Data Firefox add-on for a way
to easily figure out what to send.

You might also be able to parse the journal abbreviation out from the
citation information at the bottom of each article page, but otherwise
this is just about all you can do for the articles.

Optimally, the translator would support multiple saves from search
results, tables of contents, etc. Post here if you're running into
concrete problems writing such a feature.

Finally, we'll need some sort of license in order to distribute this
with Zotero. Most contributors have used the GPL; see "Japan Times
Online.js" for one such example.

Regards,

Avram

Jason Friedman

unread,
Dec 23, 2010, 8:51:14 PM12/23/10
to zotero-dev
Thanks for the helpful suggestions.

I have included the PDF now, as well as dealing with search pages and
pages with multiple results (Table of contents, etc). The updated
version is here:

http://groups.google.com/group/zotero-dev/web/Frontiers%20%282%29.js

I have tested it now on a larger range of pages:

Some individual results:
http://www.frontiersin.org/Human_Neuroscience/10.3389/fnhum.2010.00223/abstract
http://www.frontiersin.org/Human_Neuroscience/10.3389/neuro.09.026.2009/abstract
http://www.frontiersin.org/Aging_Neuroscience/10.3389/fnagi.2010.00027/abstract
http://www.frontiersin.org/Alzheimer%27s_Disease/10.3389/fpsyt.2010.00152/abstract
http://www.frontiersin.org/Cognitive_Science/10.3389/fpsyg.2010.00007/full

Pending articles:
http://www.frontiersin.org/human_neuroscience/abstract/8806

Journal main page:
http://www.frontiersin.org/cognitive_science
http://www.frontiersin.org/Human_Neuroscience

Archive:
http://www.frontiersin.org/Human_Neuroscience/archive

Search for cognitive
http://www.frontiersin.org/SearchSite.aspx?sq=cognitive

Finally, I included the GPL license in it.

anything else to do?

Thanks,

Jason


On Dec 20, 8:24 pm, Avram Lyon <ajl...@gmail.com> wrote:
> Jason,
>
> 2010/12/13 Jason Friedman <write.to.ja...@gmail.com>:

Avram Lyon

unread,
Dec 24, 2010, 11:37:09 AM12/24/10
to zoter...@googlegroups.com
Jason,

2010/12/24 Jason Friedman <write.t...@gmail.com>:
> [..] I have included the PDF now, as well as dealing with search pages and


> pages with multiple results (Table of contents, etc). The updated

> version is here [..]

This looks much better -- it works quite nicely. However, I'm having
trouble with attached PDFs. Testing with the article at
http://www.frontiersin.org/neuroscience/10.3389/fnins.2010.00051/abstract
, the article saves very nicely, but the attachment doesn't stick-- it
is rejected because the MIME-type doesn't match:

(2)(+0008002): Downloaded PDF did not have MIME type 'application/pdf'
in Attachments.importFromURL()

I don't see what is causing this -- when I go to that URL myself, the
PDF downloads correctly, and when I try to request that URI using
curl, I get exactly what you'd expect:

ajlyon@kechkene:~/Zotero Data/translators$ curl -vv
http://www.frontiersin.org/journal/downloadfile.aspx?fileid=2416731
* About to connect() to www.frontiersin.org port 80 (#0)
* Trying 94.236.98.247... connected
* Connected to www.frontiersin.org (94.236.98.247) port 80 (#0)
> GET /journal/downloadfile.aspx?fileid=2416731 HTTP/1.1
> User-Agent: curl/7.21.0 (i686-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: www.frontiersin.org
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: private
< Transfer-Encoding: chunked
< Content-Type: application/pdf
< Server: Microsoft-IIS/7.0
< X-AspNet-Version: 2.0.50727
< Content-Disposition: attachment; filename= fnins-04-00051.pdf
< X-Powered-By: ASP.NET
< Date: Fri, 24 Dec 2010 16:30:22 GMT

That said, PDF downloading is a great feature that frequently breaks,
so if it works for you, then we can let it go at that and commit this
to the repository regardless.

The only additional data that I think you should add are tags (use the
keywords) and URL, particularly since some pre-press items don't yet
have a DOI assigned. If you can, populate the URL field with a stable
URL. We don't do that for databases, but this is a publisher's site,
so their URLs are canonical and possibly useful.

Thanks for making this translator happen!

- Avram

Jason Friedman

unread,
Dec 28, 2010, 7:26:34 AM12/28/10
to zoter...@googlegroups.com
Hi Avram,

> This looks much better -- it works quite nicely. However, I'm having
> trouble with attached PDFs. Testing with the article at
> http://www.frontiersin.org/neuroscience/10.3389/fnins.2010.00051/abstract
> , the article saves very nicely, but the attachment doesn't stick-- it
> is rejected because the MIME-type doesn't match:
>
> (2)(+0008002): Downloaded PDF did not have MIME type 'application/pdf'
> in Attachments.importFromURL()

I hunted down the function in the zotero source, it calls
sniffForMimeType (in mime.js), which looks
for %PDF- at the start of the pdf file.

For some strange reason, the frontiers PDF starts with the line
"1e1ef5" when downloaded using firefox.
The second line has the usual "%PDF-". As it does not start with %PDF,
sniffForMimeType decides it is not
a PDF and so it gets deleted by zotero.

When downloaded using curl, it does not have this strange first line.
Even with the strange first line, it seems
to open OK in adobe reader.

I don't know how to deal with this -for now, I have just commented the
lines out in the code. Any suggestions?

I fixed up the other two things you suggested - including tags
(keywords) and the url. The updated version is here:

http://groups.google.com/group/zotero-dev/web/Frontiers.3.js?hl=en

Thanks,

Jason

--
Jason Friedman
Postdoctoral scholar
Macquarie Centre for Cognitive Science
Macquarie University, NSW 2109 Australia
email: write.t...@gmail.com
web: http://curiousjason.com

Avram Lyon

unread,
Dec 29, 2010, 2:53:21 AM12/29/10
to zoter...@googlegroups.com
2010/12/28 Jason Friedman <write.t...@gmail.com>:

> I hunted down the function in the zotero source, it calls
> sniffForMimeType (in mime.js), which looks
> for %PDF- at the start of the pdf file.
>
> For some strange reason, the frontiers PDF starts with the line
> "1e1ef5" when downloaded using firefox.
> The second line has the usual "%PDF-". As it does not start with %PDF,
> sniffForMimeType decides it is not
> a PDF and so it gets deleted by zotero.

That's very odd. I only see it for the Frontiers PDF, and the
six-digit hex value varies. This might be some sort of watermarking on
Frontiers' part?

Would it be possible to change sniffForMimeType to ignore an initial
six-digit hex value and linebreak, if present?

Avram

Jason Friedman

unread,
Dec 29, 2010, 7:32:38 AM12/29/10
to zoter...@googlegroups.com
>> For some strange reason, the frontiers PDF starts with the line
>> "1e1ef5" when downloaded using firefox.
>> The second line has the usual "%PDF-". As it does not start with %PDF,
>> sniffForMimeType decides it is not
>> a PDF and so it gets deleted by zotero.
>
> That's very odd. I only see it for the Frontiers PDF, and the
> six-digit hex value varies. This might be some sort of watermarking on
> Frontiers' part?
>
> Would it be possible to change sniffForMimeType to ignore an initial
> six-digit hex value and linebreak, if present?

I think by changing the line in xpcom/mime.js from

["%PDF-", "application/pdf", 0],

to

["%PDF-", "application/pdf"],

will allow the %PDF- to be anywhere in the first 128 bytes, and so
this should work.

Actually according to the PDF specification, the first line is
supposed to start with %PDF-, but the Frontiers web site does not seem
to follow this, and although not technically legal, the PDF files work
fine with Adobe reader and others.

I'd rather not set up a zotero development environment to test this
change, can this be posted as a bug for someone else to test / fix?

In the meantime, can the translator be considered with this feature
commented out (to be put back in when zotero is changed, or if the
frontiers server starts behaving better)?

Thanks,

Jason

Avram Lyon

unread,
Dec 29, 2010, 2:22:54 PM12/29/10
to zoter...@googlegroups.com
2010/12/29 Jason Friedman <write.t...@gmail.com>:

> I think by changing the line in xpcom/mime.js from
> ["%PDF-", "application/pdf", 0],
> to
> ["%PDF-", "application/pdf"],
> will allow the %PDF- to be anywhere in the first 128 bytes, and so
> this should work.

See https://www.zotero.org/trac/attachment/ticket/1754/ . Good catch.

> In the meantime, can the translator be considered with this feature
> commented out (to be put back in when zotero is changed, or if the
> frontiers server starts behaving better)?

I've committed the translator with PDF saving disabled:
https://www.zotero.org/trac/changeset/7536

Avram

Jason Friedman

unread,
Jan 11, 2011, 8:17:21 AM1/11/11
to zoter...@googlegroups.com
On Thu, Dec 30, 2010 at 6:22 AM, Avram Lyon <ajl...@gmail.com> wrote:
> 2010/12/29 Jason Friedman <write.t...@gmail.com>:
>> I think by changing the line in xpcom/mime.js from
>> ["%PDF-", "application/pdf", 0],
>> to
>> ["%PDF-", "application/pdf"],
>> will allow the %PDF- to be anywhere in the first 128 bytes, and so
>> this should work.
>
> See https://www.zotero.org/trac/attachment/ticket/1754/ . Good catch.

I noticed that the pdf bug was fixed in changeset 7614:

I tested the translator on the latest svn version of the zotero
including this patch, and pdf download now works with the frontiers
website.

>> In the meantime, can the translator be considered with this feature
>> commented out (to be put back in when zotero is changed, or if the
>> frontiers server starts behaving better)?

I placed a new version of the translator, with the lines not commented out:
http://groups.google.com/group/zotero-dev/web/Frontiers.4.js

Jason

Avram Lyon

unread,
Jan 11, 2011, 12:24:44 PM1/11/11
to zoter...@googlegroups.com
2011/1/11 Jason Friedman <write.t...@gmail.com>:

> I noticed that the pdf bug was fixed in changeset 7614:
..

> I tested the translator on the latest svn version of the zotero
> including this patch, and pdf download now works with the frontiers
> website.
..

> I placed a new version of the translator, with the lines not commented out:
> http://groups.google.com/group/zotero-dev/web/Frontiers.4.js

The updated translator is now in the repository (r7661). I'm not sure
how this should be pushed, since it will only work completely properly
with Zotero 2.1b3 and later, and conceivably with a future 2.0.x
release if r7614 is merged onto the 2.0 branch as well.

Avram

Dan Stillman

unread,
Jan 11, 2011, 12:32:34 PM1/11/11
to zoter...@googlegroups.com
On 1/11/11 12:24 PM, Avram Lyon wrote:
> The updated translator is now in the repository (r7661). I'm not sure
> how this should be pushed, since it will only work completely properly
> with Zotero 2.1b3 and later, and conceivably with a future 2.0.x
> release if r7614 is merged onto the 2.0 branch as well.

It will go out with 2.1b3. There will be no future 2.0.x releases.

Reply all
Reply to author
Forward
0 new messages