2010/12/13 Jason Friedman <write.t...@gmail.com>:
> I have put together a simple translator for the Frontiers open access
> journals web site
> www.frontiersin.org
..
> I'll be happy to get feedback, and hopefully to have this included i
> with zotero. I use zotero regularly and this is pretty much the only
> site that the translators do not work on (for me).
This is a good start. The PDF download will just require sending the
correct POST request -- see the Tamper Data Firefox add-on for a way
to easily figure out what to send.
You might also be able to parse the journal abbreviation out from the
citation information at the bottom of each article page, but otherwise
this is just about all you can do for the articles.
Optimally, the translator would support multiple saves from search
results, tables of contents, etc. Post here if you're running into
concrete problems writing such a feature.
Finally, we'll need some sort of license in order to distribute this
with Zotero. Most contributors have used the GPL; see "Japan Times
Online.js" for one such example.
Regards,
Avram
2010/12/24 Jason Friedman <write.t...@gmail.com>:
> [..] I have included the PDF now, as well as dealing with search pages and
> pages with multiple results (Table of contents, etc). The updated
> version is here [..]
This looks much better -- it works quite nicely. However, I'm having
trouble with attached PDFs. Testing with the article at
http://www.frontiersin.org/neuroscience/10.3389/fnins.2010.00051/abstract
, the article saves very nicely, but the attachment doesn't stick-- it
is rejected because the MIME-type doesn't match:
(2)(+0008002): Downloaded PDF did not have MIME type 'application/pdf'
in Attachments.importFromURL()
I don't see what is causing this -- when I go to that URL myself, the
PDF downloads correctly, and when I try to request that URI using
curl, I get exactly what you'd expect:
ajlyon@kechkene:~/Zotero Data/translators$ curl -vv
http://www.frontiersin.org/journal/downloadfile.aspx?fileid=2416731
* About to connect() to www.frontiersin.org port 80 (#0)
* Trying 94.236.98.247... connected
* Connected to www.frontiersin.org (94.236.98.247) port 80 (#0)
> GET /journal/downloadfile.aspx?fileid=2416731 HTTP/1.1
> User-Agent: curl/7.21.0 (i686-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: www.frontiersin.org
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: private
< Transfer-Encoding: chunked
< Content-Type: application/pdf
< Server: Microsoft-IIS/7.0
< X-AspNet-Version: 2.0.50727
< Content-Disposition: attachment; filename= fnins-04-00051.pdf
< X-Powered-By: ASP.NET
< Date: Fri, 24 Dec 2010 16:30:22 GMT
That said, PDF downloading is a great feature that frequently breaks,
so if it works for you, then we can let it go at that and commit this
to the repository regardless.
The only additional data that I think you should add are tags (use the
keywords) and URL, particularly since some pre-press items don't yet
have a DOI assigned. If you can, populate the URL field with a stable
URL. We don't do that for databases, but this is a publisher's site,
so their URLs are canonical and possibly useful.
Thanks for making this translator happen!
- Avram
> This looks much better -- it works quite nicely. However, I'm having
> trouble with attached PDFs. Testing with the article at
> http://www.frontiersin.org/neuroscience/10.3389/fnins.2010.00051/abstract
> , the article saves very nicely, but the attachment doesn't stick-- it
> is rejected because the MIME-type doesn't match:
>
> (2)(+0008002): Downloaded PDF did not have MIME type 'application/pdf'
> in Attachments.importFromURL()
I hunted down the function in the zotero source, it calls
sniffForMimeType (in mime.js), which looks
for %PDF- at the start of the pdf file.
For some strange reason, the frontiers PDF starts with the line
"1e1ef5" when downloaded using firefox.
The second line has the usual "%PDF-". As it does not start with %PDF,
sniffForMimeType decides it is not
a PDF and so it gets deleted by zotero.
When downloaded using curl, it does not have this strange first line.
Even with the strange first line, it seems
to open OK in adobe reader.
I don't know how to deal with this -for now, I have just commented the
lines out in the code. Any suggestions?
I fixed up the other two things you suggested - including tags
(keywords) and the url. The updated version is here:
http://groups.google.com/group/zotero-dev/web/Frontiers.3.js?hl=en
Thanks,
Jason
--
Jason Friedman
Postdoctoral scholar
Macquarie Centre for Cognitive Science
Macquarie University, NSW 2109 Australia
email: write.t...@gmail.com
web: http://curiousjason.com
That's very odd. I only see it for the Frontiers PDF, and the
six-digit hex value varies. This might be some sort of watermarking on
Frontiers' part?
Would it be possible to change sniffForMimeType to ignore an initial
six-digit hex value and linebreak, if present?
Avram
I think by changing the line in xpcom/mime.js from
["%PDF-", "application/pdf", 0],
to
["%PDF-", "application/pdf"],
will allow the %PDF- to be anywhere in the first 128 bytes, and so
this should work.
Actually according to the PDF specification, the first line is
supposed to start with %PDF-, but the Frontiers web site does not seem
to follow this, and although not technically legal, the PDF files work
fine with Adobe reader and others.
I'd rather not set up a zotero development environment to test this
change, can this be posted as a bug for someone else to test / fix?
In the meantime, can the translator be considered with this feature
commented out (to be put back in when zotero is changed, or if the
frontiers server starts behaving better)?
Thanks,
Jason
See https://www.zotero.org/trac/attachment/ticket/1754/ . Good catch.
> In the meantime, can the translator be considered with this feature
> commented out (to be put back in when zotero is changed, or if the
> frontiers server starts behaving better)?
I've committed the translator with PDF saving disabled:
https://www.zotero.org/trac/changeset/7536
Avram
I noticed that the pdf bug was fixed in changeset 7614:
I tested the translator on the latest svn version of the zotero
including this patch, and pdf download now works with the frontiers
website.
>> In the meantime, can the translator be considered with this feature
>> commented out (to be put back in when zotero is changed, or if the
>> frontiers server starts behaving better)?
I placed a new version of the translator, with the lines not commented out:
http://groups.google.com/group/zotero-dev/web/Frontiers.4.js
Jason
The updated translator is now in the repository (r7661). I'm not sure
how this should be pushed, since it will only work completely properly
with Zotero 2.1b3 and later, and conceivably with a future 2.0.x
release if r7614 is merged onto the 2.0 branch as well.
Avram
It will go out with 2.1b3. There will be no future 2.0.x releases.