Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 13 by
98310b...@gmail.com: URL scraping misses certain PDF MIME
types [fix included]
http://code.google.com/p/gpapers/issues/detail?id=13
What steps will reproduce the problem?
1. Open gpapers
2. Select File->Import DOI...
3. Enter 10.1021/jm049029u and press OK
What is the expected output? What do you see instead?
I expect the only PDF URL on the target page to be downloaded and added to
my library. Instead, the program fails to locate the URL because it has
the MIME type "application/pdf; charset=UTF-8" rather than
just "application/pdf".
What version of the product are you using? On what operating system?
I am using a local copy of revision #1638d25e2632 on Ubuntu 12.04 with
Python 2.7.3
Please provide any additional information below.
I changed the MIME type analysis so it looks for types which start
with "application/pdf", and this resolves the problem.
It is also worth noting that if you don't have access to the full-text
article, the file which gets downloaded for this DOI is another PDF which
happens to also be linked on the article's page. That is a much more
complicated problem which might merit its own bug report.
Attachments:
mime_type_fix.patch 723 bytes