Issue 13 in gpapers: URL scraping misses certain PDF MIME types [fix included]

codesite...@google.com

unread,

Jul 29, 2012, 7:59:23 PM7/29/12

to gpapers...@googlegroups.com

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 13 by 98310b...@gmail.com: URL scraping misses certain PDF MIME
types [fix included]
http://code.google.com/p/gpapers/issues/detail?id=13

What steps will reproduce the problem?
1. Open gpapers
2. Select File->Import DOI...
3. Enter 10.1021/jm049029u and press OK

What is the expected output? What do you see instead?

I expect the only PDF URL on the target page to be downloaded and added to
my library. Instead, the program fails to locate the URL because it has
the MIME type "application/pdf; charset=UTF-8" rather than
just "application/pdf".

What version of the product are you using? On what operating system?

I am using a local copy of revision #1638d25e2632 on Ubuntu 12.04 with
Python 2.7.3

Please provide any additional information below.

I changed the MIME type analysis so it looks for types which start
with "application/pdf", and this resolves the problem.

It is also worth noting that if you don't have access to the full-text
article, the file which gets downloaded for this DOI is another PDF which
happens to also be linked on the article's page. That is a much more
complicated problem which might merit its own bug report.

Attachments:
mime_type_fix.patch 723 bytes

codesite...@google.com

unread,

Jul 31, 2012, 6:09:42 PM7/31/12

to gpapers...@googlegroups.com

Updates:
Status: Fixed

Comment #1 on issue 13 by marcelCo...@gmail.com: URL scraping misses

certain PDF MIME types [fix included]
http://code.google.com/p/gpapers/issues/detail?id=13

This issue was closed by revision ed11b79da7d8.

codesite...@google.com

unread,

Jul 31, 2012, 6:11:42 PM7/31/12

to gpapers...@googlegroups.com

Comment #2 on issue 13 by marcelCo...@gmail.com: URL scraping misses

certain PDF MIME types [fix included]
http://code.google.com/p/gpapers/issues/detail?id=13

Again, many thanks for the report/patch. The issue about the wrong PDF
being downloaded is indeed a complicated one, I think at some point the GUI
should maybe present a list of potential PDFs for the user to chose from.

Please note that I added another fix for DOI downloads: After downloading
the PDF, the DOI is now used for downloading the paper's metadata.

Reply all

Reply to author

Forward