[gpapers] 8 new revisions pushed by marcelCo...@gmail.com on 2012-06-26 17:49 GMT

6 views
Skip to first unread message

codesite...@google.com

unread,
Jun 26, 2012, 1:49:36 PM6/26/12
to gpapers...@googlegroups.com
8 new revisions:

Revision: 0510151f8c7d
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 01:33:31 2012
Log: fix import_after_search signature and delete unnecessary
import_after_...
http://code.google.com/p/gpapers/source/detail?r=0510151f8c7d

Revision: 6946bf6dead1
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 04:09:00 2012
Log: towards cleaning up and simplifying the import code
http://code.google.com/p/gpapers/source/detail?r=6946bf6dead1

Revision: f4e4a5d2fa14
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 07:25:39 2012
Log: Merge WebSearchProvider and SimpleSearchProvider and move the
class to...
http://code.google.com/p/gpapers/source/detail?r=f4e4a5d2fa14

Revision: df9004aa48a5
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 07:29:35 2012
Log: remove some debug messages
http://code.google.com/p/gpapers/source/detail?r=df9004aa48a5

Revision: fde563c6982b
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 07:37:01 2012
Log: add documentation scaffolding generated with sphinx-quickstart
http://code.google.com/p/gpapers/source/detail?r=fde563c6982b

Revision: 0ef338bdea05
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 08:36:55 2012
Log: set up some basic reference documentation
http://code.google.com/p/gpapers/source/detail?r=0ef338bdea05

Revision: f737e13dd054
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 09:10:10 2012
Log: add some more documentation and remove some more unnecessary code
http://code.google.com/p/gpapers/source/detail?r=f737e13dd054

Revision: cadbc110d77c
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 09:22:12 2012
Log: add doc for models
http://code.google.com/p/gpapers/source/detail?r=cadbc110d77c

==============================================================================
Revision: 0510151f8c7d
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 01:33:31 2012
Log: fix import_after_search signature and delete unnecessary
import_after_search function in arxiv importer
http://code.google.com/p/gpapers/source/detail?r=0510151f8c7d

Modified:
/gpapers/__init__.py
/gpapers/importer/arxiv.py
/gpapers/importer/google_scholar.py
/gpapers/importer/pubmed.py

=======================================
--- /gpapers/__init__.py Mon Jun 25 13:50:30 2012
+++ /gpapers/__init__.py Tue Jun 26 01:33:31 2012
@@ -1698,6 +1698,7 @@
button.set_tooltip_text('Add this paper to your
library...')
button.connect('clicked',
lambda x:
paper.provider.import_paper_after_search(paper.data,
+
paper,

self.document_imported))
paper_information_toolbar.insert(button, -1)
elif paper.id != -1:
=======================================
--- /gpapers/importer/arxiv.py Mon Jun 25 13:31:55 2012
+++ /gpapers/importer/arxiv.py Tue Jun 26 01:33:31 2012
@@ -115,41 +115,4 @@

return papers

- def import_paper_after_search(self, data, callback):
- """
- FIXME: Inconsistent signature for this function.
-
- gpapers.__init__:1702
- button.connect('clicked',
- lambda x:
paper.provider.import_paper_after_search(paper.data,
-
self.document_imported))
-
- gpapers.importer.__init__:625
- def import_paper_after_search(self, data, paper, callback)
-
- Nowehere appears to call the latter form, but it seems more
sensible -
- otherwise I have to set paper['data'] = paper otherwise I don't
seem to
- be able to return appropriate info to the callback.
-
- I am not clear on the correct delegation of operations here
- "import_url"
- is already supplied by the initial search operation, but it appears
- necessary to download it ourselves here or it doesn't get done.
-
- Note that arxiv returns 403 forbidden if no user-agent is set.
- """
-
- if 'import_url' in data:
- message = Soup.Message.new(method='GET',
uri_string=data['import_url'])
-
- def mycallback(session, message, user_data):
- if message.status_code == Soup.KnownStatusCode.OK:
- log_debug("arxiv: received pdf length %s" %
message.response_body.length)
- callback(data,
message.response_body.flatten().get_data(), user_data)
- else:
- log_error("arxiv: got status %s while trying to fetch
PDF" % (message.status_code))
- callback(data, None, user_data)
-
- log_debug("arxiv: trying to fetch %s" % data['import_url'])
- soup_session.queue_message(message, mycallback, (self.label,
data['arxiv_id']))
- else:
- callback(data, None, self.label)
+
=======================================
--- /gpapers/importer/google_scholar.py Sun Jun 24 05:30:19 2012
+++ /gpapers/importer/google_scholar.py Tue Jun 26 01:33:31 2012
@@ -127,7 +127,7 @@
paper_info = None
callback(paper_info, None, user_data)

- def import_paper_after_search(self, data, callback):
+ def import_paper_after_search(self, data, paper, callback):
log_info('Trying to import google scholar citation')
try:
citations = data.findAll('div', {'class': 'gs_fl'})[0]
=======================================
--- /gpapers/importer/pubmed.py Sun Jun 24 05:30:19 2012
+++ /gpapers/importer/pubmed.py Tue Jun 26 01:33:31 2012
@@ -189,7 +189,7 @@

callback(paper_info, None, user_data)

- def import_paper_after_search(self, pubmed_id, callback):
+ def import_paper_after_search(self, pubmed_id, paper, callback):
log_info('Trying to import pubmed citation with id %s' % pubmed_id)
query = BASE_URL + EFETCH_QUERY % pubmed_id
message = Soup.Message.new(method='GET', uri_string=query)

==============================================================================
Revision: 6946bf6dead1
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 04:09:00 2012
Log: towards cleaning up and simplifying the import code
http://code.google.com/p/gpapers/source/detail?r=6946bf6dead1

Modified:
/gpapers/__init__.py
/gpapers/gPapers/models.py
/gpapers/importer/__init__.py
/gpapers/importer/arxiv.py
/gpapers/importer/google_scholar.py
/gpapers/importer/pubmed.py

=======================================
--- /gpapers/__init__.py Tue Jun 26 01:33:31 2012
+++ /gpapers/__init__.py Tue Jun 26 04:09:00 2012
@@ -136,40 +136,6 @@
renderer.set_property('ellipsize', Pango.EllipsizeMode.END)


-def fetch_citations_via_urls(urls):
- log_info('trying to fetch: %s' % str(urls))
- thread.start_new_thread(import_citations, (urls,))
-
-def fetch_citations_via_references(references):
- log_info('trying to fetch: %s' % str(references))
- thread.start_new_thread(import_citations_via_references, (references,))
-
-def import_citations(urls):
- for url in urls:
-
- # display status message and delete it afterwards
- main_gui.active_threads[url] = 'Importing %s' % url
- def my_callback():
- if url in main_gui.active_threads:
- del main_gui.active_threads[url]
- main_gui.refresh_middle_pane_search()
-
- importer.import_citation(url, callback=my_callback)
-
-
-def import_citations_via_references(references):
- for reference in references:
- if not reference.referenced_paper:
- if reference.url_from_referencing_paper:
- reference.referenced_paper =
importer.import_citation(reference.url_from_referencing_paper)
- reference.save()
- if not reference.referencing_paper:
- if reference.url_from_referenced_paper:
- reference.referenced_paper =
importer.import_citation(reference.url_from_referenced_paper)
- reference.save()
- main_gui.refresh_middle_pane_search()
-
-
def import_documents_via_filenames(filenames, callback):
'''
Adds existing files or directories to the database and copies the
documents
@@ -379,19 +345,32 @@
except Paper.DoesNotExist:
log_warn('No paper in the database has DOI %s -- aborting.' %
doi)

- def document_imported(self, paper_info, paper_data, user_data):
+ def document_imported(self, paper_obj=None, paper_info=None,
+ paper_data=None, user_data=None):
'''
- Should be called after a paper is imported. `paper_info` is a
- dictionary with document metadata, `paper_data` is the PDF itself.
+ Should be called after a paper is imported. `paper_obj` is a
+ :class:`gpapers.model.VirtualPaper` object (in case the document is
+ imported after a search), `paper_info` is a dictionary with
document
+ metadata, `paper_data` is the PDF itself.
'''

- if user_data in self.active_threads:
- del self.active_threads[str(user_data)]
-
- if paper_data is None and paper_info is None:
+ if paper_data is None and paper_info is None and paper_obj is None:
# FIXME: This should be handled via an error callback
return

+ if paper_obj is not None:
+ # This is a paper imported after a search, merge its info with
+ # any additional info in paper_info (overwriting infos in the
+ # paper object info -- e.g. google search gives very imprecise
+ # results for a search but the BibTeX contains more accurate
+ # info)
+ if paper_info is None:
+ paper_info = {}
+
+ for key in paper_obj.paper_info:
+ if not key in paper_info:
+ paper_info[key] = paper_obj.paper_info[key]
+
if paper_data is not None:

# Get some info from the PDF:
@@ -453,7 +432,6 @@
response = dialog.run()
if response == Gtk.ResponseType.OK:
url = entry.get_text()
- importer.active_threads[url] = 'Importing %s' % url
importer.import_from_url(url, self.document_imported)

dialog.destroy()
@@ -479,7 +457,6 @@
response = dialog.run()
if response == Gtk.ResponseType.OK:
url = 'http://dx.doi.org/' + entry.get_text().strip()
- importer.active_threads[url] = 'Importing DOI'
importer.import_from_url(url, self.document_imported)

dialog.destroy()
@@ -579,8 +556,7 @@
importer.import_from_url(url, self.document_imported,
paper_info=paper_info)
else:
- self.document_imported(paper_info, paper_data=None,
- user_data=None)
+ self.document_imported(paper_info=paper_info)

dialog.destroy()

@@ -1536,7 +1512,7 @@
# Get the actual file content
file_content =
file_object.load_contents_finish(asyncresult)[1]

- self.document_imported(paper_data=file_content,
paper_info=None,
+ self.document_imported(paper_data=file_content,
user_data=user_data)

import_documents_via_filenames([filename], mycallback)
@@ -1555,8 +1531,7 @@
importer.import_from_url(url,
self.document_imported,
paper_info=paper_info)
else:
- self.document_imported(paper_info,
paper_data=None,
- user_data=None)
+ self.document_imported(paper_info=paper_info)
gfile = Gio.File.new_for_uri(data.strip())
# first argument is the `cancellable` object
gfile.load_contents_async(None, mycallback,
data.strip())
@@ -1617,10 +1592,6 @@
playlist.save()
self.refresh_left_pane()

- def import_citation_via_middle_top_pane_row(self, row):
- paper_obj = row[0]
- importer.get_or_create_paper_via(paper_obj,
callback=self.document_imported)
-
def select_middle_top_pane_item(self, selection):
liststore, rows = selection.get_selected_rows()
self.paper_information_pane_model.clear()
@@ -1686,20 +1657,25 @@
button.connect('clicked',
lambda x: Gtk.show_uri(None, url,
Gdk.CURRENT_TIME))
paper_information_toolbar.insert(button, -1)
- if paper.id != -1:
- button = Gtk.ToolButton(stock_id=Gtk.STOCK_REFRESH)
- button.set_tooltip_text('Re-add this paper to your
library...')
-
- button.connect('clicked', lambda x:
self.import_citation_via_middle_top_pane_row(liststore[rows[0]]))
- paper_information_toolbar.insert(button, -1)
+ # TODO: Re-think this, possibly should either download a
PDF
+ # if this is missing but the import_url or DOI is known or
+ # download additional metadata via the import_url/DOI
+# if paper.id != -1:
+# button = Gtk.ToolButton(stock_id=Gtk.STOCK_REFRESH)
+# button.set_tooltip_text('Re-add this paper to your
library...')
+#
+# button.connect('clicked', lambda x:
self.import_citation_via_middle_top_pane_row(liststore[rows[0]]))
+# paper_information_toolbar.insert(button, -1)

if paper.id == -1 and hasattr(paper, 'provider'): # This is a
search result
button = Gtk.ToolButton(stock_id=Gtk.STOCK_ADD)
button.set_tooltip_text('Add this paper to your
library...')
- button.connect('clicked',
- lambda x:
paper.provider.import_paper_after_search(paper.data,
-
paper,
-
self.document_imported))
+
+ def import_paper_callback():
+ paper.provider.import_paper_after_search(paper,
+
self.document_imported)
+ button.connect('clicked', lambda x:
import_paper_callback())
+
paper_information_toolbar.insert(button, -1)
elif paper.id != -1:
importable_references = set()
@@ -1734,38 +1710,39 @@
button.connect('clicked', lambda x:
self.remove_papers_from_current_playlist([paper.id]))
paper_information_toolbar.insert(button, -1)

- if importable_references or importable_citations:
- import_button =
Gtk.MenuToolButton(stock_id=Gtk.STOCK_ADD)
- import_button.set_tooltip_text('Import all cited and
referenced documents...(%i)' %
len(importable_references.union(importable_citations)))
- import_button.connect('clicked', lambda x:
fetch_citations_via_references(importable_references.union(importable_citations)))
- paper_information_toolbar.insert(import_button, -1)
- import_button_menu = Gtk.Menu()
- if importable_citations:
- menu_item = Gtk.MenuItem('Import all cited
documents (%i)' % len(importable_citations))
- menu_item.connect('activate', lambda x:
fetch_citations_via_references(importable_citations))
- import_button_menu.append(menu_item)
- menu_item = Gtk.MenuItem('Import specific cited
document')
- import_button_submenu = Gtk.Menu()
- for citation in importable_citations:
- submenu_item =
Gtk.MenuItem(truncate_long_str(citation.line_from_referenced_paper))
- submenu_item.connect('activate', lambda x:
fetch_citations_via_references((citation,)))
- import_button_submenu.append(submenu_item)
- menu_item.set_submenu(import_button_submenu)
- import_button_menu.append(menu_item)
- if importable_references:
- menu_item = Gtk.MenuItem('Import all referenced
documents (%i)' % len(importable_references))
- menu_item.connect('activate', lambda x:
fetch_citations_via_references(importable_references))
- import_button_menu.append(menu_item)
- menu_item = Gtk.MenuItem('Import specific
referenced document')
- import_button_submenu = Gtk.Menu()
- for reference in importable_references:
- submenu_item =
Gtk.MenuItem(truncate_long_str(reference.line_from_referencing_paper))
- submenu_item.connect('activate', lambda x:
fetch_citations_via_references((reference,)))
- import_button_submenu.append(submenu_item)
- menu_item.set_submenu(import_button_submenu)
- import_button_menu.append(menu_item)
- import_button_menu.show_all()
- import_button.set_menu(import_button_menu)
+# TODO: Check how the references/citations are supposed to work
+# if importable_references or importable_citations:
+# import_button =
Gtk.MenuToolButton(stock_id=Gtk.STOCK_ADD)
+# import_button.set_tooltip_text('Import all cited and
referenced documents...(%i)' %
len(importable_references.union(importable_citations)))
+# import_button.connect('clicked', lambda x:
fetch_citations_via_references(importable_references.union(importable_citations)))
+# paper_information_toolbar.insert(import_button, -1)
+# import_button_menu = Gtk.Menu()
+# if importable_citations:
+# menu_item = Gtk.MenuItem('Import all cited
documents (%i)' % len(importable_citations))
+# menu_item.connect('activate', lambda x:
fetch_citations_via_references(importable_citations))
+# import_button_menu.append(menu_item)
+# menu_item = Gtk.MenuItem('Import specific cited
document')
+# import_button_submenu = Gtk.Menu()
+# for citation in importable_citations:
+# submenu_item =
Gtk.MenuItem(truncate_long_str(citation.line_from_referenced_paper))
+# submenu_item.connect('activate', lambda x:
fetch_citations_via_references((citation,)))
+# import_button_submenu.append(submenu_item)
+# menu_item.set_submenu(import_button_submenu)
+# import_button_menu.append(menu_item)
+# if importable_references:
+# menu_item = Gtk.MenuItem('Import all referenced
documents (%i)' % len(importable_references))
+# menu_item.connect('activate', lambda x:
fetch_citations_via_references(importable_references))
+# import_button_menu.append(menu_item)
+# menu_item = Gtk.MenuItem('Import specific
referenced document')
+# import_button_submenu = Gtk.Menu()
+# for reference in importable_references:
+# submenu_item =
Gtk.MenuItem(truncate_long_str(reference.line_from_referencing_paper))
+# submenu_item.connect('activate', lambda x:
fetch_citations_via_references((reference,)))
+# import_button_submenu.append(submenu_item)
+# menu_item.set_submenu(import_button_submenu)
+# import_button_menu.append(menu_item)
+# import_button_menu.show_all()
+# import_button.set_menu(import_button_menu)

button = Gtk.ToolButton() # GRAPH_ICON
icon = Gtk.Image()
@@ -1780,16 +1757,27 @@
self.update_bookmark_pane_from_paper(None)
self.paper_information_pane_model.append(('<b>Number of
papers:</b>', str(len(rows)) ,))

- downloadable_paper_urls = set()
+ papers = []
for row in rows:
paper = liststore[row][0]
- if paper.import_url and paper.id == -1:
- downloadable_paper_urls.add(paper.import_url)
- if len(downloadable_paper_urls):
- self.paper_information_pane_model.append(('<b>Number of
new papers:</b>', str(len(downloadable_paper_urls)) ,))
+ if paper.id == -1 and hasattr(paper, 'provider'):
+ papers.append(paper)
+ if len(papers):
+ self.paper_information_pane_model.append(('<b>Number of
new papers:</b>', str(len(papers)) ,))
button = Gtk.ToolButton(stock_id=Gtk.STOCK_ADD)
- button.set_tooltip_text('Add new papers (%i) to your
library...' % len(downloadable_paper_urls))
- button.connect('clicked', lambda x:
fetch_citations_via_urls(downloadable_paper_urls))
+ button.set_tooltip_text('Add new papers (%i) to your
library...' % len(papers))
+
+ # These should all be from the same search provider
+ provider = papers[0].provider
+ for paper in papers:
+ assert(paper.provider == provider)
+
+ # Download the papers
+ def import_papers_callback():
+ paper.provider.import_papers_after_search(papers,
+
self.document_imported)
+
+ button.connect('clicked', lambda x:
import_papers_callback())
paper_information_toolbar.insert(button, -1)

selected_valid_paper_ids = []
=======================================
--- /gpapers/gPapers/models.py Mon Jun 25 13:07:43 2012
+++ /gpapers/gPapers/models.py Tue Jun 26 04:09:00 2012
@@ -304,6 +304,7 @@
'''

self.id = -1
+ self.paper_info = paper_info # save the complete info for later
reuse
self.provider = provider # The origin of this search result
self.authors = []
self.source = VirtualSource()
=======================================
--- /gpapers/importer/__init__.py Mon Jun 25 13:31:55 2012
+++ /gpapers/importer/__init__.py Tue Jun 26 04:09:00 2012
@@ -20,6 +20,7 @@
import os
import re
import traceback
+import time
from htmlentitydefs import name2codepoint as n2cp
import urllib
import urlparse
@@ -80,215 +81,6 @@
return match.group()


-def get_or_create_paper_via(paper_obj, callback, full_text_md5=None):
-
- paper_id = paper_obj.id
- doi = paper_obj.doi
- pubmed_id = paper_obj.pubmed_id
- import_url = paper_obj.import_url
- title = paper_obj.title
- provider = paper_obj.provider
- data = paper_obj.data
-
- paper = None
-
- if paper_id >= 0:
- try: paper = Paper.objects.get(id=paper_id)
- except: pass
-
- if doi:
- if paper:
- if not paper.doi:
- paper.doi = doi
- else:
- try: paper = Paper.objects.get(doi=doi)
- except: pass
-
- if pubmed_id:
- if paper:
- if not paper.pubmed_id:
- paper.pubmed_id = pubmed_id
- else:
- try: paper = Paper.objects.get(pubmed_id=pubmed_id)
- except: pass
-
- if import_url:
- if paper:
- if not paper.import_url:
- paper.import_url = import_url
- else:
- try: paper = Paper.objects.get(import_url=import_url)
- except: pass
-
- if full_text_md5:
- if not paper:
- try: paper = Paper.objects.get(full_text_md5=full_text_md5)
- except: pass
-
- if title:
- if paper:
- if not paper.title:
- paper.title = title
- else:
- try: paper = Paper.objects.get(title=title)
- except: pass
-
- if not paper:
- # it looks like we haven't seen this paper before...
- if provider:
- # Get the paper from the search provider
- provider.import_paper_after_search(data, callback=callback)
- else:
- if not doi:
- doi = ''
- if not pubmed_id:
- pubmed_id = ''
- if not import_url:
- import_url = ''
- if not title:
- title = ''
- paper = Paper.objects.create(doi=doi, pubmed_id=pubmed_id,
- import_url=import_url,
title=title)
- # we are done, call the callback
- callback(paper)
-
-
-#TODO: Refactor import_pdf into a new function
-def import_citation(url, paper=None, callback=None):
-
- log_info('Importing URL: %s' % url)
-
- active_threads[ str(thread.get_ident()) ] = 'importing: ' + url
- try:
- response = urllib.urlopen(url)
- if response.getcode() != 200 and response.getcode() != 302:
- log_error('unable to download: %s (%i)' % (url,
response.getcode()))
- return
-
- data = response.read(-1)
- info = response.info()
-
- if info.gettype() == 'application/pdf' or info.gettype()
== 'application/octet-stream':
- # this is hopefully a PDF file
-
- #Try finding a PDF file name in the url
- parsed_url = urlparse.urlsplit(url)
- filename = os.path.split(parsed_url.path)[1]
-
- if os.path.splitext(filename)[1].lower() != '.pdf':
- filename = None
- #That didn't work, try to find a filename in the query
string
- query = urlparse.parse_qs(parsed_url.query)
- for key in query:
- print key, query[key]
- if os.path.splitext(query[key][0])[1].lower()
== '.pdf':
- filename = query[key].lower() # found a .pdf name
- break
-
- log_info('importing paper: %s' % filename)
-
- if not paper:
- md5_hexdigest = get_md5_hexdigest_from_data(data)
- # FIXME
- paper, created =
get_or_create_paper_via(full_text_md5=md5_hexdigest)
- if created:
- if not filename:
- filename = md5_hexdigest # last resort for filename
-
-
paper.save_file(defaultfilters.slugify(filename.replace('.pdf', ''))
+ '.pdf',
- data)
- paper.import_url = response.geturl()
- paper.save()
- log_info('imported paper: %s' % filename)
- else:
- log_info('paper already exists: %s' % str(paper))
- else:
-
paper.save_file(defaultfilters.slugify(filename.replace('.pdf', ''))
+ '.pdf',
- local_file.read())
- paper.import_url = response.geturl()
- paper.save()
- return paper
-
- # let's see if there's a pdf somewhere in here...
- paper = _import_unknown_citation(data, response.geturl(),
paper=paper)
- if paper and callback:callback()
- if paper: return paper
-
- except:
- traceback.print_exc()
- Gdk.threads_enter()
- error = Gtk.MessageDialog(type=Gtk.MessageType.ERROR,
buttons=Gtk.ButtonsType.OK, flags=Gtk.DialogFlags.MODAL)
- error.connect('response', lambda x, y: error.destroy())
- error.set_markup('<b>Unknown Error</b>\n\nUnable to download this
resource.')
- error.run()
- Gdk.threads_leave()
-
- Gdk.threads_enter()
- error = Gtk.MessageDialog(type=Gtk.MessageType.ERROR,
buttons=Gtk.ButtonsType.OK, flags=Gtk.DialogFlags.MODAL)
- error.connect('response', lambda x, y: error.destroy())
- error.set_markup('<b>No Paper Found</b>\n\nThe given URL does not
appear to contain or link to any PDF files. (perhaps you have it buy it?)
Try downloading the file and adding it using "File &gt;&gt;
Import..."\n\n%s' % pango_escape(url))
- error.run()
- Gdk.threads_leave()
- if active_threads.has_key(str(thread.get_ident())):
- del active_threads[str(thread.get_ident())]
-
-
-def _import_unknown_citation(data, orig_url, paper=None):
-
- # soupify
- soup = BeautifulSoup.BeautifulSoup(data)
-
- # search for bibtex link
- for a in soup.findAll('a'):
- for c in a.contents:
- if str(c).lower().find('bibtex') != -1:
- log_debug('found bibtex link: %s' % a)
- #TODO: Do something with bibtex link
-
- # search for ris link
- for a in soup.findAll('a'):
- if not a.has_key('href'):
- continue
- href = a['href']
- if href.find('?') > 0: href = href[ : href.find('?') ]
- if href.lower().endswith('.ris'):
- log_debug('found RIS link: %s' % a)
- break
- for c in a.contents:
- c = str(c).lower()
- if c.find('refworks') != -1 or c.find('procite') != -1 or
c.find('refman') != -1 or c.find('endnote') != -1:
- log_debug('found RIS link: %s' % a)
- #TODO: Do something with ris link
-
- # search for pdf link
- # TODO: If more than one link is found, present the choice to the user
- pdf_link = None
- for a in soup.findAll('a'):
- if pdf_link:
- break
- if not a.has_key('href'):
- continue
- href = a['href']
- if href.lower() == orig_url.lower(): #this is were we came from...
- continue
- if href.find('?') > 0: href = href[ : href.find('?') ]
- if href.lower().endswith('pdf'):
- pdf_link = a['href']
- log_debug('found PDF link: %s' % a)
- break
- for c in a.contents:
- c = str(c).lower()
- if c.find('pdf') != -1:
- log_debug('found PDF link: %s' % a)
- pdf_link = a['href']
- break
-
- if pdf_link:
- #Combine the base URL with the PDF link (necessary for relative
URLs)
- pdf_link = urlparse.urljoin(orig_url, pdf_link)
- return import_citation(pdf_link)
-
-
def get_bibtex_for_doi(doi, callback):
'''
Asynchronously retrieves the bibtex data for a document with a given
`doi`,
@@ -350,12 +142,16 @@
(binary data) as an argument.
'''

+ importer.active_threads[url] = 'Importing %s' % url
+
def data_received(session, message, user_data):
if not message.status_code == Soup.KnownStatusCode.OK:
# FIXME: Use error handler here
log_warn('URL %s responded with error code %d' % (user_data,

message.status_code))
- callback(None, None, user_data)
+ if url in importer.active_threads:
+ del importer.active_threads[url]
+ callback(user_data=user_data)
return

log_debug('Response received (status code OK)')
@@ -376,9 +172,14 @@

message.get_uri()))

if content_type == 'application/pdf':
- callback(paper_info, data, user_data)
+ if url in importer.active_threads:
+ del importer.active_threads[url]
+ callback(paper_info=paper_info, paper_data=data,
user_data=user_data)
elif (content_type == 'text/x-bibtex' or first_letter == '@') and
not paper_info:
- callback(bibtex.paper_info_from_bibtex(data), paper_data,
user_data)
+ if url in importer.active_threads:
+ del importer.active_threads[url]
+ callback(paper_info=bibtex.paper_info_from_bibtex(data),
+ paper_data=paper_data, user_data=user_data)
elif content_type == 'text/html':
log_debug('Searching page for links')
parsed = BeautifulSoup.BeautifulSoup(data)
@@ -419,13 +220,20 @@
#Combine the base URL with the PDF link (necessary for
relative URLs)
urls = [urlparse.urljoin(orig_url, url) for url in urls]
log_debug('Calling import_from_urls with %s' % str(urls))
+ if url in importer.active_threads:
+ del importer.active_threads[url]
import_from_urls(urls, callback, user_data)
else:
log_warn('Nothing found...')
- callback(paper_info, paper_data, user_data)
+ if url in importer.active_threads:
+ del importer.active_threads[url]
+ callback(paper_info=paper_info, paper_data=paper_data,
+ user_data=user_data)
else:
log_warn('Do not know what to do with content type %s of
URL %s' % (content_type, orig_url))
- callback(paper_info, paper_data, user_data)
+ if url in importer.active_threads:
+ del importer.active_threads[url]
+ callback(paper_info=paper_info, paper_data=paper_data,
user_data=user_data)

try:
message = Soup.Message.new(method='GET', uri_string=url)
@@ -438,6 +246,8 @@
soup_session.queue_message(message, data_received, url)
log_debug('Message queued')
else:
+ if url in importer.active_threads:
+ del importer.active_threads[url]
callback(paper_info, paper_data, url)


@@ -494,14 +304,15 @@
as an argument.
'''
if urls is None:
- callback(None, None, user_data)
+ callback(user_data=user_data)

log_info(('Starting to look for PDF and/or metadata '
'from %d possible URLs' % len(urls)))

def _import_from_urls_finished(paper_info, paper_data, user_data):
log_debug('_import_from_urls_finished')
- callback(paper_info, paper_data, user_data)
+ callback(paper_info=paper_info, paper_data=paper_data,
+ user_data=user_data)

_import_from_urls(urls, _import_from_urls_finished, user_data)

@@ -544,19 +355,15 @@
This method should not block but use the :class:`AsyncSoupSession`
object
`importer.soup_session` for getting the information.

- * `import_paper_after_search(self, data, paper, callback)`
- This method receives the `data` (if any) previously returned from the
- :method:`search_async` method and a :class:`Paper` object, already
+ * `import_paper_after_search(self, paper, callback)`
+ This method receives the a :class:`VirtualPaper` object, already
filled with the information previously returned from the search. It
should
call the callback when it is finished processing (which may include
getting
more information from webpages -- in this case
the :class:`AsyncSoupSession`
object `importer.soup_session` should be used to asynchronously fetch
the
- page(s)). The callback function has to be called with the
same :class:`Paper`
- object (with any additional information now available filled in,
generated
- :class:`Author` objects, etc.) as the first argument. Optionally, a
list of
- URL strings can be given as the second argument, these URLs will be
used
- in the order they are given, i.e. if fetching the document from the
first
- one is not successful, the second one will be tried etc.
+ page(s)). The callback function has to be called with ...
+
+ TODO

The :method:`__init__` method of the subclass has to call the
:method:`__init__` method of the superclass.
@@ -610,9 +417,31 @@
except Exception as ex:
error_callback(ex, None)

- def import_paper_after_search(self, data, paper, callback):
+ def import_paper_after_search(self, paper, callback):
raise NotImplementedError()

+ def import_papers_after_search(self, papers, callback):
+
+ identifier = '%s_%f' % (self.label, time.time()) #only used for
status message
+
+ def import_all_papers(paper_list):
+ active_threads[identifier] = 'Importing %d papers' %
len(paper_list)
+ def my_callback(paper_obj=None, paper_data=None,
paper_info=None,
+ user_data=None):
+
+ callback(paper_obj=paper_obj, paper_data=paper_data,
+ paper_info=paper_info, user_data=user_data)
+ import_all_papers(paper_list)
+
+ if len(paper_list):
+ one_paper = paper_list.pop()
+ self.import_paper_after_search(one_paper, my_callback)
+ else:
+ if identifier in active_threads:
+ del active_threads[identifier]
+
+ import_all_papers(papers)
+
def search_async(self, search_string, callback, error_callback):
raise NotImplementedError()

@@ -663,9 +492,30 @@
else:
error_callback(message.status_code, None)

- def import_paper_after_search(self, data, paper, callback):
- # Nothing to add
- callback(paper, [], self.label)
+ def import_paper_after_search(self, paper_obj, callback):
+ # in case the paper already had an import URL, download from this
URL
+ if hasattr(paper_obj, 'import_url') and paper_obj.import_url:
+ message = Soup.Message.new(method='GET',
+ uri_string=paper_obj.import_url)
+
+ def mycallback(session, message, user_data):
+ if message.status_code == Soup.KnownStatusCode.OK:
+ log_debug("%s: received pdf length %s" %
(self.__class__.__name__,
+
message.response_body.length))
+ callback(paper_obj=paper_obj,
+
paper_data=message.response_body.flatten().get_data(),
+ user_data=user_data)
+ else:
+ log_error("%: got status %s while trying to fetch
PDF" % (self.__class__.__name__,
+
message.status_code))
+ callback(paper_obj=paper_obj, user_data=user_data)
+
+ log_debug("%s: trying to fetch %s" % (self.__class__.__name__,
+ paper_obj.import_url))
+ soup_session.queue_message(message, mycallback,
+ (self.label, paper_obj.import_url))
+ else:
+ callback(paper_obj=paper_obj, user_data=self.label)

#
-------------------------------------------------------------------------
# Methods to overwrite in sub classes
=======================================
--- /gpapers/importer/arxiv.py Tue Jun 26 01:33:31 2012
+++ /gpapers/importer/arxiv.py Tue Jun 26 04:09:00 2012
@@ -114,5 +114,3 @@
log_error("arxiv: error while reading item: %s" % ex[0])

return papers
-
-
=======================================
--- /gpapers/importer/google_scholar.py Tue Jun 26 01:33:31 2012
+++ /gpapers/importer/google_scholar.py Tue Jun 26 04:09:00 2012
@@ -127,9 +127,10 @@
paper_info = None
callback(paper_info, None, user_data)

- def import_paper_after_search(self, data, paper, callback):
+ def import_paper_after_search(self, paper, callback):
log_info('Trying to import google scholar citation')
try:
+ data = paper.data
citations = data.findAll('div', {'class': 'gs_fl'})[0]
log_debug('Citations: %s' % str(citations))
for link in citations.findAll('a'):
=======================================
--- /gpapers/importer/pubmed.py Tue Jun 26 01:33:31 2012
+++ /gpapers/importer/pubmed.py Tue Jun 26 04:09:00 2012
@@ -187,9 +187,10 @@
except:
pass

- callback(paper_info, None, user_data)
-
- def import_paper_after_search(self, pubmed_id, paper, callback):
+ callback(paper_info=paper_info, user_data=user_data)
+
+ def import_paper_after_search(self, paper, callback):
+ pubmed_id = paper.data
log_info('Trying to import pubmed citation with id %s' % pubmed_id)
query = BASE_URL + EFETCH_QUERY % pubmed_id
message = Soup.Message.new(method='GET', uri_string=query)

==============================================================================
Revision: f4e4a5d2fa14
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 07:25:39 2012
Log: Merge WebSearchProvider and SimpleSearchProvider and move the
class to its own module. Document the class.
http://code.google.com/p/gpapers/source/detail?r=f4e4a5d2fa14

Added:
/gpapers/importer/provider_base.py
Modified:
/gpapers/importer/__init__.py
/gpapers/importer/arxiv.py
/gpapers/importer/google_scholar.py
/gpapers/importer/jstor.py
/gpapers/importer/pubmed.py

=======================================
--- /dev/null
+++ /gpapers/importer/provider_base.py Tue Jun 26 07:25:39 2012
@@ -0,0 +1,289 @@
+# gPapers
+# Copyright (C) 2007-2009 Derek Anderson
+# 2012 Derek Anderson and Marcel Stimberg
+#
+# This file is part of gPapers.
+#
+# gPapers is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# gPapers is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with gPapers. If not, see <http://www.gnu.org/licenses/>.
+
+'''
+This module contains the base class for all search providers
+'''
+from gi.repository import Soup
+
+from gpapers.logger import log_info, log_debug
+from gpapers.importer import soup_session
+
+
+class WebSearchProvider(object):
+ '''
+ Base class for all web search providers, i.e. websites or web APIs that
+ return a number of search results for a search string.
+
+ Implementation of new search providers should derive from this class
and
+ have to provide the following attributes as class attributes:
+ * `name`: A human readable name that is used in the left column of the
GUI,
+ e.g. "Google Scholar".
+ * `label`: A simple name that is used for saving searches to the
database,
+ e.g. "gscholar"
+ * `icon` : The name of an icon (expected in the `icons` subdirectory
+ currently) used in the left column of the GUI, e.g.
+ "favicon_google.ico". If no icon name is provided, a
standard
+ icon is used.
+ * `unique_key`: Defining under what circumstances a search result
should be
+ onsidered a duplicate of an existing paper in the
+ database. For a PubMED search, for example, this
should be
+ 'pubmed_id'. If the unique key is not set, 'doi' is
used
+ as a default.
+
+ In the simplest case (for the search, a single request to a website is
+ sufficient and that is all the data that is needed for an import), it
is
+ enough to overwrite two relatively simple methods (see
+ :class:`importer.jstor.JSTORSearch` for an example):
+ * prepare_search_message(self, search_string)
+ Has to construct and return a `Soup.Message` object using
+ `Soup.Message.new`, for example:
+ ..
+ return Soup.Message.new(method='GET',
+
uri_string='http://example.com/search?query=search_string')
+
+ * parse_response(self, response):
+ Receives the HTML response of the website and should return a list of
+ paper info dictionaries (see :method:`search_async`).
+
+ For more complex scenarios, the following methods can be overwritten:
+ * `search_async(self, search_string, callback)`
+ This method receives a search string from the GUI and should call the
+ callback with a list of paper info dictionaries
(see :method:`search_async`)
+
+ * `import_paper_after_search(self, paper, callback)`
+ This method receives the a :class:`VirtualPaper` object, already
+ filled with the information previously returned from the search. It
should
+ call the callback when it is finished processing (which may include
getting
+ more information from webpages -- in this case
the :class:`AsyncSoupSession`
+ object `importer.soup_session` should be used to asynchronously fetch
the
+ page(s)) (see :method:`import_paper_after_search`).
+
+ In case that the website supports the downloading of multiple papers at
+ once, it may also be more efficient to use this operation by
overwriting
+ :method:`import_papers_after_search` which otherwise will call
+ :method:`import_paper_after_search` for each paper.
+
+ Note that if the subclass overwrites the :method:`__init__` method, it
has
+ to call the :method:`__init__` of its superclass.
+ '''
+
+ unique_key = 'doi'
+
+ def __init__(self):
+ '''
+ Initializes the cache for previous search results (should be called
+ by overriding implementations in subclasses).
+ '''
+ # Remember previous search results so that no new search is
necessary.
+ # Useful especially if switching between libraries/searches in the
left
+ # pane
+ self.search_cache = {}
+
+ def __str__(self):
+ '''
+ Return the name of this search provider.
+ '''
+ return self.name
+
+ def clear_cache(self, text):
+ '''
+ Delete search results for `text` from the cache.
+ '''
+ if text in self.search_cache:
+ del self.search_cache[text]
+
+ def search_async(self, search_string, callback, error_callback):
+ '''
+ Asynchronously search for `search_string` and hand over a list of
+ search results to the callback. Each single search result is a
+ dictionary containing all the information that could be fetched
from the
+ webpage, e.g.:
+ [{'title': 'A paper title', 'authors': ['Author A', 'Author B']},
+ {'title': 'Another paper', 'authors': ['Author C'],
+ 'import_url': 'http://example.com/paper.pdf'}]
+ In addition, each paper can also contain arbitrary additional data
as the
+ value for a 'data' key. This could for example be used to save the
full
+ HTML code of a search result (which might be useful for an import
of this
+ paper) as opposed to only the extracted information.
+ This method should not block but use the :class:`AsyncSoupSession`
object
+ `importer.soup_session` for getting the information.
+ '''
+
+ try:
+ # Call the method defined in the subclass
+ message = self.prepare_search_message(search_string)
+
+ def my_callback(session, message, user_data):
+ self.handle_response_received(message, callback,
error_callback)
+
+ soup_session.queue_message(message, my_callback, None)
+ except Exception as ex:
+ error_callback(ex, search_string)
+
+ def search(self, search_string, callback, error_callback):
+ '''
+ This method will be called by the GUI with the `search_string`
when a
+ search is initiated. Returns search results from the cache or
initiates
+ a new search using :method:`search_async` if the search has not
been
+ performed before. Before calling the `callback`, saves the search
+ results to the cache.
+
+ This method should normally not be overwritten.
+ '''
+ # A tuple identifying the search, making it possible for the
callback
+ # function to deal with the results properly (otherwise results
arriving
+ # out of order could lead to wrongly displayed results)
+ user_data = (self.label, search_string)
+
+ if not search_string:
+ callback(user_data, [])
+ return
+
+ if search_string in self.search_cache:
+ log_debug('Result for "%s" already in cache.' % search_string)
+ callback(user_data, self.search_cache[search_string])
+ return
+
+ log_info('Search for "%s" is not cached by this provider, starting
new search' % search_string)
+
+ try:
+ def callback_wrapper(search_results):
+ '''
+ Before calling the actual callback, save the result in the
+ cache and add `user_data` (tuple identifying request and
search
+ provider) to the call.
+ '''
+ log_debug('Saving %s in cache for "%s"' % (search_results,
search_string))
+ self.search_cache[search_string] = search_results
+ callback(user_data, search_results)
+
+ self.search_async(search_string, callback_wrapper,
error_callback)
+ except Exception as ex:
+ error_callback(ex, None)
+
+ def handle_response_received(self, message, callback, error_callback):
+ '''
+ Will be called when the server returns a response. If the server
+ returns an invalid response, the error callback is called. In case
of
+ a valid response, the response will be parsed with
+ :method:`parse_respose` and the resulting list returned to the
callback.
+ '''
+ if message.status_code == Soup.KnownStatusCode.OK:
+
callback(self.parse_response(message.response_body.flatten().get_data()))
+ else:
+ error_callback(message.status_code, None)
+
+ def import_papers_after_search(self, papers, callback):
+ '''
+ This method will be called if multiple `papers` are requested for
import.
+ In the default case, this imports one paper after the other calling
+ :method:`import_paper_after_search` for each individual paper
(which in
+ turn calls the the `callback` for each result).
+ Some search providers allow downloading multiple papers in one bulk
+ operation, such providers should overwrite this function.
+ '''
+ identifier = '%s_%f' % (self.label, time.time()) #only used for
status message
+
+ def import_all_papers(paper_list):
+ active_threads[identifier] = 'Importing %d papers' %
len(paper_list)
+ def my_callback(paper_obj=None, paper_data=None,
paper_info=None,
+ user_data=None):
+
+ callback(paper_obj=paper_obj, paper_data=paper_data,
+ paper_info=paper_info, user_data=user_data)
+ import_all_papers(paper_list)
+
+ if len(paper_list):
+ one_paper = paper_list.pop()
+ self.import_paper_after_search(one_paper, my_callback)
+ else:
+ if identifier in active_threads:
+ del active_threads[identifier]
+
+ import_all_papers(papers)
+
+ def import_paper_after_search(self, paper_obj, callback):
+ '''
+ This method is called when a search result is requested to be
imported.
+ The given `paper_obj` is a :class:`VirtualPaper` which has all the
+ information previously returned by the search as attributes, e.g.
+ `paper_obj.doi` is its DOI. The special attribute `data` should be
used
+ for information that can be useful for importing the paper, in
addition
+ to the default paper attributes. For example,
+ :class:`GoogleScholarSearch` saves the complete HTML code for a
search
+ result, which contains a link to BibTeX data and possibly to a PDF
+ document.
+
+ If this method is not overwritten, it asynchronously downloads a
+ document given in import_url (if any) and returns the original
+ `paper_obj` and possibly the PDF document to the callback. In case
the
+ search provider does not have any info to add to the initial search
+ result, this is all that is needed. In cases where the search
provider
+ can add more information (e.g. the :class:`PubMedSearch` only
requests
+ summaries for the search, but when a specific paper is requested it
+ gets the full record), this method should be overwritten.
+ '''
+ # in case the paper already had an import URL, download from this
URL
+ if hasattr(paper_obj, 'import_url') and paper_obj.import_url:
+ message = Soup.Message.new(method='GET',
+ uri_string=paper_obj.import_url)
+
+ def mycallback(session, message, user_data):
+ if message.status_code == Soup.KnownStatusCode.OK:
+ paper_data = message.response_body.flatten().get_data()
+ callback(paper_obj=paper_obj,
+ paper_data=paper_data,
+ user_data=user_data)
+ else:
+ log_error("%: got status %s while trying to fetch
PDF" % (self.__class__.__name__,
+
message.status_code))
+ callback(paper_obj=paper_obj, user_data=user_data)
+
+ log_debug("%s: trying to fetch %s" % (self.__class__.__name__,
+ paper_obj.import_url))
+ soup_session.queue_message(message, mycallback,
+ (self.label, paper_obj.import_url))
+ else:
+ callback(paper_obj=paper_obj, user_data=self.label)
+
+ #
-------------------------------------------------------------------------
+ # Methods to overwrite in sub classes for the simple case, see class
+ # documentation
+ #
-------------------------------------------------------------------------
+ def prepare_search_message(self, search_string):
+ '''
+ If :method:`search_async` is not overwritten, this method should be
+ overwritten to return a :class:`Soup.Message`, representing the
query
+ that should be send to the website. In many cases, this is as
simple as
+ ::
+ uri = 'http://example.com/search?query=%s' % search_string
+ return Soup.Message.new(method='GET', uri_string=uri)
+
+ '''
+ raise NotImplementedError()
+
+ def parse_response(self, response):
+ '''
+ In case :method:`handle_response_received` has not been
overwritten,
+ this method will be called to parse a response (typically HTML or
XML).
+ It is expected to return a list of search results (see
+ :method:`search_async` for further details).
+ '''
+ raise NotImplementedError()
=======================================
--- /gpapers/importer/__init__.py Tue Jun 26 04:09:00 2012
+++ /gpapers/importer/__init__.py Tue Jun 26 07:25:39 2012
@@ -315,213 +315,3 @@
user_data=user_data)

_import_from_urls(urls, _import_from_urls_finished, user_data)
-
-
-class WebSearchProvider(object):
- '''
- Base class for all web search providers, i.e. websites or web APIs that
- return a number of search results for a search string.
-
- Implementation of new search providers should derive from this class
and
- have to provide the following attributes as class attributes:
- * `name`: A human readable name that is used in the left column of the
GUI,
- e.g. "Google Scholar".
- * `label`: A simple name that is used for saving searches to the
database,
- e.g. "gscholar"
- * `icon` : The name of an icon (expected in the `icons` subdirectory
- currently) used in the left column of the GUI, e.g.
- "favicon_google.ico". If no icon name is provided, a
standard
- icon is used.
- * `unique_key`: Defining under what circumstances a search result
should be
- onsidered a duplicate of an existing paper in the
- database. For a PubMED search, for example, this
should be
- 'pubmed_id'. If the unique key is not set, 'doi' is
used
- as a default.
-
-
- A search provider has to provide the following methods:
- * `search_async(self, search_string, callback)`
- This method receives a search string from the GUI and should call the
- callback with a list of search results where each single result is a
- dictionary containing all the information that could be fetched from
the
- webpage, e.g.:
- [{'title': 'A paper title', 'authors': ['Author A', 'Author B']},
- {'title': 'Another paper', 'authors': ['Author C'],
- 'import_url': 'http://example.com/paper.pdf'}]
- In addition, each paper can also contain arbitrary additional data as
the
- value for a 'data' key. This could for example be used to save the full
- HTML code of a search result (which might be useful for an import of
this
- paper) as opposed to only the extracted information.
- This method should not block but use the :class:`AsyncSoupSession`
object
- `importer.soup_session` for getting the information.
-
- * `import_paper_after_search(self, paper, callback)`
- This method receives the a :class:`VirtualPaper` object, already
- filled with the information previously returned from the search. It
should
- call the callback when it is finished processing (which may include
getting
- more information from webpages -- in this case
the :class:`AsyncSoupSession`
- object `importer.soup_session` should be used to asynchronously fetch
the
- page(s)). The callback function has to be called with ...
-
- TODO
-
- The :method:`__init__` method of the subclass has to call the
- :method:`__init__` method of the superclass.
- '''
-
- unique_key = 'doi'
-
- def __init__(self):
- # Remember previous search results so that no new search is
necessary.
- # Useful especially if switching between libraries/searches in the
left
- # pane
- self.search_cache = {}
-
- def __str__(self):
- return self.name
-
- def clear_cache(self, text):
- if text in self.search_cache:
- del self.search_cache[text]
-
- def search(self, search_string, callback, error_callback):
-
- # A tuple identifying the search, making it possible for the
callback
- # function to deal with the results properly (otherwise results
arriving
- # out of order could lead to wrongly displayed results)
- user_data = (self.label, search_string)
-
- if not search_string:
- callback(user_data, [])
- return
-
- if search_string in self.search_cache:
- log_debug('Result for "%s" already in cache.' % search_string)
- callback(user_data, self.search_cache[search_string])
- return
-
- log_info('Search for "%s" is not cached by this provider, starting
new search' % search_string)
-
- try:
- def callback_wrapper(search_results):
- '''
- Before calling the actual callback, save the result in the
- cache and add `user_data` (tuple identifying request and
search
- provider) to the call.
- '''
- log_debug('Saving %s in cache for "%s"' % (search_results,
search_string))
- self.search_cache[search_string] = search_results
- callback(user_data, search_results)
-
- self.search_async(search_string, callback_wrapper,
error_callback)
- except Exception as ex:
- error_callback(ex, None)
-
- def import_paper_after_search(self, paper, callback):
- raise NotImplementedError()
-
- def import_papers_after_search(self, papers, callback):
-
- identifier = '%s_%f' % (self.label, time.time()) #only used for
status message
-
- def import_all_papers(paper_list):
- active_threads[identifier] = 'Importing %d papers' %
len(paper_list)
- def my_callback(paper_obj=None, paper_data=None,
paper_info=None,
- user_data=None):
-
- callback(paper_obj=paper_obj, paper_data=paper_data,
- paper_info=paper_info, user_data=user_data)
- import_all_papers(paper_list)
-
- if len(paper_list):
- one_paper = paper_list.pop()
- self.import_paper_after_search(one_paper, my_callback)
- else:
- if identifier in active_threads:
- del active_threads[identifier]
-
- import_all_papers(papers)
-
- def search_async(self, search_string, callback, error_callback):
- raise NotImplementedError()
-
-class SimpleWebSearchProvider(WebSearchProvider):
- '''
- Convenience class for web searches that do a single request to a
website for
- a search and do not have to perform additional web requests to get more
- detailed info for a paper chosen for import.
-
- Such web search providers need only to provide two simple functions
- (see :class:`importer.jstor.JSTORSearch` for an example):
- * prepare_search_message(self, search_string)
- Has to construct and return a `Soup.Message` object using
- `Soup.Message.new`, for example:
- ..
- return Soup.Message.new(method='GET',
-
uri_string='http://example.com/search?query=search_string')
-
- * parse_response(self, response):
- Receives the HTML response of the website and should return a paper
info
- dictionary (see :class:`WebSearchProvider`).
- '''
-
- def __init__(self):
- WebSearchProvider.__init__(self)
-
- def search_async(self, search_string, callback, error_callback):
- try:
- # Call the method defined in the subclass
- message = self.prepare_search_message(search_string)
-
- def my_callback(session, message, user_data):
- self.response_received(message, callback, error_callback)
-
- soup_session.queue_message(message, my_callback, None)
- except Exception as ex:
- error_callback(ex, search_string)
-
- def response_received(self, message, callback, error_callback):
- '''
- Will be called when the server returns a response.
- '''
- if message.status_code == Soup.KnownStatusCode.OK:
- #try:
-
callback(self.parse_response(message.response_body.flatten().get_data()))
- #except Exception as ex:
- # error_callback(ex, user_data)
- else:
- error_callback(message.status_code, None)
-
- def import_paper_after_search(self, paper_obj, callback):
- # in case the paper already had an import URL, download from this
URL
- if hasattr(paper_obj, 'import_url') and paper_obj.import_url:
- message = Soup.Message.new(method='GET',
- uri_string=paper_obj.import_url)
-
- def mycallback(session, message, user_data):
- if message.status_code == Soup.KnownStatusCode.OK:
- log_debug("%s: received pdf length %s" %
(self.__class__.__name__,
-
message.response_body.length))
- callback(paper_obj=paper_obj,
-
paper_data=message.response_body.flatten().get_data(),
- user_data=user_data)
- else:
- log_error("%: got status %s while trying to fetch
PDF" % (self.__class__.__name__,
-
message.status_code))
- callback(paper_obj=paper_obj, user_data=user_data)
-
- log_debug("%s: trying to fetch %s" % (self.__class__.__name__,
- paper_obj.import_url))
- soup_session.queue_message(message, mycallback,
- (self.label, paper_obj.import_url))
- else:
- callback(paper_obj=paper_obj, user_data=self.label)
-
- #
-------------------------------------------------------------------------
- # Methods to overwrite in sub classes
- #
-------------------------------------------------------------------------
- def prepare_search_message(self, search_string):
- raise NotImplementedError()
-
- def parse_response(self, response):
- raise NotImplementedError()
=======================================
--- /gpapers/importer/arxiv.py Tue Jun 26 04:09:00 2012
+++ /gpapers/importer/arxiv.py Tue Jun 26 07:25:39 2012
@@ -23,7 +23,8 @@
from gi.repository import Soup
import feedparser

-from gpapers.importer import SimpleWebSearchProvider, soup_session
+from gpapers.importer import soup_session
+from gpapers.importer.provider_base import WebSearchProvider
from gpapers.logger import log_debug, log_error

BASE_URL = 'http://export.arxiv.org/api/query?'
@@ -31,7 +32,7 @@
SORT_ORDER = "descending"
MAX_RESULTS = 100

-class ArxivSearch(SimpleWebSearchProvider):
+class ArxivSearch(WebSearchProvider):
"""
A search provider/importer for the arXiv.org e-print archive,
covering physics, mathematics, computer science, etc.
=======================================
--- /gpapers/importer/google_scholar.py Tue Jun 26 04:09:00 2012
+++ /gpapers/importer/google_scholar.py Tue Jun 26 07:25:39 2012
@@ -27,12 +27,13 @@

from gpapers.logger import log_debug, log_info, log_error
from gpapers.importer.bibtex import paper_info_from_bibtex
-from gpapers.importer import SimpleWebSearchProvider, html_strip,
soup_session
+from gpapers.importer import html_strip, soup_session
+from gpapers.importer.provider_base import WebSearchProvider

BASE_URL = 'http://scholar.google.com/'


-class GoogleScholarSearch(SimpleWebSearchProvider):
+class GoogleScholarSearch(WebSearchProvider):

name = 'Google Scholar'
label = 'google_scholar'
@@ -40,7 +41,7 @@
unique_key = 'import_url'

def __init__(self):
- SimpleWebSearchProvider.__init__(self)
+ WebSearchProvider.__init__(self)

# create a "google ID" used later for setting our preferences
instead
# of using a cookie
=======================================
--- /gpapers/importer/jstor.py Sun Jun 24 05:30:19 2012
+++ /gpapers/importer/jstor.py Tue Jun 26 07:25:39 2012
@@ -20,8 +20,8 @@
from gi.repository import Soup
from BeautifulSoup import BeautifulStoneSoup

-from gpapers.importer import SimpleWebSearchProvider
-from gpapers.logger import *
+from gpapers.importer.provider_base import WebSearchProvider
+from gpapers.logger import log_debug

QUERY_STRING = 'http://dfr.jstor.org/sru/?version=1.1&' + \
'operation=searchRetrieve&query=%(query)s&' + \
@@ -29,14 +29,14 @@
'recordSchema=info:srw/schema/srw_jstor'


-class JSTORSearch(SimpleWebSearchProvider):
+class JSTORSearch(WebSearchProvider):

name = 'JSTOR'
label = 'jstor'
icon = 'favicon_jstor.ico'

def __init__(self):
- SimpleWebSearchProvider.__init__(self)
+ WebSearchProvider.__init__(self)
# TODO: Make this configurable
self.max_results = 20

@@ -101,4 +101,3 @@
papers.append(paper)

return papers
-
=======================================
--- /gpapers/importer/pubmed.py Tue Jun 26 04:09:00 2012
+++ /gpapers/importer/pubmed.py Tue Jun 26 07:25:39 2012
@@ -24,7 +24,8 @@
from gi.repository import Soup # @UnresolvedImport

from gpapers.logger import log_debug, log_info, log_error
-from gpapers.importer import WebSearchProvider, soup_session
+from gpapers.importer import soup_session
+from gpapers.importer.provider_base import WebSearchProvider

BASE_URL = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/'
ESEARCH_QUERY = 'esearch.fcgi?db=pubmed&term=%s&usehistory=y'

==============================================================================
Revision: df9004aa48a5
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 07:29:35 2012
Log: remove some debug messages
http://code.google.com/p/gpapers/source/detail?r=df9004aa48a5

Modified:
/gpapers/__init__.py

=======================================
--- /gpapers/__init__.py Tue Jun 26 04:09:00 2012
+++ /gpapers/__init__.py Tue Jun 26 07:29:35 2012
@@ -1601,8 +1601,6 @@
paper_information_toolbar.remove(child)
self.displayed_paper = None

- log_debug('rows: %s' % str(rows))
-
if not rows or len(rows) == 0:
self.update_bookmark_pane_from_paper(None)
elif len(rows) == 1:
@@ -1648,7 +1646,6 @@

#self.ui.get_object('paper_information_pane').get_buffer().set_text( '\n'.join(description)
)

if paper.doi or paper.import_url:
- log_debug('URL or DOI exists')
button = Gtk.ToolButton(stock_id=Gtk.STOCK_HOME)
button.set_tooltip_text('Open this URL in your browser...')
url = paper.import_url

==============================================================================
Revision: fde563c6982b
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 07:37:01 2012
Log: add documentation scaffolding generated with sphinx-quickstart
http://code.google.com/p/gpapers/source/detail?r=fde563c6982b

Added:
/sphinx-dox/Makefile
/sphinx-dox/source/conf.py
/sphinx-dox/source/index.rst
Modified:
/.hgignore

=======================================
--- /dev/null
+++ /sphinx-dox/Makefile Tue Jun 26 07:37:01 2012
@@ -0,0 +1,153 @@
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS =
+SPHINXBUILD = sphinx-build
+PAPER =
+BUILDDIR = build
+
+# Internal variables.
+PAPEROPT_a4 = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER))
$(SPHINXOPTS) source
+# the i18n builder cannot share the environment and doctrees with the
others
+I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
+
+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp
devhelp epub latex latexpdf text man changes linkcheck doctest gettext
+
+help:
+ @echo "Please use \`make <target>' where <target> is one of"
+ @echo " html to make standalone HTML files"
+ @echo " dirhtml to make HTML files named index.html in directories"
+ @echo " singlehtml to make a single large HTML file"
+ @echo " pickle to make pickle files"
+ @echo " json to make JSON files"
+ @echo " htmlhelp to make HTML files and a HTML help project"
+ @echo " qthelp to make HTML files and a qthelp project"
+ @echo " devhelp to make HTML files and a Devhelp project"
+ @echo " epub to make an epub"
+ @echo " latex to make LaTeX files, you can set PAPER=a4 or
PAPER=letter"
+ @echo " latexpdf to make LaTeX files and run them through pdflatex"
+ @echo " text to make text files"
+ @echo " man to make manual pages"
+ @echo " texinfo to make Texinfo files"
+ @echo " info to make Texinfo files and run them through makeinfo"
+ @echo " gettext to make PO message catalogs"
+ @echo " changes to make an overview of all changed/added/deprecated
items"
+ @echo " linkcheck to check all external links for integrity"
+ @echo " doctest to run all doctests embedded in the documentation (if
enabled)"
+
+clean:
+ -rm -rf $(BUILDDIR)/*
+
+html:
+ $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+ @echo
+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+dirhtml:
+ $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+ @echo
+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+singlehtml:
+ $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+ @echo
+ @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+pickle:
+ $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+ @echo
+ @echo "Build finished; now you can process the pickle files."
+
+json:
+ $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+ @echo
+ @echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+ $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+ @echo
+ @echo "Build finished; now you can run HTML Help Workshop with the" \
+ ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+qthelp:
+ $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+ @echo
+ @echo "Build finished; now you can run "qcollectiongenerator" with the" \
+ ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+ @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/gPapers.qhcp"
+ @echo "To view the help file:"
+ @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/gPapers.qhc"
+
+devhelp:
+ $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+ @echo
+ @echo "Build finished."
+ @echo "To view the help file:"
+ @echo "# mkdir -p $$HOME/.local/share/devhelp/gPapers"
+ @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/gPapers"
+ @echo "# devhelp"
+
+epub:
+ $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+ @echo
+ @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+latex:
+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+ @echo
+ @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+ @echo "Run \`make' in that directory to run these through (pdf)latex" \
+ "(use \`make latexpdf' here to do that automatically)."
+
+latexpdf:
+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+ @echo "Running LaTeX files through pdflatex..."
+ $(MAKE) -C $(BUILDDIR)/latex all-pdf
+ @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+text:
+ $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+ @echo
+ @echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+man:
+ $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+ @echo
+ @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+texinfo:
+ $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+ @echo
+ @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
+ @echo "Run \`make' in that directory to run these through makeinfo" \
+ "(use \`make info' here to do that automatically)."
+
+info:
+ $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+ @echo "Running Texinfo files through makeinfo..."
+ make -C $(BUILDDIR)/texinfo info
+ @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
+
+gettext:
+ $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
+ @echo
+ @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
+
+changes:
+ $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+ @echo
+ @echo "The overview file is in $(BUILDDIR)/changes."
+
+linkcheck:
+ $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+ @echo
+ @echo "Link check complete; look for any errors in the above output " \
+ "or in $(BUILDDIR)/linkcheck/output.txt."
+
+doctest:
+ $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+ @echo "Testing of doctests in the sources finished, look at the " \
+ "results in $(BUILDDIR)/doctest/output.txt."
=======================================
--- /dev/null
+++ /sphinx-dox/source/conf.py Tue Jun 26 07:37:01 2012
@@ -0,0 +1,246 @@
+# -*- coding: utf-8 -*-
+#
+# gPapers documentation build configuration file, created by
+# sphinx-quickstart on Tue Jun 26 16:33:58 2012.
+#
+# This file is execfile()d with the current directory set to its
containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import sys, os
+
+# If extensions (or modules to document with autodoc) are in another
directory,
+# add these directories to sys.path here. If the directory is relative to
the
+# documentation root, use os.path.abspath to make it absolute, like shown
here.
+#sys.path.insert(0, os.path.abspath('.'))
+
+# -- General configuration
-----------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions =
['sphinx.ext.autodoc', 'sphinx.ext.intersphinx', 'sphinx.ext.coverage', 'sphinx.ext.viewcode']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix of source filenames.
+source_suffix = '.rst'
+
+# The encoding of source files.
+#source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'gPapers'
+copyright = u'2012, Derek Anderson and Marcel Stimberg'
+
+# The version info for the project you're documenting, acts as replacement
for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = '0.5dev'
+# The full version, including alpha/beta/rc tags.
+release = '0.5dev'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#language = None
+
+# There are two options for replacing |today|: either, you set today to
some
+# non-false value, then it is used:
+#today = ''
+# Else, today_fmt is used as the format for a strftime call.
+#today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = []
+
+# The reST default role (used for this markup: `text`) to use for all
documents.
+#default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+#modindex_common_prefix = []
+
+
+# -- Options for HTML output
---------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+html_theme = 'default'
+
+# Theme options are theme-specific and customize the look and feel of a
theme
+# further. For a list of options available for each theme, see the
+# documentation.
+#html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this
directory.
+#html_theme_path = []
+
+# The name for this set of Sphinx documents. If None, it defaults to
+# "<project> v<release> documentation".
+#html_title = None
+
+# A shorter title for the navigation bar. Default is the same as
html_title.
+#html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the
top
+# of the sidebar.
+#html_logo = None
+
+# The name of an image file (within the static path) to use as favicon of
the
+# docs. This file should be a Windows icon file (.ico) being 16x16 or
32x32
+# pixels large.
+#html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets)
here,
+# relative to this directory. They are copied after the builtin static
files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page
bottom,
+# using the given strftime format.
+#html_last_updated_fmt = '%b %d, %Y'
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#html_additional_pages = {}
+
+# If false, no module index is generated.
+#html_domain_indices = True
+
+# If false, no index is generated.
+#html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is
True.
+#html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is
True.
+#html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages
will
+# contain a <link> tag referring to it. The value of this option must be
the
+# base URL from which the finished HTML is served.
+#html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+#html_file_suffix = None
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'gPapersdoc'
+
+
+# -- Options for LaTeX output
--------------------------------------------------
+
+latex_elements = {
+# The paper size ('letterpaper' or 'a4paper').
+#'papersize': 'letterpaper',
+
+# The font size ('10pt', '11pt' or '12pt').
+#'pointsize': '10pt',
+
+# Additional stuff for the LaTeX preamble.
+#'preamble': '',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title, author, documentclass
[howto/manual]).
+latex_documents = [
+ ('index', 'gPapers.tex', u'gPapers Documentation',
+ u'Derek Anderson and Marcel Stimberg', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the
top of
+# the title page.
+#latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are
parts,
+# not chapters.
+#latex_use_parts = False
+
+# If true, show page references after internal links.
+#latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#latex_show_urls = False
+
+# Documents to append as an appendix to all manuals.
+#latex_appendices = []
+
+# If false, no module index is generated.
+#latex_domain_indices = True
+
+
+# -- Options for manual page output
--------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+ ('index', 'gpapers', u'gPapers Documentation',
+ [u'Derek Anderson and Marcel Stimberg'], 1)
+]
+
+# If true, show URL addresses after external links.
+#man_show_urls = False
+
+
+# -- Options for Texinfo output
------------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+# dir menu entry, description, category)
+texinfo_documents = [
+ ('index', 'gPapers', u'gPapers Documentation',
+ u'Derek Anderson and Marcel Stimberg', 'gPapers', 'One line description
of project.',
+ 'Miscellaneous'),
+]
+
+# Documents to append as an appendix to all manuals.
+#texinfo_appendices = []
+
+# If false, no module index is generated.
+#texinfo_domain_indices = True
+
+# How to display URL addresses: 'footnote', 'no', or 'inline'.
+#texinfo_show_urls = 'footnote'
+
+
+# Example configuration for intersphinx: refer to the Python standard
library.
+intersphinx_mapping = {'http://docs.python.org/': None}
=======================================
--- /dev/null
+++ /sphinx-dox/source/index.rst Tue Jun 26 07:37:01 2012
@@ -0,0 +1,22 @@
+.. gPapers documentation master file, created by
+ sphinx-quickstart on Tue Jun 26 16:33:58 2012.
+ You can adapt this file completely to your liking, but it should at
least
+ contain the root `toctree` directive.
+
+Welcome to gPapers's documentation!
+===================================
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
=======================================
--- /.hgignore Thu Feb 9 14:30:14 2012
+++ /.hgignore Tue Jun 26 07:37:01 2012
@@ -3,3 +3,4 @@
*~
*.pyc
.*
+sphinx-doc/build

==============================================================================
Revision: 0ef338bdea05
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 08:36:55 2012
Log: set up some basic reference documentation
http://code.google.com/p/gpapers/source/detail?r=0ef338bdea05

Added:
/sphinx-dox/source/reference-gpapers.rst
/sphinx-dox/source/reference-importer.rst
/sphinx-dox/source/reference-logger.rst
/sphinx-dox/source/reference.rst
Modified:
/.hgignore
/gpapers/__init__.py
/gpapers/importer/__init__.py
/gpapers/importer/bibtex.py
/gpapers/importer/provider_base.py
/sphinx-dox/source/conf.py
/sphinx-dox/source/index.rst

=======================================
--- /dev/null
+++ /sphinx-dox/source/reference-gpapers.rst Tue Jun 26 08:36:55 2012
@@ -0,0 +1,30 @@
+Main module
+===========
+
+Utility functions in the gpapers module
+---------------------------------------
+.. automodule:: gpapers
+ :members:
+
+GUI classes
+-----------
+.. autoclass:: MainGUI
+ :members:
+
+.. autoclass:: PaperEditGUI
+ :members:
+
+.. autoclass:: AuthorEditGUI
+ :members:
+
+.. autoclass:: OrganizationEditGUI
+ :members:
+
+.. autoclass:: SourceEditGUI
+ :members:
+
+.. autoclass:: ReferenceEditGUI
+ :members:
+
+.. autoclass:: CitationEditGUI
+ :members:
=======================================
--- /dev/null
+++ /sphinx-dox/source/reference-importer.rst Tue Jun 26 08:36:55 2012
@@ -0,0 +1,39 @@
+Importer
+========
+
+General import functions
+------------------------
+
+.. automodule:: gpapers.importer
+ :members:
+
+BibTeX parsing
+--------------
+.. automodule:: gpapers.importer.bibtex
+ :members:
+
+PDF file parsing
+----------------
+.. automodule:: gpapers.importer.pdf_file
+ :members:
+
+Base class for web searches/imports
+-----------------------------------
+.. autoclass:: gpapers.importer.provider_base.WebSearchProvider
+ :members:
+
+Web search/import classes
+-------------------------
+
+.. autoclass:: gpapers.importer.arxiv.ArxivSearch
+ :members:
+
+.. autoclass:: gpapers.importer.google_scholar.GoogleScholarSearch
+ :members:
+
+.. autoclass:: gpapers.importer.pubmed.PubMedSearch
+ :members:
+
+.. autoclass:: gpapers.importer.jstor.JSTORSearch
+ :members:
+
=======================================
--- /dev/null
+++ /sphinx-dox/source/reference-logger.rst Tue Jun 26 08:36:55 2012
@@ -0,0 +1,6 @@
+Logging facilities
+==================
+
+.. automodule:: gpapers.logger
+ :members:
+
=======================================
--- /dev/null
+++ /sphinx-dox/source/reference.rst Tue Jun 26 08:36:55 2012
@@ -0,0 +1,11 @@
+Reference
+=========
+
+.. toctree::
+ :maxdepth: 2
+
+ reference-gpapers
+ reference-logger
+ reference-importer
+
+
=======================================
--- /.hgignore Tue Jun 26 07:37:01 2012
+++ /.hgignore Tue Jun 26 08:36:55 2012
@@ -3,4 +3,4 @@
*~
*.pyc
.*
-sphinx-doc/build
+build/*
=======================================
--- /gpapers/__init__.py Tue Jun 26 07:29:35 2012
+++ /gpapers/__init__.py Tue Jun 26 08:36:55 2012
@@ -1995,8 +1995,8 @@
def delete_object(self, text, obj, update_function):
'''
Asks for confirmation before deleting an object and calling an
update
- function (e.g. :method:`refresh_left_pane`). Is called by the
- more specific methods like :method:`delete_playlist` etc.
+ function (e.g. :meth:`refresh_left_pane`). Is called by the
+ more specific methods like :meth:`delete_playlist` etc.
'''
dialog = Gtk.MessageDialog(type=Gtk.MessageType.QUESTION,
buttons=Gtk.ButtonsType.YES_NO,
=======================================
--- /gpapers/importer/__init__.py Tue Jun 26 07:25:39 2012
+++ /gpapers/importer/__init__.py Tue Jun 26 08:36:55 2012
@@ -137,7 +137,7 @@
Searches the given url (asynchronously) for a PDF and/or metadata
(currently it will only look for BibTeX data). If the URL is a standard
HTML page, it will be searched for potential links, then
- :function:`import_from_urls` will be called with this link. Finally,
the
+ :func:`import_from_urls` will be called with this link. Finally, the
callback is called with the `paper_info` (a dictionary) and
`paper_data`
(binary data) as an argument.
'''
=======================================
--- /gpapers/importer/bibtex.py Sun Jun 24 05:30:19 2012
+++ /gpapers/importer/bibtex.py Tue Jun 26 08:36:55 2012
@@ -149,20 +149,7 @@

def latex2unicode(s):
"""
- * \`{o} produces a grave accent
- * \'{o} produces an acute accent
- * \^{o} produces a circumflex
- * \"{o} produces an umlaut or dieresis
- * \H{o} produces a long Hungarian umlaut
- * \~{o} produces a tilde
- * \c{c} produces a cedilla
- * \={o} produces a macron accent (a bar over the letter)
- * \b{o} produces a bar under the letter
- * \.{o} produces a dot over the letter
- * \d{o} produces a dot under the letter
- * \u{o} produces a breve over the letter
- * \v{o} produces a "v" over the letter
- * \t{oo} produces a "tie" (inverted u) over the two letters
+ Not useful at the moment.
"""
# TODO: expand this to really work
return s
=======================================
--- /gpapers/importer/provider_base.py Tue Jun 26 07:25:39 2012
+++ /gpapers/importer/provider_base.py Tue Jun 26 08:36:55 2012
@@ -33,55 +33,63 @@

Implementation of new search providers should derive from this class
and
have to provide the following attributes as class attributes:
- * `name`: A human readable name that is used in the left column of the
GUI,
- e.g. "Google Scholar".
- * `label`: A simple name that is used for saving searches to the
database,
- e.g. "gscholar"
- * `icon` : The name of an icon (expected in the `icons` subdirectory
- currently) used in the left column of the GUI, e.g.
- "favicon_google.ico". If no icon name is provided, a
standard
- icon is used.
- * `unique_key`: Defining under what circumstances a search result
should be
- onsidered a duplicate of an existing paper in the
- database. For a PubMED search, for example, this
should be
- 'pubmed_id'. If the unique key is not set, 'doi' is
used
- as a default.
+
+ `name`
+ A human readable name that is used in the left column of the GUI,
+ e.g. "Google Scholar".
+ `label`
+ A simple name that is used for saving searches to the database,
+ e.g. "gscholar"
+ `icon`
+ The name of an icon (expected in the `icons` subdirectory
+ currently) used in the left column of the GUI, e.g.
+ "favicon_google.ico". If no icon name is provided, a standard
+ icon is used.
+ `unique_key`
+ Defining under what circumstances a search result should be
+ considered a duplicate of an existing paper in the
+ database. For a PubMED search, for example, this should be
+ 'pubmed_id'. If the unique key is not set, 'doi' is used
+ as a default.

In the simplest case (for the search, a single request to a website is
sufficient and that is all the data that is needed for an import), it
is
enough to overwrite two relatively simple methods (see
:class:`importer.jstor.JSTORSearch` for an example):
- * prepare_search_message(self, search_string)
- Has to construct and return a `Soup.Message` object using
- `Soup.Message.new`, for example:
- ..
- return Soup.Message.new(method='GET',
-
uri_string='http://example.com/search?query=search_string')
-
- * parse_response(self, response):
- Receives the HTML response of the website and should return a list of
- paper info dictionaries (see :method:`search_async`).
+
+ .. method:: prepare_search_message(self, search_string)
+
+ Has to construct and return a `Soup.Message` object using
+ `Soup.Message.new`.
+
+ .. method:: parse_response(self, response):
+
+ Receives the HTML response of the website and should return a list
of
+ paper info dictionaries (see :meth:`search_async`).

For more complex scenarios, the following methods can be overwritten:
- * `search_async(self, search_string, callback)`
- This method receives a search string from the GUI and should call the
- callback with a list of paper info dictionaries
(see :method:`search_async`)
-
- * `import_paper_after_search(self, paper, callback)`
- This method receives the a :class:`VirtualPaper` object, already
- filled with the information previously returned from the search. It
should
- call the callback when it is finished processing (which may include
getting
- more information from webpages -- in this case
the :class:`AsyncSoupSession`
- object `importer.soup_session` should be used to asynchronously fetch
the
- page(s)) (see :method:`import_paper_after_search`).
+
+ .. method:: search_async(self, search_string, callback)
+
+ This method receives a search string from the GUI and should call
the
+ callback with a list of paper info dictionaries
(see :meth:`search_async`)
+
+ .. method:: import_paper_after_search(self, paper, callback)
+
+ This method receives the a :class:`VirtualPaper` object, already
+ filled with the information previously returned from the search. It
should
+ call the callback when it is finished processing (which may include
getting
+ more information from webpages -- in this case
the :class:`AsyncSoupSession`
+ object `importer.soup_session` should be used to asynchronously
fetch the
+ page(s)) (see :meth:`import_paper_after_search`).

In case that the website supports the downloading of multiple papers at
once, it may also be more efficient to use this operation by
overwriting
- :method:`import_papers_after_search` which otherwise will call
- :method:`import_paper_after_search` for each paper.
-
- Note that if the subclass overwrites the :method:`__init__` method, it
has
- to call the :method:`__init__` of its superclass.
+ :meth:`import_papers_after_search` which otherwise will call
+ :meth:`import_paper_after_search` for each paper.
+
+ Note that if the subclass overwrites the :meth:`__init__` method, it
has
+ to call the :meth:`__init__` of its superclass.
'''

unique_key = 'doi'
@@ -115,9 +123,12 @@
search results to the callback. Each single search result is a
dictionary containing all the information that could be fetched
from the
webpage, e.g.:
- [{'title': 'A paper title', 'authors': ['Author A', 'Author B']},
- {'title': 'Another paper', 'authors': ['Author C'],
- 'import_url': 'http://example.com/paper.pdf'}]
+ ..
+
+ [{'title': 'A paper title', 'authors': ['Author A', 'Author
B']},
+ {'title': 'Another paper', 'authors': ['Author C'],
+ 'import_url': 'http://example.com/paper.pdf'}]
+
In addition, each paper can also contain arbitrary additional data
as the
value for a 'data' key. This could for example be used to save the
full
HTML code of a search result (which might be useful for an import
of this
@@ -141,7 +152,7 @@
'''
This method will be called by the GUI with the `search_string`
when a
search is initiated. Returns search results from the cache or
initiates
- a new search using :method:`search_async` if the search has not
been
+ a new search using :meth:`search_async` if the search has not been
performed before. Before calling the `callback`, saves the search
results to the cache.

@@ -183,7 +194,7 @@
Will be called when the server returns a response. If the server
returns an invalid response, the error callback is called. In case
of
a valid response, the response will be parsed with
- :method:`parse_respose` and the resulting list returned to the
callback.
+ :meth`parse_respose` and the resulting list returned to the
callback.
'''
if message.status_code == Soup.KnownStatusCode.OK:

callback(self.parse_response(message.response_body.flatten().get_data()))
@@ -194,7 +205,7 @@
'''
This method will be called if multiple `papers` are requested for
import.
In the default case, this imports one paper after the other calling
- :method:`import_paper_after_search` for each individual paper
(which in
+ :meth:`import_paper_after_search` for each individual paper (which
in
turn calls the the `callback` for each result).
Some search providers allow downloading multiple papers in one bulk
operation, such providers should overwrite this function.
@@ -269,7 +280,7 @@
#
-------------------------------------------------------------------------
def prepare_search_message(self, search_string):
'''
- If :method:`search_async` is not overwritten, this method should be
+ If :meth:`search_async` is not overwritten, this method should be
overwritten to return a :class:`Soup.Message`, representing the
query
that should be send to the website. In many cases, this is as
simple as
::
@@ -281,9 +292,9 @@

def parse_response(self, response):
'''
- In case :method:`handle_response_received` has not been
overwritten,
+ In case :meth:`handle_response_received` has not been overwritten,
this method will be called to parse a response (typically HTML or
XML).
It is expected to return a list of search results (see
- :method:`search_async` for further details).
+ :meth:`search_async` for further details).
'''
raise NotImplementedError()
=======================================
--- /sphinx-dox/source/conf.py Tue Jun 26 07:37:01 2012
+++ /sphinx-dox/source/conf.py Tue Jun 26 08:36:55 2012
@@ -18,6 +18,8 @@
# documentation root, use os.path.abspath to make it absolute, like shown
here.
#sys.path.insert(0, os.path.abspath('.'))

+# add root source directory
+sys.path.insert(0, os.path.abspath('../..'))
# -- General configuration
-----------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
=======================================
--- /sphinx-dox/source/index.rst Tue Jun 26 07:37:01 2012
+++ /sphinx-dox/source/index.rst Tue Jun 26 08:36:55 2012
@@ -11,6 +11,7 @@
.. toctree::
:maxdepth: 2

+ reference


Indices and tables

==============================================================================
Revision: f737e13dd054
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 09:10:10 2012
Log: add some more documentation and remove some more unnecessary code
http://code.google.com/p/gpapers/source/detail?r=f737e13dd054

Modified:
/gpapers/__init__.py

=======================================
--- /gpapers/__init__.py Tue Jun 26 08:36:55 2012
+++ /gpapers/__init__.py Tue Jun 26 09:10:10 2012
@@ -80,27 +80,11 @@
GObject.threads_init()


-def humanize_count(x, s, p, places=1):
- output = []
- if places == -1:
- places = 0
- print_x = False
- else:
- print_x = True
- x = float(x) * math.pow(10, places)
- x = round(x)
- x = x / math.pow(10, places)
- if x - int(x) == 0:
- x = int(x)
- if print_x: output.append(str(x))
- if x == 1:
- output.append(s)
- else:
- output.append(p)
- return ' '.join(output)
-
-
def truncate_long_str(s, max_length=96):
+ '''
+ Truncates a given string `s` at a certain length and adds an ellipsis
if
+ necessary.
+ '''
s = unicode(s)
if len(s) < max_length:
return s
@@ -324,7 +308,9 @@


class MainGUI:
-
+ '''
+ The main application window.
+ '''
active_threads = {}

def bibtex_received(self, bibtex_data, doi):
@@ -416,7 +402,8 @@
def import_url_dialog(self, o):
'''
Opens a dialog for entering an URL. For importing this URL,
- ``import_citation`` is called in a new thread.
+ :meth:`importer.import_from_url`` is called,
+ notifying :meth:`document_imported` when the data has arrived.
'''
dialog = Gtk.MessageDialog(parent=self.main_window,
type=Gtk.MessageType.QUESTION,
@@ -438,9 +425,9 @@

def import_doi_dialog(self, o):
'''
- Opens a dialog for entering a DOI. For importing this document
from the
- http://dx.doi.org/... URL, ``import_citation`` is called in a new
- thread.
+ Opens a dialog for entering a DOI. For importing the
+ ``http://dx.doi.org/...`` URL, :meth:`importer.import_from_url`` is
+ called, notifying :meth:`document_imported` when the data has
arrived.
'''
dialog = Gtk.MessageDialog(parent=self.main_window,
type=Gtk.MessageType.QUESTION,
@@ -464,7 +451,7 @@
def import_file_dialog(self, o):
'''
Opens a dialog for chosing one or several files. If any files are
- chosen, ``import_documents_via_filenames`` is called in a new
thread.
+ chosen, :func:`import_documents_via_filenames` is called.
'''
dialog = Gtk.FileChooserDialog(title='Select one or more files
import…',
parent=self.main_window,
@@ -493,7 +480,7 @@
def import_directory_dialog(self, o):
'''
Opens a dialog for chosing a directory. If a directory is chosen,
- ``import_documents_via_filenames`` is called in a new thread.
+ :func:`import_documents_via_filenames` is called.
'''
dialog = Gtk.FileChooserDialog(title='Select a directory to
import…',
parent=self.main_window,
@@ -521,9 +508,12 @@

def import_bibtex_dialog(self, o):
'''
- Opens a dialog for entering/pasting BibTex information. For
importing
- the information, ``import_documents_via_bibtexs`` is called in a
new
- thread.
+ Opens a dialog for entering/pasting BibTex information. The given
+ information is first converted into a ``paper_info`` dictionary via
+ :func:`bibtex.paper_info_from_bibtex`. If the returned paper info
+ contains an URL or a DOI, :meth:`importer.import_from_url` is
called
+ for this URL. In either case, :meth:`document_imported` is called
in
+ the end.
'''
dialog = Gtk.MessageDialog(parent=self.main_window,
type=Gtk.MessageType.QUESTION,
@@ -561,6 +551,9 @@
dialog.destroy()

def __init__(self):
+ '''
+ Initialize the main GUI.
+ '''
self.search_providers = {'pubmed' : pubmed.PubMedSearch(),
'google_scholar' :
google_scholar.GoogleScholarSearch(),
'jstor' : jstor.JSTORSearch(),
@@ -734,6 +727,7 @@
* `playlist_id` is the id of the playlist in the database
* `editable` is True only for Playlists (they can be renamed)
* `source` is used for saved searches and the searches themselves
+
'''
left_pane = self.ui.get_object('left_pane')
# name, icon, playlist_id, editable, source
@@ -1948,9 +1942,6 @@
toolbar_bookmarks.insert(button, -1)


- def echo_objects(self, a=None, b=None, c=None, d=None, e=None, f=None,
g=None):
- print a, b, c, d, e, f, g
-
# FIXME: only save after some time without changes
def update_paper_notes(self, text_buffer, id):
log_debug('update_paper_notes called for id %s' % str(id))
@@ -1968,10 +1959,14 @@
bookmark.save()

def delete_papers(self, paper_ids):
+ '''
+ Delete papers with the given `paper_ids`. Will first display a
dialog
+ asking for confirmation.
+ '''
papers = Paper.objects.in_bulk(paper_ids).values()
paper_list_text = '\n'.join([ ('<i>"%s"</i>' %
unicode(paper.title)) for paper in papers ])
dialog = Gtk.MessageDialog(type=Gtk.MessageType.QUESTION,
buttons=Gtk.ButtonsType.YES_NO, flags=Gtk.DialogFlags.MODAL)
- dialog.set_markup('Really delete the following %s?\n\n%s\n\n' %
(humanize_count(len(papers), 'paper', 'papers', places= -1),
paper_list_text))
+ dialog.set_markup('Really delete the following %d
paper(s)?\n\n%s\n\n' % (len(papers), paper_list_text))
dialog.set_default_response(Gtk.ResponseType.NO)
dialog.show_all()
response = dialog.run()
@@ -1983,6 +1978,10 @@
self.refresh_middle_pane_search()

def remove_papers_from_current_playlist(self, paper_ids):
+ '''
+ Delete papers with the given `paper_ids` from the currently
selected
+ playlist.
+ '''
if not self.current_playlist: return
try:
for paper in Paper.objects.in_bulk(paper_ids).values():

==============================================================================
Revision: cadbc110d77c
Author: Marcel Stimberg <marcel...@gmail.com>
Date: Tue Jun 26 09:22:12 2012
Log: add doc for models
http://code.google.com/p/gpapers/source/detail?r=cadbc110d77c

Added:
/sphinx-dox/source/reference-model.rst
Modified:
/gpapers/gPapers/models.py
/sphinx-dox/source/reference.rst

=======================================
--- /dev/null
+++ /sphinx-dox/source/reference-model.rst Tue Jun 26 09:22:12 2012
@@ -0,0 +1,50 @@
+The Django model
+================
+
+.. currentmodule:: gpapers.gPapers.models
+
+The model objects represent the corresponding tables in the database.
+
+.. autoclass:: Publisher
+ :members:
+
+.. autoclass:: Source
+ :members:
+
+.. autoclass:: Organization
+ :members:
+
+.. autoclass:: Author
+ :members:
+
+.. autoclass:: Sponsor
+ :members:
+
+.. autoclass:: Paper
+ :members:
+
+.. autoclass:: Reference
+ :members:
+
+.. autoclass:: Bookmark
+ :members:
+
+.. autoclass:: Playlist
+ :members:
+
+"Virtual models"
+----------------
+These objects are simple dummy classes that can be used for search results
which
+do not yet have an equivalent in the database.
+
+.. autoclass:: VirtualPaper
+ :members:
+
+.. autoclass:: VirtualAuthor
+ :members:
+
+.. autoclass:: VirtualSource
+ :members:
+
+
+
=======================================
--- /gpapers/gPapers/models.py Tue Jun 26 04:09:00 2012
+++ /gpapers/gPapers/models.py Tue Jun 26 09:22:12 2012
@@ -287,7 +287,7 @@

class VirtualPaper(object):
''' An object that can be treated for table display purposes as if it
were
- an already existing `Paper` object, i.e. it provides a title attribute
etc.
+ an already existing :class:`Paper` object, i.e. it provides a title
attribute etc.
'''

class VirtualSet(object):
=======================================
--- /sphinx-dox/source/reference.rst Tue Jun 26 08:36:55 2012
+++ /sphinx-dox/source/reference.rst Tue Jun 26 09:22:12 2012
@@ -7,5 +7,6 @@
reference-gpapers
reference-logger
reference-importer
+ reference-model


Reply all
Reply to author
Forward
0 new messages