META: In-house paper-related software for sharing and collaborative reading

1 view

Skip to first unread message

Bryan Bishop

unread,

Jul 11, 2009, 10:01:08 AM7/11/09

to papers, kan...@gmail.com, diytrans...@googlegroups.com

Hey all,

I have been meaning to work on some software for collaborative paper
reading/sharing. Many years ago I was working on a project I called
"AutoScholar", a terrible perl script that automatically retrieved
papers based on title from Google Scholar and ezproxy access points at
the local university. However, it was terrible and should be shot and
put down- never to return to this world, that is. Zotero actually has
some features that do the job much better, but it's all hidden away in
a GUI and maybe it can be extracted and massaged into surfraw modules,
or something.

One "improvement"-- if you could call it an improvement-- that I made
last year was called "Autozen 2008" for some reason. What the
"autozen" script did was flip up different papers from my collection
on one of my free monitors via accessing xpdf. Pages would display for
4 seconds each, and all pages from a paper would be displayed before
going on to the next page. It was last running for me back when I had
a five 21" CRT monitor setup in a dorm just a crosswalk away from the
UT Austin campus. Good times.

Originally my thinking was this: I tend to watch a lot of TV in the
background when I feel like being distracted while working on the
computer. However, there is usually nothing really interesting on
television. So instead of television, why not just show scientific
papers from my collection? That way, if I am going to be distracted, I
might as well be distracted by something moderately interesting or
informative. Think of it as a "science channel" of sorts.

There was no back button, no forward button, no bookmark button, no
"pause when the user is actually reading the page" feature, no log of
what had been already displayed, nothing.

In the mean time, I have been manually moving files back and forth
between the servers instead of using rsync. I know, I know, I'm
terrible. At the same time, I have also been announcing the papers to
pap...@postbiota.org and #hplusroadmap on irc.freenode.net (I know,
I'm terrible). At one point I was also using zotero for Firefox to add
to my collection.

Despite all of this software, I haven't made progress, but if I can be
sufficiently motivated by people asking for particular features, or
something, then maybe I'll actually get something done.

Features that I would like:

(1) CLI Zotero interface for grabbing particular papers, DOIs, etc.

(2) CLI Mendeley interface (maybe; they had some interesting ability
to extract DOIs from PDFs, and consequently grabbing metadata over the
web for that article)

(3) Zotero sqlite database integration (easy enough with some python
sqlite3 wrappers)

(4) IRC chat bot announcer for papers added to the collection

(5) rsync for anything added via Zotero (Zotero has a sync capability,
but I haven't played with it yet)

(6) pushing to citeulike.org and other citation services (maybe also
del.icio.us and twitter, and my RSS feed on my server)

(6.5) consideration of integration with ISI Web of Knowledge (or Web
of Science) for citation counting

(7) sending via mailing list. in the case of Zotero integration, all
emails should follow a certain format and definitely include the title
of the paper, a link to the PDF, and an abstract for the paper, and if
at all possible, BibTeX citing that particular paper.

(8) a notes-distribution system for any paper that I take notes on so
that others can share in my note-taking. I usually write a
"some.file.pdf.txt" file in the papers/ dir on my server, where I put
notes for a particular paper.

(9) "autozen" integration: anybody with rsync to the server should be
able to cycle through the latest papers as a screensaver or for a
reading session. There should also be back/forward buttons, pause when
the user is active over the screen, a history of previous papers that
have been displayed, and also a processactogram displaying when the
user was actually reading what based off of mouse-over or window-focus
logs (a processactogram is a script that a friend wrote that tracks
which apps had the user's focus during which time periods)

(10) the ability to let anybody sync up files with the right
information, either by dropping a file, a DOI, or something like that
- perhaps also by uploading

(11) bibliography integration: think of a bibliography as a "playlist"
for the "science channel"/autozen software.

(12) some minor web interface to manage all of this on the localhost (maybe)

(13) git tracking of log statuses, but not of the PDFs (which would be stupid)

(14) possibly some YAML format for storing BibTeX since YAML is easier
on the eyes and more fun to play with in python than yucky TeX.

(15) automatic periodic LaTeX-generated-PDF-output to show some fancy
stats on reading history, reading log, trends, whatever. (I don't have
any LaTeX templates laying around, so someone would have to help me
with this one.)

(also, BibTeX (and consequently endnote?) integration is a must, of course)

Is there anything else that I am missing? Please feel free to share
your ideas or feature requests for this sort of system.

One other thing that I should mention while I am at it. Back in 2006,
I was working on a Firefox extension that I called "autogoogler"--
IIRC, that name is already taken by another extension or greasemonkey
script, such as one of the few that automatically load the next page
of google search results (AutoPager) and for other sites. My extension
was mostly made out of greasemonkey scripts and automatically scrolled
through a page of google search results but prefixed each search
result with a number. As you scrolled through the document, you could
type numbers from your keypad, the numbers would display in big red
numbers in the middle (or the side) of the screen indicating which one
you selected, and in the end you would have a list of search results
that you were interested in. They could then be opened up in new tabs
in the background, or (this was only planned) saved to a file for
later use. Since then, I've been wanting to do something similar
except an extension for firefox that acts like a "search assistant" in
that it would keep track of each search query made over Google Scholar
and keep track of the variations on the queries that you have used. By
coupling this to WordNet, generative (string) grammars could be used
to figure out how to most effectively generate new (bountiful) queries
to search particular regions of the scientific literature. I have
since learned of the ISI Web of Knowledge, which has citation
information (like "most cited paper") for various queries, so some
integration into that system would be absolutely delightful. This is
all about search, however, not about paper reading and distribution,
so it's somewhat secondary to this email, although worth iterating
again for those who don't know about these plans and ideas.

- Bryan
http://heybryan.org/
1 512 203 0507

Mr. Gunn

unread,

Jul 11, 2009, 12:13:53 PM7/11/09

to DIYh+

Have you tried the latest release of Mendeley, Bryan? There's no CLI,
but you can now embed annotated collections of papers from your
Mendeley library, which might be of some use for you. It also syncs
your desktop library with the web automatically, so no need to mess
about with rsync, unless you really want to. They also put out some
literature use stats, too. http://www.mendeley.com/stats

For activity logging, I use timesnapper, which records not only what
window has focus, but takes a tiny PNG snapshot of the active window.
It's pretty useful for going back through what you did when.

Best of luck with your projects!

Mr. Gunn
http://synthesis.williamgunn.org

> - Bryanhttp://heybryan.org/
> 1 512 203 0507

Reply all

Reply to author

Forward

0 new messages