http://booksearch.blogspot.com/2009/08/download-over-million-public-domain.html
If you've seen Google's Book Search before, it looks possible to
construct a query of pre-1922 books:
curl 'http://books.google.com/books/feeds/volumes?tbs=cd_max:Jan%2001_2%201923&q=Stevenson'
| xmllint --format - | less
From there it should be possible to grab the epub from the data
contained in the Atom feed. But this requires a query to dip into the
data. I was wondering if any of the data mungers on get-theinfo have
found a good way for getting actual access to the ~1 million books
that Google are making available?
//Ed
> --
> [from the http://groups.google.com/group/get-theinfo mailing list]
>
Has Google advanced any rationale for not allowing download of works
that are clearly out of copyright?
If not -- sweat of the brow is not legal defense, so how about using a
decentralized system to download from a large number of IPs?
People interested in donating IPs to such a project should email me off-list.
Could this be done with a simple 2 step html/json Web page for
sympathisers to use?
1. download a file or few
2. upload it somewhere more communal...
Dan
BTW, if anyone has their eye on the little "Export as XML" feature for
Google Books bookshelves, be forewarned that it includes a minuscule
amount of information. It just has the title, author, and Google ID,
no publication info or anything else to help disambiguate or identify
the volume.
Tom
I'm not sure if you've noticed it, but the Google Books API includes
some useful DC metadata like:
<dc:creator>Richard Ambrosini</dc:creator>
<dc:creator>Richard Dury</dc:creator>
<dc:date>2006</dc:date>
<dc:description>As the editors point out in their Introduction,
Stevenson reinvented the “personal essay” and the “walking tour
essay,” in texts of ironic stylistic ...</dc:description>
<dc:format>377 pages</dc:format>
<dc:format>book</dc:format>
<dc:identifier>z2Yf1FX02EkC</dc:identifier>
<dc:identifier>ISBN:0299212246</dc:identifier>
<dc:identifier>ISBN:9780299212247</dc:identifier>
<dc:publisher>Univ of Wisconsin Pr</dc:publisher>
<dc:subject>Literary Criticism</dc:subject>
<dc:title>Robert Louis Stevenson</dc:title>
<dc:title>writer of boundaries</dc:title>
I like the idea of some coordinated effort to get this public domain
content replicated somehow. There are already 902,788 Google Book
titles on Internet Archive, which is a damn fine start:
http://www.archive.org/details/googlebooks
//Ed
The Google Book Search API Terms of Service say "The Google Book
Search APIs are limited to allowing you to display Google Book Search
Content on your site, and are not intended to provide you with the
ability to access other Google services or data." and also include the
rather strange "2.9 Your implementation of the Google Book Search APIs
must be made freely accessible to users." which I can't even parse
well enough to figure out what it means.
Perhaps the Internet Archive has been granted an exception, but my
reading is that anyone who wanted to help would probably need an
exception too.
Tom
I think it must mean that if your application is using Google Book
Search API, you cannot charge for it. It has to be free.
The other interpretation could be that it has to be available on
public internet, rather than under password and/or on private network
only.
The core concept seems to be that if you are getting this stuff for
free, don't hoard or abuse it.
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)