Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
google books
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ed Summers  
View profile  
 More options Aug 30 2010, 3:11 pm
From: Ed Summers <e...@pobox.com>
Date: Mon, 30 Aug 2010 15:11:34 -0400
Local: Mon, Aug 30 2010 3:11 pm
Subject: google books
I imagine y'all have seen the various press releases about Google
releasing a million public domain books as epubs, e.g.

  http://booksearch.blogspot.com/2009/08/download-over-million-public-d...

If you've seen Google's Book Search before, it looks possible to
construct a query of pre-1922 books:

  curl 'http://books.google.com/books/feeds/volumes?tbs=cd_max:Jan%2001_2%201...
| xmllint --format - | less

From there it should be possible to grab the epub from the data
contained in the Atom feed. But this requires a query to dip into the
data. I was wondering if any of the data mungers on get-theinfo have
found a good way for getting actual access to the ~1 million books
that Google are making available?

//Ed


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Swartz  
View profile  
 More options Aug 30 2010, 3:13 pm
From: Aaron Swartz <m...@aaronsw.com>
Date: Mon, 30 Aug 2010 15:13:29 -0400
Local: Mon, Aug 30 2010 3:13 pm
Subject: Re: [get.theinfo] google books
Some people have been uploading these to the Internet Archive, but
there's no easy way to get them from Google in bulk -- they block IPs
that hit it too frequently and put up captchas after a couple
downloads.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Dunck  
View profile  
 More options Aug 30 2010, 4:12 pm
From: Jeremy Dunck <jdu...@gmail.com>
Date: Mon, 30 Aug 2010 15:12:12 -0500
Local: Mon, Aug 30 2010 4:12 pm
Subject: Re: [get.theinfo] google books

On Mon, Aug 30, 2010 at 2:13 PM, Aaron Swartz <m...@aaronsw.com> wrote:
> Some people have been uploading these to the Internet Archive, but
> there's no easy way to get them from Google in bulk -- they block IPs
> that hit it too frequently and put up captchas after a couple
> downloads.

Has Google advanced any rationale for not allowing download of works
that are clearly out of copyright?

If not -- sweat of the brow is not legal defense, so how about using a
decentralized system to download from a large number of IPs?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Swartz  
View profile  
 More options Aug 30 2010, 4:14 pm
From: Aaron Swartz <m...@aaronsw.com>
Date: Mon, 30 Aug 2010 16:14:19 -0400
Local: Mon, Aug 30 2010 4:14 pm
Subject: Re: [get.theinfo] google books

>> Some people have been uploading these to the Internet Archive, but
>> there's no easy way to get them from Google in bulk -- they block IPs
>> that hit it too frequently and put up captchas after a couple
>> downloads.

> Has Google advanced any rationale for not allowing download of works
> that are clearly out of copyright?

> If not -- sweat of the brow is not legal defense, so how about using a
> decentralized system to download from a large number of IPs?

People interested in donating IPs to such a project should email me off-list.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Brickley  
View profile  
 More options Aug 30 2010, 4:17 pm
From: Dan Brickley <dan...@danbri.org>
Date: Mon, 30 Aug 2010 22:17:18 +0200
Local: Mon, Aug 30 2010 4:17 pm
Subject: Re: [get.theinfo] google books

On Mon, Aug 30, 2010 at 10:12 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> On Mon, Aug 30, 2010 at 2:13 PM, Aaron Swartz <m...@aaronsw.com> wrote:
>> Some people have been uploading these to the Internet Archive, but
>> there's no easy way to get them from Google in bulk -- they block IPs
>> that hit it too frequently and put up captchas after a couple
>> downloads.

> Has Google advanced any rationale for not allowing download of works
> that are clearly out of copyright?

> If not -- sweat of the brow is not legal defense, so how about using a
> decentralized system to download from a large number of IPs?

Could this be done with a simple 2 step html/json Web page for
sympathisers to use?

1. download a file or few
2. upload it somewhere more communal...

Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Morris  
View profile  
 More options Aug 30 2010, 4:56 pm
From: Tom Morris <tfmor...@gmail.com>
Date: Mon, 30 Aug 2010 16:56:21 -0400
Local: Mon, Aug 30 2010 4:56 pm
Subject: Re: [get.theinfo] google books
The threshold for "too frequently" appears to be incredibly low for
books.google.com.  I just got blocked after a half dozen searches
using variants of a single pair of terms resulting in ~65 hits for
books that I *manually* added to a Google Books bookshelf one by one.
That's without even downloading any of the books!  If they are being
that aggressive about blocking, it's going to take a massive number of
IPs to do anything useful.

BTW, if anyone has their eye on the little "Export as XML" feature for
Google Books bookshelves, be forewarned that it includes a minuscule
amount of information.  It just has the title, author, and Google ID,
no publication info or anything else to help disambiguate or identify
the volume.

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Summers  
View profile  
 More options Aug 30 2010, 5:02 pm
From: Ed Summers <e...@pobox.com>
Date: Mon, 30 Aug 2010 17:02:25 -0400
Local: Mon, Aug 30 2010 5:02 pm
Subject: Re: [get.theinfo] google books

On Mon, Aug 30, 2010 at 4:56 PM, Tom Morris <tfmor...@gmail.com> wrote:
> BTW, if anyone has their eye on the little "Export as XML" feature for
> Google Books bookshelves, be forewarned that it includes a minuscule
> amount of information.  It just has the title, author, and Google ID,
> no publication info or anything else to help disambiguate or identify
> the volume.

I'm not sure if you've noticed it, but the Google Books API includes
some useful DC metadata like:

    <dc:creator>Richard Ambrosini</dc:creator>
    <dc:creator>Richard Dury</dc:creator>
    <dc:date>2006</dc:date>
    <dc:description>As the editors point out in their Introduction,
Stevenson reinvented the “personal essay” and the “walking tour
essay,” in texts of ironic stylistic ...</dc:description>
    <dc:format>377 pages</dc:format>
    <dc:format>book</dc:format>
    <dc:identifier>z2Yf1FX02EkC</dc:identifier>
    <dc:identifier>ISBN:0299212246</dc:identifier>
    <dc:identifier>ISBN:9780299212247</dc:identifier>
    <dc:publisher>Univ of Wisconsin Pr</dc:publisher>
    <dc:subject>Literary Criticism</dc:subject>
    <dc:title>Robert Louis Stevenson</dc:title>
    <dc:title>writer of boundaries</dc:title>

I like the idea of some coordinated effort to get this public domain
content replicated somehow. There are already 902,788 Google Book
titles on Internet Archive, which is a damn fine start:

  http://www.archive.org/details/googlebooks

//Ed


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Morris  
View profile  
 More options Aug 31 2010, 2:07 pm
From: Tom Morris <tfmor...@gmail.com>
Date: Tue, 31 Aug 2010 14:07:35 -0400
Local: Tues, Aug 31 2010 2:07 pm
Subject: Re: [get.theinfo] google books

The Google Book Search API Terms of Service say "The Google Book
Search APIs are limited to allowing you to display Google Book Search
Content on your site, and are not intended to provide you with the
ability to access other Google services or data." and also include the
rather strange "2.9 Your implementation of the Google Book Search APIs
must be made freely accessible to users." which I can't even parse
well enough to figure out what it means.

Perhaps the Internet Archive has been granted an exception, but my
reading is that anyone who wanted to help would probably need an
exception too.

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alexandre Rafalovitch  
View profile  
 More options Sep 29 2010, 9:40 am
From: Alexandre Rafalovitch <arafa...@gmail.com>
Date: Wed, 29 Sep 2010 09:40:47 -0400
Local: Wed, Sep 29 2010 9:40 am
Subject: Re: [get.theinfo] google books

On Tue, Aug 31, 2010 at 2:07 PM, Tom Morris <tfmor...@gmail.com> wrote:
> "2.9 Your implementation of the Google Book Search APIs
> must be made freely accessible to users." which I can't even parse
> well enough to figure out what it means.

I think it must mean that if your application is using Google Book
Search API, you cannot charge for it. It has to be free.

The other interpretation could be that it has to be available on
public internet, rather than under password and/or on private network
only.

The core concept seems to be that if you are getting this stuff for
free, don't hoard or abuse it.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »