Download File from Google Docs using wget

2,590 views
Skip to first unread message

style-sheets

unread,
Jul 27, 2011, 1:41:53 AM7/27/11
to google-docum...@googlegroups.com
Hi guys,

I'm trying to download a file from google docs using wget, I was able to login & get the download link, but when I try to download it, both curl and wget fail.

I used this bash code:

#!/bin/bash

# Login to google docs & get auth token
sid
=$(curl -s https://www.google.com/accounts/ClientLogin -d Email=us...@email.com -d Passwd=my_password -d accountType=GOOGLE -d service=writely -d Gdata-version=3.0)
auth_token
=${sid:764:374}

document_id
=0BxONkKiBZQq6ZmNjN2RjZjYtMjZhYi00MjRiLWJmN2YtNGM2ZjE4YWQ5Mzg0

# Get download link (already have the file ID)
download_url
=$(curl -s --header "Authorization: GoogleLogin auth=${auth_token}" --header "GData-Version: 3.0" "http://docs.google.com/feeds/default/private/full/${document_id}")

if [[ $download_url =~ "src='([^']+)" ]]; then
        n
=${#BASH_REMATCH[*]}
       
if [[ $n -ge 1 ]]; then
                download_url
=${BASH_REMATCH[1]}
       
fi
else
        echo
"No match found"
fi

# Download file & save it to disk
wget
--output-document="/full/path/file.zip" --tries=1 --server-response --connect-timeout=20 --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20061201 Firefox/2.0.0.6 (Ubuntu-feisty)" --verbose --debug --no-cache --header "Authorization: GoogleLogin auth=${auth_token}" --header "GData-Version: 3.0" ${download_url}


The last wget call returns this error message:

--2011-07-27 06:01:43--  http://doc-00-34-docs.googleusercontent.com/docs/securesc/llpeaaem51v1fp6hfrue1h5jhlhqtlt4/b71h33vnabh2ssobkjmj3h5tvbea1606/1311710400000/08274386148752881479/08274386148752881479/0BxONkKiBZQq6ZmNjN2RjZjYtMjZhYi00MjRiLWJmN2YtNGM2ZjE4YWQ5Mzg0?h=04798133320467694200&e=download&gd=true
Resolving doc-00-34-docs.googleusercontent.com... 74.125.79.132
Caching doc-00-34-docs.googleusercontent.com => 74.125.79.132
Connecting to doc-00-34-docs.googleusercontent.com|74.125.79.132|:80... Closed fd 4
failed
: Connection timed out.
Releasing 0x0000000015e42430 (new refcount 1).
Giving up.


The same error happen when I use curl.

The URL format for the download link looks correct to me (but since it's dynamic I can't just reuse it to see exactly what's wrong):
http://doc-00-34-docs.googleusercontent.com/docs/securesc/llpeaaem51v1fp6hfrue1h5jhlhqtlt4/oo0dn5v23lq85v0jlcmgtriovobgdr04/1311717600000/08274386148752881479/08274386148752881479/0BxONkKiBZQq6ZmNjN2RjZjYtMjZhYi00MjRiLWJmN2YtNGM2ZjE4YWQ5Mzg0?h=04798133320467694200&e=download&gd=true

I googled "wget Closed fd 4" and it seems this is related to authentication, which is strange since I'm certain I was successfully logged in.

So I tried these operations, just to make sure:

1.  Login
2.  Get the list of folders & files
3.  Get the download link for a file in particular
4.  Download the file.

Only the last operation (download the file) failed, if it was a session problem, failure would've happened sooner.

Also, I don't think that the issue is happening because I used wget instead of curl (I used curl everywhere except when downloading the file), in fact, at first used this curl syntax before trying with wget (both methods failed):

ANY help would be greatly appreciated,

Thanks in advance!
Reply all
Reply to author
Forward
0 new messages