Downloading large report files with Python library

679 views
Skip to first unread message

Marcel Zemp

unread,
May 25, 2017, 4:14:09 PM5/25/17
to Google's DoubleClick Campaign Manager API Forum
Hello forum

I've successfully implemented an automated Python3 workflow to download scheduled reports to a linux machine on AWS with the latest API version (v2.8). Following the samples site, I use

response = service.files().get_media(reportId=report_id, fileId=file_id).execute()

with open(file_name, 'wb') as file:
    file.write(response)

So far, so good. This works fine for nearly all report files. 

But we have a few reports that can be quite large, e.g. 2.3 GB and larger. When trying to download such a file, I get memory issues:

Traceback (most recent call last):
  File "/home/ec2-user/work/bin/doubleclick-get/doubleclick-campaign-manager-get.py", line 85, in <module>
    .get_media(reportId=args.report, fileId=report_file_id).execute())
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/googleapiclient/http.py", line 835, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/googleapiclient/http.py", line 162, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1322, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1124, in _request
    headers=headers, redirections=redirections - 1)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1322, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1072, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1054, in _conn_request
    content = response.read()
  File "/usr/lib64/python3.5/http/client.py", line 446, in read
    s = self._safe_read(self.length)
  File "/usr/lib64/python3.5/http/client.py", line 597, in _safe_read
    return b"".join(s)
MemoryError

On that AWS linux instance I have 4 GB RAM. So that should be no problem in principle. I have a similar issue on my MAC laptop with 8 GB. So it's not quite clear what goes wrong here.

Questions:

1) What is the best way to download large report files? Are there better methods than the get_media one? (BTW: is this documented somewhere? I can only find the get method.)
2) Is there a way to download these files in gzipped format (*.csv.gz)? For example, those CSV files compress pretty well, e.g. 2.3 GB (*.csv) compress to 37 MB (*.csv.gz)! How would this be done with Python?
3) Can I stream directly into a file instead of saving the response first into RAM? How would this be done with Python?

Thanks for your help!

Marcel


Lakshmi Prathipati (DCM API Team)

unread,
May 26, 2017, 3:02:23 PM5/26/17
to Google's DoubleClick Campaign Manager API Forum
Hi Marcel,

You need to use MediaIoBaseDownload.  You might find this report guide on downloading the reports helpful. Currently we don't support downloading the report files in gzip format. 

Thanks,
Lakshmi, DCM API Team

Marcel Zemp

unread,
Jun 1, 2017, 5:15:06 AM6/1/17
to Google's DoubleClick Campaign Manager API Forum
Hi Lakshmi

Thank you very much for your input. MediaIoBaseDownload was exactly what I was looking for. I could successfully implement a way to download the large files in chunks.

Maybe it still would be a good idea to implement a download of the files in gzip format. Especially since those CSV files compress really well (bandwidth etc.)

Regards Marcel

Lakshmi Prathipati (DCM API Team)

unread,
Jun 1, 2017, 2:38:10 PM6/1/17
to Google's DoubleClick Campaign Manager API Forum
Hi Marcel,

I will forward your request as a feature request to the rest of the team.
Reply all
Reply to author
Forward
0 new messages