Downloading large report files with Python library

Marcel Zemp

unread,

May 25, 2017, 4:14:09 PM5/25/17

to Google's DoubleClick Campaign Manager API Forum

Hello forum

I've successfully implemented an automated Python3 workflow to download scheduled reports to a linux machine on AWS with the latest API version (v2.8). Following the samples site, I use

response = service.files().get_media(reportId=report_id, fileId=file_id).execute()

with open(file_name, 'wb') as file:

file.write(response)

So far, so good. This works fine for nearly all report files.

But we have a few reports that can be quite large, e.g. 2.3 GB and larger. When trying to download such a file, I get memory issues:

Traceback (most recent call last):

File "/home/ec2-user/work/bin/doubleclick-get/doubleclick-campaign-manager-get.py", line 85, in <module>

.get_media(reportId=args.report, fileId=report_file_id).execute())

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/_helpers.py", line 133, in positional_wrapper

return wrapped(*args, **kwargs)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/googleapiclient/http.py", line 835, in execute

method=str(self.method), body=self.body, headers=self.headers)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/googleapiclient/http.py", line 162, in _retry_request

resp, content = http.request(uri, method, *args, **kwargs)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 175, in new_request

redirections, connection_type)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 282, in request

connection_type=connection_type)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1322, in request

(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1124, in _request

headers=headers, redirections=redirections - 1)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 175, in new_request

redirections, connection_type)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 282, in request

connection_type=connection_type)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1322, in request

(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1072, in _request

(response, content) = self._conn_request(conn, request_uri, method, body, headers)

File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1054, in _conn_request

content = response.read()

File "/usr/lib64/python3.5/http/client.py", line 446, in read

s = self._safe_read(self.length)

File "/usr/lib64/python3.5/http/client.py", line 597, in _safe_read

return b"".join(s)

MemoryError

On that AWS linux instance I have 4 GB RAM. So that should be no problem in principle. I have a similar issue on my MAC laptop with 8 GB. So it's not quite clear what goes wrong here.

Questions:

1) What is the best way to download large report files? Are there better methods than the get_media one? (BTW: is this documented somewhere? I can only find the get method.)

2) Is there a way to download these files in gzipped format (*.csv.gz)? For example, those CSV files compress pretty well, e.g. 2.3 GB (*.csv) compress to 37 MB (*.csv.gz)! How would this be done with Python?

3) Can I stream directly into a file instead of saving the response first into RAM? How would this be done with Python?

Thanks for your help!

Marcel

Lakshmi Prathipati (DCM API Team)

unread,

May 26, 2017, 3:02:23 PM5/26/17

to Google's DoubleClick Campaign Manager API Forum

Hi Marcel,

You need to use MediaIoBaseDownload. You might find this report guide on downloading the reports helpful. Currently we don't support downloading the report files in gzip format.

Thanks,

Lakshmi, DCM API Team

Marcel Zemp

unread,

Jun 1, 2017, 5:15:06 AM6/1/17

to Google's DoubleClick Campaign Manager API Forum

Hi Lakshmi

Thank you very much for your input. MediaIoBaseDownload was exactly what I was looking for. I could successfully implement a way to download the large files in chunks.

Maybe it still would be a good idea to implement a download of the files in gzip format. Especially since those CSV files compress really well (bandwidth etc.)

Regards Marcel

Lakshmi Prathipati (DCM API Team)

unread,

Jun 1, 2017, 2:38:10 PM6/1/17

to Google's DoubleClick Campaign Manager API Forum

Hi Marcel,

I will forward your request as a feature request to the rest of the team.

Reply all

Reply to author

Forward