So far, so good. This works fine for nearly all report files.
But we have a few reports that can be quite large, e.g. 2.3 GB and larger. When trying to download such a file, I get memory issues:
Traceback (most recent call last):
File "/home/ec2-user/work/bin/doubleclick-get/doubleclick-campaign-manager-get.py", line 85, in <module>
.get_media(reportId=args.report, fileId=report_file_id).execute())
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/googleapiclient/http.py", line 835, in execute
method=str(self.method), body=self.body, headers=self.headers)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/googleapiclient/http.py", line 162, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 175, in new_request
redirections, connection_type)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 282, in request
connection_type=connection_type)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1322, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1124, in _request
headers=headers, redirections=redirections - 1)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 175, in new_request
redirections, connection_type)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/oauth2client/transport.py", line 282, in request
connection_type=connection_type)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1322, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1072, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/ec2-user/python-env/doubleclick-get/local/lib/python3.5/site-packages/httplib2/__init__.py", line 1054, in _conn_request
content = response.read()
File "/usr/lib64/python3.5/http/client.py", line 446, in read
s = self._safe_read(self.length)
File "/usr/lib64/python3.5/http/client.py", line 597, in _safe_read
return b"".join(s)
MemoryError
On that AWS linux instance I have 4 GB RAM. So that should be no problem in principle. I have a similar issue on my MAC laptop with 8 GB. So it's not quite clear what goes wrong here.
2) Is there a way to download these files in gzipped format (*.csv.gz)? For example, those CSV files compress pretty well, e.g. 2.3 GB (*.csv) compress to 37 MB (*.csv.gz)! How would this be done with Python?
3) Can I stream directly into a file instead of saving the response first into RAM? How would this be done with Python?