“cat urls.txt | gsutil -m cp -I gs://target-bucket-name/” consistently hangs after transferring ~10,000 files

141 views
Skip to first unread message

Alex Ryan

unread,
Nov 30, 2017, 7:12:29 PM11/30/17
to Google Cloud SQL discuss

Karthick (Cloud Platform Support)

unread,
Dec 1, 2017, 4:30:20 PM12/1/17
to Google Cloud SQL discuss

Hello Alex,


What is the size of the file, combined?


Are you having the same issue when trying to upload from the Developers Console website interface ?


As the issue occurs especially on large upload jobs, After commenting out “parallel_composite_upload_threshold = 0”, could you verify of using Parallel Composite Uploads improves the throughput by using the following command:


$ gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp I  gs://target-bucket-name.


If you still see issue, Please run a perfdiag to the bucket in question to get some diagnostics on what is going on and send me in a private message.


As an alternative to gsutil, you can also absolutely try Cloud Storage Transfer Service.


Alex Ryan

unread,
Dec 13, 2017, 7:35:45 PM12/13/17
to Google Cloud SQL discuss

file size is ~6MB

number of records is 78,997


😈   >ls -l urls.txt

-rw-r--r-- 1 alexryan staff 6168759 Nov 20 14:48 urls.txt


😈   >wc -l urls.txt

78997 urls.txt


One thing I noticed is that ~660 of the items listed in the file did not actually exist in the source bucket and thus were not transferred.

Not sure if this is relevant or not.


the work around of adding “-o GSUtil:parallel_composite_upload_threshold=150M” worked for me.


specifically, this command successfully transferred ~79K files between buckets:

cat urls.txt | gsutil -m -o GSUtil:parallel_composite_upload_threshold=150M cp -n -c -I gs://bucket-name/


I tried it again WITHOUT the new parameter and it worked.


I have no explanation for the change in behavior other than an update to gcloud.

I am using version 182 now.

The version of gsutil contained is still 4.28, but the behavior of gsutil appears to have changed.


specifically, this command appears to work now ...

date; cat urls.txt | gsutil -m cp -n -c -I gs://bucket-name/; date


Reply all
Reply to author
Forward
0 new messages