I'm using the following (*) simple code as the nugget of a worker object. Running on a 2 x intel quad core Apple on a university network I'm getting a decent throughput to S3. I use a threadpool and played around with the numbers to saturate the aggregate throughput (which happens around 10 threads). I'm reading about 200 GB of data from a non-raided drive. dd tests show that I can get data as fast as 50 MB/sec. However, I'm a bit confused about the system behavior. Looking at the disk and network usage (which are 100% anti-correlated), it almost looks like the entire file is read into memory before the network transaction begins. Perhaps I've fooled myself into the wrong diagnoses, but any peek behind the curtain on how set_contents_from_filename streams data would be greatly appreciated.
-Mike
(*)
k = Key(bucket)
k.key = keyname
k.set_contents_from_filename(fname)
--
You received this message because you are subscribed to the Google Groups "boto-users" group.
To post to this group, send email to boto-...@googlegroups.com.
To unsubscribe from this group, send email to boto-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/boto-users?hl=en.
Ahh, that's obvious. I wonder why I didn't see a corresponding spike in CPU usage. Perhaps It's ultimately limited by the bus speed and scheduling/data affinity on the chip. I'll play around with your suggestion to bypass it (only temporarily) to see if it behaves as predicted. Thanks, issue closed.
-M