Getting around 6.5MB/s from s3 to ec2 instance - should it be faster?

198 views
Skip to first unread message

Joe Van Dyk

unread,
Jan 12, 2014, 7:54:45 PM1/12/14
to wa...@googlegroups.com
When doing a backup-fetch, I get about 6.5-8 megabytes per second download from S3. I was expecting it to be a bit faster. Measured it through iftop and nethogs.

I'm noticing that wal-e is using close to 100% of cpu, top shows it hovering at about 97%.

I've tried different pool sizes from 2-8, didn't seem to make a difference.

I'm using the latest 0.6 commit with Python 2.6.5.

Joe

Daniel Farina

unread,
Jan 12, 2014, 8:24:41 PM1/12/14
to Joe Van Dyk, wal-e


On Jan 12, 2014 4:54 PM, "Joe Van Dyk" <joev...@gmail.com> wrote:
>
> When doing a backup-fetch, I get about 6.5-8 megabytes per second download from S3. I was expecting it to be a bit faster. Measured it through iftop and nethogs.
>
> I'm noticing that wal-e is using close to 100% of cpu, top shows it hovering at about 97%.

Python 2.6 you say? This version cannot block 3DES encipherment because of a missing feature of the ssl module.  This fix helped my surprisingly slow cases a lot.

To confirm, try running Linux 'perf top' and see what symbols crop up.  The 3DES symbols in openssl were perspicuously named, which is how this bug was diagnosed.

Joe Van Dyk

unread,
Jan 12, 2014, 8:45:49 PM1/12/14
to wa...@googlegroups.com, Joe Van Dyk
I upgraded to Python 2.7.6, no changes in speed. I'll check out perf tomorrow. 

Joe Van Dyk

unread,
Jan 12, 2014, 8:46:21 PM1/12/14
to wa...@googlegroups.com, Joe Van Dyk
Should wal-e be using such a high cpu percentage? 

Daniel Farina

unread,
Jan 12, 2014, 8:57:16 PM1/12/14
to Joe Van Dyk, wal-e
Unfortunately. The 3DES blocking helped my cases a lot.

> Should wal-e be using such a high cpu percentage?

I haven't tuned WAL-E very much for CPU usage, but 6 megabytes per
second is suspicious. Can you get a faster rate if you send /dev/zero
to the disk?

My recollection is the bottleneck (as seen by 'perf top') is in
"hypervisor_page" or something named like that. That may be
correlated to something else that can be optimized, yet this
hypothesized indirection obscures the issue somewhat.

Searching the web didn't nail a common cause of that bottleneck, and I
haven't researched beyond that.

Joe Van Dyk

unread,
Jan 12, 2014, 9:09:37 PM1/12/14
to wa...@googlegroups.com, Joe Van Dyk
On Sunday, January 12, 2014 5:57:16 PM UTC-8, Daniel Farina wrote:
On Sun, Jan 12, 2014 at 5:46 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> On Sunday, January 12, 2014 5:45:49 PM UTC-8, Joe Van Dyk wrote:
>>
>> On Sunday, January 12, 2014 5:24:41 PM UTC-8, Daniel Farina wrote:
>>>
>>>
>>> On Jan 12, 2014 4:54 PM, "Joe Van Dyk" <joev...@gmail.com> wrote:
>>> >
>>> > When doing a backup-fetch, I get about 6.5-8 megabytes per second
>>> > download from S3. I was expecting it to be a bit faster. Measured it through
>>> > iftop and nethogs.
>>> >
>>> > I'm noticing that wal-e is using close to 100% of cpu, top shows it
>>> > hovering at about 97%.
>>>
>>> Python 2.6 you say? This version cannot block 3DES encipherment because
>>> of a missing feature of the ssl module.  This fix helped my surprisingly
>>> slow cases a lot.
>>>
>>> To confirm, try running Linux 'perf top' and see what symbols crop up.
>>> The 3DES symbols in openssl were perspicuously named, which is how this bug
>>> was diagnosed.
>>
>>
>> I upgraded to Python 2.7.6, no changes in speed. I'll check out perf
>> tomorrow.

Unfortunately.  The 3DES blocking helped my cases a lot.

> Should wal-e be using such a high cpu percentage?

I haven't tuned WAL-E very much for CPU usage, but 6 megabytes per
second is suspicious.  Can you get a faster rate if you send /dev/zero
to the disk?

I can write at about 68 MB/s.

Daniel Farina

unread,
Jan 12, 2014, 9:27:18 PM1/12/14
to Joe Van Dyk, wal-e
Not so good.

I have measured WAL-E going faster in the past, but maybe there was a
regression.

Unfortunately you may have to take point figuring out why for a little
while if you want to see progress Real Soon, because I'm pretty sure
there's a nasty bug in backup-fetch right now and that's going to be
holding my attention. Maybe one of the other list denizens/github
issue-watchers with an eye for optimization can help.

I think this level of performance is worth a Github bug report also,
if only to widen the distribution in hopes someone will make time to
profile. I bet there's some howlers in the code right now, as it has
not been tuned for CPU usage modulo that 3DES change to my
knowledge...ever.

Joe Van Dyk

unread,
Jan 12, 2014, 9:40:20 PM1/12/14
to Daniel Farina, wal-e
from "perf top":

75.38%  [kernel]               [k] hypercall_page
3.39%  python2.7              [.] 0x179be3
3.18%  libc-2.15.so           [.] 0x140931
2.86%  libcrypto.so.1.0.0     [.] 0x66dbf
2.78%  python2.7              [.] PyEval_EvalFrameEx
1.77%  liblzo2.so.2.0.0       [.] lzo1x_decompress_safe
1.02%  [kernel]               [k] copy_user_generic_string
0.73%  liblzo2.so.2.0.0       [.] lzo_adler32
0.59%  python2.7              [.] _PyObject_GenericGetAttrWithDict
0.28%  libc-2.15.so           [.] epoll_ctl                                                                                                     

Daniel Farina

unread,
Jan 12, 2014, 9:48:26 PM1/12/14
to Joe Van Dyk, wal-e
On Sun, Jan 12, 2014 at 6:40 PM, Joe Van Dyk <joev...@gmail.com> wrote:
> from "perf top":
>
> 75.38% [kernel] [k] hypercall_page
> 3.39% python2.7 [.] 0x179be3
> 3.18% libc-2.15.so [.] 0x140931
> 2.86% libcrypto.so.1.0.0 [.] 0x66dbf
> 2.78% python2.7 [.] PyEval_EvalFrameEx
> 1.77% liblzo2.so.2.0.0 [.] lzo1x_decompress_safe
> 1.02% [kernel] [k] copy_user_generic_string
> 0.73% liblzo2.so.2.0.0 [.] lzo_adler32
> 0.59% python2.7 [.] _PyObject_GenericGetAttrWithDict
> 0.28% libc-2.15.so [.] epoll_ctl

Yeah, this is the kind of trace I don't know what to do about. I
suspect it's from making too many syscalls, but I'm not sure, and
cursory search doesn't give me a slam dunk either, roughly the same as
last time. Someone who knows Xen a bit is going to have to divine
something or I have to put aside a block of time to figure out what it
means, which is why I didn't go ahead and Just Fix It in the past.

Another profitable use of time might be to run the program through one
of the Python-level profilers. Lately I've been thinking I should
commit "hidden" (as to not clutter the help messages) '--debug'-family
options, including that one.

Joe Van Dyk

unread,
Jan 12, 2014, 10:13:18 PM1/12/14
to wa...@googlegroups.com, Joe Van Dyk
More random data points...

I was originally using Ubuntu 10.04 on a m1.large instance. I switched over to Ubuntu 12.04 on a m1.xlarge. Performance improved a tad, about 10 MB/s.

I tried s3cmd, I could get about 25MB/s. Although weirdly, CPU usage was high as well.

I then made one of the tar partition files public and used curl and wget to download it. I could get 37MB/s (which is what I'd expect, I think). But, they also had a high cpu percentage in top. Seems weird maybe?

I ran 'perf top' on the 12.04 system while hardly anything was running, I see:
 99.93%  [kernel]                  [k] hypercall_page
at the top.


Joe Van Dyk

unread,
Jan 12, 2014, 10:26:37 PM1/12/14
to wa...@googlegroups.com, Joe Van Dyk
and curl https://s3-url | lzop -d -c | tar -x 
could handle about 35 MB/s.


Daniel Farina

unread,
Jan 30, 2014, 7:03:22 PM1/30/14
to Joe Van Dyk, wal-e, Greg Stark
In this general vein, one of my colleagues -- Greg Stark -- made a
suggestion: dump gevent and use threads, since WAL-E has a modest
amount of concurrency and the extra memory bloat of a handful of
threads is probably not a big deal.

He suggested this to get more parallel execution of I/O with other
stuff going on in Python, under the model that the "threading" module
spawns new real threads that contend on the GIL, but that GIL
contention is not so important because most of the time the GIL is
released while Python is stuck in read()/write() or whatever. In
addition, he pointed out that gevent is a weird dependency (it sure
is) and it'd be nice to do without, which I agree (although the size
of the behavioral difference probably means it won't make the nascent
0.7 still in alpha). But, more relevant to this case, an additional
optimization on the table should the number of syscall executions are
indeed the problem as I am guessing here is that use of blocking I/O
can cut the number of syscalls quite dramatically.

Finally, it looks tractable to do this. WAL-E's reliance on gevent is
somewhat detailed, but gevent also tends to copy threading constructs
whenever it can. So, I suspect a port will be of some labor, but
perhaps not too much, and maybe less than some of the new storage
backends that went in 0.7.

I don't have time to do this right now, but if someone is interested
in performance work, well...

Daniel Farina

unread,
Feb 13, 2014, 9:41:35 PM2/13/14
to Joe Van Dyk, wal-e
On Sun, Jan 12, 2014 at 6:48 PM, Daniel Farina <dan...@heroku.com> wrote:
> On Sun, Jan 12, 2014 at 6:40 PM, Joe Van Dyk <joev...@gmail.com> wrote:
>> from "perf top":
>>
>> 75.38% [kernel] [k] hypercall_page
>> 3.39% python2.7 [.] 0x179be3
>> 3.18% libc-2.15.so [.] 0x140931
>> 2.86% libcrypto.so.1.0.0 [.] 0x66dbf
>> 2.78% python2.7 [.] PyEval_EvalFrameEx
>> 1.77% liblzo2.so.2.0.0 [.] lzo1x_decompress_safe
>> 1.02% [kernel] [k] copy_user_generic_string
>> 0.73% liblzo2.so.2.0.0 [.] lzo_adler32
>> 0.59% python2.7 [.] _PyObject_GenericGetAttrWithDict
>> 0.28% libc-2.15.so [.] epoll_ctl
>
> Yeah, this is the kind of trace I don't know what to do about.


Okay. I think the too-many-syscalls is a contributing factor.

Would you be willing to help test a patch to see if it helps? I'd
appreciate it. It is still early enough that it could have severe
bugs. I have done some modeling and the size of the improvement
appears enormous. Thus, I'm contemplating sneaking it into 0.7. The
alternative is waiting another major version cycle, and that seems
impractical given the gains possible without interface change.

Pull request:

https://github.com/wal-e/wal-e/pull/96

Branch:

https://github.com/fdr/wal-e/tree/lead-pipe
Reply all
Reply to author
Forward
Message has been deleted
0 new messages