Why does the Wal-e kill all the greenlets if only one of them fail?

16 views
Skip to first unread message

Dikang Gu

unread,
Oct 2, 2014, 8:43:07 PM10/2/14
to wa...@googlegroups.com
Hello there,

We are using Wal-e to backup to S3, and I find that if one of the backup segment failed because of timeout, all of the backup will fail. I dig into the code and find this line https://github.com/wal-e/wal-e/blob/master/wal_e/worker/pg/wal_transfer.py#L147.

So I'm wondering why do you killall the greenlets instead of letting other good ones finish? Any reason behind this?

Thanks
Dikang.

Daniel Farina

unread,
Oct 2, 2014, 9:10:18 PM10/2/14
to dika...@gmail.com, wa...@googlegroups.com
Thinking back, it's probably a general preference to 'join' to exit
promptly when there is trouble, and this code path has not been a
material problem for anyone who has reported bugs before.

I think it'd be fine to be more lenient about waiting for other
greenlets to finish. Try patching it.

Dikang Gu

unread,
Oct 3, 2014, 3:27:28 AM10/3/14
to Daniel Farina, wa...@googlegroups.com
Cool, are you going to change that? Or I can try to patch it as well. I already committed some changes to the instagram wal-e fork.

Thanks
Dikang.
--
Dikang

Dikang Gu

unread,
Oct 3, 2014, 2:02:38 PM10/3/14
to Daniel Farina, wa...@googlegroups.com
hmm, it seems greenlet.killall will call the joinall first.
--
Dikang

Dikang Gu

unread,
Oct 3, 2014, 2:02:43 PM10/3/14
to Daniel Farina, wa...@googlegroups.com

Dikang Gu

unread,
Oct 4, 2014, 2:14:05 AM10/4/14
to Daniel Farina, wa...@googlegroups.com
--
Dikang

Daniel Farina

unread,
Oct 19, 2014, 5:27:47 PM10/19/14
to Dikang Gu, wa...@googlegroups.com
On Fri, Oct 3, 2014 at 11:13 PM, Dikang Gu <dika...@gmail.com> wrote:
> this is my pull request. https://github.com/Instagram/wal-e/pull/4

For the sake of the record, I merged this about a week ago when releasing 0.8b2.
Reply all
Reply to author
Forward
0 new messages