S3 eventual consistency issues

69 views
Skip to first unread message

Edmund Yan

unread,
Jul 10, 2017, 11:19:58 AM7/10/17
to Luigi
Hello,

We recently increased our worker concurrency and started seeing a lot more S3 errors.


Traceback (most recent call last):
  File "/app/shared/virtualenvs/venmo_platform27/local/lib/python2.7/site-packages/luigi/worker.py", line 175, in run
    raise RuntimeError('Unfulfilled %s at run time: %s' % (deps, ', '.join(missing)))
RuntimeError: Unfulfilled dependency at run time:  ...........

I started investigating and I think it's related to S3 having weak consistency when you do a GET before a PUT->GET.  http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel.  Because the scheduler checks the .exists() to see if the task is already completed, I think we run into this problem often.

Has anyone else run across this issue?  How did you solve it?  The only solution I can think of is to add retry logic to `luigi/contrib/s3.py`?

Noah Yetter

unread,
Jul 31, 2017, 5:45:54 PM7/31/17
to Luigi
I've had this issue in production when running with multiple workers. I applied two fixes that appear to have solved the issue (I can't remember the last time I saw it pop up, it's probably been two years...)

The first fix is to sleep for an arbitrary time following any output write to S3. This is obviously very hacky and terrible, but it helped.

The second fix is to pin to a particular S3 host. My understanding at the time was that when using a bucket in US Standard, you couldn't know if your requests were being served by a machine on the east coast or the west coast, and that some eventual consistency issues were due to this behavior. It also used to be the case that US Standard did not offer read-after-write consistency (this has recently been remedied). It's not clear to me and my fuzzy memory which of these two issues was most relevant. In any case, the solution for me was to build my own S3 client instance pointed at s3-external-1.amazonaws.com and inject that into luigi. Hosts for other regions can be found at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region If you google "s3-external-1.amazonaws.com" you'll find various blog posts about this issue.

I suspect that the second fix is doing the heavy lifting and I could remove the sleep.

Lars Albertsson

unread,
Aug 10, 2017, 6:08:17 PM8/10/17
to Edmund Yan, Luigi
The quick answer is that S3:s eventual consistency does not support
ETL pipelines with Luigi or anything else. See
https://berlinbuzzwords.de/17/session/what-does-rename-do for a deeper
explanation.

In order to have a stable solution, one must use a file system with
stronger consistency, or add a layer that provides consistency. In the
video above, one solution in progress, made by Hortonworks is
mentioned. Amazon has another (EMRFS), but it is a leaky abstraction,
and I have bad experiences. Netflix has made another (S3mper). There
are more similar solutions in the works.

AFAIK, no solution currently works sufficiently well, however. Your
options are basically:

1. Live with the problems, and mitigate them with pragmatic hacks,
such as the ones suggested in another mail in this thread.

2. Use HDFS. I have heard rumours of setups where pipelines are run in
EMR, using HDFS between jobs, and end results are copied back to S3.
It is not a better solution in theory, since you have the same problem
if pipelines are connected, but might work in practice.

3. Use a managed file system with stronger consistency. From the docs,
it seems EFS might be sufficient. Azure has managed HDFS, but I think
that no other provider has.




Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar
> --
> You received this message because you are subscribed to the Google Groups
> "Luigi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to luigi-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages