社区代有大牛出,各领风骚四五年。Trio 的作者又带来干货——cancellation

370 views
Skip to first unread message

whycrying

unread,
Jan 14, 2018, 2:03:14 AM1/14/18
to python-cn(华蟒用户组,CPyUG 邮件列表)
之前的“Async/await 带来的冲击以及 asyncio 的问题”中 Trio 的作者 Nathaniel J. Smith (以下简称 njs)给 Python 社区带来了非常好的思考,现在他又带来了另一个令人赞赏的思考:

Timeouts and cancellation for humans
https://vorpus.org/blog/timeouts-and-cancellation-for-humans/

文中 njs 揪出了 Python 社区另一位大牛—— requests 的作者 Kenneth Reitz 在 requests 的 API 设计中使用 `timeout=` 参数的糟糕设计。并表示:

> I don't mean to pick on requests here – this problem is everywhere in Python APIs. I'm using requests as the example because Kenneth Reitz is famous for his obsession with making its API as obvious and intuitive as possible, and this is one of the rare places where he's failed. I think this is the only part of the requests API that gets a big box in the documentation warning you that it's counterintuitive. So like... if even Kenneth Reitz can't get this right, I think we can conclude that "just slap a timeout= argument on it" does not lead to APIs fit for human consumption.

先尝试使用 `deadline=` 替代 `timeout=` 来解决问题:

> Absolute deadlines are composable (but kinda annoying to use)

虽然 `deadline=` 的设计会比 `timeout=` 更好一些,但是依然没有从根本上解决问题。

> But this approach also has a downside: it succeeds in moving the annoying bit out of the library internals, and and instead puts it on the person using the API. At the outermost level where timeout policy is being set, your library's users probably want to say something like "give up after 10 seconds", and if all you take is a `deadline=` argument then they have to do the conversion by hand every time. Or you could have every function take both `timeout=` and `deadline=` arguments, but then you need some boilerplate in every function to normalize them, raise an error if both are specified, and so forth. Deadlines are an improvement over raw timeouts, but it feels like there's still some missing abstraction here.

在对 Python 社区各种 API 设计的批判之后,njs 提出出现这个问题的原因在于:

> Cancel tokens are unreliable in practice because humans are lazy

然后 cancellation 的灵感来自于 C# 和 Go 语言的设计:

> As far as I know, it originally comes from Joe Duffy's cancellation tokens work in C#, and Go context objects are essentially the same idea.
> ...
> I don't mean to make fun. This stuff is hard. But C# and Go are huge projects maintained by teams of highly-skilled full-time developers and backed by Fortune 50 companies. If they can't get it right, who can? Not me. I'm one human trying to reinvent I/O in Python. I can't afford to make things that complicated.
> ...
> Remember way back at the beginning of this post, we noted that Python socket methods don't take individual timeout arguments, but instead let you set the timeout once on the socket so it's implicitly passed to every method you call? And in the section just above, we noticed that C# and Go do pretty much the same thing? I think they're on to something. Maybe we should accept that when you have some data that has to be passed through to every function you call, that's something the computer should handle, rather than making flaky humans do the work – but in a general way that supports complex abstractions, not just sockets.

当然,njs 对 Python 的 `with` 设计比较满意:

> But since this post is about the underlying design, we'll focus on the primitive version. (Credit: the idea of using `with` blocks for timeouts is something I first saw in Dave Beazley's Curio, though I changed a bunch. I'll hide the details in a footnote: [4].)
> You should think of `with open_cancel_scope()` as creating a cancel token, but it doesn't actually expose any `CancelToken` object publically. Instead, the cancel token is pushed onto an invisible internal stack, and automatically applied to any blocking operations called inside the `with` block. So `requests` doesn't have to do anything to pass this through – when it eventually sends and receives data over the network, those primitive calls will automatically have the deadline applied.
> ...
> A way to delimit the boundaries of a cancel scope. Python's `with` blocks work great; other options would include dedicated syntax, or restricting cancel scopes to individual function calls like `with_timeout(10, some_fn, arg1, arg2)` (though this could force awkward factorings, and you'd need to figure out some way to expose the cancel scope object).
> ...
> If you're working in another language, I'd love to hear how the cancel scope idea adapts – if at all. For example, it'll definitely need some adjustment for languages that don't use exceptions, or that are missing the kind of user-extensible syntax that Python's `with` blocks provide.

针对并发下(比如多线程或其他并发库)的问题,njs 又来卖 Trio 设计中的 nursery 设计,这个的确也是要比没有处理的好:

> This system has many advantages, but the relevant one here is that it preserves the key assumptions that cancel scopes rely on. Any given nursery is either inside or outside the cancel scope – we can tell by checking whether the `with open_cancel_scope` block encloses the `async with open_nursery` block. And then it's straightforward to say that if a nursery is inside a cancel scope, then that scope should apply to all children in that nursery. This means that if we apply a timeout to a function, it can't "escape" by spawning a child task – the timeout applies to the child task too. (The exception is if you pass an outside nursery into the function, then it can spawn tasks into that nursery, which can escape the timeout. But then this is obvious to the caller, because they have to provide the nursery – the point is to make it clear what's going on, not to make it impossible to spawn background tasks.)

njs 表示已经开始尝试让 requests 运行在 Trio 之上了:

> Returning to our initial example: I've been doing some initial work on porting requests to run on Trio (you can help!), and so far it looks like the Trio version will not only handle timeouts better than the traditional synchronous version, but that it will be able to do this using zero lines of code – all the places where you'd want to check for cancellation are the ones where Trio does so automatically, and all the places where you need special care to handle the resulting exceptions are places where requests is prepared to handle arbitrary exceptions for other reasons.

njs 表示 requests 的作者不必为 `timeout=` 参数而道歉, 哈哈:

> Our original motivating examples involved `requests`, an ordinary synchronous library. And pretty much everything above applies equally to synchronous or concurrent code. So I think it's interesting to explore the idea of using these in classic synchronous Python. Maybe we can fix `requests` so it doesn't have to apologize for its `timeout` argument!

最后,njs 说明了这篇文章的一个写作起因,是和 asyncio 相关的尝试:

> One of the original motivations for this blog post was talking to Yury about whether we could retrofit any of Trio's improvements back into asyncio.

只是想在 asyncio 之中集成 Trio 的优点实在困难,毕竟历史负担比较重,不过也不是完全没有希望……

> Unfortunately asyncio's in a bit of a tricky position, because it's built on an architecture derived from the previous decade of experience with async I/O in Python... and then after that architecture was locked in, it added new syntax to Python that invalidated all that experience. But hopefully it's still possible to adapt some of these lessons – at least with some compromises.

Python 社区另一位大牛 Yury Selivanov 在和 njs 此次火花碰撞之后,表示非常激动,并且已经知道如何去改善 asyncio 在此问题上的表现。

https://twitter.com/1st1/status/951399181599559680
> A very insightful blog post by Nathaniel about timeouts and cancellations. Highly recommended to read. Btw, I think I know how to improve it in asyncio, will share more details later.

其他引用:

* https://github.com/njsmith/deadline-scopes

黄其泽

unread,
Jan 15, 2018, 3:58:17 AM1/15/18
to pyth...@googlegroups.com
咸得蛋疼,现在 timeout 都是这么用的:

with gevent.Timeout(5):
    resp1 = requests.get()
    resp2 = requests.post()

js 社区根本不敢想象这样的简洁。

--
邮件来自: `CPyUG`华蟒用户组(中文Python技术邮件列表)
规则: http://code.google.com/p/cpyug/wiki/PythonCn
详情: http://code.google.com/p/cpyug/wiki/CpyUg
严正: 理解列表! 智慧提问! http://wiki.woodpecker.org.cn/moin/AskForHelp
---
您收到此邮件是因为您订阅了Google网上论坛上的“python-cn(华蟒用户组,CPyUG 邮件列表)”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到python-cn+unsubscribe@googlegroups.com
要发帖到此群组,请发送电子邮件至python-cn@googlegroups.com
要查看更多选项,请访问https://groups.google.com/d/optout



--
Python及Qt相关Blog:http://hgoldfish.com/

Yunfan Jiang

unread,
Jan 15, 2018, 9:00:32 PM1/15/18
to pyth...@googlegroups.com
那如果想在timeout时候做个处理 这种形式怎么写?
Name: yunfan
Site: http://geek42.info/
Interest:
  - Lang: [forth, clojure, c, python, lua]
  - software: [nginx, redis]
  - abstract: [vm, tiny, cloud, html5]
  - history
  - science-fiction
  - music: [new-age, vangelis, yanni]

黄其泽

unread,
Jan 22, 2018, 2:38:33 AM1/22/18
to pyth...@googlegroups.com
 这个 Timeout 是个 exception,所以:

wih gevent.Timeout(10):
    try:
        requests.post()
    except gevent.Timeout:
        pass

fy

unread,
Jan 22, 2018, 3:03:19 AM1/22/18
to pyth...@googlegroups.com
这个痛点是什么呢?我觉得deadline和语义化的token也没有解决问题。

我想到的积极的一点可能是 with 能够实现将X个请求一并限定在Y秒完成,就是不知道内部逻辑是否如此了。
我的github: http://github.com/fy0



依云

unread,
Jan 22, 2018, 3:33:42 AM1/22/18
to pyth...@googlegroups.com
On Mon, Jan 22, 2018 at 03:38:18PM +0800, 黄其泽 wrote:
> 这个 Timeout 是个 exception,所以:
>
> wih gevent.Timeout(10):
> try:
> requests.post()
> except gevent.Timeout:
> pass

不应该是这样么?

try:
wih gevent.Timeout(10):
requests.post()
except gevent.Timeout:
pass

--
Best regards,
lilydjwg

Linux Vim Python 我的博客:
https://blog.lilydjwg.me/
--
A: Because it obfuscates the reading.
Q: Why is top posting so bad?

Yunfan Jiang

unread,
Jan 22, 2018, 5:03:55 AM1/22/18
to pyth...@googlegroups.com
这个看起来也挺蛋疼的 :[

2018-01-22 15:38 GMT+08:00 黄其泽 <hgol...@gmail.com>:

黄其泽

unread,
Jan 22, 2018, 5:20:26 AM1/22/18
to pyth...@googlegroups.com
内部逻辑就是如此。。所以我才说这个办法比 js 社区的写法强五倍。

黄其泽

unread,
Jan 22, 2018, 5:22:03 AM1/22/18
to pyth...@googlegroups.com
自己简单封装一个呗:

with MyTimeout(5, do_something_if_timeout):
    requests.get()
    requests.post()

Eric Yang

unread,
Jan 22, 2018, 9:53:52 PM1/22/18
to pyth...@googlegroups.com
js社区是什么写法,全文都没有提到js社区的事情

yang zhou

unread,
Mar 2, 2018, 2:21:13 AM3/2/18
to python-cn(华蟒用户组,CPyUG 邮件列表)
基本每可能在把这个参数改为deadline了...

在 2018年1月14日星期日 UTC+8下午3:03:14,whycrying写道:
Reply all
Reply to author
Forward
0 new messages