[ANNOUNCE] geventhttpclient

363 views
Skip to first unread message

gwik

unread,
Dec 25, 2011, 5:00:16 PM12/25/11
to gevent: coroutine-based Python network library
https://github.com/gwik/geventhttpclient

geventhttpclient is a fast, high performance, concurrent http client
library.
It is specifically designed for high concurrency, streaming and
support HTTP 1.1 persistent connections.
More generally it is designed for efficiently pulling from REST APIs
and streaming API's like Twitter's and connecting to daemons with an
HTTP interface like neo4j.

geventhttpclient use a fast http parser written in C originating from
nginx extracted and modified by Joyent.

The library provides httplib wrappers that you can drop in if you use
httplib directly and a monkey patch function if you use httplib2 or
urlib (see README)

I ran some benchmarks to compare against httplib2 (gevent 1.0a3):

The benchmark does 1000 get requests against a local nginx server with
a concurrency of 10.
- httplib2 (benchmarks/httplib2_simple.py): ~600 req/s
- httplib2 with geventhttpclient monkey patch (benchmarks/
httplib2_patched.py): ~2600 req/s
- geventhttpclient.HTTPClient (benchmarks/httpclient.py): ~3800 req/s

I tested it with python 2.7 and gevent 1.0a3 but it should work with
gevent 0.13 and python 2.6 as well.

I'm waiting for your feedback,
Merry Christmas.

Antonin

Denis Bilenko

unread,
Dec 26, 2011, 4:22:00 AM12/26/11
to gev...@googlegroups.com
On Mon, Dec 26, 2011 at 5:00 AM, gwik <antoni...@gmail.com> wrote:
> https://github.com/gwik/geventhttpclient
>
> geventhttpclient is a fast, high performance, concurrent http client
> library.
> It is specifically designed for high concurrency, streaming and
> support HTTP 1.1 persistent connections.
> More generally it is designed for efficiently pulling from REST APIs
> and streaming API's like Twitter's and connecting to daemons with an
> HTTP interface like neo4j.
>
> geventhttpclient use a fast http parser written in C originating from
> nginx extracted and modified by Joyent.
>
> The library provides httplib wrappers that you can drop in if you use
> httplib directly and a monkey patch function if you use httplib2 or
> urlib (see README)


Nice, thanks for sharing! Added to
http://code.google.com/p/gevent/wiki/ProjectsUsingGevent


If I may suggest a small interface change:

rename geventhttpclient.httplibcompat to geventhttpclient.httplib

Inside geventhttpclient/ you'd have to replace

import httplib

with this:

httplib = __import__('httplib')

but it's a small price to pay for the less noisy module name.


Another question is about exceptions. From reading the code it seems that
geventhttpclient.httplibcompat may raise HTTPParseException, which is
a subclass of Exception.

This means that httplibcompat is not compatible to httplib exception-wise.
However, rather than catching HTTPParseException and re-raising exceptions
of the right type (which is ugly and loses stack traces), I suggest to
derive HTTPParseException from httplib.HTTPException.

Damjan

unread,
Dec 26, 2011, 2:05:11 PM12/26/11
to gevent: coroutine-based Python network library
> https://github.com/gwik/geventhttpclient
>
> geventhttpclient is a fast, high performance, concurrent http client
> library

how does it compare to restkit - which also supports gevent and
concurency
and also has an C based http parser (maybe the same?).

Antonin AMAND

unread,
Dec 27, 2011, 2:12:12 AM12/27/11
to gev...@googlegroups.com
> Nice, thanks for sharing! Added to
> http://code.google.com/p/gevent/wiki/ProjectsUsingGevent

Thanks!

> If I may suggest a small interface change:
>
> rename geventhttpclient.httplibcompat to geventhttpclient.httplib
>
> Inside geventhttpclient/ you'd have to replace
>
>    import httplib
>
> with this:
>
>    httplib = __import__('httplib')
>
> but it's a small price to pay for the less noisy module name.
>
>
> Another question is about exceptions. From reading the code it seems that
> geventhttpclient.httplibcompat may raise HTTPParseException, which is
> a subclass of Exception.
>
> This means that httplibcompat is not compatible to httplib exception-wise.
> However, rather than catching HTTPParseException and re-raising exceptions
> of the right type (which is ugly and loses stack traces), I suggest to
> derive HTTPParseException from httplib.HTTPException.

Make sense, done.

Antonin

Antonin AMAND

unread,
Dec 27, 2011, 2:16:05 AM12/27/11
to gev...@googlegroups.com
> how does it compare to restkit - which also supports gevent and
> concurency
> and also has an C based http parser (maybe the same?).

The http parser is the same indeed.
geventhttpclient is more simple and focused on gevent.

I added restkit to my stupid simple benchmark, for what it's worth
geventhttpclient is faster.

Antonin

Andy

unread,
Jan 3, 2012, 4:21:59 PM1/3/12
to gevent: coroutine-based Python network library
> support HTTP 1.1 persistent connections

Does that mean:

1) multiple greenlets can share the same HTTP connection? OR
2) each greenlet can send multiple pipelined HTTP requests through the
same connection. But if I have 10 greenlets I'd still need 10
connections?

Antonin AMAND

unread,
Jan 4, 2012, 7:18:15 AM1/4/12
to gev...@googlegroups.com

None of the above, the HTTPClient class has a built-in connection
pool. It knows when the connection can be reused.
The number of concurrent connections can be set with the concurrency
parameters of the HTTPClient __init__ method (1 by default).
Then, you can share a HTTPClient instance among greenlets. When one of
them makes a request it will get a connection from the pool and if the
connection is reusable, other greenlets will reuse it (note that you
need to consume all the response body from the response before the
connection can be reused).
If you have 100 greenlets and 10 connections the 11th greenlet will
wait for the any of the first ten requests to complete and will reuse
the already opened connection to the server.
This model is especially efficient when pulling from some API's
(facebook, twitter...) or when you connect to a daemon (neo4j),
because it allows to limit the concurrency while optimizing the reuse
of connections, a bit like a db connection pool.

geventhttpclient doesn't support pipelining as very few servers
support it correctly. Furthermore the above connection handling model
is much more efficient than pipelining when you run on an evented
core: Pipelining is FIFO so if the first request is very long to
process, all the other greenlets would need to wait for it to
complete. It can be interesting though when you don't want to make too
many connections to the server.

For those interested in the subject, I recommend this great article :
http://www.igvita.com/2011/10/04/optimizing-http-keep-alive-and-pipelining/

I made some fixes as I'm putting it in production so please update.

Regards,

Antonin

Andy

unread,
Jan 4, 2012, 8:32:23 PM1/4/12
to gevent: coroutine-based Python network library
Thanks for explanation. You made a good point about Pipelining being
FIFO and not suitable for an evented core.
> For those interested in the subject, I recommend this great article :http://www.igvita.com/2011/10/04/optimizing-http-keep-alive-and-pipel...

Andy

unread,
Jan 6, 2012, 5:03:49 AM1/6/12
to gevent: coroutine-based Python network library
If I have 4 Python/gevent worker processes running, and I set up a
geventhttpclient pool with 10 connections, does that mean I'll have 10
connections shared by those 4 Python worker processes, or each Python
worker process will get 10 connections giving a total of 40
connections?


On Jan 4, 7:18 am, Antonin AMAND <antonin.am...@gmail.com> wrote:
> For those interested in the subject, I recommend this great article :http://www.igvita.com/2011/10/04/optimizing-http-keep-alive-and-pipel...

Antonin AMAND

unread,
Jan 6, 2012, 1:36:05 PM1/6/12
to gev...@googlegroups.com
If by processes you mean os level processes you'll have 40 connections.
A socket isn't something you can share among different processes.
What kind of service do you connect to ? Do you have any idea how long
the remote service keep the connection open ?
If the remote service you connect to tend to keep the connection open
as long as you use it you might want to reduce
the number of connections, 10 seems a lot, I tend to use 5 connections
even for intensive parallel jobs, but it depends on what you do.
You also need to know how many greenlet run concurrently inside one
worker because if there's only one you don't need more than 1
connection unless you spawn greenlets to run some tasks in parallel.

Andy

unread,
Jan 6, 2012, 2:52:26 PM1/6/12
to gevent: coroutine-based Python network library
Yes I meant OS level processes. I'll be connecting to 2 types of web
services: external, public ones such as facebook graph API, and
internal web services such as a Lucene search engine. The internal WS
response time will most likely be less than 1 second, external WS -
it's hard to say.

By the way, does facebook or twitter allow persistent connections to
their web services, or you'd need to disconnect after a certain amount
of request/time?
Reply all
Reply to author
Forward
0 new messages