29 views
Skip to first unread message

Ale

unread,
Jul 17, 2019, 4:25:54 PM7/17/19
to gev...@googlegroups.com
Hello all,

TL;DR

I need to read from a CSV file and HTTP POST to a server the fastest way I can. I think reading from the file is limiting my concurrency but I don't know where to look to check that.

Would using the gevent.fileobj wrapper help? Is the fileobj wrapper used by default when monkey patched? Would using geventhttpclient help? I'm using requests with a session shared by the spawned threads.

---

I'm trying to fill up a KVS (key-value store) which has an http interface. Saving a value means: POST with the body the key and value.

The key-values are stored in a CSV file. 

I've tried different implementations, using a Pool and passing the iterator of the opened file to the map_unorderd function. And currently adding using a queue to`put` the items from the csv file, and have 10 workers that consume from the queue.

As you can see I read the file line by line, pass the line to a gevent thread to POST to the server.

I get very similar throughputs reported by the server with the different implmentations. And eventhough I tried increasing the concurrency (more threads) it had no effect on the throughput.

Thinks I think I could try:

* chunking instead of reading line by line
* mmap file

Any ideas or help?

--
Ale.

Matt Billenstein

unread,
Jul 17, 2019, 7:32:14 PM7/17/19
to gev...@googlegroups.com
You'll be limited to one cpu in any case - I'd try the same pool/worker
approach using multiprocessing to get over being cpu bound re the data
processing...

m
> --
> You received this message because you are subscribed to the Google Groups
> "gevent: coroutine-based Python network library" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to gevent+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> gevent/
> CAGKqKhwtBSp-6bxpYb9S0%3DV9C7efv%3DedJ6uDoU5gA3gC%3Dbruqg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

Kevin Tewouda

unread,
Jul 17, 2019, 11:30:13 PM7/17/19
to gevent: coroutine-based Python network library
Hello Ale,
The fileobj wrapper is not monkeypatch by default and it could help you in your case to offload the reading part to a thread different than the one used for the event loop.
Also for general CPU-bound programs you can look this answer from Jason.

Best regards.

Ale

unread,
Jul 18, 2019, 5:52:05 PM7/18/19
to gevent: coroutine-based Python network library
Hello Matt, Hello Kevin,

Thank you for your reply.

There is not much CPU processing, just the construction of the JSON document to be POSTed. After searching how to offload the input to a different thread/process I found gipc which seems what I need.

I'll try that out.

Kevin Tewouda

unread,
Jul 18, 2019, 10:49:25 PM7/18/19
to gevent: coroutine-based Python network library
Hi Ale,
gipc can be a good option for you but do you read follow the link that I put in my previous response? It is another option to offload the input in a different thread.

Good luck

Ale

unread,
Jul 21, 2019, 8:59:11 PM7/21/19
to gevent: coroutine-based Python network library


El jueves, 18 de julio de 2019, 23:49:25 (UTC-3), Kevin Tewouda escribió:
Hi Ale,
gipc can be a good option for you but do you read follow the link that I put in my previous response? It is another option to offload the input in a different thread.

I didn't try, although I think I will try it, tomorrow. The gipc solution gets me 20k requests per minute. I feel that I should be able to POST at a higher rate.

With gipc I had the parent process sending lines from a comma separted value through a pipe to a child process that sends the POSTs with gevent and geventhttpclient/grequests (I got better results with grequests).

All this is inside a gunicorn+flask endpoint. 

The gunicorn+flask listens to and endpoint say, /load, this downloads the csv, and the POST process starts, reading from the CSV and trying to POST to a server. This is like cache warming procedure.


Current problem I'm having seems parent process gets stuck in the writer.put() in the pipe: https://gist.github.com/alep/0234f89cc0245b5af949be504e5e8905

Thanks for your help
Reply all
Reply to author
Forward
0 new messages