Multiple threads with several hubs (several greenlets in several threads)

50 views
Skip to first unread message

Ale

unread,
Jul 22, 2019, 11:36:30 AM7/22/19
to gevent: coroutine-based Python network library
Hello all,

This is a follow up from my previous message regarding uploading records to a server. Each record is a line in CSV, the number of records may vary, from a million to +100 million records. I would like to upload it as fast as possible. 

The typical:

```
def post(data):

pool.map_unordered(post, [create_data(line) for line csv.reader(csv)]
````

Got me about 10k requests per minutes which for 10 million records meant waiting for 16hrs for the process to end.

I tried gipc and got about 20k rpm. (You can see a previous mail for that)

I think I can improve the results if I can have one thread reading chunks of the file and sending the records through a queue where several threads with gevents can post to the server, but I'm not sure if I can use a threading.Queue shared by several greenlets? Should I have one greenlet per thread taking out of the threading.Queue and passing on to other greenlets through gevent.queue?

Has anyone done something similar?
Reply all
Reply to author
Forward
0 new messages