Respecting Rate Limits on a Third Party API

17 views
Skip to first unread message

Gabriel Smith

unread,
May 27, 2022, 1:45:01 PM5/27/22
to Luigi
Hello fellow Luigi users,

My team has been using Luigi for ETL and to automate a lot a data imports for a few months now and it's been great. We are working on integrating with a third party digital asset management API which requires some out going data at the end of a long process (could be one or many requests) that we've successfully set up to parallel process across multiple tasks within a wrapper task.

However, the API has very strict rate limits and I am currently unaware of how I can share constantly updating singleton variable across many tasks that are multiprocessing. On the surface level, it seems to violate the idempotency of the tasks themselves, but sadly it is a constraint we are not able to remove.

Can we use multiprocessing.shared_memory class within the standard python library in order to facilitate this variable in some way?

I'm sure someone else has run into this issue before us and any advice would be greatly appreciated. 

-Gabe

Lars Albertsson

unread,
May 27, 2022, 2:57:38 PM5/27/22
to Gabriel Smith, Luigi
The 'resource' concept might do the trick for you. https://luigi.readthedocs.io/en/stable/luigi_patterns.html

If you put an appropriate sleep statement at the end of each data collection job in combination with a resource, I think you could implement a rate limit.


--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/luigi-user/e784bc32-b42f-45fe-b5c0-da2901c6cd5bn%40googlegroups.com.

Gabriel Smith

unread,
May 27, 2022, 3:28:55 PM5/27/22
to Luigi
Thank you for your quick reply Lars,

I haven't experimented with setting resources, or sleep statements at the end of tasks, I will give those suggestions a shot.

Thank you very much,
-Gabe

Lars Albertsson

unread,
May 28, 2022, 10:28:20 AM5/28/22
to Gabriel Smith, Luigi
I realised that it probably comes with the drawback that jobs downstream from the contented resource might be delayed. Which may or may not be ok for you.

I think a proper solution might require throttling support in luigid, which is not present.

If you don't get the sleep hack to work, you could try another hack where you use the resource functionality to limit execution to a single worker, and in that worker ensure that it does not exceed the limit, which is simple within one process.

If you want to take a shot at implementing a throttle resource inside luigid, I don't think it would be difficult. You could reuse the existing resources structure, and add a mechanism that fills up resources in scheduler.get_work based on time, instead of returning resources when a task exits.
Reply all
Reply to author
Forward
0 new messages