how to shut down active jobs to retry CoralNet API

48 views
Skip to first unread message

Katje English

unread,
Sep 29, 2022, 4:58:25 AM9/29/22
to CoralNet Users
Hello admins,
I  am running large batches of images through coralnet. 

Sometimes the internet speeds here are insufficient to complete the job. I have shut down the kernel, shut down and restarted the computer, but still coralnet claims that I have jobs active and cannot proceed to request. I know those jobs are not active on my PC now but perhaps they are queued on the server side. 

The exact error I am getting is below. 
Is there a way to request manually that coralnet shut down my active jobs, so that I can proceed with new requests?

Currently on batch: 100 containing: 100 entries. Error: <Response [429]> b'{"errors":[{"detail":"You already have 5 jobs active (IDs: 18286, 18287, 18288, 18289, 18290). You must wait until one of them finishes before requesting another job."}]}'

Any help much appreciated!
Best,
Katje

Katje English

unread,
Sep 29, 2022, 8:12:58 AM9/29/22
to CoralNet Users
I wanted to add: I can see the status of these jobs directly by plugging the following url into my browser:
(just an example of the first job). However, I'm just looking for the API's command to 'kill' the job. the internet suggested /disable/ but that is not supported in this API. 

Thank you,
Katje

Stephen Chan

unread,
Sep 29, 2022, 6:16:18 PM9/29/22
to CoralNet Users
Hi Katje,

We currently don't provide a way to shut down your active jobs - it's on our to-do list (GitHub link: https://github.com/beijbom/coralnet/issues/353 ).

I've just manually canceled jobs 18286-18290 for you.

Katje English

unread,
Oct 1, 2022, 11:02:06 AM10/1/22
to Stephen Chan, CoralNet Users
Dear Stephen, 
Thank you so much for doing that! 
Best,
Katje

--
You received this message because you are subscribed to a topic in the Google Groups "CoralNet Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/coralnet-users/7BSazee3jrY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to coralnet-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/coralnet-users/f666b714-7969-4c7f-b924-5fe4bdfbd06fn%40googlegroups.com.

Angel Umana

unread,
Oct 3, 2022, 4:46:10 PM10/3/22
to CoralNet Users

Hello,

I don't suppose I could also get my jobs cleared? Wondering where the best place to request this could be. I have jobs 18291, 18292, 18298, 18299, 18300. I was trying to see if I was doing something wrong as I had been able to process some batches of images about a month ago within a couple hours, but now it feels like the jobs hang for forever. Ended up filling my job queue in the debugging process...

I'm guessing my setup is fine but the queue is just long this past week. Jobs 18291 and 18292 are still in queue but I think I requested them last Thursday, 4 days ago.

Unrelated: For how long can I see the result of a completed job? Last time I got some jobs done I had some code pinging CoralNet every minute, but with the wait times right now I'm probably just going to check once a day.

Thanks,
Angel

Stephen Chan

unread,
Oct 3, 2022, 7:03:46 PM10/3/22
to CoralNet Users
Hi Angel - just canceled the 5 jobs you listed. You can see the result of a completed job for up to 30 days after its status was last updated.

The queue hasn't looked particularly long recently, but there is likely some problem if it's taking this much time to run jobs. I'll investigate and see what I can find.

Katarina English

unread,
Oct 4, 2022, 5:00:56 AM10/4/22
to Stephen Chan, CoralNet Users
Hi Angel and Stephen,
Despite Stephen having cleared my active jobs (thank you so much), a batch of 100 images is presently taking over 8 hours.  I stopped the job after 8. 

I’m wondering if this is:
A) a server issue in UCSD at the moment
B) a throttling issue with IPs like Angel’s and mine which have sent many past requests
C) an internet speed issue on my end.

Any thoughts much appreciated. 
Best,
Katje 
Sent from my iPhone

On Oct 4, 2022, at 2:03 AM, Stephen Chan <stephe...@gmail.com> wrote:



Stephen Chan

unread,
Oct 4, 2022, 10:18:11 PM10/4/22
to CoralNet Users
I took a closer look and I think I've gotten the queue un-stuck.

There were a few job-units submitted on September 28 which had some missing job-unit metadata. I don't yet know why those units had metadata missing; it doesn't seem to have anything to do with the contents/timing of the API requests that were made. It could've been a random glitch on the AWS side or in the CoralNet infrastructure. In any case, the missing metadata would crash the job-queue-processing routine every time the routine ran, which usually meant it couldn't process any jobs at all (including the ones whose metadata had no problems).

I took those job-units entirely out of the queue (this is slightly different from canceling jobs from the API's point of view) and it seems that things are running properly again. Thanks for reporting; let me know if you see it get slow again. I'll see if I can make the queue-processing more robust when there's missing metadata.

Angel Umana

unread,
Oct 5, 2022, 4:09:41 PM10/5/22
to CoralNet Users
Thank you for looking into this and resolving it so quickly Stephen! I was able to run a couple jobs and they ran smoothly.

Angel

Katarina English

unread,
Oct 6, 2022, 7:33:16 AM10/6/22
to Stephen Chan, CoralNet Users
Thank you so much Stephen! I am so impressed by how fast you resolved that issue. 
I was able to finish batch 1 of our annual assessment yesterday, and am working on batch 2 now. I will let you know if the problem recurs. 
Your diligence is much appreciated!

Best,
Katje

Stephen Chan

unread,
Oct 7, 2022, 10:13:48 PM10/7/22
to CoralNet Users
Glad to hear it!
As a follow-up, I did confirm that the root cause of the problem was an Amazon Web Services outage on September 28. That explains why we haven't encountered this specific problem since the API was implemented (around 2 years ago). Despite the rarity, I'll work on improving how this situation gets handled so that the affected jobs get auto-canceled, and the entire queue doesn't end up getting stuck.

Katarina English

unread,
Oct 10, 2022, 3:33:40 PM10/10/22
to Stephen Chan, CoralNet Users
I’m just so relieved it wasn’t my awful scripting that was causing the problem 🤣

Much appreciated Stephen!

Sent from my iPhone

On Oct 8, 2022, at 5:13 AM, Stephen Chan <stephe...@gmail.com> wrote:


Reply all
Reply to author
Forward
0 new messages