Every 8th submit ID seems to not respond to status requests

47 views
Skip to first unread message

Karen Collins

unread,
Sep 29, 2013, 11:46:36 PM9/29/13
to astro...@googlegroups.com
It seems that every 8th submit request never responds to status queries. This happens if my submit requests get all of the IDs in sequence, or even if one of the submit IDs is allocated to another user. For example, below is part of the log from a batch process that I ran. Note that every 8th submit ID number times out (i.e., 71954, 71962, 71970, ...), and note that in this sequence,  my requests do not get every submit ID request (obviously other users interleaved as expected), so I am wondering if there is a API process on the server end that is missing every 8th global request. Any ideas?
 
As a final note, I did notice that on a rare occasion, the 7th rather than 8th submit ID times out.
 
Thanks,
    Karen Collins
    (AstroImageJ Implementation)
 
----------------------------------------------------------------------------------------------
 

[2013-09-29T21:35:53.588] Astrometry status for slice 1: SOLVED

[2013-09-29T21:36:09.796] Astrometry status for slice 2: SOLVED

[2013-09-29T21:36:21.091] Astrometry status for slice 3: SOLVED

[2013-09-29T21:36:33.040] Astrometry status for slice 4: SOLVED

[2013-09-29T21:36:50.403] Astrometry.net submission ID 71954 timed out for slice 5. Resubmitting.

[2013-09-29T21:37:01.323] Astrometry status for slice 5: SOLVED

[2013-09-29T21:37:12.774] Astrometry status for slice 6: SOLVED

[2013-09-29T21:37:23.959] Astrometry status for slice 7: SOLVED

[2013-09-29T21:37:35.316] Astrometry status for slice 8: SOLVED

[2013-09-29T21:37:47.530] Astrometry status for slice 9: SOLVED

[2013-09-29T21:38:05.392] Astrometry.net submission ID 71962 timed out for slice 10. Resubmitting.

[2013-09-29T21:38:15.564] Astrometry status for slice 10: SOLVED

[2013-09-29T21:38:27.700] Astrometry status for slice 11: SOLVED

[2013-09-29T21:38:39.104] Astrometry status for slice 12: SOLVED

[2013-09-29T21:38:50.274] Astrometry status for slice 13: SOLVED

[2013-09-29T21:39:01.428] Astrometry status for slice 14: SOLVED

[2013-09-29T21:39:13.580] Astrometry status for slice 15: SOLVED

[2013-09-29T21:39:24.719] Astrometry status for slice 16: SOLVED

[2013-09-29T21:39:42.893] Astrometry.net submission ID 71970 timed out for slice 17. Resubmitting.

[2013-09-29T21:39:52.393] Astrometry status for slice 17: SOLVED

[2013-09-29T21:40:06.027] Astrometry status for slice 18: SOLVED

[2013-09-29T21:40:21.315] Astrometry status for slice 19: SOLVED

[2013-09-29T21:40:33.515] Astrometry status for slice 20: SOLVED

[2013-09-29T21:40:44.325] Astrometry status for slice 21: SOLVED

[2013-09-29T21:40:55.854] Astrometry status for slice 22: SOLVED

[2013-09-29T21:41:14.278] Astrometry.net submission ID 71978 timed out for slice 23. Resubmitting.

[2013-09-29T21:41:24.121] Astrometry status for slice 23: SOLVED

[2013-09-29T21:41:35.322] Astrometry status for slice 24: SOLVED

[2013-09-29T21:41:47.537] Astrometry status for slice 25: SOLVED

[2013-09-29T21:41:59.065] Astrometry status for slice 26: SOLVED

[2013-09-29T21:42:10.032] Astrometry status for slice 27: SOLVED

[2013-09-29T21:42:21.420] Astrometry status for slice 28: SOLVED

[2013-09-29T21:42:33.760] Astrometry status for slice 29: SOLVED

[2013-09-29T21:42:51.216] Astrometry.net submission ID 71986 timed out for slice 30. Resubmitting.

[2013-09-29T21:43:01.902] Astrometry status for slice 30: SOLVED

[2013-09-29T21:43:13.337] Astrometry status for slice 31: SOLVED

[2013-09-29T21:43:25.786] Astrometry status for slice 32: SOLVED

[2013-09-29T21:43:36.862] Astrometry status for slice 33: SOLVED

[2013-09-29T21:43:54.459] Astrometry.net submission ID 71994 timed out for slice 34. Resubmitting.

[2013-09-29T21:44:05.550] Astrometry status for slice 34: SOLVED

[

Dustin Lang

unread,
Sep 30, 2013, 10:44:41 AM9/30/13
to astro...@googlegroups.com
Wow, that is some nice sleuth work!

Very mysterious.  The number 8 (among many other things) is the number of threads we use to process incoming submissions, using python's multithreading module.  I believe each of these keeps its own database connection, among other things.  Perhaps one of them died in a weird way.

I have restarted the server-side "process_submissions.py" script.  Any chance you could check whether this problem persists?

Thanks very much,
--dustin


Karen Collins

unread,
Sep 30, 2013, 1:53:39 PM9/30/13
to astro...@googlegroups.com
Hi Dustin,
     The problem persists as of now. Every eighth one times out. Here is a portion of the new log:
[2013-09-30T13:40:18.812]      Astrometry status for slice 1: SOLVED
[2013-09-30T13:40:30.309]      Astrometry status for slice 2: SOLVED
[2013-09-30T13:40:41.603]      Astrometry status for slice 3: SOLVED
[2013-09-30T13:40:58.202]      Astrometry.net submission ID 72508 timed out for slice 4. Resubmitting.
[2013-09-30T13:41:09.980]      Astrometry status for slice 4: SOLVED
[2013-09-30T13:41:21.337]      Astrometry status for slice 5: SOLVED
[2013-09-30T13:41:32.725]      Astrometry status for slice 6: SOLVED
[2013-09-30T13:41:44.082]      Astrometry status for slice 7: SOLVED
[2013-09-30T13:41:56.655]      Astrometry status for slice 8: SOLVED
[2013-09-30T13:42:08.152]      Astrometry status for slice 9: SOLVED
[2013-09-30T13:42:19.462]      Astrometry status for slice 10: SOLVED
[2013-09-30T13:42:35.874]      Astrometry.net submission ID 72516 timed out for slice 11. Resubmitting.
[2013-09-30T13:42:47.636]      Astrometry status for slice 11: SOLVED
[2013-09-30T13:42:59.149]      Astrometry status for slice 12: SOLVED
[2013-09-30T13:43:10.475]      Astrometry status for slice 13: SOLVED
[2013-09-30T13:43:22.050]      Astrometry status for slice 14: SOLVED
[2013-09-30T13:43:34.233]      Astrometry status for slice 15: SOLVED
[2013-09-30T13:43:45.668]      Astrometry status for slice 16: SOLVED
[2013-09-30T13:43:56.947]      Astrometry status for slice 17: SOLVED
[2013-09-30T13:44:13.483]      Astrometry.net submission ID 72524 timed out for slice 18. Resubmitting.
[2013-09-30T13:44:25.526]      Astrometry status for slice 18: SOLVED
[2013-09-30T13:44:37.226]      Astrometry status for slice 19: SOLVED
[2013-09-30T13:44:48.396]      Astrometry status for slice 20: SOLVED
[2013-09-30T13:44:59.831]      Astrometry status for slice 21: SOLVED
[2013-09-30T13:45:12.529]      Astrometry status for slice 22: SOLVED
[2013-09-30T13:45:23.886]      Astrometry status for slice 23: SOLVED
[2013-09-30T13:45:40.266]      Astrometry.net submission ID 72532 timed out for slice 24. Resubmitting.
[2013-09-30T13:45:51.202]      Astrometry status for slice 24: SOLVED
[2013-09-30T13:46:02.839]      Astrometry status for slice 25: SOLVED

Dustin Lang

unread,
Sep 30, 2013, 2:00:18 PM9/30/13
to astro...@googlegroups.com
That is SO weird.

Could I ask you to test supernova.astrometry.net?  It uses 4 threads, rather than nova's 8.  (Your API key will be different there.)

thanks,
--dustin

Karen Collins

unread,
Sep 30, 2013, 2:22:08 PM9/30/13
to astro...@googlegroups.com
I am getting this message for all submissions to supernova:
 
"Unfortunately, we couldn't process this submission properly:
...e "process_submissions.py", line 449, in dosub
    img = get_or_create_image(df)
  File "process_submissions.py", line 504, in get_or_create_image
    raise Exception('This file\'s type is not supported.')
Exception: This file's type is not supported.
 
Does supernova support a file with a list of x,y coordinates?
 
Karen

Dustin Lang

unread,
Sep 30, 2013, 2:26:07 PM9/30/13
to astro...@googlegroups.com
Sorry, try now.  I guess I forgot to update the supernova code.

Karen Collins

unread,
Sep 30, 2013, 2:32:20 PM9/30/13
to astro...@googlegroups.com
I still get the same message for all submissions:
 

Unfortunately, we couldn't process this submission properly:
...e "process_submissions.py", line 449, in dosub
    img = get_or_create_image(df)
  File "process_submissions.py", line 504, in get_or_create_image
    raise Exception('This file\'s type is not supported.')
Exception: This file's type is not supported.
 
 
Some example submit IDs are 8994, 8995, 8996

Dustin Lang

unread,
Sep 30, 2013, 2:44:43 PM9/30/13
to astro...@googlegroups.com
sorry, try now.  (i actually tested it this time...)
--dstn

Karen Collins

unread,
Sep 30, 2013, 2:55:50 PM9/30/13
to astro...@googlegroups.com
Now I get the "processing..." response that seems to never change for a submission. See sub_IDs 8987, 8988, 8989 as examples. This field solves in ~1 sec on nova. As of now, they still say processing... when I log into the website and check the status.

Dustin Lang

unread,
Oct 2, 2013, 8:32:34 AM10/2/13
to astro...@googlegroups.com
Hmm, they seem to have been processed now.

Could you please try again.  Sorry for the trouble!

One thing to note: the system notices bit-identical submissions, so if you are resubmitting exactly the same test images, you might see patterns in what works and what doesn't...

--dstn

Dustin Lang

unread,
Oct 2, 2013, 3:28:32 PM10/2/13
to astro...@googlegroups.com

Ok, finally think I found something.  Or a set of somethings.  Hold on to your socks.

First, a problem with the code that handles submissions.  When it gets an ASCII table of positions, it checks (via database constraints) that the maximum x and maximum y coordinates listed are positive.  In at least one submission, they appear to be RA,Dec positions rather than X,Y:

270.7819007,-27.7805105
270.8004984,-27.7444513
270.7738378,-27.7954999
271.0104432,-27.9148587
271.0145363,-27.8978697
270.8142363,-27.8174037

this borks the database, and then we hit this problem:
https://code.djangoproject.com/ticket/11569
and all subsequent Submissions that the target thread tries to process die horribly.  8 threads round-robin and hey presto, every eighth submission dies.

I'll get started on a fix...

--dstn

Karen Collins

unread,
Oct 2, 2013, 5:47:40 PM10/2/13
to astro...@googlegroups.com
Are those coordinates from one of my submit_IDs?

Dustin Lang

unread,
Oct 2, 2013, 5:54:55 PM10/2/13
to astro...@googlegroups.com
Umm, sorry, I'm not sure.  Not necessarily -- the "process_submissions.py" script checks for old submissions that didn't finish processing, and I think this was one of those.

Please let me know if you see this "every 8th one fails" problem again!  I put in a hackish patch for the proximal problem.  I think the issue is supposed to be fixed in Django 1.5, but I don't have the chutzpah to tackle that upgrade today...

--dstn


Karen Collins

unread,
Oct 3, 2013, 12:39:57 AM10/3/13
to astro...@googlegroups.com
Wow, not sure what you did to fix the timeout issue, but I just ran a stack 160 images and had a 100% solve rate (on the nova server). No timeouts at all.
Thanks!
 
 

Dustin Lang

unread,
Oct 3, 2013, 8:37:35 AM10/3/13
to astro...@googlegroups.com
Hey, that's great!  Glad to hear it, and thanks very much for your sleuthing work and persistence.

In short, the problem was that there was a database integrity check (like a "NOT NULL" column), that the width and height of submitted images must be positive.  With that weird list of coords with negative values, that check failed, a exception was thrown, and everything would have been fine...  Except that the django web/database framework we use has (or had) a problem recovering from database problems (it wasn't using transactions consistently), so that one database integrity check caused that thread's database connection to become invalid, but not in a strong enough way that the thread died.  So that thread (one of eight) kept accepting new submission to process, but immediately failed whenever it touched the database.  Nasty little problem!

The hackish fix I put in was just to check for non-positive width or height before sending them to the database, and failing with a different kind of exception to avoid the fatal database exception.

thanks again for the excellent bug report,
--dustin

Reply all
Reply to author
Forward
0 new messages