LSD hanging without error or crashing

27 views
Skip to first unread message

Edouard Bernard

unread,
May 21, 2015, 12:12:58 PM5/21/15
to lsd-...@googlegroups.com
Hi all,

I am having a strange issue with LSD (v0.5.5). It seems to hang at the end of certain queries without returning the result, but it's not crashing either. Has it ever happened to any of you?

Example:

$ python density_map.py
 [2418 el.]:::::::::::::::::::

It takes a bit under 2 hours to get there, then nothing happens; I've waited several hours just in case. It is not using CPUs at all (on an 8-core machine), just a tiny amount of RAM:

$ top
  PID USER   PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND     
 8290 ejb       20   0 1582m 716m  87m S   0.0  3.0  33:50.99 python             
 8289 ejb       20   0      0       0       0    Z   0.0  0.0  21:50.86 python             
 8288 ejb       20   0 1718m 604m  87m S   0.0  2.5  36:03.87 python             
 8287 ejb       20   0 1427m 457m  87m S   0.0  1.9  32:18.84 python             
 8286 ejb       20   0 1652m 524m  87m S   0.0  2.2  37:02.68 python             
 8285 ejb       20   0      0       0       0    Z   0.0  0.0  22:13.25 python             
 8284 ejb       20   0 1714m 825m  87m S   0.0  3.4  34:25.47 python             
 8283 ejb       20   0 1693m 553m  87m S   0.0  2.3  37:21.83 python             
 8231 ejb       20   0   899m 267m 1496 S   0.0  1.1   0:05.53 python   

$ ps
  PID TTY          TIME CMD
 8231 pts/0    00:00:05 python
 8283 pts/0    00:37:21 python
 8284 pts/0    00:34:25 python
 8285 pts/0    00:22:13 python <defunct>
 8286 pts/0    00:37:02 python
 8287 pts/0    00:32:18 python
 8288 pts/0    00:36:03 python
 8289 pts/0    00:21:50 python <defunct>
 8290 pts/0    00:33:50 python

Interestingly, the same code works very well with slightly different spatial boundaries. The example above was with "rectangle(-80,-40,270,90)". With "rectangle(-50,-40,50,90)" it produces the expected output:

$ python density_map.py
 [717 el.]::::::::::::::::::::>  1301.97 sec

Suggestions?
Thanks!

Bertrand Goldman

unread,
May 21, 2015, 12:15:38 PM5/21/15
to lsd-...@googlegroups.com
Hi Edouard,

  you really want to have a first negative RA? I don't think I ever tried, so that may not be the problem.

  Cheers,
    Bertrand.
--
You received this message because you are subscribed to the Google Groups "lsd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lsd-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

goldman.vcf

Edouard Bernard

unread,
May 21, 2015, 12:39:25 PM5/21/15
to lsd-...@googlegroups.com
Hi Bertrand,

thank you for the suggestion. The reason I use negative RA is to cover almost the whole sky in a single query; the only bit missing is the 10-degree strip in RA containing the bulge. Using "rectangle(0,-40,360,90)" does not work because my machine chokes on the bulge region with MemoryError (despite have 24GB of RAM...).

Since it is running fine with "rectangle(-50,-40,50,90)" I didn't think it could be a problem, but I will make more tests.

In case it can help, here is what appeared when I killed the query:

^C ERROR: KeyboardInterrupt [multiprocessing.queues]
Traceback (most recent call last):
  File "densitymap.py", line 81, in <module>
    for (patch, imin, jmin) in q.execute([(_coverage_mapper, dx, filter)],
  File "/<PATH>/py-lib/lsd/join_ops.py", line 1522, in execute
    for result in pool.map_reduce_chain(partspecs.items(), kernels, progress_callback=progress_callback):
  File "/<PATH>/py-lib/lsd/pool2.py", line 590, in map_reduce_chain
    for r in self.imap_unordered(input, K_fun, K_args, progress_callback=progress_callback, progress_callback_stage=stage):
  File "/<PATH>/py-lib/lsd/pool2.py", line 389, in imap_unordered
    (ident, what, data) = self.qout.get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 117, in get
    res = self._recv()
KeyboardInterrupt


And lines 81-83 in my code are:
for (patch, imin, jmin) in q.execute([(_coverage_mapper, dx, filter)],
                                     bounds=bounds):
    sky[imin:imin+patch.shape[0], jmin:jmin+patch.shape[1]] += patch

Bertrand Goldman

unread,
May 21, 2015, 12:46:00 PM5/21/15
to lsd-...@googlegroups.com
Hi Edouard,

  in any case, you can do that with:
rectangle(310,-40,50,90)
if ra2<ra1 it goes around 0.

  If this does not work I'll look for old mails as this seems to remind me something...

  I'm afraid I don't have the expertise to understand python output!

  Cheers,
    Bertrand.
goldman.vcf

Branimir Sesar

unread,
May 21, 2015, 12:46:34 PM5/21/15
to lsd-...@googlegroups.com
Hi Edouard,

Have you tried using a smaller number of workers? Try running your code with

NWORKERS=4 ./code

B
> --
> You received this message because you are subscribed to the Google
> Groups "lsd-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to lsd-users+...@googlegroups.com
> <mailto:lsd-users+...@googlegroups.com>.

Eric Morganson

unread,
May 21, 2015, 2:04:24 PM5/21/15
to lsd-...@googlegroups.com
FYi, I have had this happen when a lot of people are using the computer. I think that when the processes/RAM are full LSD stops processors politely and then does not start them up again.

-E

To unsubscribe from this group and stop receiving emails from it, send an email to lsd-users+...@googlegroups.com.

Edouard Bernard

unread,
May 22, 2015, 3:06:35 AM5/22/15
to lsd-...@googlegroups.com
Hi all,

my code appears to be running fine now: Bertrand's suggestion fixed the issue with LSD hanging, and Branimir's fixed the MemoryError when querying the whole sky at once (plus, it seems it ran significantly faster with only 4 cores). Thanks a lot for your feedback!

Edouard Bernard

unread,
Jun 8, 2015, 12:54:48 PM6/8/15
to lsd-...@googlegroups.com
Hi all,

just to follow up on this -
Eric was right, LSD still hangs sometimes at the end of a query if I have some other processes using a bit of RAM (even with NWORKERS=4).
So there may be an issue with LSD somewhere...

Eric Bellm

unread,
Jun 8, 2015, 4:14:33 PM6/8/15
to lsd-...@googlegroups.com
I've had similar experiences recently with our PTF lsd installation.  Processes will hang without crashing or exiting.  `ps ux` will show that one of the python processes is defunct.  My guess is likewise that the processes are using too much memory and getting killed by the OS, as lowering the iteration block size and keeping other users off the machine in question seem to help somewhat.

Eric
Reply all
Reply to author
Forward
0 new messages