Compiling astrometry.net on Solaris 10 (SunOS 5.10)

526 views
Skip to first unread message

Karsten Schindler

unread,
Nov 4, 2015, 3:04:04 PM11/4/15
to astrometry
We would like to install astrometry.net on a Solaris 10 / SunOS 5.10 machine (I know - observatory computers can be quite ancient).
Has anybody on this mailing list attempted such an endeavour before?
Right now we are still fighting with the prerequisite packages but maybe there are people reading this who could give us a helping hand (my background is in Linux and MacOS).

Thanks for reading,
Karsten

Dustin Lang

unread,
Nov 4, 2015, 3:52:52 PM11/4/15
to astrometry
I know it compiles cleanly on FreeBSD (or at least did some years ago!), but that's closer to Linux/MacOS than Solaris!

Good luck out there.  If it's a big-endian architecture, magic incantations will need to be applied to the index files.....  Let me know and I can help.

--dstn

Karsten Schindler

unread,
Nov 9, 2015, 11:20:27 PM11/9/15
to astro...@googlegroups.com

Hi Dustin,

Luckily, our Solaris 10 is running on a x86 architecture which is little endian, so we should be able to use the index files as is.

I got quite far. For completeness (and everybody else who is up for the same challenge):

1) I managed to install from the CSW repository:

netpbm
libnetpbm11
libnetpbm_dev
libcairo2
libcairodev
libpng12_0
libjpeg7
libz1
bzip2
python
py_numpy
py_pip (required to instal PyFITS)
gcc4gfortran (required for CFITSIO compiling)

and compiled CFITSIO myself.

I am still struggling with pyfits, I can not get this to install with easy_install or python setup.py install yet. Python complains that numpy is not installed although this is not true, possibly some path variables set wrong or so (I am not a Solaris person...).
At this moment I do not worry to much about pyfits as the core program should still compile and work.

2) Trying to compile astrometry.net-0.64, using the Solaris own 'make' throws me an error immediately:
"make: Fatal error in reader: Makefile, line 30: Unexpected end of line seen"
... so I decided to use GNU make (gmake) as I assume some incompatibility, and gmake is much more up-to-date ;-).

Then, gmake stops with an error that endian.h is missing. So I traced back the error to ./include/astrometry/an-endian.h and changed
...
#else
# include <endian.h>
...
to
#else
# include <sys/isa_defs.h>

which seems to be the equivalent file on Solaris 10 (the case of Solaris 10 is not defined in this file ;-).
That works!

3) Now, gmake complains about:

In file included from os-features.c:39:0:
qsort_reentrant.c:30:23: fatal error: sys/cdefs.h: No such file or directory
#include <sys/cdefs.h>

Reading the internet empty, I realized that Solaris 10 is simply missing the cdefs.h Unix/POSIX non-standard header file... but somebody realized that on Solaris a workaround is to replace
#include <sys/cdefs.h>
with
#include <stdio.h>

Seems to work!

4) Now gmake fails compiling dallpeaks.o...

In file included from dallpeaks.c:13:0:
.../include/astrometry/dimage.h:12:18: error: conflicting types for 'label_t'
typedef uint16_z label_t;

Here is the full error message:

My brain is fried for today...

Karsten

Dustin Lang

unread,
Nov 10, 2015, 2:22:05 PM11/10/15
to astrometry
Wow, that's great progress!

Do you know if there is a preprocessor flag that is defined for Solaris that I could use to use the different include files as you have done? (Could you "make report" and post the "report.txt" file?)

I think the dallpeaks.c fix will require changing that datatype name.

If you're using 0.64 you should be able to use the 'master' version instead -- I just pushed a commit that changes the name of 'label_t' to 'dimage_label_t', which should avoid this conflict.

cheers,
--dustin

Karsten Schindler

unread,
Nov 10, 2015, 2:51:28 PM11/10/15
to astro...@googlegroups.com

Hi Dustin,

I achieved quite some more progress:

I realized that there are indeed data type conflicts with system header files /usr/include/ia32/sys/machtypes.h for label_t and /usr/include/sys/types.h for index_t.
As I am in a VM sandbox, I ruthlessly commented the system header definitions out to proceed ;-), works for now. Will look into your suggestion later to make it nice.

Four large issues that I solved so far:

- Solaris does not have a NAN definition. I found a developer claiming that a compiler independent (but not processor independent) way is to add:
int NAN = 0x7F800001;
to wcs-rd2xy.c.

- HUGE_VALF is not defined in math.h on Solaris 10, so you need to add:
/*HUGE_VAlF is a 'float' infinity*/
#ifndef HUGE_VALF
# if defined _MSC_VER
/*Microsoft MSVC 9 compiler chokes on expression 1.0f/0.0f*/
# define HUGE_VALF (1e25f*1e25f
#else
# define HUGE_VALF (1.0f/0.0f)
#endif
#endif
to an.fitstopnm.c, kdtree.h
(the MSVC9 definition could very likely be removed - copy and paste from another forum).

- isfinite(x) is not available, so you have to define it yourself:
#undef  isfinite
#define isfinite(x) \
  __extension__ ({ __typeof (x) __x_f = (x); \
                   __builtin_expect(!isnan(__x_f - __x_f), 1); })
and add it to an.fitstopnm.c, kdtree_internal.c, kdtree_internal_fits.c, wcs-resample.c, resample.c

- MIN, MAX could not be linked throughout the code. It is unclear why, but I see no header definition anywhere, too. So I replaced it in all c files with fmin, fmax. Certainly something that could be improved by a knowledgeable programmer.
just look with grep "MIN(" * and grep "MAX(" * for all instances... its a lot of files...

Now, gmake seems to compile everything from the core pacakge... but stops with the first instruction in the /catalogs folder:


Hmm... what to do now?

I googled this error and realized that there is a "freebsd notes" document on trac.astrometry.net, but the server seems to be down, so I can not access it...

Karsten

Dustin Lang

unread,
Nov 10, 2015, 3:39:51 PM11/10/15
to astrometry
Hi,

MIN/MAX are defined in sys/param.h, or at least are on linux/macosx...

I am attaching ngc2000* files -- these are just generated from their text data files into C and python code by an awk script.  Probably the awk program on Solaris doesn't understand the awk script.

Hope that helps!
--dstn


ngc2000names.c
ngc2000.py
ngc2000accurate.py
ngc2000entries.c
ngc2000entries.py
ngcic-accurate-entries.c

Karsten Schindler

unread,
Nov 10, 2015, 4:44:22 PM11/10/15
to astro...@googlegroups.com

Thanks Dustin, those files made gmake skip the problematic awk script.

The macros for MIN/MAX:
#define MIN(a,b) (((a)<(b))?(a):(b))
#define MAX(a,b) (((a)>(b))?(a):(b))

are missing in my sys/param.h file. Thanks for the heads up. Why is life among these similar looking operating systems so difficult? ;-)

Now, NAN is not defined in 2mass.h. I thought I fix this simply by adding the same line as suggested above:
int NAN = 0x7F800001;
But then, gmake complains that NAN is multiply defined: in 2masstofits.c, 2mass-fits.c and 2mass.c.
Adding the NAN definition into the c files does not do the trick.


So I checked again header files and added it instead to starutil.h which seemed to be the better place.
Then all 2MASS related code files compile and gcc stops with a similar error for the Tycho2 files.

It feels like I am almost there but gcc does not let me succeed ;-)

Karsten

Dustin Lang

unread,
Nov 10, 2015, 4:56:12 PM11/10/15
to astrometry
What if you make it

static int NAN = 0x7F800001;

?


Karsten Schindler

unread,
Nov 10, 2015, 5:13:35 PM11/10/15
to astro...@googlegroups.com
Makes sense! I put this line into starutil.h and deleted the line from wcs-rd2xy.c.
Wow, gmake was quite going somewhere now.
Seems I arrived in the /blind folder!
engine-main.c seems to be probleamtic now: GLOB_TILDE and GLOB_BRACE seem to be undeclared.

And this seems to be another issue on Solaris... Googling for a workaround... first results show that GLOB_BRACE is not available on Solaris. :(

Btw, thanks for your continuous feedback and encouragement!

Karsten

Dustin Lang

unread,
Nov 10, 2015, 5:26:48 PM11/10/15
to astrometry
Ok, according to the man page, GLOB_TILDE and GLOB_BRACE are not in the POSIX standard.

Try just changing it to:

           int flags = 0;

That's just used in parsing the astrometry.cfg file -- if you say "index ~/index_file/index-4207.fits" then glob expands the "~".

cheers,
--dustin


Karsten Schindler

unread,
Nov 10, 2015, 7:08:59 PM11/10/15
to astro...@googlegroups.com
Awesome, that solved that. Meanwhile I realized another type conflict in /sys/types.h: quad_t - to achieve progress, I did the very same ruthless uncommenting as for index_t.
Now I am getting all kind of undefined symbols: gethostbyname, socket, connect, hstrerror, h_errno
I try to run:
gmake -C blind CFLAGS="-lnsl -lsocket -lresolv" to link to these libraries. The undefined symbols go away, but now /blind/image2xy-files.c can't find header files in /include/astrometry/...
I tried to run
gmake -C blind --include-dir "/path/to/astrometry/source/include/astrometry" CFLAGS="-lnsl -lsocket -lresolv"
but this does not change the situation. Somehow the path to the header files is lost if I call gmake with CFLAGS? Guess I have to adapt the makefile?

Karsten

Dustin Lang

unread,
Nov 10, 2015, 8:23:38 PM11/10/15
to astrometry
To add libraries (which is what you're doing), use LDLIBS rather than CFLAGS;

gmake LDLIBS="-lnls -lsocket -lresolv"

Hope that helps!
--dstn

Karsten Schindler

unread,
Nov 10, 2015, 8:44:25 PM11/10/15
to astrometry
More success, almost there.

I changed one line of the primary make file to
...
$(MAKE) -C blind CFLAGS="lnsl -lsocket -lresolv"
so all programs in the /blind folder get linked to these libraries.

I have now added:

#undef  isfinite
#define isfinite(x) \
  __extension__ ({ __typeof (x) __x_f = (x); \
                   __builtin_expect(!isnan(__x_f - __x_f), 1); })
to an.fitstopnm.c, kdtree_internal.c, kdtree_internal_fits.c, wcs-resample.c, resample.c, simplexy.c, dmedsmooth.c, dcen3x3.c.

I also had to define:
#undef  isnormal
#define isnormal(x) \
  __extension__ ({ __typeof(x) __x_n = (x); \
                   if (__x_n < 0.0) __x_n = -__x_n; \
                   __builtin_expect(isfinite(__x_n) \
                                    && (sizeof(__x_n) == sizeof(float) \
                                          ? __x_n >= __FLT_MIN__ \
                                          : sizeof(__x_n) ==
sizeof(long double) \
                                            ? __x_n >= __LDBL_MIN__ \
                                            : __x_n >= __DBL_MIN__), 1); })
to dcen3x3.c.

I have now replaced MIN / MAX with fmin / fmax in:
an.fitstopnm.c, fitsioutils.c, qfits_convert.c, qfits_table.c, ioutils.c, downsample-fits.c, hpsplit.c, fitstable.c, healpix.c, fitstable.c, fit-wcs.c, anwcs.c, sip.c, gslutils.c, sip-utils.c, starutils.c, kdtree_internal.c, dualtree_nearestneighbour.c, fits_column_merge.c, wcsinfo.c, wcs_pv2sip.c, wcs-resample.c, resample.inc, starxy.c, fit-wcs-main.c, blind.c, solver.c, matchobj.c, verify.c, engine.c, simplexy.c, dallpeaks.c, coadd.c, convolve-image.c, dfind2.c, histogram2d.c, dobjects.c, dallpeaks.inc, dsmooth.inc, resort_xylist.c
(hopefully this list is complete; check with grep)

I had to tweak one line in wcs-resample.c as the compiler argued that the array indices were not longer integer:
line 203: bib2[(int)((fmin(fmax(i+di, 0), BH-1))*BW + (fmin(fmax(j+dj, 0), BW-1)))] = TRUE;

Also, I had to redefine variables fmin and fmax in engine.c as they were colliding with the math.c function definitions of fmin/fmax. Maybe a wise thing to do towards the next release.

I am not sure though if it was wise to make all these fmin/fmax replacements. Although the code compiles now it might affect precision and run time.
I noticed that ctmf.c contains exactly the defintion that is missing in my /sys/param.sys - exactly this definition could be added to all files above instead to leave the code untouched:

#ifndef MIN
#define MIN(a,b) ((a) > (b) ? (b) : (a))
#endif

#ifndef MAX
#define MAX(a,b) ((a) < (b) ? (b) : (a))
#endif

could be added in all files above! Maybe that is a better solution, in retrospect. If these definitions would be added to all affected c files by standard portability would be much easier.

In engine-main.c, I replaced    
int flags = GLOB_TILDE | GLOB_BRACE
with
int flags = 0

In image2xy-files.c, I had to kick out the strings AN_GIT_URL, AN_GIT_REVISION, AN_GIT_DATE as the compiler could not find their definition.

Regarding the installation of CFITSIO:
make install fails as cp -a aborts due to "an iillegal option -a".
The following files need to be copied to he respective folders:
libcfitsio* to /usr/lib
fitsio.h, fitsio2.h, longnam.h, drvrsmem.h to /usr/include
cfitsio.pc to /usr/lib/pkgconfig
(this is exactly what make install would do)

You can check then if everything is correct by typing
pck-config --libs cfitsio

gmake now runs through everything! Yippieh!!!


Karsten Schindler

unread,
Nov 10, 2015, 8:54:13 PM11/10/15
to astro...@googlegroups.com
Ok, now I have a compiled solve-field binary.

When I run it locally the following error came up:

# ./solve-field ld.so.1: solve-field: fatal: libcfitsio.so.2: open failed: No such file or directory

Compiling cfitsio, I only got a libcfitsio.so... so I made a copy, renamed it to libcfitsio.so.2, and solve-field is alive!

Trying to solve my first image, I got stuck with image2pnm.py. I do have python 2.6.4 and numpy, but not pyfits on the machine - is this the reason?
Maybe netpbm was not correctly linked. Is there a way to force a rebuilt of the whole source code?

Karsten

Dustin Lang

unread,
Nov 11, 2015, 10:07:29 AM11/11/15
to astrometry
Hi,

Yes, I believe pyfits is required.  Yuck.

If you do a "make reconfig", it will delete and re-run the os-config steps, which try to detect netpbm.  You should then be able to do a "make" and have everything rebuild.

cheers,
--dustin

Karsten Schindler

unread,
Nov 11, 2015, 10:36:59 PM11/11/15
to astro...@googlegroups.com
Thanks Dustin.

I finally managed to install pyfits and to correctly link NETPBM.
After a gmake reconfig, makefile.os-features echos HAVE_NETPBM := yes.
pyfits and numpy can be imported without problems in Python.

One thing I have to do on Solaris to compile the /blind folder is to use CFLAGS="-lnsl -lsocket -lresolv".
My goal was that the makefile runs through everything in one shot...

So I tried to pass these compiler flags aka
gmake CFLAGS="-lnsl -lsocket -lresolv",
but then include files are not fund anymore. I realized that passing CFLAGS to gmake overwrites all defined CFLAGS, not appends them. So I added at the end of makefile.common:
CFLAGS_DEF += -lnsl
CFLAGS_DEF += -lsocket
CFLAGS_DEF += -lresolv

Now the package compiles in one run!

I can now even compile the "eycandy" tools with: gmake extra.
For this I had to again do the usual changes only:
replace fmin/fmax in: plotstuff.c, plotimage.c, plotxy.c, plotannotations.c, plotgrid.c, plothealpix.c, plotradec.c,
and add definitions for isfinite to: plotimage.c, plotgrid.c.

Still, I can not solve an image. The output is unchanged:

Reading input file 1 of 1: "wfi.fits"...
ERROR: Image type not recognized: Could not determine file type (does the file exist?): wfi.fits
augment-xylist.c:588:backtick Failed to run command: /Desktop/packages/astrometry.net-0.64/util/image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.Ouaq9S --fix-sdss --infile wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.Nuaq9S --outfile /tmp/tmp.ppm.Muaq9S --ppm
 ioutils.c:515:run_command_get_outputs error reading from child output stream
 system: No such file or directory

Searching the forum this is typically a netpbm issue. Maybe gmake -reconfig followed by another gmake does not recompile "cleanly"?

I am out of ideas for today... What is still wrong? It should work by now :(

Karsten

Dustin Lang

unread,
Nov 12, 2015, 10:11:03 AM11/12/15
to astrometry
You should add the libraries to LDLIBS, not CFLAGS.  CFLAGS are "compile flags" (but also used during linking, I guess), which LDLIBS are used only during linking.

You see the error message saying that it couldn't run the image2pnm.py command?  If you run solve-field with "--no-delete-temp" (so it doesn't delete the temp files), if you run the same command it was trying to run (just copy-and-paste the image2pnm.py command line reported), does that work?

You can turn off some of the steps with "--no-fits2fits --no-remove-lines"

cheers,
--dustin


Karsten Schindler

unread,
Nov 12, 2015, 9:06:08 PM11/12/15
to astro...@googlegroups.com
Ok, I changed the three CFLAGS_DEF to LDLIBS_DEF. I can not see a difference in the compiler output, both compiles equally well.

So I ran:
solve-field --no-delete-temp wfi.fits

Reading input file 1 of 1: "wfi.fits"...
ERROR: Image type not recognized: Could not determine file type (does the file exist?): wfi.fits
augment-xylist.c:588:backtick Failed to run command: /Desktop/packages/astrometry.net-0.64/util/image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.fca4nc --fix-sdss --infile wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.eca4nc --outfile /tmp/tmp.ppm.dca4nc --ppm

 ioutils.c:515:run_command_get_outputs error reading from child output stream
 system: No such file or directory

and copied the the image2pnm.py command line:
image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.fca4nc --fix-sdss --infile wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.eca4nc --outfile /tmp/tmp.ppm.dca4nc --ppm

ERROR: Image type not recognized: Could not determine file type (does the file exist?): wfi.fits

No difference - I always get the same error.

Also, switching of the fits file sanitizer and not removing source overdensities does not help...

I compile everything doing
gmake
gmake py
gmake extra

and install via gmake install to /opt/astrometry as /usr/local is not a common folder on Solaris.
To be able to do make install I had to uncomment 4 lines in /libkd/makefile:
#     @for x in $(LIBKD_INSTALL); do \
#         echo cp $$x '$(INSTALL_DIR)/bin'; \
#        cp $$x '$(INSTALL_DIR)/bin'; \
#    done
Shell terminates the program due to semicolon, it appears to me that LIBKD_INSTALL is empty. Looking a few lines above LIBKD_INSTALL is indeed defined as
LIBKD_INSTALL := #fix-bb checktree
so # comments everything out, and there is no content in this variable.

I looked at image2pnm.py and tried to do a
from astrometry.util.filetype import filetype_short
on the python shell, this fails: python can not find this module.

My python setup is in /opt, not in /usr, so maybe I am missing an astrometry.net python module that is installed into the python library folder on Linux, but not on my system due to other paths?

Karsten

Dustin Lang

unread,
Nov 13, 2015, 10:36:22 AM11/13/15
to astrometry
When image2pnm.py starts, it sets up the python path so that the "astrometry" package can be found.  I think that's why you get different results when you try to import filetype_short from the command line.  You need to add to PYTHONPATH to tell python where the astrometry package can be found, eg

PYTHONPATH=/usr/local/astrometry/lib/python:${PYTHONPATH} python -c "from astrometry.util.filetype import filetype_short"


And let's see, image2pnm.py calls

get_image_type()
which calls
filetype_short()
which calls
filetype()
which runs the command-line program "file".

I'm going to guess that Solaris' "file" program produces different results, or doesn't understand the command-line args we give it.

The command we run is

file -b -N -L -k -r <FILENAME>

what do you get when you run that with <FILENAME> = wfi.fits?

cheers,
--dstn


Karsten Schindler

unread,
Nov 13, 2015, 4:41:22 PM11/13/15
to astro...@googlegroups.com
That was spot on Dustin.
'file' on Solaris does not understand *any* of the five command line parameters.

file -b -N -L -k -r wfi.fits
file: illegal option -- b
file: illegal option -- N
file: illegal option -- L
file: illegal option -- k
file: illegal option -- r
usage: file [-dh] [-M mfile] [-m mfile] [-f ffile] file ...
       file [-dh] [-M mfile] [-m mfile] -f ffile
       file -i [-h] [-f ffile] file ...
       file -i [-h] -f ffile
       file -c [-d] [-M mfile] [-m mfile]

I downloaded the open source implementation of 'file' from http://darwinsys.com/file/, compiled it and installed it to /opt/file. I renamed the original /usr/bin/file and made a symbolic link to the new 'file'.

Now, image2pnm.py crashes with a Segmentation Fault - core dumped when reading a FITS image...

 solve-field --no-fits2fits --no-remove-lines --no-delete-temp -z 2 wfi.fits

Reading input file 1 of 1: "wfi.fits"...
Segmentation Fault - core dumped
Command failed: /opt/astrometry/bin/an-fitstopnm -i wfi.fits > /tmp/tmpJy9Daw.pnm
augment-xylist.c:588:backtick Failed to run command: /opt/astrometry/bin/image2pnm.py --no-fits2fits --fix-sdss --infile wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.JZayzW --outfile /tmp/tmp.ppm.IZayzW --ppm

 ioutils.c:515:run_command_get_outputs error reading from child output stream
 system: No such file or directory

an-fitstopnm.c was among the many files where I replaced MIN() and MAX() with fmin() and fmax() in math.h... I do not see immediately why this should be problematic.

When reading a JPEG the error is different:

solve-field --no-fits2fits --no-remove-lines --no-delete-temp -z 2 wfi.jpg
Reading input file 1 of 1: "wfi.jpg"...
jpegtopnm: WRITING PPM FILE
augment-xylist.c:588:backtick Failed to run command: /opt/astrometry/bin/image2pnm.py --no-fits2fits --fix-sdss --infile wfi.jpg --uncompressed-outfile /tmp/tmp.uncompressed.4saaYW --outfile /tmp/tmp.ppm.3saaYW --ppm

 ioutils.c:515:run_command_get_outputs error reading from child output stream
 system: No such file or directory

Interestingly, in 'ioutils.c' line 515 there is a comment to
// https://groups.google.com/d/msg/astrometry/H0bQBjaoZeo/19pe8DXGoigJ
but this is after
#if !(defined(__CYGWIN__))?

Karsten

Karsten Schindler

unread,
Nov 18, 2015, 4:39:38 PM11/18/15
to astro...@googlegroups.com
Due to other project work only little progress on this subject, however I could isolate the problem a little bit better.

Trying to solve a FITS file the core dump reads:

core 'core' of 941:     /opt/astrometry/bin/an-fitstopnm -i /tmp/tmp.sanitized.xvaO0b
 080780d1 main     (80474ec, 8076e8a, feffa910) + 1221
 08058670 _start   (3, 8047640, 8047661, 8047664, 0, 804767e) + 80

So I ran an-fitstopnm -v -i /tmp/tmp.sanitized.xvaO0b and indeed, it stops with a segmentation fault:
Reading pixels...
Computing image percentiles...

Segmentation Fault - core dumped

Interestingly, I had to define HUGE_VALF in an-fitstopnm.c so I am wondering if this is the problem. I am also wondering if my change from MIN / MAX to fmin/fmax (float min / float max in math.h) has created some side effects that I am unaware of?

Trying to solve a JPEG file results in a different core dump:

core 'core' of 949:     solve-field wfi.jpg
 08045964 ???????? (0, 8c8f310, ffffe, ffffc, febb687f, fec341b0)
 080b1a2d permuted_sort (8045bc0, fefc2bf4, 8060009, 80861d9, 8046040, ae) + 4d
 080861d9 image2xy_run (fece2a00, 8045c58, feb667ad, fec31688) + 109
 080758d3 image2xy_files (814bde8, 814be28, 1, 0, 3, 1) + 965
 08073098 augment_xylist (812c1fa, 80b0a96, 8149e10, 812aa48) + 2278
 0812aa48 main     (806df50, 2, 804753c) + 1378
 0806df50 _start   (2, 804766c, 8047678, 0, 8047680, 80476ba) + 80

Here, it seems to crash the main solver, not an auxilary program.

Not sure how to continue at this moment. My next try would be to start from a fresh source, include a new header file that contains  all missing definitions (including MIN/MAX) mentioned earlier and only including that instead of replacing stuff in the source code files. In that way I could test if fmin/fmax is the problem. Also, I redefined NAN as
double NAN = 0.0/0.0
as I am not 100% sure if
int NAN = 0x7F800001
is a good solution for Solaris.

Unfortunately I will not be able to do this earlier then next week. I am open to all suggestions and hints!

Karsten

Dustin Lang

unread,
Nov 18, 2015, 10:10:55 PM11/18/15
to astrometry
Not sure if it will help isolate anything, but you could try
make test
and then run
util/test
blind/test
libkd/test
catalogs/test

Not sure what else to suggest.  The permutedsort error could be related to qsort_r; we (try to) do some fancy tests to figure out how qsort_r works on your system, but I don't think I ever saw it on Solaris -- if you could do a "man qsort_r" or equivalent and tell me what the function signature looks like, that could be a help.  Or maybe run "make reconfig" and report the parts where it's figuring out qsort?

cheers,
--dustin


Karsten Schindler

unread,
Nov 18, 2015, 10:51:17 PM11/18/15
to astrometry
Hi Dustin,

I did a gmake test that revealed three more files with MAX/MIN (tweak.c, tweak2.c, test_healpix.c) and a missing isfinite() (test_2mass.c).
After fixing that gmake test compiles.
/util/test prints tons of numbers and then crashes with a segmentation fault:

[... tons of numbers]
check: ok
appended: 42
Before sorting:
[ 34951, 34950, 34949, 35049, 35149, 29951, 29950, 29949, 34999, 34998, 5099, 5199, 39849, 39949, 4999, 35249, 29952, 34952, 5299, 35349, 29953, 34953, 5399, 35449, 29954, 34954, 5499, 35549, 29955, 34955, 5599, 35649, 29956, 34956, 5699, 35749, 29957, 34957, 5799, 35849, 29958, 34958, 5899, 35949, 29959, 34959, 5999, 36049, 29960, 34960, 6099, 36149, 29961, 34961, 6199, 36249, 29962, 34962, 6299, 36349, 29963, 34963, 6399, 36449, 29964, 34964, 6499, 36549, 29965, 34965, 6599, 36649, 29966, 34966, 6699, 36749, 29967, 34967, 6799, 36849, 29968, 34968, 6899, 36949, 29969, 34969, 6999, 37049, 29970, 34970, 7099, 37149, 29971, 34971, 7199, 37249, 29972, 34972, 7299, 37349, 29973, 34973, 7399, 37449, 29974, 34974, 7499, 37549, 29975, 34975, 7599, 37649, 29976, 34976, 7699, 37749, 29977, 34977, 7799, 37849, 29978, 34978, 7899, 37949, 29979, 34979, 7999, 38049, 29980, 34980, 8099, 38149, 29981, 34981, 8199, 38249, 29982, 34982, 8299, 38349, 29983, 34983, 8399, 38449, 29984, 34984, 8499, 38549, 29985, 34985, 8599, 38Segmentation Fault - core dumped

Doing a pstack core reveals:
core 'core' of 3467:    ./test
 080470ba ???????? (fed33988, 80f9070, a, 80f95f4, 85bc978, cb)
 080f95f4 bl_sort_with_userdata.constprop.0 () + 354

/blind/test crashes with a segmentation fault - core dumped without any other output:
pstack core shows
core 'core' of 3462:    ./test
 08047267 ???????? (fee72a00, 8047238, 6, 4, 819f080, 8191950)
 080f242d permuted_sort (8195970, 8191950, 8047458, 808d12e, 8195970, 8047400) + 4d
 08075b5c test_sorting (80746c3, 8195170, 8075619, 8191950) + 21c
 080746e3 CuTestRun (0, 1, 0, 8047498, 19, 0) + 43
 08075940 test_sorting (676e6974, 0, 11, 0, 74736574, 6577745f)
 726f735f ???????? ()

/libkd/test runs successfully and finishes with "OK (31 tests)"

/catalogs/test *seems* to run fine but one file is missing which one test likely fails:
byte 80: 0000000000
byte 84: 0536875740
File "hd.fits" does not exist; test skipped.
Got: 85
Got: 67
...F...

There was 1 failure:
1) test_read_2mass: test_2mass.c:107: dist null

!!!FAILURES!!!
Runs: 7 Passes: 6 Fails: 1

So yes, it seems it is a sorting related problem...
I am attaching the man page for Solaris' qsort...

Interestingly I do not see any output related to qsort when I do a gmake reconfig:

# gmake reconfig
rm -f util/os-features-config.h util/makefile.os-features
gmake -C util config
gmake[1]: Entering directory '/Desktop/packages/astrometry.net-0.64/util'
Makefile:61: makefile.os-features: No such file or directory

---- Error messages in the next few commands are not necessarily bugs ----
     (we're checking how things works on your computer)
rm -f os-features-makefile.log
Testing netpbm...
   NETPBM_INC_ORIG is -I/opt/csw/include/netpbm
   NETPBM_LIB_ORIG is -L/opt/csw/lib -lnetpbm
( \
 echo "# This file is generated by util/Makefile."; \
 ((gcc -o os-features-test-netpbm-make \
     -g -Wall -std=gnu89 -ffinite-math-only -fno-signaling-nans -pthread -march=native -O3 -fomit-frame-pointer -DNDEBUG -fpic -Winline -I../include -I../include/astrometry -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DAN_GIT_REVISION='"0.64"' -DAN_GIT_DATE='"Thu_Oct_22_10:00:31_2015_-0400"' -DAN_GIT_URL='"https://github.com/dstndstn/astrometry.net"' -I../util -I/opt/csw/include/cairo  -I/usr/include/libpng12            -I../include -I../include/astrometry -I../gsl-an     -I../include -I../include/astrometry -I../gsl-an  -I. -DTEST_NETPBM_MAKE -I/opt/csw/include/netpbm os-features-test.c  -g -Wall -std=gnu89 -ffinite-math-only -fno-signaling-nans -pthread -march=native -O3 -fomit-frame-pointer -DNDEBUG -fpic -Winline -L/opt/csw/lib -lnetpbm >> os-features-makefile.log && \
   ./os-features-test-netpbm-make >> os-features-makefile.log && \
   echo "HAVE_NETPBM := yes") \
|| (echo "# Astrometry.net didn't find netpbm; not setting HAVE_NETPBM."; \
        echo "# See os-features-makefile.log for details."; \
        echo "# To re-run this test, do 'make reconfig; make makefile.os-features' (in the 'util' directory)"; \
        echo "# Or to do it yourself, just uncomment this line:"; \
        echo "# HAVE_NETPBM := yes")) \
; \
echo) > makefile.os-features.tmp
--------------- End of expected error messages -----------------

mv makefile.os-features.tmp makefile.os-features

Config results:
------------------------------

cat makefile.os-features
# This file is generated by util/Makefile.
HAVE_NETPBM := yes

------------------------------

And, by the way, is WCSlib support being compiled in?

pkg-config --exists wcslib && echo yes || echo no
no

  WCSLIB_INC:
  WCSLIB_LIB:

Hopefully I did not open a can of worms (guess I already have!).

Karsten
man qsort.TXT

Dustin Lang

unread,
Nov 19, 2015, 6:21:19 AM11/19/15
to astrometry
You said you're using astrometry.net 0.64, right?

Ahh, you just uncovered a bug in "make reconfig" :)

Could you do instead,

rm include/astrometry/os-features-config.h util/makefile.os-features
make -C util config

Thanks for sending the 'qsort' man page -- is there a 'qsort_r' one?  That's the one that causes trouble.

Thanks,
--dstn

Karsten Schindler

unread,
Nov 19, 2015, 11:37:31 AM11/19/15
to astrometry
Yes, I am working with the latest release v0.64.

I did the rm / gmake and got:

[...]
Config results:
------------------------------

cat ../include/astrometry/os-features-config.h
#define NEED_CANONICALIZE_FILE_NAME 1
#define NEED_DECLARE_QSORT_R 1
#define NEED_QSORT_R 1
#define NEED_SWAP_QSORT_R 0
#define HAVE_NETPBM 1

------------------------------
[...]

There is no man page for qsort_r, and seems this function is missing on Solaris, see for example:
http://www.hep.by/gnu/gnulib/qsort_005fr.html
http://git.kaarsemaker.net/libgit2/commit/551f5cefb41877be03e6d7a03f16fd424fc9de37/

The file above actually confirms that with #define NEED_QSORT_R 1.

Cheers,
Karsten

Dustin Lang

unread,
Nov 19, 2015, 1:01:57 PM11/19/15
to astrometry
Yes, ok, that's fine.  In util/os-features.c, we ship a version that we use if there is no system version;

#if NEED_QSORT_R
#include "qsort_reentrant.c"
#endif


Soooo.... what is going wrong.

Maybe try rebuilding with "make OPTIMIZE=no" to get debugging symbols and then see whether the stack traces are more informative?

Thanks,
--dustin

Karsten Schindler

unread,
Nov 23, 2015, 5:05:36 PM11/23/15
to astro...@googlegroups.com

Hi Dustin,

I recompiled everything with OPTIMIZE=no. The core dump is unchanged; I do not see more information...

core 'core' of 14491:   /opt/astrometry/bin/an-fitstopnm -i /tmp/tmp.sanitized.xaaysC


 080780d1 main     (80474ec, 8076e8a, feffa910) + 1221
 08058670 _start   (3, 8047640, 8047661, 8047664, 0, 804767e) + 80

core 'core' of 14458:   solve-field wfi.jpg
 08045965 ???????? (0, 8c8ee50, ffffe, ffffc, febb687f, fec341b0)
 080b158d permuted_sort (8045bc0, fefc2bf4, 8060009, 80861d9, 8046040, ae) + 4d


 080861d9 image2xy_run (fece2a00, 8045c58, feb667ad, fec31688) + 109

 080758d3 image2xy_files (814b928, 814b968, 1, 0, 3, 1) + 965
 08073098 augment_xylist (812bd5a, 80b05f6, 8149970, 812a5a8) + 2278
 0812a5a8 main     (806df50, 2, 804753c) + 1378


 0806df50 _start   (2, 804766c, 8047678, 0, 8047680, 80476ba) + 80

To allow qsort_reentrant.c to compile in the first place, I had to replace

#define <sys/cdefs.h>

with

#include <stdio.h>


as sys/cdefs.h seems to be a nonstandard include file (see post number 3 in this thread). I found that trick here:

http://current-users.netbsd.narkive.com/VaTqnhEz/cannot-build-current-from-solaris-10


Karsten

Karsten Schindler

unread,
Nov 30, 2015, 6:58:29 PM11/30/15
to astro...@googlegroups.com
Hi Dustin,

The terminal output for gmake, gmake py, gmake extra, gmake install, both without and with option OPTIMIZE=no is attached...
To me it looks like it compiles cleanly.
With my limited Solaris and gcc knowledge I am stuck... I can only guess that qsort_r is causing the segmentation fault somehow.

Any advise how to isolate the problem is greatly appreciated...

Karsten

P.S.: More files with MIN()/MAX(): plotstuff_wrap.c, lanczos.i, util_wrap.c
gmake.txt
gmake py.txt
gmake extra.txt
gmake install.txt
gmake OPTIMIZE=no.txt
gmake py OPTIMIZE=no.txt
gmake extra OPTIMIZE=no.txt
gmake install OPTIMIZE=no.txt

Karsten Schindler

unread,
Dec 1, 2015, 12:49:11 AM12/1/15
to astrometry
Ok, redirecting terminal output via gmake > log.txt did not log the errors, and gmake &> log.txt does not work on Solaris. Using script, I eventually logged the full terminal output (see attached) for executing
gmake reconfig

gmake
gmake py
gmake extra
gmake install

Before, I corrected the make file error discussed above:

reconfig:
+    -rm -f $(INCLUDE_DIR)/os-features-config.h util/makefile.os-features

I also noticed some warnings about implicit definitions for isfinite, BZERO and fmin/fmax.
isfinite, BZERO: wcs-resample.c; added definition for isfinite(), added #include <strings.h> for BZERO (only string.h was included so far).
fmin/fmax: gslutils.c, fitstable.c, fits-column-merge.c, added #include <math.h> to fix this.

I see some warnings about pointer targets signedness, e.g.
tabsort.c:133:7: warning: pointer targets in atabsort.c:133:7: warning: pointer targets in assignment differ in signedness [-Wpointer-sign]
and warnings on type mismatches, see f.e.
/usr/include/sys/mman.h:169:12: note: expected 'caddr_t' but argument is of type 'unsigned char *'

A clue?

Karsten

transcript.txt

Dustin Lang

unread,
Dec 1, 2015, 11:14:35 AM12/1/15
to astrometry
Hi,

I added a test for, and definition of, MIN and MAX to os-features.h.

I removed the call to bzero().

I checked that <math.h> is included in every file where isfinite() is called.  (That's where it is defined, according to POSIX.1)

Pointer type signedness doesn't really matter.

Could you 'git pull' master and try again?

Do you have 'gdb' on this machine?  Without a stack trace of where things are going wrong, I don't see how we can make any progress.

cheers,
--dustin


Karsten Schindler

unread,
Dec 1, 2015, 1:33:48 PM12/1/15
to astrometry
Hi Dustin,

The MIN/MAX test makes things much easier, thanks! Unfortunately math.h on Solaris does not define isfinite...
I have gdb on the machine. I am doing a git pull now but I have to go through the compiler errors again, as definitions for NAN, isfinite, HUGE_VALF, isnormal, GLOB_TILDE, GLOB_BRACE are missing on Solaris.
Plus, I have to read into gdb...

Thanks for staying with me...

Karsten

Dustin Lang

unread,
Dec 1, 2015, 2:41:24 PM12/1/15
to astrometry
Hi,

Ok, I just pushed what should be a fix for the GLOB_ symbols.

Could you check whether math.h contains finite() or isfinite() -- are they there but just not enabled?  Could you also check in ieeefp.h?

Thanks,
--dustin


Karsten Schindler

unread,
Dec 1, 2015, 2:56:51 PM12/1/15
to astrometry
Thanks Dustin.
I recommend the following changes:

in an-endian.h

#elif __sun
# include <sys/isa_defs.h>

in qsort_reentrant.c

#if __sun
# include <stdio.h>
#else
# include <sys/cdefs.h>
#endif

I checked /usr/include/math.h and it does not contain finite nor isfinite. A number of people complain about this e.g. here https://code.google.com/p/redis/issues/detail?id=20
/usr/include/ieeefp.h contains no isfinite, but a
extern int    finite(double);

Karsten

Dustin Lang

unread,
Dec 1, 2015, 3:14:05 PM12/1/15
to astrometry
Done; thanks!

Karsten Schindler

unread,
Dec 1, 2015, 3:44:18 PM12/1/15
to astro...@googlegroups.com
Thanks for the latest commits and added definitions in os_features.h!
I thought of creating my own solaris.h (attached) and add this to each source code file where something is missing, but os_features.h is likely the better place!

I noted that /qfits-an/qfits_convert.c and /qfits-an/qfits_table.c are also having MIN/MAX definitions, but those files again just have a #include <sys/param.h>. I replaced this with #include "os-features.h".

I am still struggling with a good definition of NAN and where to put it. When I append os-features.h as follows:

if defined(__sun) && defined(__GNUC__)

#undef isnan
#define isnan(x) \
      __extension__({ __typeof (x) __x_a = (x); \
      __builtin_expect(__x_a != __x_a, 0); })


#undef isfinite
#define isfinite(x) \
      __extension__ ({ __typeof (x) __x_f = (x); \
      __builtin_expect(!isnan(__x_f - __x_f), 1); })

#undef isinf
#define isinf(x) \
      __extension__ ({ __typeof (x) __x_i = (x); \
      __builtin_expect(!isnan(__x_i) && !isfinite(__x_i), 0); })

#undef NAN
#define NAN (0.0f/0.0f)

#undef HUGE_VALF
#define HUGE_VALF (1.0f/0.0f)

#endif

and add #include "os-features.h" to wcs-rd2xy.c and an-fitstopnm.c to overcome the missing NAN and HUGE_VALF definitions, gcc stops with:

ld: fatal: symbol 'NAN' is multiply-defined:
        (file ../libkd/libkd.a(kdint_ddd.o) type=OBJT; file ../libkd/libkd.a(kdint_fff.o) type=OBJT);
ld: fatal: symbol 'NAN' is multiply-defined:
        (file ../libkd/libkd.a(kdint_ddd.o) type=OBJT; file ../libkd/libkd.a(kdint_ddu.o) type=OBJT);
ld: fatal: symbol 'NAN' is multiply-defined:
        (file ../libkd/libkd.a(kdint_ddd.o) type=OBJT; file ../libkd/libkd.a(kdint_duu.o) type=OBJT);
ld: fatal: symbol 'NAN' is multiply-defined:
        (file ../libkd/libkd.a(kdint_ddd.o) type=OBJT; file ../libkd/libkd.a(kdint_dds.o) type=OBJT);
ld: fatal: symbol 'NAN' is multiply-defined:
        (file ../libkd/libkd.a(kdint_ddd.o) type=OBJT; file ../libkd/libkd.a(kdint_dss.o) type=OBJT);
ld: fatal: file processing errors. No output written to query-starkd

Somehow I am not able to uniquely define NAN globally?

Karsten
solaris.h

Dustin Lang

unread,
Dec 1, 2015, 8:38:28 PM12/1/15
to astrometry
Hi,

You could try defining NAN as

static float NAN = 1.0 / 0.0;

I find it very strange that you are getting multiply defined symbols, because your #define isn't declaring a variable...

cheers,
--dstn


Karsten Schindler

unread,
Dec 1, 2015, 9:21:17 PM12/1/15
to astro...@googlegroups.com
I totally agree. I wanted to avoid the static variable definition as it throws me a warning every time os-features.h is included but NAN is not used. The definition of HUGE_VALF works well. Superficially looking at all the kdint_???.c files they all look the same, they use the same header files, ... possibly there is some hidden interaction among these files that I do not understand. Anyway, I can only make the static double NAN = 0.0f/0.0f work, not the #define.

Gah, "gmake py" on the new pull from github suddenly requires the command line utility swig... okay, I just installed this also.

Please add to os_features.h in the #if defined(__sun) && defined(__GNUC__) block:


#undef isnormal
#define isnormal(x) \
  __extension__ ({ __typeof(x) __x_n = (x); \
                   if (__x_n < 0.0) __x_n = -__x_n; \
                   __builtin_expect(isfinite(__x_n) \
                                    && (sizeof(__x_n) == sizeof(float) \
                                          ? __x_n >= __FLT_MIN__ \
                                          : sizeof(__x_n) == sizeof(long double) \
                                            ? __x_n >= __LDBL_MIN__ \
                                            : __x_n >= __DBL_MIN__), 1); })

This definition is required by dcen3x3.c.

Please also add


#undef HUGE_VALF
#define HUGE_VALF (1.0f/0.0f)

This definition is required by an-fitstopnm.c.

In addition to replacing
#include <sys/param.h>      // that is now included in os-features.h
with
#include "os-features.h"
in /qfits-an/qfits_convert.c and /qfits-an/qfits_table.c (MIN/MAX definitions missing), I also added a #include "os-features.h" to dcen3x3.c, an-fitstopnm.c (pending the addition of the HUGE_VALF defintion), kdtree_internal.c, 2masstofits.c.

For now I am adding
static double NAN = 0.0f/0.0f;
to os-features.h as I do not see a better solution.

And, I could also get the awk scripts to work by installing GNU awk (gawk) from the CSW repository and by adding to the /catalogs/Makefile
AWK := /opt/csw/bin/gawk

so the workaround of copying the files you provided earlier to me is not longer necessary.

Now, gmake, gmake py, gmake extra work well.

gmake install fails as before trying to install libkd. I succeed by uncommenting line 125-128 in /libkd/Makefile


#     @for x in $(LIBKD_INSTALL); do \
#         echo cp $$x '$(INSTALL_DIR)/bin'; \
#        cp $$x '$(INSTALL_DIR)/bin'; \
#    done

as LIBKD_INSTALL seems empty?

Now, gmake install works.

The previous data type conflict for label_t with /usr/include/ia32/sys/machtypes.h is gone; I could revert that system header file back to its original condition.
The conflicts both for index_t (see index.h) and quad_t with system header file definitions in /usr/include/sys/types.h remain; I still need to uncomment both definitions there to be able to compile the code. Do you see a work around as already done for label_t here?

In any case, the good news are that the code compiles now much easier, as all missing symbols are taken care off. Only the not-so-good NAN definition and the two conflicts with system header files remain. Installing GNU make, find, GNU awk and swig solves the incompatibilities between the Linux command line versions and the Solaris command line versions.

Still, solve-field crashes with a segmentation fault, so I have to familiarize myself with gdb now...

Karsten

Dustin Lang

unread,
Dec 2, 2015, 11:56:30 AM12/2/15
to astrometry
Hi,

I just pushed fixes for some of these things...

Fixing the conflict with index_t and quad_t is... not really something I want to do, since those names are part of the public API, and only seem to conflict with solaris system headers.  Sorry.

I forget, is it solve-field that is crashing with a segfault, or one of the programs that it calls?

Ideally, you would be able to send a stack trace showing what function is being called with what args to cause the crash.  First step would be to compile with debugging symbols (make clean; make OPTIMIZE=no); the gcc commands should include "-g" and "-O0".

Once you have found out which program & args cause the crash, do:

gdb --args <program + args>

and at the (gdb) prompt,

(gdb) run

until you get the crash, and then

(gdb) where

to get the stack trace.  That should at least be a start.

thx,
--dstn

Karsten Schindler

unread,
Dec 2, 2015, 1:00:51 PM12/2/15
to astro...@googlegroups.com
Hi Dustin,

I just made a fresh clone from github to test, and it compiles without any further changes now, except one:
#include "os-features.h" in dcen3x3.c (otherwise symbol isnormal is not defined)

I looked at all the changes that you commited to github, thank you so much for all that work! I saw much more additions of os-features.h than I was expecting though? Some new warnings show up during compiling but no errors. The NaN issue is now solved also! This is awesome!
Don't worry about the index_t and quad_t conflicts; I can uncomment them from the respective header files and reverse those to the original state after compiling. We all know by now that Solaris is very weird ;-).

I will try to get the stack trace now... thanks again for all your help!

Karsten

Karsten Schindler

unread,
Dec 2, 2015, 2:15:18 PM12/2/15
to astro...@googlegroups.com
Ok. Running solve-field in gdb with a JPG file reveals this stack trace:

#0  0x08045904 in ?? ()
#1  0x080b1a2d in permuted_sort (realarray=0xfec30000, array_stride=134945776, compare=0x858b1b8,
    perm=0x8b94e08, N=135568912) at permutedsort.c:80
#2  0x08a8b1c0 in ?? ()
#3  0x080861d9 in image2xy_run (s=0x8045fe0, downsample=0, downsample_as_required=3)
    at image2xy.c:77
#4  0x080758d3 in image2xy_files ()
#5  0x08073098 in augment_xylist (axy=0x8046f64, me=0x814b398 "/opt/astrometry/bin/solve-field")
    at augment-xylist.c:946
#6  0x0812aa48 in main (argc=3, args=0x80474f0) at solve-field.c:1270

permutedsort.c:80 reads:
QSORT_R(perm, N, sizeof(int), &ps, compare_permuted);

... so it seems that your feeling about QSORT_R being the root of the problem could be right.

Running solve-field in gdb with a FITS file reveals no stack trace, but following up on this error message:

Starting program: /opt/astrometry/bin/solve-field --no-plots /Desktop/data/wfi.fits
[Thread debugging using libthread_db enabled]
Reading input file 1 of 1: "/Desktop/data/wfi.fits"...
Header has 35 cards

Segmentation Fault - core dumped
Command failed: /opt/astrometry/bin/an-fitstopnm -i /tmp/tmp.sanitized.IraiBb > /tmp/tmpwIPfxg.pnm
augment-xylist.c:588:backtick Failed to run command: /opt/astrometry/bin/image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.IraiBb --fix-sdss --infile /Desktop/data/wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.HraiBb --outfile /tmp/tmp.ppm.GraiBb --ppm
 ioutils.c:567:run_command_get_outputs Command failed: return value 255
[Inferior 1 (process 729    ) exited with code 0377]

... it seems that an-fitstopnm has an issue. So I ran an-fitstopnm in gdb and got

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x080780d1 in sample_percentiles (NPIX=10000, hi=0x8046c98, lo=0x8046c9c, hip=0.949999988,
    lop=0.25, margin=0, ny=<optimized out>, nx=1024, img=0x80905c0) at an-fitstopnm.c:92
92      an-fitstopnm.c: No such file or directory.
(gdb) where
#0  0x080780d1 in sample_percentiles (NPIX=10000, hi=0x8046c98, lo=0x8046c9c, hip=0.949999988,
    lop=0.25, margin=0, ny=<optimized out>, nx=1024, img=0x80905c0) at an-fitstopnm.c:92
#1  main (argc=3, argv=0x80474f0) at an-fitstopnm.c:319

Interestingly, an-fitstopnm.c:92 reads
#define NBUF 1024
... this defines a buffer?

and an-fitstopnm.c:319
if (sixteenbit) {
        uint16_t buf[NBUF];

Karsten





Karsten Schindler

unread,
Dec 3, 2015, 2:01:12 PM12/3/15
to astrometry
(I hope it is not too annoying that I post every day here...)

A few more things that I learned today:

qsort_reentrant.c compiles without any included header file. I guess gcc only crashed before as it could not find /sys/cdefs.h on Solaris. When you remove this line, it just compiles as stdio.h is already included somewhere else. So the section
//#if __sun
//# include <stdio.h>
//#else
//# include <sys/cdefs.h>
//#endif
is in my opinion obsolete. (For reference, the BSD cdefs.h can be found e.g. here: http://fossies.org/dox/libbind-6.0/solaris_2include_2sys_2cdefs_8h_source.html - but at this point I do not think it is necessary to ship it with the code.)

According to https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Other-Builtins.html gcc has a builtin HUGE_VALF, so a
#define HUGE_VALF __builtin_huge_valf
might work instead of my earlier definition.
There also seems to be a builtin nan: double __builtin_nan
Maybe more elegant?

You asked me some days ago about a 'gmake report' but I overlooked that question - please see attached.

Last but not least I realized a blunt mistake - starting with a fresh clone on Monday, my make file and therefore installation directory changed back from my own definition /opt/astrometry to /usr/local/astrometry... gah! As I did not adjust my PATH variable I was debugging a three day old compile yesterday...! Oh well...

Ok, so taking one step back now:
- I compiled everything with OPTIMIZE=no.
- I did a
PYTHONPATH=/usr/local/astrometry/lib/python
export PYTHONPATH         //variable did not exist before

python -c "from astrometry.util.filetype import filetype_short"
- PATH includes /usr/local/astrometry/bin
- adjusted makefile.netpbm and makefile.cairo as attached.

Now, I am back to errors related to image2pnm.py:

# solve-field wfi.jpg
Reading input file 1 of 1: "wfi.jpg"...
jpegtopnm: WRITING PGM FILE
augment-xylist.c:588:backtick Failed to run command: /usr/local/astrometry/bin/image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.vGaW.B --fix-sdss --infile wfi.jpg --uncompressed-outfile /tmp/tmp.uncompressed.uGaW.B --outfile /tmp/tmp.ppm.tGaW.B --ppm

 ioutils.c:515:run_command_get_outputs error reading from child output stream
 system: No such file or directory

# solve-field wfi.fits
Reading input file 1 of 1: "wfi.fits"...

Header has 35 cards
Segmentation Fault - core dumped
Command failed: /usr/local/astrometry/bin/an-fitstopnm -i /tmp/tmp.sanitized.Zda4_B > /tmp/tmpepStQd.pnm
augment-xylist.c:588:backtick Failed to run command: /usr/local/astrometry/bin/image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.Zda4_B --fix-sdss --infile wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.Yda4_B --outfile /tmp/tmp.ppm.Xda4_B --ppm

 ioutils.c:515:run_command_get_outputs error reading from child output stream
 system: No such file or directory

So I copy/pasted the related python commands:

# image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.vGaW.B --fix-sdss --infile wfi.jpg --uncompressed-outfile /tmp/tmp.uncompressed.uGaW.B --outfile /tmp/tmp.ppm.tGaW.B --ppm
jpegtopnm: WRITING PGM FILE
jpg

... works for a jpg (why does solve-field stop then?).

# image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.Zda4_B --fix-sdss --infile wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.Yda4_B --outfile /tmp/tmp.ppm.Xda4_B --ppm
SIMPLE = T: not an SDSS idR file.

Header has 35 cards
Segmentation Fault - core dumped
Command failed: /usr/local/astrometry/bin/an-fitstopnm -i /tmp/tmp.sanitized.Zda4_B > /tmp/tmpn4zyVp.pnm

... for a FITS file it results in a segmentation fault.

As both are python scripts, no gdb stack traces available...

Karsten
report.TXT
makefile.cairo
makefile.netpbm

Dustin Lang

unread,
Dec 3, 2015, 2:16:45 PM12/3/15
to astrometry
an-fitstopnm is a C program, so you could try to get a stack trace for that.

What do you get if you copy-and-paste the exact command that solve-field tries to run for image2pnm.py, including the full path,

/usr/local/astrometry/bin/image2pnm.py --sanitized-fits-outfile /tmp/tmp.sanitized.vGaW.B --fix-sdss --infile wfi.jpg --uncompressed-outfile /tmp/tmp.uncompressed.uGaW.B --outfile /tmp/tmp.ppm.tGaW.B --ppm

Oh, I have an idea.  image2pnm.py starts with:

# /usr/bin/env python

Maybe that (/usr/bin/env) doesn't exist?

cheers,
--dustin

Karsten Schindler

unread,
Dec 3, 2015, 2:43:46 PM12/3/15
to astrometry
Success!! (more on that at the end of the post)

You are right, while you were writing I checked ioutils.h:515 again.
In ioutils.c:512 is a reference to https://groups.google.com/forum/#!msg/astrometry/H0bQBjaoZeo/19pe8DXGoigJ
Based on this hint I extended line 513 to:
#if !(defined(__CYGWIN__) || defined(__sun))
so it does not run into the syserror (same approach as Andrew Hood did for the cygwin compile, sure if this is a good thing though).

The segmentation fault for an-fitstopnm traces back to the #define NBUF 1024 line mentioned earlier.

...but for a JPEG solve-field finally works now: (!!!)


Ok - there is still a segmentation fault after the solve but I will work on that now.
Could that be because I do not have set WCSLIB_INC: and WCSLIB_LIB: so the code dies trying to write its results as a FITS header?
And yes, /usr/bin/env does not exist on Solaris!

Seeing solve-field solving something was exactly what I needed today ;-)

Karsten

This is such a relief!
Auto Generated Inline Image 1

Dustin Lang

unread,
Dec 3, 2015, 3:01:06 PM12/3/15
to astrometry
Congratulations!  Nice sleuthing!

The astrometry-engine crash won't have anything to do with WCSLIB; we have our own WCS code to write the results.  I don't have a guess on what the problem is going to be -- maybe in trying to make plots?  It's *supposed* to be robust to that...




The segmentation fault for an-fitstopnm traces back to the #define NBUF 1024 line mentioned earlier.

That is *very* strange.  A #define is solely a preprocessor command.  It does NOT generate any code. 

BUT, that is only a few lines away from a qsort() call... and the first hit for a web search on 'solaris qsort' is a claim that it is broken...

I just pushed a change that uses QSORT_R() instead... could you try that?

cheers,
--dustin
 

Karsten Schindler

unread,
Dec 3, 2015, 3:48:48 PM12/3/15
to astro...@googlegroups.com
Ok, I was just asking about WCSlib as I saw at the end of the terminal output of gmake reconfig:


And, by the way, is WCSlib support being compiled in?

pkg-config --exists wcslib && echo yes || echo no
no

  WCSLIB_INC:
  WCSLIB_LIB:

Yes, I found many people claiming a "broken" qsort on Solaris...
I pulled the changes, recompiled with OPTIMIZE=no.
an-fitstopnm still crashes but the stack trace has changed:

# solve-field -p -O -N none -U none -B none -R none -M none -2 -u degw -L 5 -H 7 -z 2 --crpix-center /Desktop/data/wfi.fits
Reading input file 1 of 1: "/Desktop/data/wfi.fits"...

Segmentation Fault - core dumped
Command failed: /usr/local/astrometry/bin/an-fitstopnm -i /Desktop/data/wfi.fits > /tmp/tmp7f7zUd.pnm
augment-xylist.c:588:backtick Failed to run command: /usr/local/astrometry/bin/image2pnm.py --no-fits2fits --fix-sdss --infile /Desktop/data/wfi.fits --uncompressed-outfile /tmp/tmp.uncompressed.C6aqjD --outfile /tmp/tmp.ppm.B6aqjD --ppm ioutils.c:567:run_command_get_outputs Command failed: return value 255

in gdb:

Starting program: /usr/local/astrometry/bin/an-fitstopnm -i /Desktop/data/wfi.fits > /tmp/tmp7f7zUd.pnm

[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x08058b76 in sample_percentiles (img=0xb646e60, nx=1024, ny=1024, margin=0, NPIX=10000, lop=0.25, hip=0.949999988, lo=0x80473bc, hi=0x80473c0) at an-fitstopnm.c:72
72                              pix[i] = img[y*nx + x];

So yes, it could be indeed related to qsort once again but something is still not right.

I disabled plots to see if that helps but the segmentation fault after the solve remains unchanged:

# solve-field -p -O -N none -U none -B none -R none -M none -2 -u degw -L 5 -H 7 -z 2 --crpix-center /Desktop/data/wfi.jpg
Reading input file 1 of 1: "/Desktop/data/wfi.jpg"...
jpegtopnm: WRITING PGM FILE
Read file /tmp/tmp.ppm.enayYD: 1024 x 1024 pixels x 1 color(s); maxval 255
Using 8-bit output
Extracting sources...
Downsampling by 2...
simplexy: found 681 sources.
Solving...
Reading file "/Desktop/data/wfi.axy"...
Field 1 did not solve (index index-4116.fits, field objects 1-10).
Field 1 did not solve (index index-4115.fits, field objects 1-10).
Field 1 did not solve (index index-4114.fits, field objects 1-10).
  log-odds ratio 236.741 (6.53346e+102), 40 match, 0 conflict, 61 distractors, 49 index.
  RA,Dec = (85.195,-2.06493), pixel scale 21.2643 arcsec/pix.
  Hit/miss:   Hit/miss: +++++---++++++--++++++-+++---+-++-+---+--+-+-+-++--+-----+-+-+---------+----+------+-------+---+----
Field 1: solved with index index-4113.fits.
Field 1 solved: writing to file /Desktop/data/wfi.solved to indicate this.

Segmentation Fault - core dumped
solve-field.c:518:run_engine engine failed.  Command that failed was:
  /usr/local/astrometry/bin/astrometry-engine /Desktop/data/wfi.axy
 ioutils.c:567:run_command_get_outputs Command failed: return value 139

Karsten

Dustin Lang

unread,
Dec 3, 2015, 3:58:15 PM12/3/15
to astrometry
Hi,

We just use wcslib to try to read a wide variety of existing WCS headers when we try to make a guess.  It's not required.

In the an-fitstopnm gdb, could you please try:

p i
p nx
p x
p y

(maybe the random() function doesn't work the way I think it does on solaris)


And could you try getting a stack trace for astrometry-engine also?

thanks!,
--dustin


Karsten Schindler

unread,
Dec 3, 2015, 5:30:30 PM12/3/15
to astro...@googlegroups.com
Hi Dustin,

Here is the gdb output for an-fitstopnm:

(gdb) p i
$1 = 0
(gdb) p nx
$2 = 1024
(gdb) p x
$3 = 64966157
(gdb) p y
$4 = 4478216

... and for astrometry-engine wfi.axy (derived from the jpg image):

Starting program: /usr/local/astrometry/bin/astrometry-engine /Desktop/data/wfi.axy

[Thread debugging using libthread_db enabled]
Reading file "/Desktop/data/wfi.axy"...
Field 1 did not solve (index index-4116.fits, field objects 1-10).
Field 1 did not solve (index index-4115.fits, field objects 1-10).
Field 1 did not solve (index index-4114.fits, field objects 1-10).
  log-odds ratio 236.741 (6.53346e+102), 40 match, 0 conflict, 61 distractors, 49 index.
  RA,Dec = (85.195,-2.06493), pixel scale 21.2643 arcsec/pix.
  Hit/miss:   Hit/miss: +++++---++++++--++++++-+++---+-++-+---+--+-+-+-++--+-----+-+-+---------+----+------+-------+---+----
Field 1: solved with index index-4113.fits.
Field 1 solved: writing to file /Desktop/data/wfi.solved to indicate this.
[New Thread 1 (LWP 1)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0xfed0646c in strlen () from /lib/libc.so.1

(gdb) where
#0  0xfed0646c in strlen () from /lib/libc.so.1
#1  0xfed61bce in _ndoprnt () from /lib/libc.so.1
#2  0xfed64d8f in vsnprintf () from /lib/libc.so.1
#3  0xfed6119c in vasprintf () from /lib/libc.so.1
#4  0x080dfae3 in add_long_line (hdr=0xb93e3b8, keyword=0x8199b57 "COMMENT", indent=0x8199ad2 "  ", append=0, format=0x8189d07 "Solved_in: %s", lst=0x8046de8 "") at fitsioutils.c:695
#5  0x080dfcab in fits_add_long_comment (dst=0xb93e3b8, format=0x8189d07 "Solved_in: %s") at fitsioutils.c:745
#6  0x0807c75a in add_blind_params (bp=0xb8cdcc0, hdr=0xb93e3b8) at blind.c:879
#7  0x0807e3e8 in write_wcs_file (bp=0xb8cdcc0) at blind.c:1318
#8  0x0807f437 in write_solutions (bp=0xb8cdcc0) at blind.c:1576
#9  0x0807ab3f in blind_run (bp=0xb8cdcc0) at blind.c:498
#10 0x08077162 in engine_run_job (engine=0xb74dad0, job=0xb8cdc98) at engine.c:522
#11 0x08075902 in main (argc=2, args=0x804749c) at engine-main.c:329

Karsten

Dustin Lang

unread,
Dec 3, 2015, 6:47:53 PM12/3/15
to astrometry
Ugh!  Okay, I pushed what should be a fix for the an-fitstopnm problem (just add an extra mod by the image size to the random x,y coords).

As for the astrometry-engine problem... yuck, I don't know how I'd go about fixing that.  Could you try commenting out that line (and probably a lot more like it nearby)?  It's just adding COMMENT cards to the FITS headers.

cheers,
--dustin


Karsten Schindler

unread,
Dec 3, 2015, 6:49:36 PM12/3/15
to astrometry
The engine-main.c segmentation fault looks a lot like those reports:

strlen() on a null pointer?
https://community.oracle.com/thread/2021619?start=0&tstart=0
http://technopark02.blogspot.com/2006/04/solaris-null-pointer-bugs-usrlib00so1.html

Karsten

Karsten Schindler

unread,
Dec 3, 2015, 6:59:42 PM12/3/15
to astrometry
an-fitstopnm problem is solved! solve-field works with both the FITS and JPG file now to the same extent - only crashing astrometry-engine after the solve.

Apparently we just wrote at the very same time to the mailing list.

I can see that fitsioutils.c:691 reads
char* origstr = NULL;
so fitsioutils.c:695
len = vasprintf(&origstr, format, lst);
is definitly a function call with a null pointer value? I think from what I have read on this problem, this can cause segmentation faults - maybe not just on Solaris?

Karsten

Dustin Lang

unread,
Dec 3, 2015, 7:16:26 PM12/3/15
to astrometry

Well, this code has run literally millions of times on linux and never reported a segfault -- I suspect solaris weirdness :)

vasprintf allocates a new string, returning the address in &origstr, so that's legitimate; the system call isn't supposed to care about the value (NULL), only the pointer (not NULL).

cheers,
--dustin

Karsten Schindler

unread,
Dec 3, 2015, 8:48:16 PM12/3/15
to astro...@googlegroups.com
Shouldn't you also initialize
    char* str = NULL
in fitsioutils.c:383 then, as you do in :691 for origstr?

I tracked the error back to blind.c. It really happens when fits_add_long_comment wants to add a comment with an empty string.

Popular candidates that are empty among comments:
Solved_in
Solvedserver

Solving the FITS file on Ubuntu leads to the header comments
Solved_in: (null)
Solvedserver: (null)  

The Solaris implementation of vasprintf does not take care of null and causes a SEGV...

But hey: Wouldn't this work?
fits_add_long_comment(hdr, "Solvedserver: %s", bp->solvedserver?bp->solvedserver:"(null)");

... this would mean we only need to check any single variable in the add_blind_params function (blind.c:851-901).
I think that is acceptable...

Apparently the GNU Linux implementation has a safety net against null strings which the Solaris implementation has not. This comment pretty much nails it:
http://stackoverflow.com/a/16395836

Karsten

Dustin Lang

unread,
Dec 3, 2015, 8:57:29 PM12/3/15
to astrometry
Ok, good catch!  Any chance you could send a git pull request with those changes?



Karsten Schindler

unread,
Dec 3, 2015, 9:02:11 PM12/3/15
to astro...@googlegroups.com
I changed blind.c:879-881 to
fits_add_long_comment(hdr, "Solved_in: %s", bp->solved_in?bp->solved_in:"(null)");
fits_add_long_comment(hdr, "Solved_out: %s", bp->solved_out?bp->solved_out:"(null)");

fits_add_long_comment(hdr, "Solvedserver: %s", bp->solvedserver?bp->solvedserver:"(null)");
as I suspected those are the "null" candidates...

... and here we are:

solve-field working with a fits.file, including all eye candy:

Yeah!

Pretty much every function in fitsioutils.c to add some string in the FITS header calls add_long_line(). Why not checking there if the string is null, and if so, just make it "null"?





success.png

Dustin Lang

unread,
Dec 3, 2015, 9:07:59 PM12/3/15
to astrometry
Congratulations!  That was quite the effort!


Karsten Schindler

unread,
Dec 11, 2015, 2:02:37 PM12/11/15
to astro...@googlegroups.com
Dustin,

looks like the target Solaris machine is slightly older than my VM, so it does not have mkdtemp() that is called in /util/ioutils.c. (seems that function was added to stdlib.h in a later release by Sun, as my VM has it)
Any objections to replace the mkdtemp call with mkdir, at least on Solaris?
See: https://github.com/cherokee/webserver/issues/962

In line 671, instead of
if (!mkdtemp(tempdir)) {
I wrote
if (!mkdir(tempdir, 0700)) {
and the code compiles without problem.

The bug fix was inspired by https://trac.transmissionbt.com/changeset/13080

Karsten

Dustin Lang

unread,
Dec 12, 2015, 9:47:57 AM12/12/15
to astrometry
mkdtemp *chooses* a temporary file name based on a template -- before it is called, "tempdir" is not a unique directory name, it's a template containing "XXXXXX" that mkdtemp replaces with a random string.  In the replacement you have here, each call to create_temp_dir will return the same temp directory, so it compiles but doesn't do the right thing.

cheers,
--dustin

Karsten Schindler

unread,
Dec 12, 2015, 1:02:16 PM12/12/15
to astrometry
Gah. Solaris 10 Release 8/11 does not have a mkdtemp but Release 1/13 has.

I found out that mktemp() seems to be a "more" standard function (at least it is available in both releases). It creates a unique name:
The mktemp() function generates a unique temporary filename from template. The last six characters of template must be XXXXXX and these are replaced with a string that makes the filename unique. Since it will be modified, template must not be a string constant, but should be declared as a character array.
mktemp() is available on the machine, so I feel calling mktemp and mkdir do the same as mkdtemp:

    mktemp(tempdir);
    if (!mkdir(tempdir, 0700)) {
        SYSERROR("Failed to create temp dir");
        return NULL;
    }

What do you think?

I traced it back and the only place that makes use of create_temp_dir() that contains mkdtemp() is when you are about to write a KMZ file: write_kmz() in solve-field.c. We do not aim at writing KMZ files... so if I have not overlooked something we probably never run into this situation.

Karsten

Dustin Lang

unread,
Dec 12, 2015, 2:03:00 PM12/12/15
to astrometry
Hi,

That looks right.  I just pushed a change to use that instead of mkdtemp if __sun :)

cheers,
--dustin


Reply all
Reply to author
Forward
0 new messages