CMake and folder/library restructuring

228 views
Skip to first unread message

Maik Riechert

unread,
Feb 2, 2015, 4:21:28 PM2/2/15
to astro...@googlegroups.com
(Creating a new topic for that as it's not Windows-related yet; continuing from https://groups.google.com/forum/#!topic/astrometry/jwg0v0bFiM4)

Ok, I'm slowly getting the hang of the internal structure. Really, it needs some work! :)

I propose the following:

We break up libkd's dependency to qfits and anutils by moving the files fix-bb.c and kdtree_fits_io.c to anutils. Then libkd can be easily released separately. This assumes that libkd is useful in itself without the fits handling. Would you say that's true? If the fits handling is an integral part of libkd then of course it must stay, but it doesn't seem like it and fits being just the format which astrometry uses.

With anutils having fix-bb.c and kdtree_fits_io.c now it would be dependent on libkd.

simplexy is made into an own library (and folder) as it only has anbase as dependency and because last year I regularly got lost when I wanted to look up something in the simplexy code and never found its files. Note that with cmake making libraries is dead simple and simplexy really lends itself to it.

anutils (without simplexy) only contains fits/sip/wcs functionality, which is why I would rename it to anfits or something which is not so generic as utils.

I consider anbase, anutils, libkd, qfits, simplexy as parts which are not strictly the heart of astrometry.net and rather reusable in other projects. simplexy is probably the one closest related to it, but because it's exchangeable and also useful for other purposes I would put it on the generic-reusable-libraries side.

The other stuff, blind, catalogs, and anfiles is the core of astrometry.net, if I'm not wrong.

About folder structure:

blind
catalogs
gsl-an
libkd/ (possibly own repo and git submodule'd)
  python
qfits-an
simplexy
anbase
anfiles
anfits
python (for anfiles,anfits, blind)

So every library gets its own root-level folder.

For the anfiles,anfits,blind python wrappers, I think ultimately people want a wrapper for astrometry as a whole. I'm not sure how useful it would be to seperate that into two. That's why a single python folder.

Get rid of:

net (move to astrometry/nova)

About the massive include/astrometry/ folder, I don't have that much experience writing and building C/C++ stuff, so I also don't know much about conventions etc. But coming from a world of self-contained packages I just don't this folder at all. Why can't the headers be in include/ subfolders of their respective libraries? This would make managing the whole thing a lot easier, at least for newbies to the project.

Wow, so we potentially have library restructurings, massive file moves, and cmake. This sounds scary and we definitely have to split this up into smaller steps, otherwise you will never accept that big pull request :p

I played around with cmake a bit and could already create scripts to compile gsl and qfits (https://github.com/neothemachine/astrometry.net/commits/cmake). Before I continue with cmake I think the restructuring should be done, otherwise I have to do things twice, more or less.

Now, to make this a success you need to tell me if this all makes sense, from the point of library structure, C building, etc. anything that comes to mind. And of course whether you would like to see such changes.

That's all for now ;)
Maik


Dustin Lang wrote:
> Hi,
>
> Sorry, I guess I meant "this is a dependency stack ordering that works", not "this is a full dependency graph".  Thanks for fixing that.
>
> I think at one point I wanted to build libkd without qfits-an (only used for FITS I/O).  But I think we can drop that.
>
> I thought you could get gcc to print the headers it reads, in nested order.  Not a nicely formatted dependency graph... but a start.
>
> The python bindings (in util/util.i) use anfiles, anutils, qfits-an, and anbase.  In libkd/spherematch there is pyspherematch.c to provide bindings for libkd.  In blind/ there is blind.i and plotstuff.i .
>
> thanks for this,
> --dustin

Dustin Lang

unread,
Feb 2, 2015, 5:49:01 PM2/2/15
to astro...@googlegroups.com

Hi Maik,

Thanks for looking at this!

In general: I never quite know whether I want to split things into more packages, or just dump everything in one big directory.  Since I have never really thought seriously about distributing or using the low-level libraries in other projects, maintaining them as separate libraries in the code sometimes feels like unnecessary work.  But I see your point that organizing things better will make it easier for new people to understand the code.

Partly that's why one big include/ directory -- from the outside, Astrometry.net could be seen as one big library.  But maybe it's better to use the modules you've defined consistently.

Would you then move all the header files back into their respective module directories?  In with the source files, or in include/ directories?  I would prefer the headers to live next to the source files.  The "gsl" project does that, for example.  However, the one messy thing is that you want to write   #include "astrometry/base/ioutils.h"   in the code, which means you need to "invent" an astrometry/ directory... gsl does that by building a giant symlink farm in the gsl/ directory when building the code.  (Actually, as it turns out, the astrometry.net code already HAS an astrometry->. symlink as a similar hack, to allow the python code to be run out of the astrometry directory.  So that's not a big deal.)

We need to be able to write/read libkd structures to/from disk, and current FITS is the only file format defined.  But ok, we could move kdtree_fits_io.c somewhere outside libkd.  One thing: kdtree_fits_io also uses some of the "internal" header files from libkd/, like kdtree_internal.h ... so the libkd/ directory will have to be included at build time.

fix-bb.c can be removed -- it was used to fix up problems with old libkd files, years and years ago.

Moving simplexy to its own directory seems like a great idea.

While you're moving the furniture: pull "plotstuff" out of blind/.

"catalogs" could reasonably be reusable in other projects too, as could "plotstuff"

For the python side: I use the util/util.i library all the time in my other projects, basically for the tan_t / sip_t classes (eg, for resampling images from one WCS to another); no need for all the stuff in blind/ .  If you're going to make a pile of libraries and directories on the C side, may as well mirror that on the python side, IMO.  (Though I would love to flatten the namespace on the python side; currently it's "from astrometry.util.util import *", which is just plain dumb!)

I'd rather not rename "net" to "nova".  :)  You really don't like that, do you?

About cmake: I have had really mixed experiences with it.  I find it has high magicness; try figuring out how to tell it which BLAS library you want to use, if you want an afternoon of frustration.  Would it be realistic to have cmake generate plain old Makefiles that are distributed with the releases?  And are those Makefiles readable?

I think you're right that it would be best to do a cmake switchover after a bunch of furniture-moving.  Working in a branch, if you want to start moving stuff into subdirectories, I can try to get the Makefiles back up and running if you're not interested.  If we want to switch to #include <astrometry/base/ioutils.h>, eg, then many edits to source files will be necessary.  I have an all-day teleconference tomorrow, so will have some mindless-work cycles :)

cheers,
--dustin



Maik Riechert

unread,
Feb 3, 2015, 4:11:35 AM2/3/15
to Dustin Lang, astro...@googlegroups.com
Hi

> Would you then move all the header files back into their respective
> module directories? In with the source files, or in include/
> directories? I would prefer the headers to live next to the source
> files. The "gsl" project does that, for example. However, the one
> messy thing is that you want to write #include
> "astrometry/base/ioutils.h" in the code, which means you need to
> "invent" an astrometry/ directory... gsl does that by building a giant
> symlink farm in the gsl/ directory when building the code. (Actually,
> as it turns out, the astrometry.net code already HAS an astrometry->.
> symlink as a similar hack, to allow the python code to be run out of
> the astrometry directory. So that's not a big deal.)
I have a feeling that symlinks will break in Windows. What about...

/include/astrometry
simplexy
anbase
anfiles
anutils
blind
catalogs
/include/qfits-an
/include/libkd

This prevents us from having to use symlink magic or having a folder
layout like:

/anbase/
c files...
include/astrometry/anbase

As for the symlink for Python, I'm sure there's a solution for that too.

>
> We need to be able to write/read libkd structures to/from disk, and
> current FITS is the only file format defined. But ok, we could move
> kdtree_fits_io.c somewhere outside libkd. One thing: kdtree_fits_io
> also uses some of the "internal" header files from libkd/, like
> kdtree_internal.h ... so the libkd/ directory will have to be included
> at build time.
When you say "we" you mean astrometry right? There was this thread
recently where someone wanted to use libkd without needing file storage
functionality, that's why I think it doesn't really belong there. About
the internal header files, I think this is just a problem that has to be
solved in libkd, because ultimately you want people using the library
writing their own file format readers/writers. So whatever is needed
should be part of the public headers.

>
> While you're moving the furniture: pull "plotstuff" out of blind/.
>
> "catalogs" could reasonably be reusable in other projects too, as
> could "plotstuff"
Ok, I'm all for reuse. So plotstuff is everything named plot*? Anything
else?

>
> For the python side: I use the util/util.i library all the time in my
> other projects, basically for the tan_t / sip_t classes (eg, for
> resampling images from one WCS to another); no need for all the stuff
> in blind/ . If you're going to make a pile of libraries and
> directories on the C side, may as well mirror that on the python side,
> IMO. (Though I would love to flatten the namespace on the python
> side; currently it's "from astrometry.util.util import *", which is
> just plain dumb!)
Ok, then Python namespace packages are the way to go. I'm sure we can
get a nice package naming which is fun to use...

>
> I'd rather not rename "net" to "nova". :) You really don't like
> that, do you?
I don't mind the name, but I just don't like it being part of the same
repository. So how about moving it to astrometry/net ? That sounds nice!

> About cmake: I have had really mixed experiences with it. I find it
> has high magicness; try figuring out how to tell it which BLAS library
> you want to use, if you want an afternoon of frustration. Would it be
> realistic to have cmake generate plain old Makefiles that are
> distributed with the releases? And are those Makefiles readable?
No :) Well, for a while we could maintain both types of build systems,
you doing the Makefiles, me the cmake. And once we're confident that
cmake works reliably then we could remove the Makefiles.

> I think you're right that it would be best to do a cmake switchover
> after a bunch of furniture-moving. Working in a branch, if you want
> to start moving stuff into subdirectories, I can try to get the
> Makefiles back up and running if you're not interested. If we want to
> switch to #include <astrometry/base/ioutils.h>, eg, then many edits to
> source files will be necessary. I have an all-day teleconference
> tomorrow, so will have some mindless-work cycles :)

Let's first converge on a folder structure we both like, then I'll start
moving things and after that you can fix the Makefiles.

Cheers
Maik

Maik Riechert

unread,
Feb 3, 2015, 5:08:41 AM2/3/15
to astro...@googlegroups.com, dstn...@gmail.com
Just thought of another idea. As I also don't like the separation between headers and sources, why don't we put them in folders according to the "package name", like in python.

So:
/astrometry/blind/
/astrometry/catalogs/

etc. where the .h files are next to the .c files. Then you would just add the root as an include folder. And for the case that we want to move a library into its own repository, we just have to copy one folder. Well, and possibly care about the dependent stuff. The only thing I don't like about that (also with a single /include folder) is that you cannot explicitly say that e.g. catalogs depends only on libraries a and b because it can #include anything it finds starting from / or /include, respectively. Which means that dependency violations will not lead to build failures and would have to be caught manually by looking at generated dependency graphs.

Dustin Lang

unread,
Feb 3, 2015, 7:19:45 AM2/3/15
to astro...@googlegroups.com, dstn...@gmail.com

Ok, so I think we want:

- header files to live beside C files
- include path to be #include "astrometry/PKG/FILE.h"

If we want both, this implies that we need a top-level "astrometry" directory, with almost everything living within it.

When I first started writing this reply, I thought I really didn't want everything to live in another top-level directory (it always takes me an extra second to remember to add the "astrometry" dir in "include/astrometry"), but maybe it is necessary (or at least, the cleanest solution).


As for package names/layout, you suggested:


/astrometry
   simplexy
   anbase
   anfiles
   anutils
   blind
   catalogs
/qfits-an
/libkd

simplexy could also move out of the astrometry/ directory.

Should we rename "anbase" to just "base", since it will now be included like #include "astrometry/base/ioutils.h" ?  Same with "anfiles" to "files"?

And let's rename "anutils" to "fits", as you proposed.

And add "astrometry/plotstuff", containing:

plotstuff.c plotfill.c plotxy.c plotimage.c plotannotations.c plotgrid.c plotoutline.c plotindex.c plotradec.c plothealpix.c plotmatch.c plotcoadd.c plotann.py plotstuff-main.c plotxxx.c plotxy-main.c setup.py plotstuff.i

(but not plot-constellations.c, plot-xy-and-quad.c, plotquad.c; those can stay in 'blind')

So that makes it:

/astrometry
   base
   files
   fits
   blind
   catalogs
/plotstuff
/simplexy
/qfits-an
/libkd

where I think that probably everything depends on astrometry/base, plotstuff depends on astrometry/{files,fits,catalogs?}, libkd depends on base, and the libkd-fitsio part depends on qfits-an and astrometry/fits.



About libkd and FITS i/o.  Yes, the libkd library is useful without FITS i/o.  But it is also useful *with* FITS i/o!  Yes, the parts that kdtree_fits_io.c needs to see *could* or *should* be public; like you say, it should be possible to write an i/o routine outside the library.  Maybe kdtree_fits_io.c should be built as a tiny little library of its own?  (And then it could be a tiny little python library as well.)  But that should still live in libkd/, right?


I see the argument that "net" should move.  One issue is that it thinks its python package name is "astrometry.net" (which I like), but the other python bindings would also want to live in the "astrometry" namespace, and it's not trivial to install them into separate trees and have python find them; eg, I believe this doesn't work:
  install-dir-1/astrometry/net/X.py
  install-dir-2/astrometry/util/Y.py
PYTHONPATH=install-dir-1:install-dir-2 python -c "import astrometry.util.Y"


I remain highly skeptical of cmake :)  But am willing to be convinced.

cheers,
--dustin

Maik Riechert

unread,
Feb 3, 2015, 7:55:01 AM2/3/15
to Dustin Lang, astro...@googlegroups.com
Renaming anbase and anfiles sounds fine. Although "files" is a little funny.

Just to confirm, base only contains utility functions and does *not*
contain headers which have to be exposed in libraries that depend on it?
Meaning base doesn't have to become its own library but merely a bunch
of files available while compiling.

Moving simplexy to the root is fine.

In your tree you have plotstuff in root, but you describe it as being
under astrometry/. Which one? I think astrometry/ makes more sense as it
depends on the other astrometry libraries.

The separation in plotstuff, can you describe your reasoning? Why are
plot-constellations.c, plot-xy-and-quad.c, plotquad.c special? Isn't
e.g. plotmatch on a same level as plotquad, also dependency-wise?

About fits and libkd. First, I think every library, no matter how small,
should live in its own folder. I also thought of having a little
likd-fits library which depends on libkd and qfitsan, but then I wasn't
sure how useful that is, compared to putting the fits part into one of
the existing libraries. The question again is do you think it makes
sense to release libkd-fits in this form? Especially with qfitsan as
dependency. I think it would be better for such a libkd-fits library to
depend on a standard (unpatched) fits library like cfitsio. Then
creating a python wrapper is probably easier too, as astropy is based on
cfitsio as well. So putting the fits part of libkd (for now) into an
existing library is basically saying that it's somehow internal and not
ready for primetime.

Don't worry about the Python packages. What you describe is what
"namespace packages" are made for.
http://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python

About cmake, I don't think it's the best thing the world has seen, but
at least it's easily readable and creatable (which you might say about
Makefiles as well..) and most importantly platform-independent. More and
more big projects are switching to it, so I guess although it also has
its little problems, the upsides far outweigh them. In the end, it's
another layer of abstraction, sometimes leaky but mostly working. We
will see!

Cheers
Maik
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "astrometry" group.
> Visit this group at http://groups.google.com/group/astrometry.

Dustin Lang

unread,
Feb 3, 2015, 9:01:07 AM2/3/15
to astro...@googlegroups.com, dstn...@gmail.com

Hi,

In util/Makefile, ANBASE_OBJ is defined as:

starutil.o mathutil.o bl-sort.o bl.o bt.o healpix-utils.o \
    healpix.o permutedsort.o ioutils.o fileutils.o md5.o \
    os-features.o an-endian.o errors.o an-opts.o tic.o log.o datalog.o \
    sparsematrix.o lsqr.o coadd.o convolve-image.o resample.o \
    intmap.o histogram.o histogram2d.o

bl (block-list) in particular is used in the public API in packages "catalog", "blind", "fits", and "plotstuff", so the bl.h and other headers must be installed.


Plotstuff: I guess it could go in astrometry/, since it depends on astrometry/files.


> Renaming anbase and anfiles sounds fine. Although "files" is a little funny.

Yeah... it defines specific file types used in Astrometry.net .  Could keep it as "anfiles".  Whatever.



Moving simplexy to the root is fine.

Right, I guess things that could reasonably be distributed separately should go in root.

 
The separation in plotstuff, can you describe your reasoning? Why are
plot-constellations.c, plot-xy-and-quad.c, plotquad.c special? Isn't
e.g. plotmatch on a same level as plotquad, also dependency-wise?

"plotstuff" is a logical library.  plot-constellations.c is an older, more specific implementation that just makes the annotation plots we use with solve-field.  The web version already uses the replacement, "plotann.py".  Same with plot-xy-and-quad.c and plotquad.c.


 
About fits and libkd. First, I think every library, no matter how small,
should live in its own folder.

Well, let's make 100 little directories, each one with a single .c file ;-)

I think the only reason to care about building libkd without FITS i/o is to reduce the amount of code -- only relevant for, eg, embedded applications.  I don't think it's worth going to a lot of effort.  I think it makes sense to have a "libkd" library and a "libkd-fitsio" library that both live in the "libkd/" directory.  The libkd-fitsio part depends on astrometry/fits and qfits-an, but that's not the end of the world.

 
> I think it would be better for such a libkd-fits library to depend on a standard (unpatched) fits library like cfitsio.

Have fun with that :)


Don't worry about the Python packages. What you describe is what
"namespace packages" are made for.
http://stackoverflow.com/questions/1675734/how-do-i-create-a-namespace-package-in-python

Aha, thanks for pointing that out.  That's exactly what I want.  Ok, I guess "net" can move.

Thanks!,
--dustin

Maik Riechert

unread,
Feb 3, 2015, 5:30:51 PM2/3/15
to Dustin Lang, astro...@googlegroups.com
I thought about the dependencies a bit but I'm not really happy yet. What annoys me a bit is that anbase has lots of stuff which e.g. libkd or qfits don't need. On the one hand it has all those i/o and system helpers, and the blocklist as a very generic data structure, also keywords.h. On the other hand it has celestial coordinate transformation, healpix, and a lot of math and algorithms (histogram, convolve, resample,..). I think this should be split up somehow, such that the former part becomes something astrometry-independent and can be put in the root, and the rest stays under astrometry/. Then simplexy, libkd, qfitsan can depend on the root part which to me makes more sense.

Also, libkd having an (optional) dependency to (an)fits feels strange. In the end it's just fitsbin.h and fitsioutils.h, where fitsbin could possibly become part of libkd as it's not used elsewhere. Don't know what about fitsioutils though. I have to think about that.

Note that I'm always trying to think in terms of an ideal "clean" future situation, like... "ok, if libkd becomes an independent project, then it gets its own repo... how does it depend on anbase then... should we add the whole astrometry as a git submodule? probably not.. would annoy people.. should anbase then become an own project too which forms the base for all others under the astrometry github org?... maybe, but then it contains too much random stuff... so maybe cut it down, split it..."

Just so that my thinking process is a bit more transparent.

Cheers
Maik

Dustin Lang

unread,
Feb 3, 2015, 5:44:47 PM2/3/15
to astro...@googlegroups.com, dstn...@gmail.com

I completely agree.

What about pushing the astro-specific stuff into "anfits" (rename that, but keep its place in the dependency hierarchy)?

"fitsbin" and "fitsioutils" could move to qfits-an.  Agree that it's weird to have libkd depend on astrometry/fits

I feel like we're making progress.  Thank you for your patience!  I know am I stubborn!

Do you want to make a branch and start moving stuff around, to see how it feels?

cheers,
--dustin

PS -- Just as an aside: there is a makefile target, "make snapshot-libkd", that packages up just the subset required to build libkd.



Maik Riechert

unread,
Feb 4, 2015, 5:21:05 AM2/4/15
to Dustin Lang, astro...@googlegroups.com
Am 03.02.2015 um 23:44 schrieb Dustin Lang:
>
> I completely agree.
>
> What about pushing the astro-specific stuff into "anfits" (rename
> that, but keep its place in the dependency hierarchy)?
I found a problem with simplexy, it depends on base's resample.h.

Let's look at base and what it has, from a logical point of view. I
would say:

system/basic math/data structures:
mathutil.o bl-sort.o bl.o bt.o permutedsort.o ioutils.o fileutils.o
os-features.o an-endian.o errors.o an-opts.o tic.o log.o datalog.o
intmap.o sparsematrix.o

astronomy stuff:
starutil.o healpix-utils.o healpix.o

advanced math/image algorithms:
convolve-image.o resample.o histogram.o histogram2d.o

remaining:
coadd.o lsqr.o md5.o

What is coadd? Seems to do some projection on images with wcs. Should
probably go to anfits.

lsqr is not used anywhere?

md5 appears in util/ and also qfits-an/ but seems to be only used in
qfits and in util/test_ioutils.c, so it probably doesn't belong here.
Code that is needed for testing should be put separate.

With this separation, simplexy depends on the system stuff (various
things) and the advanced math stuff (resample). qfitsan and libkd only
depend on the system stuff.

We could put the astronomy stuff in anutil and just keep the name for
now. What's more important is that the root libraries have clear
dependencies.

> "fitsbin" and "fitsioutils" could move to qfits-an. Agree that it's
> weird to have libkd depend on astrometry/fits
Right, and as the fits backend of fitsioutils is currently hardcoded to
qfits it makes sense to put it there.

> I feel like we're making progress. Thank you for your patience! I
> know am I stubborn!
>
> Do you want to make a branch and start moving stuff around, to see how
> it feels?
Not yet, I'm drawing dependency graphs on paper and as soon as these
look right I'll start.

Cheers
Maik

Dustin Lang

unread,
Feb 4, 2015, 9:54:18 AM2/4/15
to astro...@googlegroups.com, dstn...@gmail.com
Excellent!  You probably saw that I added you as a collaborator in the github repo, so please feel free to work there (in a branch).

--dustin

Dustin Lang

unread,
Feb 4, 2015, 9:59:35 AM2/4/15
to astro...@googlegroups.com, dstn...@gmail.com
PS, yes, looks like util/md5.c and lsqr.c are not used.  Please feel free to delete.  coadd... I think is also obsolete, though maybe used by plotcoadd (also obsolete?).  I think I basically refactored all that stuff into util/resample.c ... at least that's what I use for doing coadds these days.  Similarly, I'm not sure histograms are used either.

--dstn



Maik Riechert

unread,
Feb 4, 2015, 4:56:03 PM2/4/15
to Dustin Lang, astro...@googlegroups.com
I just finished the first round of massive file moving:

https://github.com/dstndstn/astrometry.net/tree/whomovedmycheese

I haven't adapted any (Make)files yet. I also haven't thought about how
exactly the Python packages will be handled and where exactly they
should be. At the moment I put those who were related to a library under
a python/ folder, e.g. astrometry/blind/python. Oh, and I just noticed,
I haven't split off plotstuff yet, will do tomorrow.

Can you look through the structure? It's definitely not final yet.

Cheers
Maik

Dustin Lang

unread,
Feb 4, 2015, 8:20:13 PM2/4/15
to astro...@googlegroups.com, dstn...@gmail.com
Nice!

Looks like base/sparsematrix isn't used -- scrap it!

Ugh, base/datalog is not ever really used either... I think I was thinking of it as a way for one piece of code to "pipe" data (say, python literals) somewhere... but never went anywhere with it.  Feel free to remove it.

Looks like astrometry/blind/pnpoly is not used, likewise astrometry/blind/addtext.c.

astrometry/blind/an_mm_malloc should go to simplexy?

fileutils could merge with ioutils.

"base-advanced" could be renamed "image", since those are sort-of all imaging operations

I'm surprised that astrometry/catalogs depends on astrometry/files... oh, it's starkd.  I see.

There is one more category of file you may want to distinguish: the "cfitsio" library comes with some "example" programs that are actually super-useful: liststruc, tablist, listhead, fitscopy, imcopy, modhead, imarith, imstat.  These are in astrometry/blind right now but they are not at all astrometry.net-specific -- I think they're straight-up copied from cfitsio.

I should separate "core" and "non-core" code in astrometry/blind -- lots of stuff in there is old junk not really used any more.

So, one thing that hadn't occurred to me until now: if "base" is outside "astrometry", then we have to write #include "base/X.h", which isn't very specific... I moved everything into the astrometry/ directory/namespace to avoid namespace pollution.  (In the bad old days I think I had a "config.h" in the base directory!)  Do you have any thoughts on what to do about this?

Thanks for all your work here -- this is looking great!

--dustin




Maik Riechert

unread,
Feb 5, 2015, 4:27:28 AM2/5/15
to Dustin Lang, astro...@googlegroups.com

> Looks like base/sparsematrix isn't used -- scrap it!
Done.

> Ugh, base/datalog is not ever really used either... I think I was
> thinking of it as a way for one piece of code to "pipe" data (say,
> python literals) somewhere... but never went anywhere with it. Feel
> free to remove it.
It's included in blind/verify.c.

> Looks like astrometry/blind/pnpoly is not used, likewise
> astrometry/blind/addtext.c.
pnpoly is included in blind/catalog_analysis.c while catalog_analysis is
not used anywhere -- remove both?

addtext is removed.

> astrometry/blind/an_mm_malloc should go to simplexy?
True.

> fileutils could merge with ioutils.
Hm, maybe we can rather split it up like pythons os and os.path modules,
so functions which only do stuff on strings without any io are fileutils
(or pathutils) and the rest ioutils. It seems currently that both have a
mix of both types.

> "base-advanced" could be renamed "image", since those are sort-of all
> imaging operations
Ok, the only thing sticking out would be the 1D histogram. This is only
needed by blind currently, maybe put it there? It's probably not
intended to be part of any public interface anyway, so we are free to
move it around later on I guess.

> I'm surprised that astrometry/catalogs depends on astrometry/files...
> oh, it's starkd. I see.
Yeah that surprised me too, I think let's leave it for now and maybe
come back to it later.

> There is one more category of file you may want to distinguish: the
> "cfitsio" library comes with some "example" programs that are actually
> super-useful: liststruc, tablist, listhead, fitscopy, imcopy, modhead,
> imarith, imstat. These are in astrometry/blind right now but they are
> not at all astrometry.net-specific -- I think they're straight-up
> copied from cfitsio.
Are these used at all anywhere? If not then I'm kind of against
including them. I mean, if they are useful for some types of work, then
just install cfitsio on your machine. You say they are example programs,
so some may not be available with a package-manager install. But really,
if it's just about having those tools conveniently installable, then
let's create a separate repository for that where you then just do...
git clone... && cmake . && make install.

> I should separate "core" and "non-core" code in astrometry/blind --
> lots of stuff in there is old junk not really used any more.
Definitely!

> So, one thing that hadn't occurred to me until now: if "base" is
> outside "astrometry", then we have to write #include "base/X.h", which
> isn't very specific... I moved everything into the astrometry/
> directory/namespace to avoid namespace pollution. (In the bad old
> days I think I had a "config.h" in the base directory!) Do you have
> any thoughts on what to do about this?
I think the config.h still gets created somehow looking at the
makefiles, but I'm not sure. Or it was os-features-config.h, right.
Well, base is generic, yes. What we could do is not to treat it as a
separate library but rather as a mixin. Which means that we would put
base into a new github repo astrometry/base with a (C)Makefile which
just creates os-features-config.h and possibly sets some variables. Then
in the astrometry repo we create a git submodule inside astrometry/ such
that base is now available via astrometry/base. The same for libkd etc.
This also has the advantage that the project can pick which files it
actually needs from base, and which public headers. I'm not totally sure
but I think this should be easily doable with Makefiles and also CMake.

> Thanks for all your work here -- this is looking great!
The least I can do! Without astrometry.net I wouldn't have had a job
last year! (Hint: google for esa aurora dataset ;)

Cheers
Maik

Maik Riechert

unread,
Feb 5, 2015, 5:18:46 PM2/5/15
to astro...@googlegroups.com, dstn...@gmail.com
I just thought of something for the Python side. Obviously both C and Python want a one-to-one relationship from #include/import path to the actual file location, and currently astrometry more or less gets around it by just putting python files next to the C files, which works but is not ideal. My current approach with putting python files in a python/ folder beneath the C files is bad, because for this to work we would have to create another folder hierarchy inside the python folder to get the import working (without symlink magic). So, my proposal:

We create a separate mirror hierarchy at root under /python, so e.g.:
/python/libkd/spherematch.py
/python/astrometry/blind/image2xy.py
..-

When we externalize a library into its own repo, then we would copy the relevant C folder and python folder of it, e.g. libkd's repo would look like:
/libkd/dualtree.h
/python/libkd/setup.py
/python/libkd/spherematch.py
...

With this layout you can also run stuff from the source directory without installing astrometry -- by either going into /python or just setting the PYTHONPATH env var to it. I guess you would hate it if you would have to install astrometry each time you want to test a python code change.

If one day you decide that a python wrapper of something should become independent and get its own release cycle etc., then you would move the /python/libkd folder into a separate repo such that it gets into the root:
/libkd/spherematch.py
/setup.py

Note that in this case you move setup.py to the root and adjust the paths in it.

Any concerns about this?

Maik

Maik Riechert

unread,
Feb 6, 2015, 4:13:39 AM2/6/15
to astro...@googlegroups.com, dstn...@gmail.com

> So, one thing that hadn't occurred to me until now: if "base" is
> outside "astrometry", then we have to write #include "base/X.h", which
> isn't very specific... I moved everything into the astrometry/
> directory/namespace to avoid namespace pollution.  (In the bad old
> days I think I had a "config.h" in the base directory!)  Do you have
> any thoughts on what to do about this?
I think the config.h still gets created somehow looking at the
makefiles, but I'm not sure. Or it was os-features-config.h, right.
Well, base is generic, yes. What we could do is not to treat it as a
separate library but rather as a mixin.

So, I just read up a bit on how the whole linking thing works for C (and C++), and basically my idea was nonsense as this would imply that we have several copies of the same functions floating around, whereas there should be exactly one place where a function with a given name is defined. So I think what we have to do is just to figure out a good library name for base. I'm not sure there is such a name. If we would be in a Java/C#/.. world there probably would be no need for such a library because either the functionality would be already covered by the standard library, or, other people would have written appropriate libraries already that could be used as-is. Our base library is not a full library covering a single topic but rather a bit of everything. So, basically I'm saying: either we give it a random name which is still unique enough (like anbase/) or we go out looking and check if there are already libraries that could be used instead, but I have a feeling that this would pull in at least 2 if not 3 libraries, so I vote for the former solution. Maybe you have some other thoughts on this.

Cheers
Maik

PS: If possible, please keep the commits to master to a minimum, otherwise merging will be a real pain. Which means we should finish this first step sooner than later.

Dustin Lang

unread,
Feb 6, 2015, 8:42:05 PM2/6/15
to astro...@googlegroups.com, dstn...@gmail.com
Hi,

Okay, so I did a bunch of reorg of the astrometry/blind directory, and others.  Renamed "base-advanced" to "resample".  Started getting the Makefiles working again.  Started modifying include paths to get things to build.

Noticed that astrometry/utils doesn't depend on libkd, but on qfits-an; adjusted dependencies.txt

The makefile dependencies are not quite right -- too many things rebuild right now.  But I don't have time to chase these all down right no.

cheers,
--dstn

Maik Riechert

unread,
Feb 7, 2015, 4:43:55 AM2/7/15
to Dustin Lang, astro...@googlegroups.com
That's great!

To help with this whole effort I started building a little dependency
graph tool for C/C++ files in Python which is based on the dependency
file format of "snakefood" which is the same but for analysing Python
sources, and which I already used before. The nice thing is I can reuse
snakefood's subtools for clustering and graphing (outputs dot files) and
it can be neatly automated with makefiles. My plan is to generate two
graphs, one without clustering on a file-basis (huge graph) and one with
the libraries being the clusters so that we immediately see if e.g.
astrometry/files has a dependency it shouldn't have.

Also, should I go ahead putting all python files into the /python root
folder with appropriate subfolders?

Cheers
Maik

PS: This is starting to get fun!

Dustin Lang

unread,
Feb 7, 2015, 6:40:30 AM2/7/15
to astro...@googlegroups.com, dstn...@gmail.com
You have a weird idea of fun :)  But it does feel good to clean up a codebase, doesn't it?

For the python code, I think I would like the package structure to be more flat, like:

from libkd import match_radec
from astrometry.stages import *
from astrometry.wcs import *  (Tan, Sip, anwcs, resample.py?)
from astrometry.plots import * (plotutils)
from plotstuff import Plotstuff or maybe from astrometry.plotstuff import Plotstuff

With python's __init__.py setup, the code can be put into deeper directories if desired, and renamed around the tree without much trouble.

I *would* like it so that one can run the python code straight out of the source directory without having to modify PYTHONPATH; ie, that the python code lives next to the C code, rather than inside python/ subdirectories.  But I think you think that is ugly?  (Not saying I disagree)  You think it should be in a root python/ directory?

Another question I have about the python bindings in whether to have setuptools build the C files to its liking, or just use the .h and .a files, as though it were an outside library.  The latter seems cleaner to me, though setuptools is supposed to know what compiler flags to use to get results compatible with the python interpret in use.

cheers,
--dustin


Dustin Lang

unread,
Feb 7, 2015, 9:41:56 AM2/7/15
to astro...@googlegroups.com, dstn...@gmail.com

I disabled the python build (and haven't gotten to "plotstuff" yet), and now the TravisCI build passes again!

cheers,
--dustin

Maik Riechert

unread,
Feb 7, 2015, 10:55:37 AM2/7/15
to Dustin Lang, astro...@googlegroups.com
Am 07.02.2015 um 12:40 schrieb Dustin Lang:
> You have a weird idea of fun :) But it does feel good to clean up a
> codebase, doesn't it?
>
> For the python code, I think I would like the package structure to be
> more flat, like:
>
> from libkd import match_radec
> from astrometry.stages import *
> from astrometry.wcs import * (Tan, Sip, anwcs, resample.py?)
> from astrometry.plots import * (plotutils)
> from plotstuff import Plotstuff or maybe from astrometry.plotstuff
> import Plotstuff
>
> With python's __init__.py setup, the code can be put into deeper
> directories if desired, and renamed around the tree without much trouble.

A flat structure like that is tricky, because with namespace packages
you cannot use __init__.py anymore (except for initialising the
namespace). There is unfortunately no way around it. If you always ship
all python packages in one big package, then sure, we don't need
namespace packages etc., but then you could never release something
independently within the "astrometry." namespace.

> I *would* like it so that one can run the python code straight out of
> the source directory without having to modify PYTHONPATH; ie, that the
> python code lives next to the C code, rather than inside python/
> subdirectories. But I think you think that is ugly? (Not saying I
> disagree) You think it should be in a root python/ directory?

Well... the thing is, your python libraries will be (or are?) diverging
from the C folder layout, and when this happens, then it's really a
mess, too many folders, mixed up with python and c. Let me think about
it a bit more and what potential problems we get when moving to python/
or how to solve them.

> Another question I have about the python bindings in whether to have
> setuptools build the C files to its liking, or just use the .h and .a
> files, as though it were an outside library. The latter seems cleaner
> to me, though setuptools is supposed to know what compiler flags to
> use to get results compatible with the python interpret in use.

This really depends on the complexity of the C stuff / build system.
Python is good at just compiling simple C files, but as soon as it gets
more complicated you're in hell again. But using precompiled stuff is
really no problem since C has no interfacing issues compared to C++
where it depends on which compiler you used. I'll get back on the whole
python thing tomorrow or so.

Cheers
Maik

Dustin Lang

unread,
Feb 7, 2015, 1:06:23 PM2/7/15
to astro...@googlegroups.com, dstn...@gmail.com
Ahh, I did not know that about namespace packages.

In that case, yes, let's keep the python code all in a python/ directory, and I guess take this opportunity to reorganize the package tree.

(Not a big deal to have to add the python/ directory to PYTHONPATH to work out of the source directory.)

And I think it then makes more sense to let the Makefiles build the lib*.a files that are then linked-to by the python/swig code.

cheers,
--dustin


Dustin Lang

unread,
Feb 7, 2015, 1:16:57 PM2/7/15
to astro...@googlegroups.com, dstn...@gmail.com
Oh, PS, regarding dependencies: we get gcc to produce full fill-level dependencies that it finds by parsing the source files and following header file includes.

When you 'make', a *.dep file is created for each source file, and then they're cat'd into "deps" in each directory.

cheers,
--dstn

Maik Riechert

unread,
Feb 8, 2015, 9:08:03 AM2/8/15
to Dustin Lang, astro...@googlegroups.com
Am 07.02.2015 um 19:16 schrieb Dustin Lang:
> Oh, PS, regarding dependencies: we get gcc to produce full fill-level
> dependencies that it finds by parsing the source files and following
> header file includes.
Ah thanks for mentioning that. Just had a look and it produces something
like:

intmap.o: intmap.c intmap.h ../base/an-bool.h ../base/bl.h \
../base/keywords.h ../base/bl-nl.h ../base/bl-nl.ph ../base/bl.inc \
../base/bl.ph ../base/bl-nl.inc

So this effectively gets me all the transitive dependencies for a given
.c file (=.o file). This is not exactly what I need. I need only the
direct dependencies, otherwise it's tricky to create a graph. But don't
worry, I'm nearly done! :) And if everything works out it's compiler and
OS-independent, yay!

Maik

Maik Riechert

unread,
Feb 8, 2015, 2:06:28 PM2/8/15
to Dustin Lang, astro...@googlegroups.com
Just created the first clustered dependency graph with my new tool, see
attachment.

The config files are in crowfood/, you can create the pdfs yourself with:
sudo pip install snakefood crowfood
cd astrometry_repo/crowfood
make

You have to install snakefood under python 2 currently, support for 3 is
coming soon (at least the author is willing to have it support it).

So, some interesting things:

astrometry/utils and resample have a cyclic dependency it seems.
base has a dependency to astrometry/plotstuff.

I'll see if I can make the graph a bit prettier.

Cheers
Maik
astrometry_grouped.pdf

Dustin Lang

unread,
Feb 8, 2015, 2:17:24 PM2/8/15
to astro...@googlegroups.com, dstn...@gmail.com
I'm pretty sure the base->plotstuff dependency must be an error.

The resample->astrometry/utils would be from "resample-main" -- I will move that to astrometry/tools (where I've been putting most of the executables)

cheers,
--dustin


Maik Riechert

unread,
Feb 8, 2015, 4:31:03 PM2/8/15
to Dustin Lang, astro...@googlegroups.com

> I'm pretty sure the base->plotstuff dependency must be an error.
Yep, actually just a visual thing, the line went from analysis to
plotstuff underneath base.

Maik Riechert

unread,
Feb 8, 2015, 5:47:52 PM2/8/15
to Dustin Lang, astro...@googlegroups.com
Ok, enough with the graphs for now, I created a create_graphs.sh file
which creates two graphs in different levels of detail and layouts, see
attachments. This should be useful to quickly read off the dependencies.

Maik
astrometry_grouped.pdf
astrometry_grouped_big.pdf

Dustin Lang

unread,
Feb 8, 2015, 7:08:05 PM2/8/15
to astro...@googlegroups.com, dstn...@gmail.com
I would have thought a layer-by-layer layout would work well for this -- and that graphviz had a good layout engine for such graphs.  That certainly how I think of dependency graphs...

base, gsl-an, and cfitsio (and other 3rd-party libs) on level 1

qfits-an and resample on level 2

libkd, astrometry/utils, simplexy on level 3

anfiles on level 4

cats on level 5

blind on level 6

tools, (plotstuff?) on level 7

(Though your tool has uncovered some soft dependencies, eg blind depends on plotstuff via some usually-commented-out plotting code)

cheers,
--dustin

Maik Riechert

unread,
Feb 9, 2015, 3:51:43 PM2/9/15
to Dustin Lang, astro...@googlegroups.com
Yes, usually layers make sense, and often these are then manually
defined as a kind of architecture design. So, I improved the
_grouped_big.pdf graph (attached) such that the automatic graphviz
layering is from top to bottom (was left right) and also is more
compact. When I do that with the full astrometry/.. subfolders it is a
big mess still. The other graph _layers.pdf has layers defined via
regexes and is also top-bottom. Please adjust the layers in
crowfood/clusters_layered if they should be different:

^base|^gsl\-an base,gsl-an
^qfits\-an|^resample qfits-an,resample
^simplexy|^libkd|^astrometry/utils simplexy,libkd,astrometry/utils
^astrometry/files astrometry/files
^astrometry/catalogs astrometry/catalogs
^astrometry/blind|^astrometry/analysis blind,analysis
^astrometry/tools|^astrometry/plotstuff tools,plotstuff

The first part is the regex and the second the new name for it. You have
to do a "pip install --upgrade crowfood" to use it.

So, now really, enough with the graphs and back to getting this thing
working! :)

Cheers
Maik
astrometry_grouped_big.pdf
astrometry_layered.pdf

Dustin Lang

unread,
Feb 9, 2015, 3:59:24 PM2/9/15
to astro...@googlegroups.com, dstn...@gmail.com
I don't believe the simplexy -> astrometry dependency :)  Do you have an easy way of saying where that comes from?  eg, grepping the "simplexy" directory for "astrometry" doesn't show any real hits.  Similarly astrometry/blind -> astrometry/tools....?

Not that I'm trying to distract you from getting it working :)

cheers,
--dustin

Maik Riechert

unread,
Feb 9, 2015, 4:05:14 PM2/9/15
to Dustin Lang, astro...@googlegroups.com
Am 09.02.2015 um 21:59 schrieb Dustin Lang:
> I don't believe the simplexy -> astrometry dependency :) Do you have
> an easy way of saying where that comes from? eg, grepping the
> "simplexy" directory for "astrometry" doesn't show any real hits.
> Similarly astrometry/blind -> astrometry/tools....?

Sure, just do
$ cat raw.deps | grep simplexy
and we get..

( 'simplexy/test/demo_dsmooth'), 'astrometry/plotstuff/cairoutils'))

:)

Dustin Lang

unread,
Feb 9, 2015, 4:36:39 PM2/9/15
to astro...@googlegroups.com, dstn...@gmail.com
Ahhh, okay.  I haven't reorganized the tests yet.  (Yeah, yeah, yeah, test-driven development fail!)

Cool that the deps are traceable.

thanks,
--dustin

Maik Riechert

unread,
Feb 11, 2015, 4:41:01 AM2/11/15
to Dustin Lang, astro...@googlegroups.com
I moved all the python folders (except from /sdss because it's probably
a separate thing?) to /python. All the files in the root of /python have
to go to some packages, do you know where?

Also, to clarify about namespaces:

The only namespace we have is "astrometry". Subpackages are *not*
namespaces. This means, packages below that, like astrometry.utils, can
do whatever they want in their __init__.py's. I didn't think that
through the last time. So effectively you can have your flat hierarchy,
like "astrometry.utils.my_function" where my_function would physically
be inside the python/astrometry/utils/mymodule.py file.

I'll modify the namespace stuff in the setup.py's later when the folder
structure is in place.

Cheers
Maik

Maik Riechert

unread,
Feb 12, 2015, 8:55:45 AM2/12/15
to Dustin Lang, astro...@googlegroups.com
I was trying to compile all the libraries with the modified makefiles (I
know you're not done yet). The one for astrometry/utils I fixed by
adding the GSL dependency.
I'm now at plotstuff and noticed that plotmatch depends on blind's
matchfile and matchobj, whereas plotstuff should only depend on
astrometry/files downwards I guess. To me it would be more logical to
put all the file-type modules into astrometry/files, like matchfile,
matchobj, catalog, codefile, codetree, merge-index, ...? If that doesn't
make sense, can you maybe explain it a bit more? Maybe also a oneline
description per file type would be useful. Ultimately it would be nice
to have a direct relation between a given filetype/extension and the
relevant module, so that it's easy to find the code for it. At least
there should be a text file mapping file extension to source code file(s).

Cheers
Maik

Maik Riechert

unread,
Feb 12, 2015, 8:59:39 AM2/12/15
to Dustin Lang, astro...@googlegroups.com
Am 12.02.2015 um 14:55 schrieb Maik Riechert:
> whereas plotstuff should only depend on astrometry/files downwards I
> guess.
Sorry, meant plotstuff should depend on astrometry/catalogs downwards.

Dustin Lang

unread,
Feb 12, 2015, 11:16:03 AM2/12/15
to astro...@googlegroups.com, dstn...@gmail.com
Some of the file types are possibly useful outside of the Astrometry.net core routines -- eg, star kd-trees.  Others are much more specific to the blind solver.  Codefiles aren't even really used any more (not even sure if they're ever written to disk?  maybe); they get turned into code kd-trees, which are written to disk.  And code kd-trees encode the shapes of features used by Astrometry.net for matching -- very specific to our problem, and probably not much use to users of the library or the rest of the world.  Match files record matches we found, so are useful to library users, so definitely should go in astrometry/files -- good catch!  Whether all file-support modules should go there, I don't feel strongly either way.

cheers,
--dustin


Dustin Lang

unread,
Feb 12, 2015, 2:26:32 PM2/12/15
to astro...@googlegroups.com, dstn...@gmail.com
Please feel free to move things around.  I am going to have less time to work on this, so go ahead and run with it!
--dstn

Maik Riechert

unread,
Feb 12, 2015, 2:38:36 PM2/12/15
to Dustin Lang, astro...@googlegroups.com
Ok, I understand, although I still need your help with Makefile-fu. I'm
stuck on plotstuff, it creates libplotstuff.a but "make" throws some
errors after that, and, I have no idea!

Dustin Lang

unread,
Feb 12, 2015, 3:13:25 PM2/12/15
to astro...@googlegroups.com, dstn...@gmail.com
Ok, it builds cleanly for me (and TravisCI) so please send error messages.

thanks,
--dustin

Maik Riechert

unread,
Feb 12, 2015, 3:25:31 PM2/12/15
to Dustin Lang, astro...@googlegroups.com
Nevermind, got it working. Maybe forgot to clean up somewhere.

Maik Riechert

unread,
Feb 13, 2015, 5:26:54 PM2/13/15
to Dustin Lang, astro...@googlegroups.com
About the python/ folder...

I created this file:
https://github.com/dstndstn/astrometry.net/blob/whomovedmycheese/python/CONTENT

In my mind this is a way bigger mess than on the C side, but with some
cleaning this should be doable.

Could you have a look at the random files section and fill in the
question marks and give some hints on what you think belongs
together/can be removed/etc.? That would help me a lot to understand
this better.

About the random files, currently I think: There's a very small number
of python modules actually used by astrometry.net blind, see the last
subsection on "companion tools". Then there's a very small number of
helper tools, useful e.g. for building index files. Those two things
should stay. And then the bulk is just random historic stuff that
accumulated somehow, probably for writing papers or doing experiments.

Cheers
Maik
Reply all
Reply to author
Forward
0 new messages