AstropyCache proposal

45 views
Skip to first unread message

Demitri Muna

unread,
Oct 10, 2013, 12:32:27 PM10/10/13
to astro...@googlegroups.com
Hi,

Adam and I were talking about caching with respect to astroquery (but not limited to). Right now it seems like there are many pieces of code that write things to the ".astropy" directory. I want to propose an AstropyCache object that provides the sole interface to the .astropy directory. It would handle all of the low level details - creating the directory when not there, providing the full path, etc. This would allow the user to set a non-standard location for the directory, e.g. when there is no user directory or the default location is on a read-only volume.

The AstropyCache object would be written as a singleton (since there should only be one!), and used like this:

from astropy.utils import AstropyCache

cache = AstropyCache()
cache.directory = "/some/other/location/.astropy"

if cache.writable:
# write some stuff

I also want a subdirectory here to be user accessible. Image a person downloading FITS files that results from an astroquery. If they are standard files, there's no reason that they should be hidden from the user - they might want to open them in some other program. It should be clear that this is a user-facing directory, so I'd propose something like this:

.astropy/
User_Data/
cache/
astropy.coordinates/
astropy.io/

The uppercase is intended to be a suggest that's it's something the user is ok to use, the nondescript "cache" is where everything else lives (proper caches, temp files, etc.). Below this, there can be subfolders for each package that uses the cache folder so we don't end up with lots of files.

I want everything that astroquery retrieves to automatically be placed in the cache (or in this case, User_Data). For example (in pseudo code):

sloan_query = SDSSQuery()
sloan_query.redshift = [0.1,0.2]
sloan_query.u_mag = [12,13]
sloan_query.execute()

This would download n FITS files. The first time it's run, it will take a little while to retrieve the files. The second time it's run will be nearly instantaneous, since the code will first check the cache, find the files, and not attempt to retrieve them again. Given that we are writing to the user's hard drive, we will want to limit this so a runaway process doesn't fill the user's hard drive. We can decide on some default value (5GB?), but the user will be able to set this for themselves:

from astropy.utils import AstropyCache
AstropyCache().quota = Quantity("10 GB") # this doesn't work now, but should!
AstropyCache().quota = None # no limit set

I'm working on this code now. I'd like to propose that as any new code is written to write into the .astropy directory that it be transitioned to this new object. No one should write to "~/.astropy/my_temp_file" directly.

Given this, what methods/functionality do people need for this? Any thoughts/suggestions?

Cheers,
Demitri

_________________________________________
Demitri Muna

Department of Astronomy
Der Ohio State University

http://scicoder.org/



Michael Droettboom

unread,
Oct 10, 2013, 12:45:06 PM10/10/13
to astropy-dev
We already have a clear distinction between cache and config, which is why we have `~/.astropy/config` and `~/.astropy/cache`.  And these directories are abstracted in a single place in `astropy.config.paths` -- they should not be hardcoded anywhere else.  I can't find places where they are -- can you point me to them?  If there are, I'd consider that a bug and a change from the existing design -- we should fix those.  Though I see how your design pushes things a little further and makes it more object-oriented, it shouldn't be the case that these paths are hardcoded -- they are already dynamically determined in some cases and everything should be using that.

On Linux, we try to follow the XDG spec which says these should be `~/.config/astropy` and `~/.cache/astropy` respectively.  This makes system administration (deciding what and how to backup things) much easier.  Unfortunately I was overruled on supporting it completely and correctly, because we wanted something that would be the same on Mac and Linux, so it really only does this if the user has an explicit `XDG_CONFIG_HOME` set.

Does your design address timeouts in the cache?  I know that's a much requested feature for things like time tables etc.  I ask, because I think it will affect ultimately how things are stored in the cache.  There probably needs to be a place to store the timeout metadata (and possibly other metadata about items in the cache, such as the original source url). I have an abandoned branch from a few years ago where I started something that used a gdbm database for this (SQL would be overkill), but I didn't work through all of the details.

Mike





--
You received this message because you are subscribed to the Google Groups "astropy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
|\/|o _|_  _. _ | | \.__  __|__|_|_  _  _ ._ _  
|  ||(_| |(_|(/_| |_/|(_)(/_|_ |_|_)(_)(_)| | | 



Demitri Muna

unread,
Oct 10, 2013, 12:57:57 PM10/10/13
to astro...@googlegroups.com
Hi Mike,

On Oct 10, 2013, at 12:45 PM, Michael Droettboom <mdb...@gmail.com> wrote:

We already have a clear distinction between cache and config, which is why we have `~/.astropy/config` and `~/.astropy/cache`.

Right. Inspired by astroquery use, I wanted to add an explicit "user" directory, e.g. for FITS files. Good to know about this though so I can pillage that code.

And these directories are abstracted in a single place in `astropy.config.paths` -- they should not be hardcoded anywhere else.  I can't find places where they are -- can you point me to them?  If there are, I'd consider that a bug and a change from the existing design -- we should fix those.

I was looking a little at Adam's code, but this was over his shoulder and I don't know exactly how he was doing it, but this was inspired by astroquery.

Though I see how your design pushes things a little further and makes it more object-oriented, it shouldn't be the case that these paths are hardcoded -- they are already dynamically determined in some cases and everything should be using that.

Right, I wasn't thinking I was breaking new ground here, but wanted the more OO approach.

Does your design address timeouts in the cache?  I know that's a much requested feature for things like time tables etc.  I ask, because I think it will affect ultimately how things are stored in the cache.  There probably needs to be a place to store the timeout metadata (and possibly other metadata about items in the cache, such as the original source url). I have an abandoned branch from a few years ago where I started something that used a gdbm database for this (SQL would be overkill), but I didn't work through all of the details.

My email was to get requests for functionality, so yes, this can be added. (No thoughts on details or how at the moment.)

Demitri

_________________________________________
Demitri Muna

Department of Astronomy

Ole Streicher

unread,
Oct 11, 2013, 3:41:35 AM10/11/13
to astro...@googlegroups.com
Hi,

On 10.10.2013 18:45, Michael Droettboom wrote:
> On Linux, we try to follow the XDG spec which says these should be
> `~/.config/astropy` and `~/.cache/astropy` respectively. This makes
> system administration (deciding what and how to backup things) much
> easier. Unfortunately I was overruled on supporting it completely and
> correctly, because we wanted something that would be the same on Mac and
> Linux, so it really only does this if the user has an explicit
> `XDG_CONFIG_HOME` set.

Can we bring this topic back into discussion?

As the packager for Debian and Ubuntu, I am going to follow the
standards of my target systems, and these are here the XDG rules [1].
(Although they are not strictly required by policy). I will even patch
the astropy sources for packaging to fullfill these rules.
However, this may lead to a situation where a user changes from the
standard package (XDG compliant) to a self-compiled version and then
doesn't use his old configuration settings.

What is the reason to have the same directory structure on Mac and
Linux? I guess that a common directory (via NFS or so) between these
systems is quite rare, and also a user would expect that a software for
Linux would follow the Linux rules (FHS and XDG), and a software for Mac
follows its the standards there.

Best regards

Ole

[1] http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html

P.S. This is a repost, since my original message seems to be lost on
google-groups.

Demitri Muna

unread,
Oct 11, 2013, 11:45:09 AM10/11/13
to astro...@googlegroups.com
Hi,

On Oct 11, 2013, at 3:41 AM, Ole Streicher <ast...@liska.ath.cx> wrote:

As the packager for Debian and Ubuntu, I am going to follow the
standards of my target systems, and these are here the XDG rules [1].
(Although they are not strictly required by policy). I will even patch
the astropy sources for packaging to fullfill these rules.
However, this may lead to a situation where a user changes from the
standard package (XDG compliant) to a self-compiled version and then
doesn't use his old configuration settings.

That would definitely be a surprise, and it's generally frowned upon to "surprise" the user.

What is the reason to have the same directory structure on Mac and
Linux? I guess that a common directory (via NFS or so) between these
systems is quite rare, and also a user would expect that a software for
Linux would follow the Linux rules (FHS and XDG), and a software for Mac
follows its the standards there.

I think part of this comes down to what the user expects. I would venture that a large percentage of our users move back and forth between Mac OS (or even I suppose a few Windows computers) and Linux continuously. We work on our laptops and run code on Linux servers (which may or may not be Ubuntu or Debian). It would be a surprise to find the .astropy directory different between systems, particularly as we are used to the Unix interface on the Mac as being essentially the same thing. (Yes, I know it's not the same OS, but to our users it effectively is.)

For better or worse, users are already well familiar with software having hidden directories in our home directories: .ssh, .vim, .git, .matplotlib, .dropbox, etc. While you can throw this data point away if you'd like, I've been a Linux user (though never on my desktop) for many years, and I'd never heard of FHS or XDG until this thread. I can assure you virtually no astronomer has (and no one on this list counts). I understand that it might be the (a) Linux way of doing things, but it's not something that exists on the Mac or Windows. To follow the Mac convention, the proper place for the Astropy cache directory components would be:

~/Library/Application Support/Astropy/
~/Library/Preferences/org.astropy.plist
~/Library/Caches/org.astropy/

and I wouldn't even think of suggesting that we adopt that, following the principle of least surprise. Why? Because Astropy is not a Mac application. That's where I expect Mac applications to put things. One could argue that the XDG rules apply to specifically Linux applications, where astropy is cross-platform.

My question would be: what is the benefit of following the XDG suggestion? I'd hate to have all over the documentation, "to see the files downloaded, go to ~/.astropy/User_Data, unless you are running Ubuntu or Debian, then go to …".

Now, all that said, one of the great benefits of abstracting the cache into a proper OO object is that it can handle such details in the background transparently. Whether we want to is another question.

Demitri

_________________________________________
Demitri Muna

Department of Astronomy
Il Ohio State University

http://scicoder.org/



Perry Greenfield

unread,
Oct 11, 2013, 11:51:35 AM10/11/13
to astro...@googlegroups.com
I'm with Demitri on this issue.

Perry

Michael Droettboom

unread,
Oct 11, 2013, 12:12:53 PM10/11/13
to astropy-dev
The benefit of XDG is that is creates a separation between configuration and cache in the dot-directories user's home directory.  It, admittedly, only really helps if all apps follow the convention, but there has been a concerted effort in recent years to do so.  I agree, however, that it does make documenting things more difficult.

IPython follows the XDG conventions since at least 0.13 (possibly earlier), and matplotlib does since 1.3.0.  If we pursue the configuration proposal we discussed at the meeting, configuration will by default (by encouraging use in IPython profiles) also end up being XDG-compliant.

IMHO, if we do any half-measures on this, it's far more important to put cache in `.cache` than it is to put configuration in `.config`.  And since .cache is by design not really for human consumption, it's far less of a documentation issue (and also less likely to be repeated in many places).

Mike

Ole Streicher

unread,
Oct 11, 2013, 1:02:12 PM10/11/13
to astro...@googlegroups.com
Am 11.10.2013 17:45, schrieb Demitri Muna:
> On Oct 11, 2013, at 3:41 AM, Ole Streicher <ast...@liska.ath.cx
>> What is the reason to have the same directory structure on Mac and
>> Linux? I guess that a common directory (via NFS or so) between these
>> systems is quite rare, and also a user would expect that a software for
>> Linux would follow the Linux rules (FHS and XDG), and a software for Mac
>> follows its the standards there.
>
> I think part of this comes down to what the user expects. I would
> venture that a large percentage of our users move back and forth between
> Mac OS (or even I suppose a few Windows computers) and Linux
> continuously. We work on our laptops and run code on Linux servers
> (which may or may not be Ubuntu or Debian). It would be a surprise to
> find the .astropy directory different between systems, particularly as
> we are used to the Unix interface on the Mac as being essentially the
> same thing. (Yes, I know it's not the same OS, but to our users it
> effectively is.)

I think it is quite common that different systems -- even Unix like ones
-- differ in their file location. On some systems, python code goes into
/usr/lib/python/, on others it goes into /usr/share/python. Manpages may
be in /usr/man or in /usr/share/man. etc. Global cache may be in
/var/lib, /var/<package>/, or /var/cache/. (What are the according paths
for MacOS here, BTW?)

These different file paths are really a mess. The way to overcome this
mess is to agree on some standardization. For system-wide files, there
is already since long a well-established standard, the Filesystem
Hierarchy Standard (FHS), which today is no longer questioned.

> For better or worse, users are already well familiar with software
> having hidden directories in our home directories: .ssh, .vim, .git,
> .matplotlib, .dropbox, etc. While you can throw this data point away if
> you'd like, I've been a Linux user (though never on my desktop) for many
> years, and I'd never heard of FHS or XDG until this thread. I can assure
> you virtually no astronomer has (and no one on this list counts). I
> understand that it might be the (a) Linux way of doing things, but it's
> not something that exists on the Mac or Windows.

For user files, the history of such a standard is not so long yet. This
leads to the current situation, where each software does has its own
preferences on where to put its data. However, these things are going to
change, and especially for a new piece of software it is wise to follow
the standard and not some old private rules.

> To follow the Mac convention, the proper place for the Astropy cache
> directory components would be:
>
> ~/Library/Application Support/Astropy/
> ~/Library/Preferences/org.astropy.plist
> ~/Library/Caches/org.astropy/
>
> and I wouldn't even think of suggesting that we adopt that, following
> the principle of least surprise. Why? Because Astropy is not a Mac
> application. That's where I expect Mac applications to put things. One
> could argue that the XDG rules apply to specifically Linux applications,
> where astropy is cross-platform.

You have the same problem for almost all larger software packages:
Where, for example would you expect that the Video Lan Client (VLC) puts
its config? On my (Ubuntu) Linux system, it is ~/.config/vlc/, vlc being
cross-platform or not. LibreOffice the same.

*I* would expect that the files are where the standard for the system
puts them. So, I would really use the directories you listed above for
MacOS and the XDG directories for Linux (... and to whatever is the
standard on Windows).

> My question would be: what is the benefit of following the XDG
> suggestion? I'd hate to have all over the documentation, "to see the
> files downloaded, go to ~/.astropy/User_Data, unless you are running
> Ubuntu or Debian, then go to �".

Right. That's why put them always in the system's standard. Fedora may,
BTW go the same way: in the IRAF package preapared for Fedora you will
find the data in XDG_DATA_HOME= ~/.local/share/iraf/ [1]. I will do the
same on Debian/Ubuntu when I get IRAF working :-)

For the benefits: People use more and more software today, and the
software gets more complex. If you just need to learn the XDG paths for
Linux:

* configuration goes into ~/.config/
* data goes into ~/.local/share/
* cache data is in ~/.cache/
(resp. the paths defined by XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME).

you can apply this knowledge to any software you want. You just don't
need to learn a specific rule for astropy, since the rule is the same as
for the other packages. At least, as long as the packages follow these
rules, and that's why it is important that they really do. For older
software, it may take some time to convert them (and their users :-) ),
but I think new software should follow these standards from the beginning.

This is basically the same as for the system-wide files, where there is
no doubt that having common rules is a Good Thing, and one would not
argue to have the same paths on MacOS, Linux, and Windows.

Best regards

Ole

[1] https://github.com/joequant/iraf/blob/976dbe4b/unix/hlib/mkiraf.csh


Olе Streicher

unread,
Oct 10, 2013, 1:08:34 PM10/10/13
to astro...@googlegroups.com
Hi Michael,

Michael Droettboom <mdb...@gmail.com> writes:
> On Linux, we try to follow the XDG spec which says these should be
> `~/.config/ astropy` and `~/.cache/astropy` respectively.  This makes
> system administration (deciding what and how to backup things) much
> easier.  Unfortunately I was overruled on supporting it completely and
> correctly, because we wanted something that would be the same on Mac
> and Linux, so it really only does this if the user has an explicit
> `XDG_CONFIG_HOME` set.

Could this be brought somehow into discussion again?

As the packager for Debian, I am trying to make the package conforming
to the rules of my target system, which is in this case XDG [1] (Although
it is not a hard Debian policy requirement to follow the XDG specs).

Therefore, if astropy puts the files not according to XDG, I would patch
astropy for its inclusion in Debian (and Ubuntu). This could however
lead to the unexpected behaviour that if someone replaces my
Debian-packaged version by a self-compiled one (f.e. to get to the
bleeding edge), he may see his config no longer supported.

I do not see a reason why Mac and Linux should have the same directory
structure: each system has its own philosophy on where the files go. It
is also probably today rare to (NFS-) mount a MacOS home directory onto
a Linux desktop or vice versa and expect that everything works. Instead,
I would expect that a program (by default) behaves natively on each
system.

Best regards

Ole

[1] Xdg base directory specification
http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html

Demitri Muna

unread,
Oct 12, 2013, 9:38:14 AM10/12/13
to astro...@googlegroups.com
Hi,

I feel like this conversation has been hijacked some. I'm going to assume that no one objects to an AstropyCache object that provides the sole interface to the cache directory on disk (whatever it is).

To my original question, what are the functions/interfaces needed for this object? We can get back to discussing the implementation details after this is answered.

Demitri

_________________________________________
Demitri Muna

Department of Astronomy
La Ohio State University

http://scicoder.org/



Olе Streicher

unread,
Oct 12, 2013, 9:48:43 AM10/12/13
to astro...@googlegroups.com
Demitri Muna <demitr...@gmail.com> writes:
> I feel like this conversation has been hijacked some.

Sorry. The mail you refer to was delayed, and I assumed that it got
lost; that's why I wrote it again. After a day, this first message
appeared :-(

I changed the subject of my answer already.

Best regards

Ole

Erik Tollerud

unread,
Oct 14, 2013, 12:49:44 AM10/14/13
to astropy-dev
I think Mike was saying that it's not clear this is necessary given
that the code is already only (supposed) to be in
`astropy.config.paths`. I'm fine with it either way, personally - I
can see the advantage to a more OO approach at this point, but I'm not
sure it's worth the effort.

But I'm still not clear if you're suggesting actually letting the user
specify programatically where the cache is, Demitri. It was
intentional for this to be not really easy, because we don't
necessarily *want* users changing the cache directory often - that's
the point of a cache after all. Or are you saying you want this to
make it easier for astroquery to specify it's own cache location?

Also as Mike said, if you do go plumbing into the depths of this code,
it would be great if the re-design includes an option for a cache
timeout, which needs to be done fairly soon for some near-term planned
functionality.
> --
> You received this message because you are subscribed to the Google Groups "astropy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Erik

Erik Tollerud

unread,
Oct 14, 2013, 1:06:15 AM10/14/13
to astropy-dev
If my memory is correct, the system as it is now *does* respect
XDG_*_HOME if they are present. That is, that's where it will create
the astropy config and cache directories if they are defined.

I think the original justification for the current system was "easiest
to find." Right now people are more likely to find a file if it is in
~/.astropy, because they'll see it if they do "ls -a" or similar in
their home directory, while they won't if it's instead in
~/.config/astropy. That's not a strong argument, but in the absence
of any direct advantage to us for defaulting to ~/.config/astropy, it
seemed the reasonable default.

I don't have a terribly strong opinion on this, though. People are
probably not that likely to stumble-upon this without seeing it in a
doc, so changing it to say ~/.config/astropy instead of ~/.astropy
seems about the same.

While I hate to bring this up, there's one other issue: Windows.
Unlike Mac, it does not have any sensible XDG option as far as I know.
So a single astropy directory is the only thing that makes sense
there.
>> Ubuntu or Debian, then go to …".
>
> Right. That's why put them always in the system's standard. Fedora may,
> BTW go the same way: in the IRAF package preapared for Fedora you will
> find the data in XDG_DATA_HOME= ~/.local/share/iraf/ [1]. I will do the
> same on Debian/Ubuntu when I get IRAF working :-)
>
> For the benefits: People use more and more software today, and the
> software gets more complex. If you just need to learn the XDG paths for
> Linux:
>
> * configuration goes into ~/.config/
> * data goes into ~/.local/share/
> * cache data is in ~/.cache/
> (resp. the paths defined by XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME).
>
> you can apply this knowledge to any software you want. You just don't
> need to learn a specific rule for astropy, since the rule is the same as
> for the other packages. At least, as long as the packages follow these
> rules, and that's why it is important that they really do. For older
> software, it may take some time to convert them (and their users :-) ),
> but I think new software should follow these standards from the beginning.
>
> This is basically the same as for the system-wide files, where there is
> no doubt that having common rules is a Good Thing, and one would not
> argue to have the same paths on MacOS, Linux, and Windows.
>
> Best regards
>
> Ole
>
> [1] https://github.com/joequant/iraf/blob/976dbe4b/unix/hlib/mkiraf.csh
>
>
> --
> You received this message because you are subscribed to the Google Groups "astropy-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to astropy-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



--
Erik

Michael Droettboom

unread,
Oct 14, 2013, 9:51:34 AM10/14/13
to astropy-dev
On Mon, Oct 14, 2013 at 1:06 AM, Erik Tollerud <erik.t...@gmail.com> wrote:
If my memory is correct, the system as it is now *does* respect
XDG_*_HOME if they are present.  That is, that's where it will create
the astropy config and cache directories if they are defined.

That is correct, but that is not how the XDG spec is defined.  It says that if `XDG_*_HOME` are not present, they default to `~/.config`, `~/.cache` etc.  At the moment astropy defaults to `~/.astropy`.
 

I think the original justification for the current system was "easiest
to find." Right now people are more likely to find a file if it is in
~/.astropy, because they'll see it if they do "ls -a" or similar in
their home directory, while they won't if it's instead in
~/.config/astropy.  That's not a strong argument, but in the absence
of any direct advantage to us for defaulting to ~/.config/astropy, it
seemed the reasonable default.

I don't have a terribly strong opinion on this, though.  People are
probably not that likely to stumble-upon this without seeing it in a
doc, so changing it to say ~/.config/astropy instead of ~/.astropy
seems about the same.

While I hate to bring this up, there's one other issue: Windows.
Unlike Mac, it does not have any sensible XDG option as far as I know.
 So a single astropy directory is the only thing that makes sense
there.

Each new version of Windows has made subtle changes to the directory defaults.  I'd probably seek the advice of a Windows expert about the best cross-version, long-term location for such things.  In the meantime, the simplest solution is probably fine.

Mike



--

James Turner

unread,
Oct 14, 2013, 12:45:13 PM10/14/13
to astro...@googlegroups.com
> While I hate to bring this up, there's one other issue: Windows.
> Unlike Mac, it does not have any sensible XDG option as far as I know.
> So a single astropy directory is the only thing that makes sense
> there.

Isn't that the registry in Windows (for better or worse)? Not
sure about the cache though...

Erik Tollerud

unread,
Oct 22, 2013, 12:24:10 PM10/22/13
to astropy-dev
Well, the XDG_*_HOME usage in astropy is for a *file* that contains
configuration. So the registry doesn't help there, because it stores
data, not actual files.

I suppose you're right the registry is supposed to be window's
solution to the underlying problem, but that doesn't help if we want
it to look similar on each platform. (And it would be a lot of effort
for a very small fraction of users...)
Reply all
Reply to author
Forward
0 new messages