API conventions: classmethods/functions vs. initializers

Erik Tollerud

unread,

May 14, 2012, 11:38:24 PM5/14/12

to astro...@googlegroups.com

This is an informal poll triggered by some of the discussion
surrounding the time PR (and some items in specutils). One thing to
bear in mind: I'm asking right now primarily about *user*-level APIs.
That is, this is for objects like Time or Spectrum or whatever that
would be used by astronomer users who are not necessarily as
python-savvy as most of us here are.

The question is: do we prefer a family of class methods for
initializing objects from a variety of input formats, or should the
initializers be "smart"? for example:

1. **Classmethod/function Approach**

class Foo(object):
def __init__(self, val):
self.val = val #val is the internal representation

@classmethod
def from_format1(cls, val1):
trueval = convert_to_foo_format1(val1)
return cls(trueval)

@classmethod
def from_format2(cls, val2):
trueval = convert_to_foo_format2(val2)
return cls(trueval)

If classmethods are difficult to comprehend for novices, these can
also be done as simple functions.

2. **Initializer Approach**

class Foo(object):
def __init__(self, val=None, val1=None, val2=None):
if val is not None:
self.val = val
elif val1 is not None:
self.val = convert_to_foo_format(val1)
elif val2 is not None:
self.val = convert_to_foo_format(val2)
else:
raise ValueError('no input')

OR, if the 3 different input formats are different types:

class Foo(object):
def __init__(self, val):
if isinstance(val,float):
self.val = val
elif isinstance(val,basestring):
self.val = convert_to_foo_format1(val)
elif isinstance(val,someothertype):
self.val = convert_to_foo_format2(val)

Generally speaking, I favor approach #2, as I find it leverages
python's capabilities and leads to simpler code and less looking-up in
docs. But I'm primarily thinking of cases where the input value is
unambiguous (e.g. a bunch of different string representations of
coordinates), and I agree it's a bit more awkward if e.g. there are a
bunch of identically typed formats (e.g. for Time we have jd, mjd,
unix epoch, etc., and they are all numbers or arrays of numbers).

What do others think?

--
Erik Tollerud

Brian Kloppenborg

unread,

May 15, 2012, 12:10:17 AM5/15/12

to astro...@googlegroups.com

Hi Erik,

As you mention below, there are cases where automatic conversion may be
ambiguous. Therefore I would vastly prefer option 1 because the user's
intended action is explicitly stated. This also has the added benefit
of making the code more readable:

>>> a = time(x)

is less clear than

>>> a = time.from_jd(x)

Brian

Tom Aldcroft

unread,

May 15, 2012, 9:32:59 AM5/15/12

to astro...@googlegroups.com

I favor a modified version of approach #2, and agree with Erik's
opinion that it allows for simpler code, simpler documentation (both
for users and developers), and better use of Python and better
potential for abstracting the problem so the Time class is more easily
extensible.

Here's my take on the situation.

Fundamentally there are three things needed for instantiating the
"normal" resolution time representations:

- One or more "values", which are the input time(s). This input could
be one of a number of Python types:
- Float, int, datetime object, string, list of any of the previous,
numpy array of any of the previous.

- A format (one might also call this the "representation" but stick
with "format" for now). This is required in cases where the input
values are ambiguous. For instance a float is always ambiguous but a
datetime is never ambiguous nor is a string that matches one of the
supported string formats (e.g. ISO). Examples for format include:
- JD, MJD, unix, ISO, datetime, Chandra_seconds, Chandra_FITS,
'YYYYDDD:HHMMSS.sss' etc.

- A "system", e.g. UTC, TT, TAI, TDB, TGB, TCB
(http://tycho.usno.navy.mil/systime.html)

Handling of the different input types for "values" (e.g. array or not)
should be independent of the format and system. This makes it easier
to support more formats. In the latest Time implementation this
handling occurs twice (from_tai and from_jd) and would likely get
repeated more as the class develops. Of course this can be
refactored, but I think this would drive the code toward Erik's #2
anyway.

It should be very cheap to add support for a new format. Notice I
include Chandra_seconds as a format. This is a mission-specific
format, but as an astronomical utility there is no reason we cannot
support mission or observatory specific formats if it is cheap and
unambiguous.

Given the above, why not have one and exactly one way to make a time object:

t = Time(values, format=None, system=None)
t = Time(values, format='iso', system='tt')

This provides a single uniform interface and a single doc string that
needs to be memorized (users) and maintained (developers). The Time
class will have internal methods that convert from any format and
system into the internal representation. The Time class will raise an
exception if the input is ambiguous, for instance a float value is
provided with no value of format. But for formats that are
unambiguous it won't be required to specify the format.

Each time format will have a default "system" which is well
documented, but this can be overridden by the user.

To access different representations, provide two ways:

t.jd # short form which uses the default system for that format
t.get_jd(system=None) # long getter form that allows for a different system

If you are still reading, providing a single point interface like this
has a huge advantage in developing other parts of astropy that rely on
time. For any new method that needs a time you can *always* do:

def my_method(stuff, ..., time_values, time_format=None,
time_system=None, .. kwargs):
time_values = Time(time_values, time_format, time_system)
...

This means that for any astropy routine that needs a time there is
just one line of code to validate and convert anything the user
provides into a well-mannered Time object. And this provides a
uniform interface across all of astropy for how to do this. If Time
uses a blizzard of class methods then the above code gets a lot
uglier.

My 2c,
Tom

Perry Greenfield

unread,

May 15, 2012, 9:48:35 AM5/15/12

to astro...@googlegroups.com

Offhand, this seems like a very sensible proposal to me. The only
other suggestion is that even the aspect of converting representations
could be made generic (no format in the method name). E.g.
t.get_as('JD').

Perry

On May 15, 2012, at 9:32 AM, Tom Aldcroft wrote:

> On Mon, May 14, 2012 at 11:38 PM, Erik Tollerud <erik.t...@gmail.com
> > wrote:
>>

[...]

Wolfgang Kerzendorf

unread,

May 17, 2012, 4:24:55 PM5/17/12

to astro...@googlegroups.com

Hello guys,

TL/DR I prefer option #1, because it's more explicit and easier to read. In addition, it's easier to extend for users and maintainer don't have to check to see if everything works

1. I agree with brian that this gives a more explicit option and is easier to understand when reading code.

2. A problem with keywords is that some formats can use secondary parameters and thus we need to have many keywords for all eventualities and the docstring becomes very long. One example could be that Equatorial coordinates can have an epoch whereas Galactic coordinates don't. So if a user sets the epoch keyword when using galactic keywords you need to worry about that. And that will be one of many.

3. Many people use ipython for their data analysis. I think that the option two makes great use of tab-completion.
A user types foo.from-<tab> and will know all the different input formats (I guess the same as .get_ which could be available as a function and a property). So there's no reading the documentation as it is already built-in.
It also makes it easier to maintain separate doc-strings and the user only get's the doc-string for the function that he/she wants to use.

4. In the option 2, the __init__ function is left open and thus developers can very fast instantiate many objects with the internal format without letting it run through the if-statements of the converters.

5. It is very easy for users and developers to extend this class as there's no chance that a new from_converter will break any of the other converters as they are all separate. In addition, it's very nice for the maintainers as they can just see if the function works and add it in without having to check any of the other converter functions.

6. The point that was raised by Tom A. is valid, that we need to have functions that are called to make checks (for example arrays/non-arrays) but I think this can be solved by using decorators and functions.

So in the current approach that we're working on for Time, we have a bit of a mix of 1 and 3 in some converter functions. For example, the from_tai function takes arrays/non-arrays and checks if they are basestrings and then assumes iso or datetime and then uses the datetime format. One could add a keyword format to that function that will be None (as Tom A suggested) if the user wishes an automatic check or 'iso' or 'datetime' if the user wants to specify exactly what it is. A this is another example for my point 2: The format keyword is irrelevant when parsing jd.

Thoughts?

Cheers
Wolfgang

Erik Bray

unread,

May 17, 2012, 4:30:18 PM5/17/12

to astro...@googlegroups.com

On Thu, May 17, 2012 at 4:24 PM, Wolfgang Kerzendorf
<wkerz...@gmail.com> wrote:
> Hello guys,
>
> TL/DR I prefer option #1, because it's more explicit and easier to read. In addition, it's easier to extend for users and maintainer don't have to check to see if everything works
>
>
> 1. I agree with brian that this gives a more explicit option and is easier to understand when reading code.
>
> 2. A problem with keywords is that some formats can use secondary parameters and thus we need to have many keywords for all eventualities and the docstring becomes very long. One example could be that Equatorial coordinates can have an epoch whereas Galactic coordinates don't. So if a user sets the epoch keyword when using galactic keywords you need to worry about that. And that will be one of many.
>
> 3. Many people use ipython for their data analysis. I think that the option two makes great use of tab-completion.
> A user types foo.from-<tab> and will know all the different input formats (I guess the same as .get_ which could be available as a function and a property). So there's no reading the documentation as it is already built-in.
> It also makes it easier to maintain separate doc-strings and the user only get's the doc-string for the function that he/she wants to use.
>
> 4. In the option 2, the __init__ function is left open and thus developers can very fast instantiate many objects with the internal format without letting it run through the if-statements of the converters.
>
> 5. It is very easy for users and developers to extend this class as there's no chance that a new from_converter will break any of the other converters as they are all separate. In addition, it's very nice for the maintainers as they can just see if the function works and add it in without having to check any of the other converter functions.
>
> 6. The point that was raised by Tom A. is valid, that we need to have functions that are called to make checks (for example arrays/non-arrays) but I think this can be solved by using decorators and functions.
>
>
> So in the current approach that we're working on for Time, we have a bit of a mix of 1 and 3 in some converter functions. For example, the from_tai function takes arrays/non-arrays and checks if they are basestrings and then assumes iso or datetime and then uses the datetime format. One could add a keyword format to that function that will be None (as Tom A suggested) if the user wishes an automatic check or 'iso' or 'datetime' if the user wants to specify exactly what it is. A this is another example for my point 2: The format keyword is irrelevant when parsing jd.
>
> Thoughts?
>
> Cheers
> Wolfgang

In the process of working on PR #237
(https://github.com/astropy/astropy/pull/237) I came to more or less
agree with all of the above, except for the part about tab-completion
which I don't care about :)

Erik B.

Tom Aldcroft

unread,

May 17, 2012, 7:18:35 PM5/17/12

to astro...@googlegroups.com

Hi Wolfgang,

As difficult as it is to not respond to each of your points, I just
want to ask one question about something I consider a very high
priority and which you didn't address. As a reminder, in earlier
email I wrote about an advantage of instantiating with Time(value,
format, system):

"""
For any new method that needs a time you can *always* do:

def my_method(stuff, ..., time_values, time_format=None,
time_system=None, .. kwargs):
time_values = Time(time_values, time_format, time_system)
...

This means that for any astropy routine that needs a time there is
just one line of code to validate and convert anything the user
provides into a well-mannered Time object. And this provides a
uniform interface across all of astropy for how to do this.
"""

In your implementation how can methods or functions that need a time
get this in a generic and uniform way without requiring users to
supply a Time object.

Thanks,
Tom

On Thu, May 17, 2012 at 4:24 PM, Wolfgang Kerzendorf
<wkerz...@gmail.com> wrote:

Tom Aldcroft

unread,

May 18, 2012, 8:08:05 AM5/18/12

to astro...@googlegroups.com

I violated the rule of not sending email when you're tired or annoyed
for other reasons. I should have just said that it would be very
useful to have a universal initializer in Time something like
from_any(values, format, system).

Cheers,
Tom

Reply all

Reply to author

Forward