Access to fields of abstract types

272 views
Skip to first unread message

Toivo Henningsson

unread,
Apr 13, 2012, 10:03:34 PM4/13/12
to julia-dev

If I write

f(x) = x.t
f(1)

I get an error(of course) on the second line:

type error: getfield: expected CompositeKind, got Int64
in f at none:1

If I create

type T
t
end

I can invoke f(T(1)), which returns 1.
Nothing unexpected.

My question is:
Is it intentional that I don't get an error already for the function
definition f(x) = x.t? It seems like bad form to write this kind of
code; if I don't know the concrete type of x, I shouldn't be allowed
to assume anything about its members. Of course, f(x::T) = x.t should
still work, since the compiler can see that its the member t of type T
that I'm after. Also, If I really know what I'm doing, I could still
define the original unsafe f(x) = x.t as

f(x) = getfield(x,:t)

One possibility would be:
Don't allow dotted member access x.t unless x has a concrete type
specified. The compiler could even check that the composite type has a
field named t, which would provide some helpful compile time error
checking! (Ok, I'm not really sure what "compile time" means when it
comes to Julia, but hopefully you shouldn't have to exercise f(x) to
get the error, anyway)

This would enforce the principle that only implementation methods for
a specific composite type have access to its fields, giving some
degree of encapsulation. If you don't know about the implementation
type, don't poke around in it's fields (surely to be considered an
implementation detail?)

What's your position on this?
Is it feasible?
Are there any drawbacks?

/ Toivo







Jeff Bezanson

unread,
Apr 13, 2012, 10:09:51 PM4/13/12
to juli...@googlegroups.com
We actually considered a rule like this at one point. But it never
seemed worth it to go to lots of effort in the implementation just to
disallow something:

http://en.wikipedia.org/wiki/Duck_typing

It also somewhat alleviates the need for structure inheritance to be
able to define multiple types with the same fields, and call the same
functions on them.

Stefan Karpinski

unread,
Apr 13, 2012, 11:03:13 PM4/13/12
to juli...@googlegroups.com
Also, for expediency, we have until now deferred all run-time errors to run-time. That doesn't have to be the case though. One can imagine more aggressive checks either by default or optional.

Toivo Henningsson

unread,
Apr 14, 2012, 2:58:51 PM4/14/12
to julia-dev
[About accessing x.t without knowing the concrete type of x:]

On Apr 14, 4:09 am, Jeff Bezanson <jeff.bezan...@gmail.com> wrote:
> We actually considered a rule like this at one point. But it never
> seemed worth it to go to lots of effort in the implementation just to
> disallow something:
>
> http://en.wikipedia.org/wiki/Duck_typing
>
> It also somewhat alleviates the need for structure inheritance to be
> able to define multiple types with the same fields, and call the same
> functions on them.

I see your point. Maybe one should go in the opposite direction and
implement properties in Julia (which I think would be pretty useful).
I think it's only fair that if named fields could be part of the
interface of an object, it should be possibe to put in some custom
behavior, or provide a default implementation in an abstract
supertype.

I was able to get something of the kind by overloading getfield.
If I enter this at the Julia prompt:

global getfield
orig_getfield = getfield
getfield(x...) = orig_getfield(x...)

# create type T and add a property t2 == t^2
type T
t
end
getfield(self::T, sym::Symbol) = (sym == :t2 ? self.t^2 :
orig_getfield(self, sym))

Then T(3).t2 returns 9.

I wasn't able to overload assignment in the same way. (Looking at the
definition of setfield: "setfield(s, f, v) = (s.(f) = v)", I suppose
that's not so surprising :) No matter what, read only properties would
be a lot better than no properties.


Incidentally, if this kind of thing could work without too much
overhead, it could be nice to use it for the newly introduced Options
type, e g

getfield(opts::Options, sym::Symbol) = opts[sym]

At least to me, opts.size feels a lot more comfortable than
opts[:size].
I'd prefer to leave quoting to the more magic/meta parts of the code.


I suppose it would give the compiler an easier job to optimize the
resulting code if you could define a specific property using something
like

getfield(self::T, Type{:t2}) = (self.t^2)

(singleton types for symbols), and it would also decouple properties
for the same type with different names.


A nice thing with this kind of setup would be that you could define
properties for abstract types, with a default implementation or a an
error("abstract property!"), and subtypes could override it.
I suppose an actual field with the same name in a composite type
should override all property definitions, since it would be the most
specific, just like adding an actual property with the same name to
the concrete type.


Of course, if you want to introduce a specific property, you could get
similar results with plain getter and setter methods and avoid all the
nasty stuff (but not with code already written to use field access).
But dot notation would be a more compact way to express properties,
and also more clear I think. Properties would seemingly live in the
namespace of the object, which feels more comfortable to me than
pulling a getter method from a sea of functions (maybe I just haven't
written enough Julia code yet to get used to this :) )


I'm not sure this would be the best way to go about it, though.
I got lot of segfaults and hangs when playing around with redefining
getfield.
For instance, the above code works if you load it after starting
julia, but running it right after boot with
julia -L props.jl
gives a segfault.

If properties is a good idea, there should definitely be some macros
to take care of the unsafe stuff and boilerplate, like

@rwproperty T.t2 self->self.t^2 (self,t2)->(self.t = sqrt(t2))

or perhaps just

@rwproperty T.t2 self.t^2 self.t = sqrt(t2)

to define the property above (with assignment too), or maybe

@rproperty T.t2 = self.t^2

for a read only property.


What do people think? Would it be a good thing or a bad thing?

Harlan Harris

unread,
Apr 14, 2012, 4:04:30 PM4/14/12
to juli...@googlegroups.com
It would be _great_ to have something like this for DataFrames. I'm imagining accessing columns by df.foo instead of df["foo"]! Python/Pandas/DataFrame allows exactly this syntax, which would make me extremely happy to have in Julia, at least for columns with names that are legal identifiers when wrapped in string().

As for Options, yeah, o.field would be great, although it's less of a big deal, as options manipulation is typically done in "production" code, while DataFrame column access is often done interactively in the REPL.

 -Harlan

Jeff Bezanson

unread,
Apr 14, 2012, 4:07:42 PM4/14/12
to juli...@googlegroups.com
Redefining getfield isn't supposed to work. Currently the julia
toplevel is in a module called "Base", which inherits getfield from
"Core". Writing "getfield(x,y)=..." defines a new getfield in Base,
and then future references to getfield by functions in Base use this
function, which can clearly cause lots of problems.

Tim Holy

unread,
Apr 14, 2012, 4:46:39 PM4/14/12
to juli...@googlegroups.com
On Saturday, April 14, 2012 11:58:51 AM Toivo Henningsson wrote:
> Incidentally, if this kind of thing could work without too much
> overhead, it could be nice to use it for the newly introduced Options
> type, e g
>
> getfield(opts::Options, sym::Symbol) = opts[sym]
>
> At least to me, opts.size feels a lot more comfortable than
> opts[:size].
> I'd prefer to leave quoting to the more magic/meta parts of the code.

The macro interface (coming soon, subject to discussion by the community) will
avoid the need for explicit quoting.

--Tim

Toivo Henningsson

unread,
Apr 14, 2012, 4:46:56 PM4/14/12
to julia-dev


On Apr 14, 10:07 pm, Jeff Bezanson <jeff.bezan...@gmail.com> wrote:
> Redefining getfield isn't supposed to work. Currently the julia
> toplevel is in a module called "Base", which inherits getfield from
> "Core". Writing "getfield(x,y)=..." defines a new getfield in Base,
> and then future references to getfield by functions in Base use this
> function, which can clearly cause lots of problems.

Oh, I'm definitely not saying this is the way to implement it.
I'm just saying I'd really love to have the functionality.
If there could be some way to define a get/set method for a given type
and field name that would get called in case an actual field doesn't
exist. Python has this for instance, with __getattr__ and __setattr__
(and single-name properties too), and it comes in really handy once in
a while. I do think that properties can be a really convenient way to
work with some things, and it allows you to add encapsulation even if
you have user code that relies on field access.

I realize it's not a small change (though I've no idea how big of
course :),
and a major decision. But I think a lot of people would be thrilled,
myself included.

I guess it's up to you guys.

Toivo Henningsson

unread,
Apr 14, 2012, 4:55:32 PM4/14/12
to julia-dev


On Apr 14, 5:03 am, Stefan Karpinski <stefan.karpin...@gmail.com>
wrote:
> Also, for expediency, we have until now deferred all run-time errors to run-time. That doesn't have to be the case though. One can imagine more aggressive checks either by default or optional.

Yes! I'm all for more aggressive checks. If they're not the default,
I'd be happy to add a few lines to the top of each source file to
enable them. I suppose it would be reasonable to have a compile time
check for undefined field names be optional, so that you could enable
it in the parts of your code that don't rely on them.

Of course, if you don't add something like the properties I proposed,
there would be no harm in giving a compile time error for f(x::T) =
x.t, when T is a composite type that doesn't declare t.

Tim Holy

unread,
Apr 14, 2012, 4:56:44 PM4/14/12
to juli...@googlegroups.com
On Saturday, April 14, 2012 01:46:56 PM Toivo Henningsson wrote:
> I realize it's not a small change (though I've no idea how big of
> course :),
> and a major decision. But I think a lot of people would be thrilled,
> myself included.

To elaborate further: the "alternative/macro" options framework is
implementing this via

opts = @options a=5 b=7

and

opts = @add_options opts c=2*a+5

You could presumably do something similar to also provide a "get" syntax. With
"macro options", the get is handled implicitly by the statement

@defaults opts a=3 name="julia"

I'm hoping to get this out in fully-working form by tomorrow (no promises,
though).

--Tim

Stefan Karpinski

unread,
Apr 14, 2012, 5:06:12 PM4/14/12
to juli...@googlegroups.com
On the one hand, I'm really, really not keen on allowing x.f to mean anything besides accessing a field in a composite object, I would much prefer to have the way of accessing a column of a DataFrame be df.col than df["col"], not only because it's shorter and easier to write and read, but also because doesn't add another way of indexing into things. Then it would be more reasonable to insist that on two indices for DataFrame indexing and slicing: df[row,:], df[:,col], df[row,col], etc.

An alternate approach to overloading get/setfield would be to construct an anonymous struct type when creating data frames so that the columns actually *are* fields, making the df.col access more than just an abuse of syntax. Patrick did this in his strpack module.

Toivo Henningsson

unread,
Apr 14, 2012, 5:07:24 PM4/14/12
to julia-dev
> Oh, I'm definitely not saying this is the way to implement it.
> I'm just saying I'd really love to have the functionality.
> If there could be some way to define a get/set method for a given type
> and field name that would get called in case an actual field doesn't
> exist.

I suppose another view of it would be that the namespace of a type
could contain not only actual fields, but also things like properties.
Then you would have to declare all properties of a type within the
type itself, which would actually ensure that they'd belong to that
type, and would let the compiler know which names are valid field
names as soon as the type is declared. (I suppose the general fallback
for missing field names wouldn't follow along from this, but you could
have some special fallback method inside the type). Maybe it's in
conflict with how you handle scoping for constructors...

With this setup it would of course be handy to able to declare
properties within abstract types as well, which I guess might be an
even greater change than a fallback method for missing fields...
Anyway, a property would just be a method (or a method pair (getter,
setter)), with some special invocation syntax. The same rules for
dispatch should still apply.

Stefan Karpinski

unread,
Apr 14, 2012, 5:10:20 PM4/14/12
to juli...@googlegroups.com
The "harm" is just that it takes a lot of time and effort to implement compile-time checks. "God sells us everything for the price of labor." There are too many other things to do at this point to work on optional things like aggressive compile-time checks. But in principle I'm all for it.

Patrick O'Leary

unread,
Apr 14, 2012, 5:51:54 PM4/14/12
to juli...@googlegroups.com
On Saturday, April 14, 2012 4:06:12 PM UTC-5, Stefan Karpinski wrote:
An alternate approach to overloading get/setfield would be to construct an anonymous struct type when creating data frames so that the columns actually *are* fields, making the df.col access more than just an abuse of syntax. Patrick did this in his strpack module.

You could argue that macro-constructing "anonymous" types is an abuse of gensym(). And it gets worse if you suddenly decide you want to tack a new column onto your DataFrame (you should be allowed to do that, right?) which will leave you with a lot of bookkeeping--might be worth it for the syntax, but doesn't sound easy to get right. Maybe it's easier than it sounds, though!

Stefan Karpinski

unread,
Apr 14, 2012, 5:57:42 PM4/14/12
to juli...@googlegroups.com
Nope. I would agree that's it's an abuse. And I don't actually think there is any way to tack a column on at that point. You'd be creating an entirely new DataFrame object. However, that may be acceptable since it's a little more functional that way.

Patrick O'Leary

unread,
Apr 14, 2012, 6:45:43 PM4/14/12
to juli...@googlegroups.com
That's the bookkeeping I was referring to--if you could freely add and remove fields from a composite type, I don't think it would be that complicated at all.

Jeff Bezanson

unread,
Apr 14, 2012, 7:05:00 PM4/14/12
to juli...@googlegroups.com
We definitely can't support mutating types. I'd rather make getfield()
an ordinary generic function than most of the options mentioned here.

On Sat, Apr 14, 2012 at 6:45 PM, Patrick O'Leary

Toivo Henningsson

unread,
Apr 15, 2012, 12:10:09 AM4/15/12
to julia-dev


On Apr 14, 11:10 pm, Stefan Karpinski <ste...@karpinski.org> wrote:
> The "harm" is just that it takes a lot of time and effort to implement
> compile-time checks. "God sells us everything for the price of labor."
> There are too many other things to do at this point to work on optional
> things like aggressive compile-time checks. But in principle I'm all for it.

That's all I wanted to hear :) I'm all into this for the long run.
I'm not trying to stress you with all my feature requests, just feels
best to bring up the discussion while the language is still evolving
as much as it is now.

I really like dynamic languages, but maybe the thing I miss most from
static ones is being able to get a load of compile time errors at the
same time :)
Static languages seem to catch a lot of errors at compile time, which
won't show up until runtime in dynamic ones, __if__ you exercise the
proper code. Julia seems better equipped than most dynamic languages
to eventually combine the best of both worlds in this respect,
which I think would be awesome :)

Now, if getting a long list of errors at the same time would mean
actually making them into warnings, I think that would be ok.

Toivo Henningsson

unread,
Apr 15, 2012, 12:14:32 AM4/15/12
to julia-dev

> An alternate approach to overloading get/setfield would be to construct an
> anonymous struct type when creating data frames so that the columns
> actually *are* fields, making the df.col access more than just an abuse of
> syntax. Patrick did this in his strpack module.

Ok, that's an interesting trick that could work in some cases.
But isn't it a lot of overhead to create a new anonymous type for each
object?

Harlan Harris

unread,
Apr 15, 2012, 9:38:00 AM4/15/12
to juli...@googlegroups.com
To me, it's not worth doing unless there's a way to make the syntax be a single character, ala list$elemname in R, or obj.prop in Python. Maybe there could be, in Julia 2.0 or whatever, some syntactic sugar that replaces x["a"] or ref(x,"a") with x?a, where ? is an ascii character TBD, and a is a symbol that gets stringified.

 -Harlan

Stefan Karpinski

unread,
Apr 15, 2012, 1:11:41 PM4/15/12
to juli...@googlegroups.com
The trouble is that we're completely out of syntax. I've tried thinking of syntax for this and I just can't come up with anything that doesn't already mean something. That's why field access syntax is appealing to me: it's relatively under-used in Julia ("this is not a dot-oriented language"), but feels apropos for pulling named columns of data out of data frames. The downside is that that would overload what is currently a completely unambiguous syntax, but it's going to get overloaded by namespaces real soon anyway, so maybe we should just embrace that. It would be nice if one mechanism could be used for all foo.bar sorts of things. Maybe that just needs to be another piece of syntax.

Jeff Bezanson

unread,
Apr 15, 2012, 3:51:17 PM4/15/12
to juli...@googlegroups.com
It's crazy, but we could maybe use x$$y.
If a.b is overloaded we need some way to access the actual properties.
I guess it would have to be something like _getfield(a,:b).

Stefan Karpinski

unread,
Apr 15, 2012, 5:49:01 PM4/15/12
to juli...@googlegroups.com
Only one underscore? That's way too unsafe. It needs to be __getfield__. No one would be crazy enough to use *four* underscores.

Vitalie Spinu

unread,
Apr 15, 2012, 6:34:21 PM4/15/12
to juli...@googlegroups.com
>>>> Jeff Bezanson <jeff.b...@gmail.com>

>>>> on Sun, 15 Apr 2012 15:51:17 -0400 wrote:

> It's crazy, but we could maybe use x$$y.

Or x..y, or x~y, or x@y . The last one is only used as a prefix for
macro calls, right?

Also +1 for overloadable ".". It will prove useful for defining other
programming paradigms such as prototype OO style for example.

Toivo Henningsson

unread,
Apr 16, 2012, 7:28:54 AM4/16/12
to julia-dev
On Apr 15, 9:51 pm, Jeff Bezanson <jeff.bezan...@gmail.com> wrote:
> It's crazy, but we could maybe use x$$y.
> If a.b is overloaded we need some way to access the actual properties.
> I guess it would have to be something like _getfield(a,:b).

Or if you don't want to give away all that power to the user, the
getfield that you're allowed to overload could be a fallback that gets
called only when no actual field exists with the given name. That way
a field declaration would serve as the most specific override to any
property definition. It would certainly reduce the need for _getfield,
since all actual fields would be accessible by name. I suppose it
might help the compiler too, in the common case that a named field
exists.

On the other hand, sometimes it's nice to get control of the whole
namespace obj.????, without having to try to come up with crazy enough
names for the fields that noone would ever try to use them.

Jeff Bezanson

unread,
Apr 16, 2012, 8:38:28 PM4/16/12
to juli...@googlegroups.com
Another possible available syntax is x@y, since currently @ is only prefix.

Tom Short

unread,
Apr 17, 2012, 7:31:34 AM4/17/12
to julia-dev
On Apr 16, 8:38 pm, Jeff Bezanson <jeff.bezan...@gmail.com> wrote:
> Another possible available syntax is x@y, since currently @ is only prefix.

I'd prefer overloading "." to using "@" for custom field access. The
dot is easier to type, has less visual clutter (fewer pixels), and is
more standard in other languages.

"@" could be used for "low-level" object access. In R, "@" is used for
low-level access to S4 objects. I'm fine with using a function for low-
level access, though.

Vitalie Spinu

unread,
Apr 17, 2012, 8:18:10 AM4/17/12
to juli...@googlegroups.com

One might need two (or even more) accessors with different meaning. In
general it might be a good idea to preserve "." to access the "real"
structure of an object and to have something else for the "user
interface". More generic alternative accessors such as "@" and "..",
the better. The second is as readable as "." IMO.

@ in R is the same as "." in Julia, used to access fields which are
called slots in R.

>>>> Tom Short <tshort...@gmail.com>

Tom Short

unread,
Apr 17, 2012, 9:54:37 AM4/17/12
to julia-dev
On Apr 17, 8:18 am, Vitalie Spinu <spinu...@gmail.com> wrote:
> One might need two (or even more) accessors with different meaning. In
> general it might be a good idea to preserve "." to access the "real"
> structure of an object and to have something else for the "user
> interface".  More generic alternative accessors such as "@" and "..",
> the better. The second is as readable as "." IMO.

For the regular user, "@" will work well. I think overloading "." is
better because it makes Julia more approachable for beginning students
or people coming from Javascript, Excel/VBA, or Matlab.
Reply all
Reply to author
Forward
0 new messages