Postpone creation of attributes until needed

Frank Millman

unread,

Jun 11, 2007, 5:24:51 AM6/11/07

to

Hi all

I have a small problem. I have come up with a solution, but I don't
know if it is a) safe, and b) optimal.

I have a class with a number of attributes, but for various reasons I
cannot assign values to all the attributes at __init__ time, as the
values depend on attributes of other linked classes which may not have
been created yet. I can be sure that by the time any values are
requested, all the other classes have been created, so it is then
possible to compute the missing values.

At first I initialised the values to None, and then when I needed a
value I would check if it was None, and if so, call a method which
would compute all the missing values. However, there are a number of
attributes, so it got tedious. I was looking for one trigger point
that would work in any situation. This is what I came up with.

>>> class A(object):
... __slots__ = ('x','y','z')
... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name
... if name not in self.__class__.__slots__:
... raise AttributeError,name
... self.z = self.x * self.y
... return getattr(self,name)

>>> a = A(3,4)
>>> a.x
3
>>> a.y
4
>>> a.z
getattr z
12
>>> a.z
12
>>> a.q
getattr q
Attribute Error: q

In other words, I do not declare the unknown attributes at all. This
causes __getattr__ to be called when any of their values are
requested, and __getattr__ calls the method that sets up the
attributes and computes the values.

I use __slots__ to catch any invalid attributes, otherwise I would get
a 'maximum recursion depth exceeded' error.

Is this ok, or is there a better way?

Thanks

Frank Millman

Phil Thompson

unread,

Jun 11, 2007, 5:47:21 AM6/11/07

to pytho...@python.org, Frank Millman

Properties...

@property
def z(self):
return self.x * self.y

Phil

Steven D'Aprano

unread,

Jun 11, 2007, 6:21:51 AM6/11/07

to

On Mon, 11 Jun 2007 02:24:51 -0700, Frank Millman wrote:

> Hi all
>
> I have a small problem. I have come up with a solution, but I don't
> know if it is a) safe, and b) optimal.
>
> I have a class with a number of attributes, but for various reasons I
> cannot assign values to all the attributes at __init__ time, as the
> values depend on attributes of other linked classes which may not have
> been created yet. I can be sure that by the time any values are
> requested, all the other classes have been created, so it is then
> possible to compute the missing values.

Unless you're doing something like creating classes in one thread while
another thread initiates your instance, I don't understand how this is
possible.

Unless... you're doing something like this?

def MyClass(object):
def __init__(self):
self.x = Parrot.plumage # copy attributes of classes
self.y = Shrubbery.leaves

Maybe you should force the creation of the classes?

def MyClass(object):
def __init__(self):
try:
Parrot
except Some_Error_Or_Other: # NameError?
# do something to create the Parrot class
pass
self.x = Parrot.plumage
# etc.

> At first I initialised the values to None, and then when I needed a
> value I would check if it was None, and if so, call a method which
> would compute all the missing values. However, there are a number of
> attributes, so it got tedious. I was looking for one trigger point
> that would work in any situation. This is what I came up with.
>
>>>> class A(object):
> ... __slots__ = ('x','y','z')

By using slots, you're telling Python not to reserve space for a __dict__,
which means that your class cannot create attributes on the fly.

> ... def __init__(self,x,y):
> ... self.x = x
> ... self.y = y
> ... def __getattr__(self,name):
> ... print 'getattr',name
> ... if name not in self.__class__.__slots__:
> ... raise AttributeError,name
> ... self.z = self.x * self.y
> ... return getattr(self,name)

[snip]

> In other words, I do not declare the unknown attributes at all. This
> causes __getattr__ to be called when any of their values are
> requested, and __getattr__ calls the method that sets up the
> attributes and computes the values.
>
> I use __slots__ to catch any invalid attributes, otherwise I would get
> a 'maximum recursion depth exceeded' error.

That's the wrong solution to that problem. To avoid that problem,
__getattr__ should write directly to self.__dict__.

> Is this ok, or is there a better way?

At the interactive Python prompt:

help(property)

--
Steven

Frank Millman

unread,

Jun 11, 2007, 6:10:43 AM6/11/07

to

On Jun 11, 11:47 am, Phil Thompson <p...@riverbankcomputing.co.uk>
wrote:

> On Monday 11 June 2007 10:24 am, Frank Millman wrote:
>
> > Hi all
>
> > I have a small problem. I have come up with a solution, but I don't
> > know if it is a) safe, and b) optimal.
>
> > I have a class with a number of attributes, but for various reasons I
> > cannot assign values to all the attributes at __init__ time, as the
> > values depend on attributes of other linked classes which may not have
> > been created yet. I can be sure that by the time any values are
> > requested, all the other classes have been created, so it is then
> > possible to compute the missing values.
>
>

> Properties...
>
> @property
> def z(self):
> return self.x * self.y
>

In my simple example I showed only one missing attribute - 'z'. In
real life I have a number of them, so I would have to set up a
separate property definition for each of them.

With my approach, __getattr__ is called if *any* of the missing
attributes are referenced, which seems easier and requires less
maintenance if I add additional attributes.

Another point - the property definition is called every time the
attribute is referenced, whereas __getattr__ is only called if the
attribute does not exist in the class __dict__, and this only happens
once. Therefore I think my approach should be slightly quicker.

Frank

Frank Millman

unread,

Jun 11, 2007, 6:58:16 AM6/11/07

to

On Jun 11, 12:21 pm, Steven D'Aprano

<s...@REMOVE.THIS.cybersource.com.au> wrote:
> On Mon, 11 Jun 2007 02:24:51 -0700, Frank Millman wrote:
> > Hi all
>
> > I have a small problem. I have come up with a solution, but I don't
> > know if it is a) safe, and b) optimal.
>
> > I have a class with a number of attributes, but for various reasons I
> > cannot assign values to all the attributes at __init__ time, as the
> > values depend on attributes of other linked classes which may not have
> > been created yet. I can be sure that by the time any values are
> > requested, all the other classes have been created, so it is then
> > possible to compute the missing values.
>
> Unless you're doing something like creating classes in one thread while
> another thread initiates your instance, I don't understand how this is
> possible.
>

I was hoping not to have to explain this, as it gets a bit complicated
(yes, I have read The Zen of Python ;-), but I will try.

I have a class that represents a database table, and another class
that represents a database column. When the application 'opens' a
table, I create an instance for the table and separate instances for
each column.

If there are foreign keys, I used to automatically open the foreign
table with its columns, and build cross-references between the foreign
key column on the first table and the primary key column on the second
table.

I found that as the database grew, I was building an increasing number
of links, most of which would never be used during that run of the
program, so I stopped doing it that way. Now I only open the foreign
table if the application requests it, but then I have to find the
original table and update it with attributes representing the link to
the new table.

It gets more complicated than that, but that is the gist of it.

> >>>> class A(object):
> > ... __slots__ = ('x','y','z')
>
> By using slots, you're telling Python not to reserve space for a __dict__,
> which means that your class cannot create attributes on the fly.
>

I understand that. In fact I was already using slots, as I was
concerned about the number of 'column' instances that could be created
in any one program, and wanted to minimise the footprint. I have since
read some of caveats regarding slots, but I am not doing anything out
of the ordinary so I feel comfortable with them so far.

> > I use __slots__ to catch any invalid attributes, otherwise I would get
> > a 'maximum recursion depth exceeded' error.
>
> That's the wrong solution to that problem. To avoid that problem,
> __getattr__ should write directly to self.__dict__.
>

Are you saying that instead of

self.z = self.x * self.y

return getattr(self.name)

I should have

self.__dict__['z'] = self.x * self.y
return self.__dict__[name]

I tried that, but I get AttributeError: 'A' object has no attribute
'__dict__'.

Aslo, how does this solve the problem that 'name' may not be one of
the attributes that my 'compute' method sets up. Or are you saying
that, if I fixed the previous problem, it would just raise
AttributeError anyway, which is what I would want to happen.

> > Is this ok, or is there a better way?
>
> At the interactive Python prompt:
>
> help(property)
>

See my reply to Phil - I would use property if there was only one
attribute, but there are several.

Thanks

Frank

Marc 'BlackJack' Rintsch

unread,

Jun 11, 2007, 7:35:08 AM6/11/07

to

In <1181559496.9...@p77g2000hsh.googlegroups.com>, Frank Millman
wrote:

> On Jun 11, 12:21 pm, Steven D'Aprano
> <s...@REMOVE.THIS.cybersource.com.au> wrote:
>> > I use __slots__ to catch any invalid attributes, otherwise I would get
>> > a 'maximum recursion depth exceeded' error.
>>
>> That's the wrong solution to that problem. To avoid that problem,
>> __getattr__ should write directly to self.__dict__.
>>
>
> Are you saying that instead of
>
> self.z = self.x * self.y
> return getattr(self.name)
>
> I should have
>
> self.__dict__['z'] = self.x * self.y
> return self.__dict__[name]
>
> I tried that, but I get AttributeError: 'A' object has no attribute
> '__dict__'.

That's because you used `__slots__`. One of the drawbacks of `__slots__`.

Ciao,
Marc 'BlackJack' Rintsch

Steven D'Aprano

unread,

Jun 11, 2007, 7:56:58 AM6/11/07

to

On Mon, 11 Jun 2007 03:58:16 -0700, Frank Millman wrote:

>> By using slots, you're telling Python not to reserve space for a __dict__,
>> which means that your class cannot create attributes on the fly.
>>
>
> I understand that. In fact I was already using slots, as I was
> concerned about the number of 'column' instances that could be created
> in any one program, and wanted to minimise the footprint.

Unless you have thousands and thousands of instances, __slots__ is almost
certainly not the answer. __slots__ is an optimization to minimize the
size of each instance. The fact that it prevents the creation of new
attributes is a side-effect.

> I have since
> read some of caveats regarding slots, but I am not doing anything out
> of the ordinary so I feel comfortable with them so far.
>
>> > I use __slots__ to catch any invalid attributes, otherwise I would get
>> > a 'maximum recursion depth exceeded' error.
>>
>> That's the wrong solution to that problem. To avoid that problem,
>> __getattr__ should write directly to self.__dict__.
>>
>
> Are you saying that instead of
>
> self.z = self.x * self.y
> return getattr(self.name)
>
> I should have
>
> self.__dict__['z'] = self.x * self.y
> return self.__dict__[name]
>
> I tried that, but I get AttributeError: 'A' object has no attribute
> '__dict__'.

Of course you do, because you are using __slots__ and so there is no
__dict__ attribute.

I really think you need to lose the __slots__. I don't see that it really
gives you any advantage.

> Aslo, how does this solve the problem that 'name' may not be one of
> the attributes that my 'compute' method sets up. Or are you saying
> that, if I fixed the previous problem, it would just raise
> AttributeError anyway, which is what I would want to happen.

You haven't told us what the 'compute' method is.

Or if you have, I missed it.

>> > Is this ok, or is there a better way?
>>
>> At the interactive Python prompt:
>>
>> help(property)
>>
>
> See my reply to Phil - I would use property if there was only one
> attribute, but there are several.

Writing "several" properties isn't that big a chore, especially if they
have any common code that can be factored out.

Another approach might be to create a factory-function that creates the
properties for you, so you just need to call it like this:

class MyClass(object):
x = property_maker(database1, tableX, 'x', other_args)
y = property_maker(database2, tableY, 'y', other_args)
# blah blah blah

def property_maker(database, table, name, args):
def getx(self):
return getattr(database[table], name) # or whatever...
def setx(self, value):
setattr(database[table], name, value)
return property(getx, setx, None, "Some doc string")

--
Steven.

Peter Otten

unread,

Jun 11, 2007, 7:44:10 AM6/11/07

to

Frank Millman wrote:

> I tried that, but I get AttributeError: 'A' object has no attribute
> '__dict__'.

That's what you get for (ab)using __slots__ without understanding the
implications ;)

You can instead invoke the __getattr__() method of the superclass:

super(A, self).__getattr__(name)

Peter

Giles Brown

unread,

Jun 11, 2007, 7:46:48 AM6/11/07

to

You could treat the property access like a __getattr__ and use it
to trigger the assignment of instance variables. This would mean that
all future access would pick up the instance variables. Following a
kind
"class variable access causes instance variable creation" pattern
(anyone
know a better name for that?).

You may want to construct a little mechanism that sets up these
properties
(a loop, a list of attribute names, and a setattr on the class?).

If you've got to allow access from multiple threads and aren't happy
that
the calculations being idempotent is going to be sufficient (e.g. if
the calculations are really expensive) then you need some kind of
threading
lock in your (one and only?) lazy loading function.

Ok. Enough lunchtime diversion (I should get some fresh air).

Giles

Frank Millman

unread,

Jun 11, 2007, 8:27:35 AM6/11/07

to

On Jun 11, 1:56 pm, Steven D'Aprano

<s...@REMOVE.THIS.cybersource.com.au> wrote:
>
> Unless you have thousands and thousands of instances, __slots__ is almost
> certainly not the answer. __slots__ is an optimization to minimize the
> size of each instance. The fact that it prevents the creation of new
> attributes is a side-effect.
>

Understood - I am getting there slowly.

I now have the following -

>>> class A(object):

... def __init__(self,x,y):
... self.x = x
... self.y = y
... def __getattr__(self,name):
... print 'getattr',name

... self.compute()
... return self.__dict__[name]
... def compute(self): # compute all missing attributes
... self.__dict__['z'] = self.x * self.y
[there could be many of these]

>>> a = A(3,4)
>>> a.x
3
>>> a.y
4
>>> a.z
getattr z
12
>>> a.z
12
>>> a.q

KeyError: 'q'

The only problem with this is that it raises KeyError instead of the
expected AttributeError.

>
> You haven't told us what the 'compute' method is.
>
> Or if you have, I missed it.
>

Sorry - I made it more explicit above. It is the method that sets up
all the missing attributes. No matter which attribute is referenced
first, 'compute' sets up all of them, so they are all available for
any future reference.

To be honest, it feels neater than setting up a property for each
attribute.

I would prefer it if there was a way of raising AttributeError instead
of KeyError. I suppose I could do it manually -

try:
return self.__dict__[name]
except KeyError:
raise AttributeError,name

Frank

George Sakkis

unread,

Jun 11, 2007, 9:38:08 AM6/11/07

to

I don't see why this all-or-nothing approach is neater; what if you
have a hundred expensive computed attributes but you just need one ?
Unless you know this never happens in your specific situation because
all missing attributes are tightly coupled, properties are a better
way to go. The boilerplate code can be minimal too with an appropriate
decorator, something like:

class A(object):

def __init__(self,x,y):
self.x = x
self.y = y

@cachedproperty

def z(self):
return self.x * self.y

where cachedproperty is

def cachedproperty(func):
name = '__' + func.__name__
def wrapper(self):
try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value
return property(wrapper)

HTH,

George

Frank Millman

unread,

Jun 11, 2007, 10:37:06 AM6/11/07

to

On Jun 11, 3:38 pm, George Sakkis <george.sak...@gmail.com> wrote:
> On Jun 11, 8:27 am, Frank Millman <f...@chagford.com> wrote:
>
>
> > Sorry - I made it more explicit above. It is the method that sets up
> > all the missing attributes. No matter which attribute is referenced
> > first, 'compute' sets up all of them, so they are all available for
> > any future reference.
>
> > To be honest, it feels neater than setting up a property for each
> > attribute.
>
> I don't see why this all-or-nothing approach is neater; what if you
> have a hundred expensive computed attributes but you just need one ?
> Unless you know this never happens in your specific situation because
> all missing attributes are tightly coupled, properties are a better
> way to go.

It so happens that this is my specific situation. I can have a foreign
key column in one table with a reference to a primary key column in
another table. I have for some time now had the ability to set up a
pseudo-column in the first table with a reference to an alternate key
column in the second table, and this requires various attributes to be
set up. I have recently extended this concept where the first table
can have a pseudo-column pointing to a column in the second table,
which is in turn a pseudo-column pointing to a column in a third
table. This can chain indefinitely provided that the end of the chain
is a real column in the final table.

My problem is that, when I create the first pseudo-column, the target
column, also pseudo, does not exist yet. I cannot call it recursively
due to various other complications. Therefore my solution was to wait
until I need it. Then the first one makes a reference to the second
one, which in turn realises that in needs a reference to the third
one, and so on. So it is recursive, but at execution-time, not at
instantiation-time.

Hope this makes sense.

>The boilerplate code can be minimal too with an appropriate
> decorator, something like:
>
> class A(object):
>
> def __init__(self,x,y):
> self.x = x
> self.y = y
>
> @cachedproperty
> def z(self):
> return self.x * self.y
>
> where cachedproperty is
>
> def cachedproperty(func):
> name = '__' + func.__name__
> def wrapper(self):
> try: return getattr(self, name)
> except AttributeError: # raised only the first time
> value = func(self)
> setattr(self, name, value)
> return value
> return property(wrapper)
>

This is very neat, George. I will have to read it a few more times
before I understand it properly - I still have not fully grasped
decorators, as I have not yet had a need for them.

Actually I did spend a bit of time trying to understand it before
posting, and I have a question.

It seems that this is now a 'read-only' attribute, whose value is
computed by the function the first time, and after that cannot be
changed. It would probably suffice for my needs, but how easy would it
be to convert it to read/write?

Thanks

Frank

George Sakkis

unread,

Jun 11, 2007, 11:22:58 AM6/11/07

to

On Jun 11, 10:37 am, Frank Millman <f...@chagford.com> wrote:
> On Jun 11, 3:38 pm, George Sakkis <george.sak...@gmail.com> wrote:
> >The boilerplate code can be minimal too with an appropriate
> > decorator, something like:
>
> > class A(object):
>
> > def __init__(self,x,y):
> > self.x = x
> > self.y = y
>
> > @cachedproperty
> > def z(self):
> > return self.x * self.y
>
> > where cachedproperty is
>
> > def cachedproperty(func):
> > name = '__' + func.__name__
> > def wrapper(self):
> > try: return getattr(self, name)
> > except AttributeError: # raised only the first time
> > value = func(self)
> > setattr(self, name, value)
> > return value
> > return property(wrapper)
>
> This is very neat, George. I will have to read it a few more times
> before I understand it properly - I still have not fully grasped
> decorators, as I have not yet had a need for them.

You never *need* decorators, in the sense it's just syntax sugar for
things you might do without them, but they're handy once you get your
head around them.

> Actually I did spend a bit of time trying to understand it before
> posting, and I have a question.
>
> It seems that this is now a 'read-only' attribute, whose value is
> computed by the function the first time, and after that cannot be
> changed. It would probably suffice for my needs, but how easy would it
> be to convert it to read/write?

It's straightforward, just define a setter wrapper and pass it in the
property along with the getter:

def cachedproperty(func):
name = '__' + func.__name__

def getter(self):

try: return getattr(self, name)
except AttributeError: # raised only the first time
value = func(self)
setattr(self, name, value)
return value

def setter(self, value):
setattr(self, name, value)
return property(getter,setter)

HTH,

George

Frank Millman

unread,

Jun 11, 2007, 11:33:32 AM6/11/07

to

Wonderful - this is very educational for me :-)

Thanks very much

Frank

Steven Bethard

unread,

Jun 11, 2007, 3:55:29 PM6/11/07

to George Sakkis

And, if you don't want to go through the property machinery every time,
you can use a descriptor that only calls the function the first time:

>>> class Once(object):
... def __init__(self, func):
... self.func = func
... def __get__(self, obj, cls=None):
... if obj is None:
... return self
... else:
... value = self.func(obj)
... setattr(obj, self.func.__name__, value)
... return value
...
>>> class A(object):
... def __init__(self, x, y):

... self.x = x
... self.y = y

... @Once
... def z(self):
... print 'calculating z'
... return self.x * self.y
...
>>> a = A(2, 3)
>>> a.z
calculating z
6
>>> a.z
6

With this approach, the first time 'z' is accessed, there is no
instance-level 'z', so the descriptor's __get__ method is invoked. That
method creates an instance-level 'z' so that every other time, the
instance-level attribute is used (and the __get__ method is no longer
invoked).

STeVe

Steven D'Aprano

unread,

Jun 11, 2007, 7:46:07 PM6/11/07

to

Yes, because you never assign __dict__['q'].

>> You haven't told us what the 'compute' method is.
>>
>> Or if you have, I missed it.
>>
>
> Sorry - I made it more explicit above. It is the method that sets up
> all the missing attributes. No matter which attribute is referenced
> first, 'compute' sets up all of them, so they are all available for
> any future reference.

If you're going to do that, why not call compute() from your __init__ code
so that initializing an instance sets up all the attributes? That way you
can remove all the __getattr__ code. Sometimes avoiding the problem is
better than solving the problem.

--
Steven.

Frank Millman

unread,

Jun 12, 2007, 1:35:46 AM6/12/07

to

On Jun 12, 1:46 am, Steven D'Aprano

<s...@REMOVE.THIS.cybersource.com.au> wrote:
>
> >> You haven't told us what the 'compute' method is.
>
> >> Or if you have, I missed it.
>
> > Sorry - I made it more explicit above. It is the method that sets up
> > all the missing attributes. No matter which attribute is referenced
> > first, 'compute' sets up all of them, so they are all available for
> > any future reference.
>
> If you're going to do that, why not call compute() from your __init__ code
> so that initializing an instance sets up all the attributes?

Because, as I have tried to explain elsewhere (probably not very
clearly), not all the information required to perform compute() is
available at __init__ time.

I have gained a lot of valuable advice from this thread, but I do have
a final question.

Every respondent has tried to nudge me away from __getattr__() and
towards property(), but no-one has explained why. What is the downside
of my approach? And if this is not a good case for using
__getattr__(), what is? What kind of situation is it intended to
address?

Thanks

Frank

Steven D'Aprano

unread,

Jun 12, 2007, 7:18:40 AM6/12/07

to

On Mon, 11 Jun 2007 22:35:46 -0700, Frank Millman wrote:

> On Jun 12, 1:46 am, Steven D'Aprano
> <s...@REMOVE.THIS.cybersource.com.au> wrote:
>>
>> >> You haven't told us what the 'compute' method is.
>>
>> >> Or if you have, I missed it.
>>
>> > Sorry - I made it more explicit above. It is the method that sets up
>> > all the missing attributes. No matter which attribute is referenced
>> > first, 'compute' sets up all of them, so they are all available for
>> > any future reference.
>>
>> If you're going to do that, why not call compute() from your __init__ code
>> so that initializing an instance sets up all the attributes?
>
> Because, as I have tried to explain elsewhere (probably not very
> clearly), not all the information required to perform compute() is
> available at __init__ time.

I'm sorry, but this explanation doesn't make sense to me.

Currently, something like this happens:

(1) the caller initializes an instance
=> instance.x = some known value
=> instance.y is undefined
(2) the caller tries to retrieve instance.y
(3) which calls instance.__getattr__('y')
(4) which calls instance.compute()
=> which forces the necessary information to be available
=> instance.__dict__['y'] = some value
(5) finally returns a value for instance.y

Since, as far as I can tell, there is no minimum time between creating the
instance at (1) and trying to access instance.y at (2), there is no
minimum time between (1) and calling compute() at (4), except for the
execution time of the steps between them. So why not just make compute()
the very last thing that __init__ does?

> I have gained a lot of valuable advice from this thread, but I do have
> a final question.
>
> Every respondent has tried to nudge me away from __getattr__() and
> towards property(), but no-one has explained why.

Not me! I'm trying to nudge you away from the entire approach!

> What is the downside of my approach?

It is hard to do at all, harder to do right, more lines of code, more bugs
to fix, slower to write and slower to execute.

> And if this is not a good case for using
> __getattr__(), what is? What kind of situation is it intended to
> address?

Delegation is probably the poster-child for the use of __getattr__. Here's
a toy example: a list-like object that returns itself when you append to
it, without sub-classing.

class MyList:
def __init__(self, *args):
self.__dict__['data'] = list(args)
def __getattr__(self, attr):
return getattr(self.data, attr)
def __setattr__(self, attr, value):
return setattr(self.data, attr, value)
def append(self, value):
self.data.append(value)
return self

--
Steven.

Gabriel Genellina

unread,

Jun 12, 2007, 7:59:26 AM6/12/07

to pytho...@python.org

En Tue, 12 Jun 2007 08:18:40 -0300, Steven D'Aprano
<st...@REMOVE.THIS.cybersource.com.au> escribió:

> On Mon, 11 Jun 2007 22:35:46 -0700, Frank Millman wrote:
>
>> Because, as I have tried to explain elsewhere (probably not very
>> clearly), not all the information required to perform compute() is
>> available at __init__ time.
>
> I'm sorry, but this explanation doesn't make sense to me.
>
> Currently, something like this happens:
>
> (1) the caller initializes an instance
> => instance.x = some known value
> => instance.y is undefined
> (2) the caller tries to retrieve instance.y
> (3) which calls instance.__getattr__('y')
> (4) which calls instance.compute()
> => which forces the necessary information to be available
> => instance.__dict__['y'] = some value
> (5) finally returns a value for instance.y
>
> Since, as far as I can tell, there is no minimum time between creating
> the
> instance at (1) and trying to access instance.y at (2), there is no
> minimum time between (1) and calling compute() at (4), except for the
> execution time of the steps between them. So why not just make compute()
> the very last thing that __init__ does?

As far as I understand what the OP said, (2) may never happen. And since
(4) is expensive, it is avoided until it is actually required.

--
Gabriel Genellina

Gabriel Genellina

unread,

Jun 12, 2007, 7:59:26 AM6/12/07

to pytho...@python.org

En Tue, 12 Jun 2007 08:18:40 -0300, Steven D'Aprano
<st...@REMOVE.THIS.cybersource.com.au> escribió:

> On Mon, 11 Jun 2007 22:35:46 -0700, Frank Millman wrote:
>
>> Because, as I have tried to explain elsewhere (probably not very
>> clearly), not all the information required to perform compute() is
>> available at __init__ time.
>
> I'm sorry, but this explanation doesn't make sense to me.
>
> Currently, something like this happens:
>
> (1) the caller initializes an instance
> => instance.x = some known value
> => instance.y is undefined
> (2) the caller tries to retrieve instance.y
> (3) which calls instance.__getattr__('y')
> (4) which calls instance.compute()
> => which forces the necessary information to be available
> => instance.__dict__['y'] = some value
> (5) finally returns a value for instance.y
>
> Since, as far as I can tell, there is no minimum time between creating
> the
> instance at (1) and trying to access instance.y at (2), there is no
> minimum time between (1) and calling compute() at (4), except for the
> execution time of the steps between them. So why not just make compute()
> the very last thing that __init__ does?

As far as I understand what the OP said, (2) may never happen. And since

Frank Millman

unread,

Jun 12, 2007, 11:53:11 AM6/12/07

to

On Jun 12, 1:18 pm, Steven D'Aprano

I wrote a long reply to this about an hour ago, but Google Groups
seems to have eaten it. I hope I can remember what I wrote.

This is more like what I am doing -

(t=table, c=column, p=pseudo column)

(1) the caller initializes table t1 and columns c1-c10
(2) the caller initializes table t2 and columns c11-c20
(3) t2.__init__() creates a link to t1, and updates t1 with a link to
t2
(4) t2.__init__() creates a link from c12 to c3, and updates c3 with a
link to c12
(5) t2.__init__() creates pseudo column p1 on table t1, creates a link
from c14 to p1, updates p1 with a link to c14

This all works well, and has been working for some time.

You can already see a difference between your scenario and mine.
Various attributes are set up *after* the original __init__() method
has completed.

I have now added a complication.

I want to create a t3 instance, with columns c21-c30, and I want to
create a pseudo column p2 on table t2, exactly as I did in steps 2 to
5 above. I also want to change step 5 so that instead of linking p1 on
table 1 to c14 on table 2, I link it to p2 on table 2. However, at
that point, p2 does not exist.

I hope that describes the problem a bit better. I'm going to leave it
at that for now, as I am getting a glimmer of an idea as to how I can
refactor this. I will tackle it again in the morning when I am feeling
fresh, and will report back then.

Frank

Steven D'Aprano

unread,

Jun 12, 2007, 7:24:58 PM6/12/07

to

On Tue, 12 Jun 2007 08:53:11 -0700, Frank Millman wrote:

>> Since, as far as I can tell, there is no minimum time between creating the
>> instance at (1) and trying to access instance.y at (2), there is no
>> minimum time between (1) and calling compute() at (4), except for the
>> execution time of the steps between them. So why not just make compute()
>> the very last thing that __init__ does?
>>
>
> I wrote a long reply to this about an hour ago, but Google Groups
> seems to have eaten it. I hope I can remember what I wrote.
>
> This is more like what I am doing -
>
> (t=table, c=column, p=pseudo column)
>
> (1) the caller initializes table t1 and columns c1-c10
> (2) the caller initializes table t2 and columns c11-c20
> (3) t2.__init__() creates a link to t1, and updates t1 with a link to
> t2
> (4) t2.__init__() creates a link from c12 to c3, and updates c3 with a
> link to c12
> (5) t2.__init__() creates pseudo column p1 on table t1, creates a link
> from c14 to p1, updates p1 with a link to c14
>
> This all works well, and has been working for some time.

Ah, if I had ever read that there were two instances involved, I hadn't
noticed. Sorry!

> You can already see a difference between your scenario and mine.
> Various attributes are set up *after* the original __init__() method
> has completed.

By "original" I guess you mean t1.

I also assume that both t1 and t2 are instances of the same Table class.

Here are some thoughts:

- Don't ask the caller to initiate t1 and t2 (steps 1 and 2 above).
Instead, write a function which does all the initiation (steps 1 through
5) and returns a tuple (t1, t2). That way, your initiate function
("make_tables" perhaps?) can do all the setup needed before the caller
starts using either t1 or t2.

- How does t2 know about t1? As a named global variable? If so, there is
probably a better way, maybe something like this:

class Table(object):
def __init__(self, start_column, end_column, sibling=None):
self.columns = []
for i in range(start_column, end_column):
self.columns.append(get_a_column(i))
self.sibling = sibling
if sibling is not None:
# I am the sibling of my sibling
sibling.sibling = self
# make links between columns
self.columns[12].make_link(sibling.columns[3])
sibling.columns[3].make_link(self.columns[12])
# create pseudo-columns (whatever they are!)
sibling.p1 = contents_of_pseudo_column()
self.columns[14].make_link(sibling.p1)
sibling.p1.make_link(self.columns[14])

Now you call them like this:

t1 = Table(1, 11)
t2 = Table(11, 21, t1)
# and now everything is initiated and ready to use...

- If both tables t1 and t2 need to exist, it should be an error to use t1
without creating t2, or vice versa. An easy check for that will be:

if self.sibling is None: raise TableError('no sibling')

- Most importantly... I hope you are getting some benefit from all this
extra work needed to support splitting the columns across two instances.

> I have now added a complication.
>
> I want to create a t3 instance, with columns c21-c30, and I want to
> create a pseudo column p2 on table t2, exactly as I did in steps 2 to
> 5 above. I also want to change step 5 so that instead of linking p1 on
> table 1 to c14 on table 2, I link it to p2 on table 2. However, at
> that point, p2 does not exist.

Perhaps each table needs two siblings, a left and a right. Should it be
circular? As in t1 <-> t2 <-> t3 <-> t1. Or perhaps your requirement is
that each table must have _at least_ one sibling, but not necessarily two.

Either way, changing the magic constants 3, 12 and 14 above into either
arguments or calculated values should allow you to make the code general
enough to have any number of tables.

Another suggestion: factor out the "make these two tables siblings" bits
out of the __init__, so you can do this:

def create_tables():
t1 = Table(1, 11)
t2 = Table(11, 21)
t3 = Table(21, 31)
make_sibling(t1, t2)
make_sibling(t2, t3)
return (t1, t2, t3)

t1, t2, t3 = create_Tables()

As before, it is an error to access the data in a half-initiated table.
But since the caller is not expected to create table instances themselves,
but only call the create_table() function, that is never a problem.

--
Steven.

Frank Millman

unread,

Jun 13, 2007, 5:15:11 AM6/13/07

to

On Jun 13, 1:24 am, Steven D'Aprano

<s...@REMOVE.THIS.cybersource.com.au> wrote:
> On Tue, 12 Jun 2007 08:53:11 -0700, Frank Millman wrote:
>
> Ah, if I had ever read that there were two instances involved, I hadn't
> noticed. Sorry!
>

No problem - I really appreciate your input.

I snipped the rest of your post, as I have figured out a way to make
my problem go away, without changing my code radically.

It was a 'sequence of events' problem.

1. create table a
2. create table b, create pseudo column on table a, link to column on
table b
3. create table c, create pseudo column on table b, link to column on
table c

This works most times, but if, in step 2, I want to link the pseudo
column in table a to the pseudo column in table b, it fails, as the
pseudo column in table b does not get created until step 3.

Now, if I try to link to a column that does not exist, I set a flag
and carry on. When I go to the next step, after creating the pseudo
column, I check if the flag exists, and if so I go back and create the
link that it tried to create in the first place. It is not very
pretty, but it has the big advantage that, by the time all the tables
are opened, all the links are correctly set up, and I never have the
problem of trying to access non-existent attributes.

I will try to explain what benefit all this gives me.

Assume a Customers table and an Invoices table. Customers has a
primary key which is a generated 'next number', and an alternate key
which is the Customer Code. Users will never see the primary key, only
the alternate key.

Invoices has a foreign key CustomerId, which references the primary
key of Customers. However, when capturing an invoice, the user wants
to enter the code, not the primary key.

It is not difficult to program for this, but it is tedious to do it on
every data-entry form. Therefore, in my framework, I set up a pseudo
column on Invoices called CustomerCode, which the application
programmer can use as if it is a real column. Behind the scenes, I
automatically use that to check that it is a valid code, and populate
the real column with the primary key. This has been working for some
time, and really simplifies the 'business logic' side of things by
abstracting a common idiom which would otherwise have to be coded
explicitly every time.

Now I have added a complication. I have decided to implement the idea
of using a single table to store details of all parties with whom we
have a relationship, such as Customers, Suppliers, Agents, etc,
instead of separate tables each with their own Code, Name, and Address
details.

Therefore I now have the following- a Parties table with a 'next
number' primary key and a PartyCode alternate key, a Customers table
with a 'next number' primary key and a PartyId foreign key reference
to the Parties table, and an Invoices table with a CustomerId foreign
key reference to the Customers table.

Now when capturing an invoice, the user enters a Code, then the
program must check that the code exists on Parties, retrieve the
primary key, then check that it exists on Customers, retrieve that
primary key, then store the result on Invoices, with appropriate error
messages if any step fails. I have successfully abstracted all of
that, so all that complication is removed from the application.

Hope that makes sense.

Thanks very much for all your attempts to help me, Steven. You have
succeeded in getting me to think properly about my problem and come up
with a much cleaner solution. I really appreciate it.

Frank

Steven D'Aprano

unread,

Jun 13, 2007, 10:42:31 AM6/13/07

to

On Wed, 13 Jun 2007 02:15:11 -0700, Frank Millman wrote:

> Thanks very much for all your attempts to help me, Steven. You have
> succeeded in getting me to think properly about my problem and come up
> with a much cleaner solution. I really appreciate it.

Glad to be of help.

--
Steven.