Best way to ensure user calls methods in correct order?

Thomas Nyberg

unread,

Jun 22, 2017, 10:03:15 AM6/22/17

to

Hello,

I have a situation in which I want a user to call methods in a certain
order and to force the re-calling of methods "down-stream" if upstream
methods are called again. An example of this sort of thing would be a
pipeline where calling methods again invalidates the results of methods
called afterwards and you'd like to warn the user of that.

Here is a very simplified example:

ordered_consistency.py
-----------------------------------------------
class ConsistencyError(Exception):
pass

class C:
def __init__(self):
self._a_dirty = self._b_dirty = self._c_dirty = True

def a(self):
self._a_dirty = self._b_dirty = self._c_dirty = True
print("Calling a()...")
self._a_dirty = False

def b(self):
if self._a_dirty:
raise ConsistencyError("Re-run a() before calling b()!")
self._b_dirty = self._c_dirty = True
print("Calling b()...")
self._b_dirty = False

def c(self):
if self._b_dirty:
raise ConsistencyError("Re-run b() before calling c()!")
self._c_dirty = True
print("Calling c()...")
self._c_dirty = False

def d(self):
if self._c_dirty:
raise ConsistencyError("Re-run c() before calling d()!")
print("Calling d()...")

c = C()
# This is fine:
c.a()
c.b()
c.c()
c.d()
# This is also fine (with same class instantiation!)
c.c()
c.d()
# This throws an error:
c.b()
c.d()
-----------------------------------------------

Here's what you get when calling it:
-----------------------------------------------
$ python3 ordered_methods.py
Calling a()...
Calling b()...
Calling c()...
Calling d()...
Calling c()...
Calling d()...
Calling b()...
Traceback (most recent call last):
File "ordered_methods.py", line 43, in <module>
c.d()
File "ordered_methods.py", line 29, in d
raise ConsistencyError("Re-run c() before calling d()!")
__main__.ConsistencyError: Re-run c() before calling d()!
-----------------------------------------------

My solution seems to work, but has a large amount of repetition and
errors will certainly be introduced the first time anything is changed.
I have the following questions:

1) Most importantly, am I being stupid? I.e. is there some obviously
better way to handle this sort of thing?
2) Is there an out of the box solution somewhere that enforces this
that I've never seen before?
3) If not, can anyone here see some sort of more obvious way to do this
without all the repetition in dirty bits, error messages, etc.?

I presume a decorator could be used to do some sort of "order
registration", but I figure I might as well ask before I re-invent a
hexagonal wheel. Thanks a lot for any help!

Cheers,
Thomas

Peter Otten

unread,

Jun 22, 2017, 10:31:42 AM6/22/17

to

If the order is linear like it seems to be the following might work:

$ cat inorder.py
import itertools

class ConsistencyError(Exception):
pass

class InOrder:
def __init__(self):
self.indices = itertools.count(1)

def __call__(self, method):
index = next(self.indices)

def wrapper(self, *args, **kw):
print("index", index, "state", self.state)
if index - self.state > 1:
raise ConsistencyError
result = method(self, *args, **kw)
self.state = index
return result

return wrapper

class C:
def __init__(self):
self.state = 0

inorder = InOrder()

@inorder
def a(self):
print("Calling a()...")

@inorder
def b(self):
print("Calling b()...")

@inorder
def c(self):
print("Calling c()...")

@inorder
def d(self):
print("Calling d()...")
$ python3 -i inorder.py
>>>
>>> c = C()
>>> c.a()
index 1 state 0
Calling a()...
>>> c.b()
index 2 state 1
Calling b()...
>>> c.a()
index 1 state 2
Calling a()...
>>> c.c()
index 3 state 1

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "inorder.py", line 18, in wrapper
raise ConsistencyError
__main__.ConsistencyError
>>> c.b()
index 2 state 1
Calling b()...
>>> c.c()
index 3 state 2
Calling c()...

Steve D'Aprano

unread,

Jun 22, 2017, 10:40:29 AM6/22/17

to

On Thu, 22 Jun 2017 11:53 pm, Thomas Nyberg wrote:

> I have a situation in which I want a user to call methods in a certain
> order and to force the re-calling of methods "down-stream" if upstream
> methods are called again.

Don't do that. It's fragile and an anti-pattern. Your methods have too much
coupling. If c() relies on b() being called first, then either b() or c()
aren't good methods. They don't do enough:

- calling b() alone doesn't do enough, so you have to call c() next to get the
job done;

- calling c() alone doesn't do enough, because it relies on b() being called
first.

There are a very few exceptions to this rule of thumb, such as opening
connections to databases or files or similar. They are mostly enshrined from
long practice, or justified by low-level APIs (that's how file systems and
databases work, and it's not practical to change that). But you should try very
hard to avoid creating new examples.

Of course, I'm talking about your *public* methods. Private methods can be a bit
more restrictive, since it's only *you* who suffers if you do it wrong.

Ideally, your methods should be written in a functional style with as little
shared state as possible. (Shared state introduces coupling between components,
and excessive coupling is *the* ultimate evil in programming.) With no shared
state, your methods are trivial:

def a(self, arg):
return something

def b(self, arg):
# relies on a
x = self.a(arg)
return process(x)

def c(self, arg):
# relies on b
x = self.b(arg)
return process(x)

With shared state, it becomes more horrible, but at least your users are shared
the pain. (*You* on the other hand, are not -- that's your punishment for
writing in a Java-like style with lots of shared state.)

def a(self):
self.data = something
self.step = 1

def b(self):
# relies on a
assert self.step >= 0
if self.step == 0:
self.a()
if self.step == 1:
self.data = process(self.data)
self.step = 2
else:
assert self.step > 1
# do nothing?
# or raise?

def c(self):
# relies on b
assert self.step >= 0
if self.step < 2:
self.b()
if self.step == 2:
self.data = process(self.data)
self.step = 3
else:
assert self.step > 1
# do nothing?
# or raise?

This is messy enough if there is only a single chain of correct calls:

# you must call a, b, c, d in that order

If you can call the methods in lots of different ways:

# you can call a, b, c, d;
# or a, b, d;
# or a, d, e, f
# or g, d, e
# ...

it becomes horrible. In that case, I'd say just document what the methods do and
let the user deal with the mess.

> 1) Most importantly, am I being stupid? I.e. is there some obviously
> better way to handle this sort of thing?

I wouldn't say "stupid" as such, but since you used the word, yes :-)

> 2) Is there an out of the box solution somewhere that enforces this
> that I've never seen before?
> 3) If not, can anyone here see some sort of more obvious way to do
> this without all the repetition in dirty bits, error messages, etc.?

If you want the user to call methods a, b, c, d in that order, provide then with
a single method "run" that calls a, b, c, d in that order, and tell them to
call "run".

--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

Thomas Nyberg

unread,

Jun 22, 2017, 11:29:09 AM6/22/17

to

Thanks for the response! I see the reasoning, but I can't entirely
square it with what I'm thinking. Probably it's either because I was
hiding some of my intention (sorry if I wasted your time due to lack of
details) or that I'm naive and regardless of intention doing something
silly.

On 06/22/2017 04:40 PM, Steve D'Aprano wrote:
> On Thu, 22 Jun 2017 11:53 pm, Thomas Nyberg wrote:
>
> Don't do that. It's fragile and an anti-pattern. Your methods have too much
> coupling. If c() relies on b() being called first, then either b() or c()
> aren't good methods. They don't do enough:
>
> - calling b() alone doesn't do enough, so you have to call c() next to get the
> job done;
>
> - calling c() alone doesn't do enough, because it relies on b() being called
> first.
>
>
> There are a very few exceptions to this rule of thumb, such as opening
> connections to databases or files or similar. They are mostly enshrined from
> long practice, or justified by low-level APIs (that's how file systems and
> databases work, and it's not practical to change that). But you should try very
> hard to avoid creating new examples.
>
> Of course, I'm talking about your *public* methods. Private methods can be a bit
> more restrictive, since it's only *you* who suffers if you do it wrong.
>
> Ideally, your methods should be written in a functional style with as little
> shared state as possible. (Shared state introduces coupling between components,
> and excessive coupling is *the* ultimate evil in programming.)
>

This makes perfect sense in general. An important detail I left out
earlier was what exactly I meant by "users" of my code. Really all I
meant was myself calling this code as a library and avoiding making
mistakes. Currently what I'm doing is kind of a data storage, data
processing/conversion, data generation pipeline. This different
variables at the different steps in the pipeline are specified in a
script. The functions in this script have _no coupling at all_ for all
the reasons you stated. In fact, the only reason I'm introducing a class
is because I'm trying to force their ordering in the way I described.

The ultimate goal here is to instead put a basic http server wrapper
around the whole process. The reason I want to do this is to allow
others (internal to the company) to make changes in the process
interactively at different steps in the process (the steps are "natural"
for the problem at hand) without editing the python scripts explicitly.

Of course I could force them to execute everything in one go, but due to
the time required for the different steps, it's nicer to allow for some
eye-balling of results (and possible changes and re-running) before
continuing. Before this the main way of enforcing this was all done by
the few of use editing the scripts, but now I'd like to make it an error
for corrupted data to propagate down through the pipeline. Having a
run_all() method or something similar will definitely exist (usually
once you figure out the "right" parameters you don't change them much on
re-runs), but having things broken apart is still very nice.

Would you still say this is a bad way of going about it? Thanks for the
feedback. It's very helpful.

Cheers,
Thomas

Thomas Nyberg

unread,

Jun 22, 2017, 11:32:54 AM6/22/17

to

[I accidentally sent this only to Peter, but wanted instead to send it
to the list for anyone else who happens upon it...now sending to the
list...]

On 06/22/2017 04:31 PM, Peter Otten wrote:
> If the order is linear like it seems to be the following might work:

Thanks a lot! This removes the duplication that was bothering me. To get
the error messages I wanted, I changed the code minorly as follows:

------------------------------------------

import itertools

class ConsistencyError(Exception):
pass

class InOrder:
def __init__(self):
self.indices = itertools.count(1)

self.prior_method_name = ""

def __call__(self, method):
index = next(self.indices)

prior_method_name = self.prior_method_name
method_name = method.__name__
self.prior_method_name = method_name

def wrapper(this, *args, **kw):
if index - this.state > 1:
msg = "call {}() before calling {}()".format(
prior_method_name, method_name)
raise ConsistencyError(msg)
result = method(this, *args, **kw)
this.state = index

return result

return wrapper

class C:
def __init__(self):
self.state = 0

inorder = InOrder()

@inorder
def a(self):
print("Calling a()...")

@inorder
def b(self):
print("Calling b()...")

@inorder
def c(self):
print("Calling c()...")

@inorder
def d(self):
print("Calling d()...")

------------------------------------------

Now of course I'll have to consider Steve's response and wonder _why_
exactly I'm doing this...

Cheers,
Thomas

Gregory Ewing

unread,

Jun 23, 2017, 1:44:19 AM6/23/17

to

Steve D'Aprano wrote:
> There are a very few exceptions to this rule of thumb, such as opening
> connections to databases or files or similar.

Another way to handle that is to have the connection method
return another object that has the methods that should only
be called for active connections. That way it's not possible
to do things out of sequence.

--
Greg

Neil Cerutti

unread,

Jun 23, 2017, 2:28:46 PM6/23/17

to

It's like a bidirectional iterator in C++, except in reverse it's
random access. An iterator that can't easily be modeled with a
generator in Python is going to feel awkward.

--
Neil Cerutti

Thomas Jollans

unread,

Jun 24, 2017, 5:53:55 AM6/24/17

to

If a() does some processing, and then b() does something else to the
result of a(), then the natural way of calling the functions is probably
c(b(a(initial_data))), rather than a sequence of method or function
calls that hide some internal state. If the user wants to jump in and
look at what's going on, they can:

a_result = a()
# peruse the data
b_result = b(a_result)
# have some coffee
c_result = c(b_result)
# and so on.

If the data is modified in-place, maybe it makes sense to to use a class
like the one you have, but then it's a bit bizarre to make your methods
create an artificial side effect (self._a_dirty) - why don't you simply
check for the actual effect of a(), whatever it is. If a() did some
calculations and added a column to your dataset with the results, check
for the existence of that column in b()! If calling b() invalidates some
calculation results generated in c(), delete them! (In the functional
setup above, b() would refuse to process a dataset that it has already
processed)

-- Thomas

Thomas Nyberg

unread,

Jun 24, 2017, 9:00:18 AM6/24/17

to

On 06/24/2017 11:53 AM, Thomas Jollans wrote:
> If the data is modified in-place, maybe it makes sense to to use a class
> like the one you have, but then it's a bit bizarre to make your methods
> create an artificial side effect (self._a_dirty) - why don't you simply
> check for the actual effect of a(), whatever it is. If a() did some
> calculations and added a column to your dataset with the results, check
> for the existence of that column in b()! If calling b() invalidates some
> calculation results generated in c(), delete them! (In the functional
> setup above, b() would refuse to process a dataset that it has already
> processed)
>
> -- Thomas
>
>

Thanks I like this a lot! This does fit in well with my setup and I
think it probably is the cleanest approach. It also makes it pretty
obvious to tell where in the process you are by just looking at the data
(though this is more of a convenience for myself than anyone else since
the internal steps are hidden).

Thanks a lot!

Cheers,
Thomas

Jugurtha Hadjar

unread,

Jul 1, 2017, 6:49:27 PM7/1/17

to

A few questions: It looks to me you have a problem that looks a lot like
plant
processes where a process' output is another process' input.

Q1: Am I correct in describing your problem as follows?

input ---> |process a|---> |process b| ---> |process c| ---> output

Q2: Does every process change the type of its input to something not
produced
by other processes and not accepted by other processes? i.e: if the first
input is a list:

- Does |process a| accept a list and output a dict.
- Does |process b| accept a dict and output a tuple.
- Does |process c| accept a tuple and output a set.

You can do the following:

def pipeline(obj):
mapping = {
list: lambda: pipeline(a(obj)),
dict: lambda: pipeline(b(obj)),
tuple: lambda: c(obj),
}
return mapping[type(obj)]()

This way, calling process on a list will:
- Call a on it --> dict, then call pipeline on that dict which will:
- Call b on it --> tuple, then call pipeline on that tuple which will:
- Call c on it --> set, then will return that (last step).

You can also do it this way:

def pipeline(obj):
mapping = {
list: lambda: pipeline(a(obj)),
dict: lambda: pipeline(b(obj)),
tuple: lambda: pipeline(c(obj)),
set: lambda: obj,
}
return mapping[type(obj)]()

However, if you call pipeline on the result of say, b (tuple).. Then it will
call c on that object, which gives a set, then call pipeline on that set,
which returns the set unchanged..

In other words, whatever the step your data is, calling pipeline on it will
automatically place it at the processing step.

This is for the first case.. If however you can't discriminate based on
type,
you can create your own "types" with custom classes or namedtuples.

If `a` accepts a list and outputs a list, and b accepts a list and outputs a
list, and c accepts a list and outputs a list...

You can just do:

class OutputA(list):
"""Result of a list processed by `a`."""

class OutputB(list):
"""Result of OutputA processed by `b`."""

class OutputC(list):
"""Result of OutputB processed by `c`."""

def pipeline(obj):
mapping = {
list: lambda: pipeline(a(obj)),
OutputA: lambda: pipeline(b(obj)),
OutputB: lambda: pipeline(c(obj)),
OutputC: lambda: obj,
}
return mapping[type(obj)]()

You'll just have to change your functions like this:

def a(list_obj):
...
_a_result = the_resulting_list_you_are_about_to_return
return OutputA(_a_result)

def b(list_from_a):
_b_result = also_a_list_you_were_returning
return OutputB(_b_result)

def c(list_from_b):
_c_result = you_get_the_point
return OutputC(_c_result)

--
~Jugurtha Hadjar,