Improving ModelStorage.write

113 views
Skip to first unread message

Cédric Krier

unread,
Jun 19, 2013, 8:09:44 AM6/19/13
to tryton
Hi,

In the same idea as the changeset [1], I would like to propose this
change for the write method:

@classmethod
def write(cls, records, values):


=>

@classmethod
def write(cls, records, values, *args):
assert not len(args) % 2
all_records = []
actions = iter((records, values) + args)
for records, values in zip(actions, actions):

all_records += records

cls._validate(all_records)


This has two advantages:

- call validate for a larger set of records which will benefit of
the prefetching
- reduce the number of RPC calls


PS: of course the write action in xxx2Many will follow.

http://hg.tryton.org/trytond/rev/c7804c288ca3
--
Cédric Krier

B2CK SPRL
Rue de Rotterdam, 4
4000 Liège
Belgium
Tel: +32 472 54 46 59
Email/Jabber: cedric...@b2ck.com
Website: http://www.b2ck.com/

Guillem Barba Domingo

unread,
Jun 19, 2013, 10:00:41 AM6/19/13
to Tryton Dev



2013/6/19 Cédric Krier <cedric...@b2ck.com>

Hi,

In the same idea as the changeset [1], I would like to propose this
change for the write method:

sounds good
 
    @classmethod
    def write(cls, records, values):
        …

    =>

    @classmethod
    def write(cls, records, values, *args):
        assert not len(args) % 2
        all_records = []
        actions = iter((records, values) + args)
        for records, values in zip(actions, actions):
            …
            all_records += records

        cls._validate(all_records)

I don't understand the treatment of 'actions'. Which extra arguments can be expected?

This has two advantages:

    - call validate for a larger set of records which will benefit of
      the prefetching
    - reduce the number of RPC calls

The only cons I found is if you want to set a value (the same) to a lot of records (for example, set the state of a list of 'order' records after processed their lines.
But I understand that there are more cases where it is an improvement.

It could be possible to get this behaviour using 'active record'?
An usage example:

    for move in todo_moves:
        move.quantity += 3
        ...
    save(todo_moves)

No more wishes for now ;-)
--
Guillem Barba
http://www.guillem.alcarrer.net

Cédric Krier

unread,
Jun 19, 2013, 10:40:45 AM6/19/13
to Tryton Dev
On 19/06/13 16:00 +0200, Guillem Barba Domingo wrote:
> 2013/6/19 Cédric Krier <cedric...@b2ck.com>
>
> > Hi,
> >
> > In the same idea as the changeset [1], I would like to propose this
> > change for the write method:
> >
>
> sounds good
>
>
> > @classmethod
> > def write(cls, records, values):
> > …
> >
> > =>
> >
> > @classmethod
> > def write(cls, records, values, *args):
> > assert not len(args) % 2
> > all_records = []
> > actions = iter((records, values) + args)
> > for records, values in zip(actions, actions):
> > …
> > all_records += records
> >
> > cls._validate(all_records)
> >
>
> I don't understand the treatment of 'actions'. Which extra arguments can be
> expected?

The same as two first:

write(cls, records, values, records, values, …)

> This has two advantages:
> >
> > - call validate for a larger set of records which will benefit of
> > the prefetching
> > - reduce the number of RPC calls
> >
>
> The only cons I found is if you want to set a value (the same) to a lot of
> records (for example, set the state of a list of 'order' records after
> processed their lines.

Just use the write as now.

> But I understand that there are more cases where it is an improvement.
>
> It could be possible to get this behaviour using 'active record'?
> An usage example:
>
> for move in todo_moves:
> move.quantity += 3
> ...
> save(todo_moves)

Yes from a One2Many if we take care of it.
Otherwise the code will need to be written like this:

Move.write(*chain(*(([m], m._save_values) for m in todo_moves)))

Albert Cervera i Areny

unread,
Jun 20, 2013, 4:37:48 AM6/20/13
to tryto...@googlegroups.com
El dimecres 19 de juny de 2013 14:09:44 UTC+2, Cédric Krier va escriure:

Hi,

In the same idea as the changeset [1], I would like to propose this
change for the write method:

    @classmethod
    def write(cls, records, values):
        …

    =>

    @classmethod
    def write(cls, records, values, *args):
        assert not len(args) % 2
        all_records = []
        actions = iter((records, values) + args)
        for records, values in zip(actions, actions):
            …
            all_records += records

        cls._validate(all_records)

I've been thinking in something in this line since the create with lists implementation. The main issue I see is yet another important API change, which we wanted to minimize.

I know it has some implementation issues but I also had another idea in mind which maybe somebody has some ideas to improve and that it may be less aggressive in terms of API changes. The idea is to implement deferred validation in the same way a database implements deferred constraints. Something like this:

with Transaction().set_deferred():
    for record in records:
        record.field = random.random()
        record.save()
 
The checks would be executed on all modified records when exiting the with statement.

Probably the main issue is that in case of a failure of the validation, it may be harder to find where and when the value was set incorrectly, but that's why that should be used in loops and places where the developer knows beforehand that it can take advantage of deferring the validation.

One advantage is that it can defer validation in a loop of "create" or "write" (for example, an import process which checks for the existance of a record and creates it or updates it accordingly).

Jean C

unread,
Jun 20, 2013, 5:02:03 AM6/20/13
to tryto...@googlegroups.com
I know it has some implementation issues but I also had another idea in mind which maybe somebody has some ideas to improve and that it may be less aggressive in terms of API changes. The idea is to implement deferred validation in the same way a database implements deferred constraints. Something like this:

with Transaction().set_deferred():
    for record in records:
        record.field = random.random()
        record.save()
 
The checks would be executed on all modified records when exiting the with statement.

Probably the main issue is that in case of a failure of the validation, it may be harder to find where and when the value was set incorrectly, but that's why that should be used in loops and places where the developer knows beforehand that it can take advantage of deferring the validation.

One advantage is that it can defer validation in a loop of "create" or "write" (for example, an import process which checks for the existance of a record and creates it or updates it accordingly).

I think this might be a problem. If a given record is not valid, you want to know it before going on with what
may be a costly operation.

Even though it would be a choice of the developer to use it or not, it seems rather dangerous to me.

Jean CAVALLO
Coopengo

Cédric Krier

unread,
Jun 20, 2013, 5:05:02 AM6/20/13
to tryto...@googlegroups.com
On 20/06/13 01:37 -0700, Albert Cervera i Areny wrote:
> El dimecres 19 de juny de 2013 14:09:44 UTC+2, Cédric Krier va escriure:
>
> > Hi,
> >
> > In the same idea as the changeset [1], I would like to propose this
> > change for the write method:
> >
> > @classmethod
> > def write(cls, records, values):
> > …
> >
> > =>
> >
> > @classmethod
> > def write(cls, records, values, *args):
> > assert not len(args) % 2
> > all_records = []
> > actions = iter((records, values) + args)
> > for records, values in zip(actions, actions):
> > …
> > all_records += records
> >
> > cls._validate(all_records)
> >
>
> I've been thinking in something in this line since the create with lists
> implementation. The main issue I see is yet another important API change,
> which we wanted to minimize.

Except it is a backward compatible changes but of course code that
extend write method will need to be updated.

> I know it has some implementation issues but I also had another idea in
> mind which maybe somebody has some ideas to improve and that it may be less
> aggressive in terms of API changes. The idea is to implement deferred
> validation in the same way a database implements deferred constraints.
> Something like this:
>
> with Transaction().set_deferred():
> for record in records:
> record.field = random.random()
> record.save()
>
> The checks would be executed on all modified records when exiting the with
> statement.

Ok but you will still have one UPDATE query per iteration.
Such cases are not very common and most of the time we get some calls the
ModelStorage.write for each cases. So I like the idea to still be able
to work easily with classmethod on a bunch of records.
Moreover you can not use your context manager on RPC calls.
Also what will happen in case of nested deferred context?
And finally, if after the save of one "invalid" record, you could have
some piece of code that will be triggered and make decision based on
"invalid" data which could result to a weird exception.

> Probably the main issue is that in case of a failure of the validation, it
> may be harder to find where and when the value was set incorrectly, but
> that's why that should be used in loops and places where the developer
> knows beforehand that it can take advantage of deferring the validation.

But such assumption doesn't work well in the modular context of Tryton.
You can not be sure of what will run inside your loop.

> One advantage is that it can defer validation in a loop of "create" or
> "write" (for example, an import process which checks for the existance of a
> record and creates it or updates it accordingly).

Yes but it is just at best an improvement of factor 2 no more. And for
large bunch of records, it will just be none.

Albert Cervera i Areny

unread,
Aug 5, 2013, 3:02:34 PM8/5/13
to tryto...@googlegroups.com
A Dimecres 19 Juny 2013 14:09:44, Cédric Krier va escriure:
> Hi,
>
> In the same idea as the changeset [1], I would like to propose this
> change for the write method:
>
> @classmethod
> def write(cls, records, values):
> …
>
> =>
>
> @classmethod
> def write(cls, records, values, *args):
> assert not len(args) % 2
> all_records = []
> actions = iter((records, values) + args)
> for records, values in zip(actions, actions):
> …
> all_records += records
>
> cls._validate(all_records)

For consistency and performance, copy() could also use the same syntax. At
least stock_split module would benefit from the performance gain.

--
Albert Cervera i Areny
Consultor funcional
Tel. 93 553 18 03
@albertnan
www.NaN-tic.com

Sergi Almacellas Abellana

unread,
Nov 23, 2013, 2:45:36 PM11/23/13
to tryto...@googlegroups.com
Hi All,

As Albert suggested on [1] i think that copy may also be improved in the
line of [2].

This is difficult as a the default parameter is not mandatory.

As copy is a simplification of creating new records I will suggest the
following improvement:

def copy(cls, records, default=None, create=True):
....
to_create = []
for data in datas:
data = convert_data(field_defs, data)
to_create.append(data)
if not create:
return to_create
new_records = cls.create(to_create)
....

So when using copy inside a loop can be used in the following way:

to_create = []
for i in range(1, 1000):
to_create.extend(cls.copy([record], default, False))
cls.create(to_create)

I've done some benchmarks on stock_split module. I split a move of
quantity 50 into 50 moves of quantity one.

The results (in seconds) are the following:

Without patch: Elapsed: 2.81604099274
With patch: Elapsed: 1.34554600716

I can share the patches if anyone is interested.

Any toughs?


[1]https://groups.google.com/d/msg/tryton-dev/PU_P_Z8wjoE/uWcJE8mwHzEJ

[2] http://codereview.tryton.org/1641003/

--
Sergi Almacellas Abellana
www.koolpi.com
Twitter: @pokoli_srk

Cédric Krier

unread,
Nov 23, 2013, 2:49:40 PM11/23/13
to tryto...@googlegroups.com
On 23/11/13 20:45 +0100, Sergi Almacellas Abellana wrote:
> Hi All,
>
> As Albert suggested on [1] i think that copy may also be improved in
> the line of [2].
>
> This is difficult as a the default parameter is not mandatory.
>
> As copy is a simplification of creating new records I will suggest
> the following improvement:
>
> def copy(cls, records, default=None, create=True):
> ....
> to_create = []
> for data in datas:
> data = convert_data(field_defs, data)
> to_create.append(data)
> if not create:
> return to_create
> new_records = cls.create(to_create)
> ....

This can not work because of the translations.
Moreover, over time we have removed all the ugly different in and out
type of values because they always generate bugs.

--
Cédric Krier - B2CK SPRL
Email/Jabber: cedric...@b2ck.com
Website: http://www.b2ck.com/

Cédric Krier

unread,
Nov 25, 2013, 4:37:34 PM11/25/13
to tryto...@googlegroups.com
On 23/11/13 20:49 +0100, Cédric Krier wrote:
> On 23/11/13 20:45 +0100, Sergi Almacellas Abellana wrote:
> > Hi All,
> >
> > As Albert suggested on [1] i think that copy may also be improved in
> > the line of [2].
> >
> > This is difficult as a the default parameter is not mandatory.
> >
> > As copy is a simplification of creating new records I will suggest
> > the following improvement:
> >
> > def copy(cls, records, default=None, create=True):
> > ....
> > to_create = []
> > for data in datas:
> > data = convert_data(field_defs, data)
> > to_create.append(data)
> > if not create:
> > return to_create
> > new_records = cls.create(to_create)
> > ....
>
> This can not work because of the translations.
> Moreover, over time we have removed all the ugly different in and out
> type of values because they always generate bugs.

I have re-think about it and I have one more cons.
Indeed what you try to optimize is when we copy records with different
default values.

for record in records:
defaults = {…}
new_record = Model.copy([record], defaults)

But indeed it would be done with quiet good preformance once the
write-multi is there like this:

new_records = Model.copy(records)
to_write = []
for record, new_record in zip(records, new_records):
default = {…}
to_write.extend(([new_record], defaults))
Model.write(*to_write)

Ok it is a little bit longer and it generate twice validation. But I
think it is pretty good enough.

Sergi Almacellas Abellana

unread,
Nov 27, 2013, 2:00:56 PM11/27/13
to tryto...@googlegroups.com
El 25/11/13 22:37, Cédric Krier ha escrit:
> On 23/11/13 20:49 +0100, Cédric Krier wrote:
>> >On 23/11/13 20:45 +0100, Sergi Almacellas Abellana wrote:
>>> > >Hi All,
>>> > >
>>> > >As Albert suggested on [1] i think that copy may also be improved in
>>> > >the line of [2].
>>> > >
>>> > >This is difficult as a the default parameter is not mandatory.
>>> > >
>>> > >As copy is a simplification of creating new records I will suggest
>>> > >the following improvement:
>>> > >
>>> > >def copy(cls, records, default=None, create=True):
>>> > >....
>>> > > to_create = []
>>> > > for data in datas:
>>> > > data = convert_data(field_defs, data)
>>> > >to_create.append(data)
>>> > > if not create:
>>> > > return to_create
>>> > > new_records = cls.create(to_create)
>>> > >....
>> >
>> >This can not work because of the translations.
>> >Moreover, over time we have removed all the ugly different in and out
>> >type of values because they always generate bugs.
> I have re-think about it and I have one more cons.
> Indeed what you try to optimize is when we copy records with different
> default values.
Indeed what I try to optimize is copying multiple times the same record.
Something like:

new_records = Model.copy([record]*10, defaults)


>
> for record in records:
> defaults = {…}
> new_record = Model.copy([record], defaults)
>
> But indeed it would be done with quiet good preformance once the
> write-multi is there like this:
>
> new_records = Model.copy(records)
> to_write = []
> for record, new_record in zip(records, new_records):
> default = {…}
> to_write.extend(([new_record], defaults))
> Model.write(*to_write)
>
> Ok it is a little bit longer and it generate twice validation. But I
> think it is pretty good enough.
That makes sense for me.

Cédric Krier

unread,
Nov 27, 2013, 3:51:51 PM11/27/13
to tryto...@googlegroups.com
In this case, I don't see what needs to be optimized. In such case copy
will already call once create and with the cache of records the
computation of the values is quite fast (no DB access).
Reply all
Reply to author
Forward
0 new messages