Slow queries when searching m2m fields in big databases

129 views
Skip to first unread message

Jordi Esteve

unread,
Jun 4, 2015, 1:11:36 PM6/4/15
to tryto...@googlegroups.com
I'm analysing postgres performance in a big database (aprox. 1-2 million
or records of account.invoice.line and stock.move) and I have figured
out that slow queries are executed to get data related to m2m fields (I
suppose the same happens for o2m fields). I think the problem is how the
search_domain is build in these cases. For example, the m2m field
invoice_lines in stock.move model is asked with a domain like that:

[('stock_move', 'in', [1826865, 1826864]), ('invoice_line.id', '!=',
None), ('invoice_line.company', '=', 1)]

The second condition ('invoice_line.id', '!=', None) is added by the
get() of the many2many field.
The third condition ('invoice_line.company', '=', 1) is added by a
company rule.

This domain is converted to the following SQL, and if
account_invoice_line table has 1 million records it is very slow (10-20
seconds) to execute and consumes a lot of postgres resources because the
subqueries returns thousands (nearly million) of records:

SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
FROM "account_invoice_line-stock_move" AS "a"
LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" =
"a"."invoice_line")
WHERE (("a"."stock_move" IN (1826865, 1826864))
AND ("a"."invoice_line" IN (SELECT "c"."id" AS "id" FROM
"account_invoice_line" AS "c" WHERE ((("c"."id" IS NOT NULL))
AND ("c"."id" IN (SELECT "d"."id" AS "id" FROM
"account_invoice_line" AS "d" WHERE (((("d"."company" = 1))) AND true)))))))
ORDER BY "b"."description" ASC;


I don't know if it will be room to improve how a domain is converted to
an SQL expression. For example, based in the previous example, one like
this that applies the related conditions directly to the JOIN table
instead of the original table with subqueries.

SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
FROM "account_invoice_line-stock_move" AS "a"
LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" =
"a"."invoice_line")
WHERE (("a"."stock_move" IN (1826865, 1826864))
AND ("b"."id" IS NOT NULL)
AND (("b"."company" = 1) AND true))
ORDER BY "b"."description" ASC;

--
Jordi Esteve
Consultor Zikzakmedia SL
jes...@zikzakmedia.com
Mòbil 679 170 693

Zikzakmedia SL
St. Jaume, 9, baixos, 2a
08720 Vilafranca del Penedès
Tel 93 890 2108

Albert Cervera i Areny

unread,
Jun 4, 2015, 6:20:28 PM6/4/15
to tryto...@googlegroups.com
This query is not equivalent to the generated one, you'd have to make
something like

LEFT JOIN (select .....)

and not use the "b".* in the main where clause, otherwise the LEFT
nature of the join is lost.

Anyway, I don't know how hard it would be to make tryton generate a
better query but if the generated query is very slow it may be because
postgres requires a lot of temporary memory and PostgreSQL is not
properly tuned. You could try increasing work_mem which is usually
extremely low by default.

Also we've experience very poor performance with this kind of queries
with older versions of PostgreSQL <= 9.0 I guess. Newer versions
managed that pretty well.


>
> --
> Jordi Esteve
> Consultor Zikzakmedia SL
> jes...@zikzakmedia.com
> Mòbil 679 170 693
>
> Zikzakmedia SL
> St. Jaume, 9, baixos, 2a
> 08720 Vilafranca del Penedès
> Tel 93 890 2108
>



--
Albert Cervera i Areny
Tel. 93 553 18 03
@albertnan
www.NaN-tic.com

Raimon Esteve

unread,
Jun 5, 2015, 3:44:57 AM6/5/15
to tryto...@googlegroups.com
Yes, it's reconfigured this parammeter (according with hardware):

work_mem = 82MB # min 64kB

> Also we've experience very poor performance with this kind of queries
> with older versions of PostgreSQL <= 9.0 I guess. Newer versions
> managed that pretty well.

It's 9.3

thanks

Cédric Krier

unread,
Jun 5, 2015, 6:40:03 AM6/5/15
to tryto...@googlegroups.com
Agree on point of view of purely SQL but for Tryton they will have the
same result because "id" is required and "invoice_line" is a foreign
key.

> Anyway, I don't know how hard it would be to make tryton generate a
> better query.

I think both queries have their advantage indeed it depends on the size
of each tables.
In our case, a join is faster because the database will make a join
between a small set of tuple from "account_invoice_line-stock_move" with
a large table "account_invoice_line". Instead of retrieving almost all
the tuples from "account_invoice_line" to make the IN test.
But if the table size were different then a IN clause could have better
performance.

So I think the improvement should add a parameter on the Many2One
definition to define which kind of "nested" search strategy it should
use.

--
Cédric Krier - B2CK SPRL
Email/Jabber: cedric...@b2ck.com
Tel: +32 472 54 46 59
Website: http://www.b2ck.com/

Jordi Esteve

unread,
Jun 5, 2015, 7:43:04 AM6/5/15
to tryto...@googlegroups.com
Yes, you are right, with several GB of RAM assigned to postgres and
postgres tunned correctly the time consumed decreases drastically. I was
doing tests in my local PC with lower resources and I was surprised with
this behaviour.

I don't know if it is worth improve this queries with Cedric's
suggestion of defining the kind of "nested" search strategy to use, it
will make code for search_domain() more complex. But solving the problem
by more RAM resources + tunning postgres hides the real problem.

Cédric Krier

unread,
Jun 6, 2015, 5:30:03 AM6/6/15
to tryto...@googlegroups.com
I wrote a quick POC of this patch at https://bugs.tryton.org/issue4798
It will be great to get some timing report between both options on large
database.

Cédric Krier

unread,
Jun 7, 2015, 5:00:03 PM6/7/15
to tryto...@googlegroups.com
Indeed it is no more a POC but a patch ready for review and testing.

Jordi Esteve

unread,
Jun 11, 2015, 2:52:52 PM6/11/15
to tryto...@googlegroups.com
I have tested the duplication and reception of supplier shipments in a
database with 1,200,000 invoice lines in my laptop (old machine with a
postgres tuned to use 400MB).

Before applying the patch this query was generated which takes 25864.758
ms - 28027.834 ms (between 26 and 28 seconds):

SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
FROM "account_invoice_line-stock_move" AS "a"
LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = "a"."invoice_line")
WHERE (("a"."stock_move" IN (1826865, 1826864))
AND ("a"."invoice_line" IN (SELECT "c"."id" AS "id" FROM
"account_invoice_line" AS "c" WHERE ((("c"."id" IS NOT NULL))
AND ("c"."id" IN (SELECT "d"."id" AS "id" FROM "account_invoice_line" AS
"d" WHERE (((("d"."company" = 1))) AND true)))))))
ORDER BY "b"."description" ASC;

After applying the patch and setting target_search='join' in the m2o
field invoice_line in account.invoice.line-stock.move table, this query
was generated without the subquery which takes only 6.708 ms:

SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
FROM "account_invoice_line-stock_move" AS "a"
LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = "a"."invoice_line")
WHERE (("a"."stock_move" IN (1826865, 1826864))
AND ((("b"."id" IS NOT NULL))
AND ("b"."id" IN (SELECT "c"."id" AS "id" FROM "account_invoice_line" AS
"c" WHERE (((("c"."company" = 1))) AND true)))))))
ORDER BY "b"."description" ASC;


As we can see it is nearly 4000 times faster, so the improvement is
impressive. I suppose testing in a server with more and better resources
the difference will not be so high, but IMHO really worth it.

By the way, the default option in the new parameter 'target_search' for
m2o fields is 'subquery'. Maybe is it better to set 'join' by default?
If not, then the m2o relations suspected to have a lot of records like
move lines, invoice lines, sale lines, stock moves, ... should have the
'join' parameter set in the tryton modules?

Cédric Krier

unread,
Jun 11, 2015, 3:15:02 PM6/11/15
to tryto...@googlegroups.com
Thanks for the report.

> By the way, the default option in the new parameter 'target_search' for m2o
> fields is 'subquery'. Maybe is it better to set 'join' by default? If not,
> then the m2o relations suspected to have a lot of records like move lines,
> invoice lines, sale lines, stock moves, ... should have the 'join' parameter
> set in the tryton modules?

I will explain why I kept 'subquery' as default. On small database,
there will be almost no difference with the 'join'. On larger database,
in many cases the query planner could infer the 'join' if it worths it
base on its statistics. But the 'in' join could be worse on specific
case.

So for me, we must explicitly set the 'target_search' after some
thoughts just like the 'ondelete'. We will have to make an issue after
this one to search on each module the Many2One to fix.

Jordi Esteve

unread,
Jun 11, 2015, 3:37:55 PM6/11/15
to tryto...@googlegroups.com
On 11/06/15 21:10, Cédric Krier wrote:
>> By the way, the default option in the new parameter 'target_search' for m2o
>> fields is 'subquery'. Maybe is it better to set 'join' by default? If not,
>> then the m2o relations suspected to have a lot of records like move lines,
>> invoice lines, sale lines, stock moves, ... should have the 'join' parameter
>> set in the tryton modules?
> I will explain why I kept 'subquery' as default. On small database,
> there will be almost no difference with the 'join'. On larger database,
> in many cases the query planner could infer the 'join' if it worths it
> base on its statistics. But the 'in' join could be worse on specific
> case.

If on small database there is almost no difference between the two
methods and on larger databases the 'in' joint is worse, I don't see the
advantage to keep the 'subquery' method as the default one.

I make the question in another way: Which are the advantages of using
'subquery' method instead of 'join' method?


> So for me, we must explicitly set the 'target_search' after some
> thoughts just like the 'ondelete'. We will have to make an issue after
> this one to search on each module the Many2One to fix.
Ok, but we must have a good criteria to choose one of the two methods in
each case.

Cédric Krier

unread,
Jun 11, 2015, 6:00:16 PM6/11/15
to tryto...@googlegroups.com
On 2015-06-11 21:37, Jordi Esteve wrote:
> On 11/06/15 21:10, Cédric Krier wrote:
> >>By the way, the default option in the new parameter 'target_search' for m2o
> >>fields is 'subquery'. Maybe is it better to set 'join' by default? If not,
> >>then the m2o relations suspected to have a lot of records like move lines,
> >>invoice lines, sale lines, stock moves, ... should have the 'join' parameter
> >>set in the tryton modules?
> >I will explain why I kept 'subquery' as default. On small database,
> >there will be almost no difference with the 'join'. On larger database,
> >in many cases the query planner could infer the 'join' if it worths it
> >base on its statistics. But the 'in' join could be worse on specific
> >case.
>
> If on small database there is almost no difference between the two methods
> and on larger databases the 'in' joint is worse, I don't see the advantage
> to keep the 'subquery' method as the default one.
>
> I make the question in another way: Which are the advantages of using
> 'subquery' method instead of 'join' method?

They are faster if the target table is small.
And I bet that we have more small table than big one because small table
are the referential while big table are the operational and I think we
have much more referential Model than operational (we try to reuse the
operational).

> >So for me, we must explicitly set the 'target_search' after some
> >thoughts just like the 'ondelete'. We will have to make an issue after
> >this one to search on each module the Many2One to fix.
> Ok, but we must have a good criteria to choose one of the two methods in
> each case.

Yes for me, if the target is a table that is growing overtime then
'join' must be used otherwise 'subquery'.

Cédric Krier

unread,
Jun 12, 2015, 3:45:04 AM6/12/15
to tryto...@googlegroups.com
On 2015-06-11 21:55, Cédric Krier wrote:
> On 2015-06-11 21:37, Jordi Esteve wrote:
> > On 11/06/15 21:10, Cédric Krier wrote:
> > >>By the way, the default option in the new parameter 'target_search' for m2o
> > >>fields is 'subquery'. Maybe is it better to set 'join' by default? If not,
> > >>then the m2o relations suspected to have a lot of records like move lines,
> > >>invoice lines, sale lines, stock moves, ... should have the 'join' parameter
> > >>set in the tryton modules?
> > >I will explain why I kept 'subquery' as default. On small database,
> > >there will be almost no difference with the 'join'. On larger database,
> > >in many cases the query planner could infer the 'join' if it worths it
> > >base on its statistics. But the 'in' join could be worse on specific
> > >case.
> >
> > If on small database there is almost no difference between the two methods
> > and on larger databases the 'in' joint is worse, I don't see the advantage
> > to keep the 'subquery' method as the default one.
> >
> > I make the question in another way: Which are the advantages of using
> > 'subquery' method instead of 'join' method?
>
> They are faster if the target table is small.
> And I bet that we have more small table than big one because small table
> are the referential while big table are the operational and I think we
> have much more referential Model than operational (we try to reuse the
> operational).

I made some tests and indeed joins even on large table are not so
expensive if there is a where clause that reduces a lot the rows to join
on because pg can filter before doing the join.
So I made 'join' the default target_search method.

Guillem Barba Domingo

unread,
Jun 19, 2015, 2:56:48 AM6/19/15
to Tryton Dev
good job
--
Reply all
Reply to author
Forward
0 new messages