Automatically remove unused LEFT JOIN from list count database queries

77 views
Skip to first unread message

Alexander Obuhovich

unread,
Jan 25, 2011, 5:45:00 AM1/25/11
to In-Portal Development
That's not a rare case, when data from referenced table should be displayed along with data from table being viewed.

To do that "LEFT JOIN" clause is added a SELECT query being used for data retrieval from database.

That's not a bad habit actually. But, when table contains 1 million records and has 5 LEFT JOINs on it, then it really slows COUNT database query, used to calculate pagination for that grid.

I propose that we dynamically remove "LEFT JOIN" parts of database query if JOINed table isn't used anywhere in that database query.

For example this query doesn't use JOINed table at all, so we can remove it automatically:

SELECT table1.*
FROM table1
LEFT JOIN on table2 ON table1.field = table2.field
WHERE table1.field = 'test';


Ready for testing.

remove_unused_left_joins_from_list_count_sql.patch

Dmitry A.

unread,
Jan 28, 2011, 10:44:54 AM1/28/11
to in-por...@googlegroups.com
Hi Alex,


First of all, thanks for the patch!


I suppose we are talking about Admin part at all?

Also, this accounts for Catalog Permissions if we are counting Category Items. Please correct me if I am wrong.


DA

Alexander Obuhovich

unread,
Jan 28, 2011, 12:13:25 PM1/28/11
to in-por...@googlegroups.com
This will affect both admin and front-end. Only SQLs, that are used to get data to be displayed in grids are affected.

Haven't tested, when subselect is used.

Dmitry A.

unread,
Jan 28, 2011, 3:54:44 PM1/28/11
to in-por...@googlegroups.com
Can you give me an example of SUB-SELECT in Grids.

I don't think we are having any at the moment in default installation, do we?


DA

Alexander Obuhovich

unread,
Jan 29, 2011, 3:55:44 AM1/29/11
to in-por...@googlegroups.com
We don't, but you can create a calculated field with such sql:

'GroupCount' => 'SELECT COUNT(*) FROM ' . TABLE_PREFIX . 'UserGroup ug LEFT JOIN ' . TABLE_PREFIX .'PortalGroup g ON g.GroupId = ug.GroupId WHERE ug.PortalUserId = %1$s.PortalUserId'

It should show user membership group count for user.


er...@intechnic.lv

unread,
Mar 7, 2011, 6:23:52 AM3/7/11
to in-por...@googlegroups.com
Very questionable in its usefulness patch. Has anyone measured how many milliseconds it allows you to win / save?

Test on the table with 2.7 million entries. The result - 485.529 records.

SELECT c . *
FROM `inp_Cities` c
WHERE c.Country
IN (
'RU', 'US', 'CN', 'IN', 'JP'
)

SELECT c . *
FROM inp_Cities c
LEFT JOIN inp_Regions r ON c.State = r.RegionId
WHERE c.Country
IN (
'RU', 'US', 'CN', 'IN', 'JP'
)

Both requests are equally fast - for 0.004 seconds.

Alexander Obuhovich

unread,
Mar 7, 2011, 7:08:24 AM3/7/11
to in-por...@googlegroups.com
As I've pointed out in this discussion title (maybe you haven't noticed), but what I've optimized is only COUNT SQL. When counting it does improve execution time. When doing "select *" of course it doesn't.

You really think, that I will be positing patch to a group, that does nothing?

er...@intechnic.lv

unread,
Mar 9, 2011, 5:27:40 AM3/9/11
to in-por...@googlegroups.com
You really think, that I will be positing patch to a group, that does nothing?

Nothing personal - i got special invitation from Dmitry A. to comment this patch.

My further experiments shows that count sqls with unused joins are equally fast too -


SELECT COUNT(CityId)

FROM `inp_Cities` c
WHERE c.Country
IN (
'RU', 'US', 'CN', 'IN', 'JP'
)

SELECT COUNT(CityId)

FROM inp_Cities c
LEFT JOIN inp_Regions r
ON
    c.Country = r.Country
    AND c.State = r.Code

WHERE c.Country
IN (
'RU', 'US', 'CN', 'IN', 'JP'
)

Please give me example from real In-Portal system tables where SQL execution times with/without joins significantly differs.

Alexander Obuhovich

unread,
Mar 9, 2011, 5:43:52 AM3/9/11
to in-por...@googlegroups.com
Project, where I tested it has large speed improvements. Then miracle happened, since according to your tests this shouldn't be happening at all :)

I have nothing more to add, since you don't see my point anyway.

er...@intechnic.lv

unread,
Mar 9, 2011, 6:24:27 AM3/9/11
to in-por...@googlegroups.com
Can I see table structures and SQLs which makes "miracle"? Why it is initially defined as "That's not a rare case", but now this is some secret very special project?

Dmitry A.

unread,
Mar 9, 2011, 9:27:29 AM3/9/11
to in-por...@googlegroups.com
Hi Erik,


I guess it's time to look at the actual data if you don't mind.

Please Export your 3-4 tables with data, archive and upload to the DEV server so we can take a closer look.


Cheers!


DA

er...@intechnic.lv

unread,
Mar 9, 2011, 10:37:44 AM3/9/11
to in-por...@googlegroups.com
You can launch my SQLs as is at resumark dev database.

er...@intechnic.lv

unread,
Mar 9, 2011, 10:43:24 AM3/9/11
to in-por...@googlegroups.com
I suppose that Alex must provide proof  of his patch's usefulness by uploading actual data to DEV and by posting SQL requests here.

Dmitry Andrejev

unread,
Mar 9, 2011, 11:00:18 AM3/9/11
to in-por...@googlegroups.com
Hi Erik,


As a matter of fact I have tested Alex's patch. I can tell you right away that I did NOT have a table of 1 million records with JOIN on 1 million which actually will make things pretty hard for any optimization. The patch does quite a good improvement when you have VERY large main table (ie. Users with 2-3+ mil. records, Links) and JOINs on other tables that can be removed since are NOT critical for the Counts.

Make sense?


As far as I have learned from different resources including Zend forums and webminars JOINing very big tables is not always a good idea and developer should consider adjusting his Table architecture in order to get desired results or running multiple queries.

We'll be happy to look at your particular case to see what we can offer.


DA

er...@intechnic.lv

unread,
Mar 9, 2011, 11:34:42 AM3/9/11
to in-por...@googlegroups.com
In my patricular case there no problems - main table is 2.7 mln records, and is pretty good optimized with join and without join.

If goal of this patch is fast count query, then good testing must show significant difference between "with patch" and "without patch" - in seconds.

Dmitry A.

unread,
Mar 9, 2011, 12:16:56 PM3/9/11
to in-por...@googlegroups.com
Hi Erik,


I have looked at your particular SQLs and have nothing to add there is nothing to add there since Yes the patch is not applying ANY particular changes to them due to no need - at least I believe so!

Now take a look at this query, it should familiar:

SELECT %1$s.* %2$s FROM %1$s
 LEFT JOIN '.TABLE_PREFIX.'UserGroup ug ON %1$s.PortalUserId = ug.PortalUserId AND ug.PrimaryGroup = 1
 LEFT JOIN '.TABLE_PREFIX.'PortalGroup g ON ug.GroupId = g.GroupId
 LEFT JOIN '.TABLE_PREFIX.'%3$sPortalUserCustomData cust ON %1$s.ResourceId = cust.ResourceId
 LEFT JOIN '.TABLE_PREFIX.'%3$sImages img ON img.ResourceId = %1$s.ResourceId AND img.DefaultImg = 1',


Try having 2 million Users there (which will have 2 mil. Custom Data records) and quote a few records in Images table (general table). Now comes the question do you really need that JOIN on CustomData and Images table for Counts? I don't think so - this is what Patch does - removes it from there in case if NONE of these tables are involved in current Filters. Simply try this scenario.

Let me know if it makes sense to you?


DA



er...@intechnic.lv

unread,
Mar 9, 2011, 2:33:46 PM3/9/11
to in-por...@googlegroups.com
I believe that on this structure we'll got difference in query processing time by removing joins. Root of the problem is that some joins are wrong. If you don't want solve problem cardinally - then use this particular patch, and tons of  other patches. But right solution is in right data structure.

I already recommended store PrimaryGroupId to the PortalUser table. By analogy there is required adding DefaultImageId field to the PortalUser table. And then you got right structure, that will be faster, with joins, without joins, with filters and without filters.

In current u prefix there is not problem "leak unused joins", there is other problem - wrong joins. And thay are wrong always - when used, and when unused.


Dmitry Andrejev

unread,
Mar 9, 2011, 2:47:11 PM3/9/11
to in-por...@googlegroups.com
Erik, we are happy to make changes to the system and improve currently used JOINs not only for Users but for all other Prefixes!

Take a moment and carefully read your original post where you proposed to disable some things from the Grid to make things work differently for counts. From my point of view it can be considered as a Work-around as well as Alex's patch.

I am sure you can notice that here in Groups and Issue Tracker that we are constantly working on improving the structure of Database and the Code.

Said that - we'll be happy to have you on board and help with improving the ROOT problems that we all face in In-Portal.

By all means, please start a new task in Issues Tracker and help In-Portal by improving those JOINs!


Cheers!


DA
--


Best regards,

Dmitry A.

er...@intechnic.lv

unread,
Mar 10, 2011, 12:11:33 AM3/10/11
to in-por...@googlegroups.com
This Alex's patch is effective only in case when joins are incorrect.

I already created task, which include suggestions to change data structure and to solve SQL performance related problems, where structural changes will not help - https://groups.google.com/forum/#!topic/in-portal-dev/J9pnGGWxhK0

Dmitry Andrejev

unread,
Mar 10, 2011, 10:44:21 AM3/10/11
to in-por...@googlegroups.com
Hi Erik,


Well let's say that incorrectly setup JOINS were inherited with the software.

Now let's say we have fixed this SQL by doing you have suggested and will result in something like this:


SELECT %1$s.* %2$s FROM %1$s
 LEFT JOIN '.TABLE_PREFIX.'%3$sPortalUserCustomData cust ON %1$s.ResourceId = cust.ResourceId


Now you have the case with 1 million records in Users table and 1 million in PortalUserCustomData table. Now you can keep the JOIN as is or you can remove it when it's NOT needed (columns from Custom table not used in WHERE at all). Obviously, it will run faster, correct?

Now, you say it's the only SQL like this, unfortunately NO - we have it for each CategoryItems and in fact we can have it in other places. The matter of fact is that we have quite a Complex Tables structures which definitely can be optimized and re-architectured, but we need make sure we don't optimize in a way when we loose functionality.

Erik, I hope we are agree on this? If, you have a different opinion on what I have said above - you are welcome to explain why!


Thanks for starting the new discussion ( https://groups.google.com/forum/#!topic/in-portal-dev/J9pnGGWxhK0 ) - it's very big step and we always look forward for ideas. We'll definitely continue talking about what you have posted in that discussion.


We are here to talk and not to fight - I am sure we all understand that.

-- 


Best regards,

Dmitry A.

er...@intechnic.lv

unread,
Mar 11, 2011, 7:17:44 AM3/11/11
to in-por...@googlegroups.com
PortalUserCustomData join also is wrong. To improve it is enough replace ResourceId index with unique one in PortalUserCustomData table. Then this particular patch will have no effect. No dependance of record count. In unique index case SQL server "understands" that relation is "one to one", so no matter how much records are in left joined table.

Alexander Obuhovich

unread,
Mar 11, 2011, 7:34:18 AM3/11/11
to in-por...@googlegroups.com
I see, that your comment isn't related to this discussion, but rather to another discussion about database optimizations. Please post it there instead.


On Fri, Mar 11, 2011 at 2:17 PM, er...@intechnic.lv <er...@intechnic.lv> wrote:
PortalUserCustomData join also is wrong. To improve it is enough replace ResourceId index with unique one in PortalUserCustomData table. Then this particular patch will have no effect. No dependance of record count. In unique index case SQL server "understands" that relation is "one to one", so no matter how much records are in left joined table.




er...@intechnic.lv

unread,
Mar 11, 2011, 8:07:58 AM3/11/11
to in-por...@googlegroups.com
No, this post is about this particular patch. - "Then this particular patch will have no effect." - Please read text carefully.

Dmitry A.

unread,
Mar 11, 2011, 11:10:11 AM3/11/11
to in-por...@googlegroups.com
Hi Erik,

I think we have explained where and when this Patch kicks-in. If it doesn't work for your particular SQL it doesn't mean it won't work for mine. The example that were shown here are from current default In-Portal installation. I can easily write here a SQL which is custom and will apply to it, and it's going to be correct.

We can continue argue about current SQLs in In-Portal, but again I strongly believe it has NO relevance to this patch since the patch does VERY simple and straight forward job:

"Remove UNUSED Joins from COUNT SQL when it's possible".


I have started a new discussion particularly to cover "Reviewing & Optimizing Database Queries".


DA

er...@intechnic.lv

unread,
Mar 25, 2011, 7:22:23 AM3/25/11
to in-por...@googlegroups.com

"Remove UNUSED Joins from COUNT SQL when it's possible"

This patch is not needed as the work of this patch is done fine by MySQL-server - MySQL-server itself ignores joins that not alter the result of the SQL-query. The positive effect of this patch can be observed only in the case of incorrect (BUG) data structure. In such cases, the patch is also not needed - instead, should be corrected BUGs in the data structure.

Alexander Obuhovich

unread,
Mar 25, 2011, 7:48:08 AM3/25/11
to in-por...@googlegroups.com
Every participant can have his own opinion. That's how it works. Naturally, that not always they match.

I'm won't be convincing you any further on this subject (no matter, that colleagues of mine have witnessed performance improvement after this patch was applied).


Please do apply your knowledge of PHP and In-Portal into existing/new discussions, rather then trying to deny obvious facts (that aren't so obvious as it seems to you).


 

On Fri, Mar 25, 2011 at 1:22 PM, er...@intechnic.lv <er...@intechnic.lv> wrote:

"Remove UNUSED Joins from COUNT SQL when it's possible"

This patch is not needed as the work of this patch is done fine by MySQL-server - MySQL-server itself ignores joins that not alter the result of the SQL-query. The positive effect of this patch can be observed only in the case of incorrect (BUG) data structure. In such cases, the patch is also not needed - instead, should be corrected BUGs in the data structure.



er...@intechnic.lv

unread,
Mar 25, 2011, 9:31:18 AM3/25/11
to in-por...@googlegroups.com, Alexander Obuhovich
colleagues of mine have witnessed performance improvement after this patch was applied
Of course, removal of buggy joins will improve performance. But right way is - replace buggy joins with correct ones.
 

Dmitry A.

unread,
Mar 25, 2011, 10:56:37 AM3/25/11
to in-por...@googlegroups.com
Hi Erik,

I am happy that we finally found our MySQL expert that can do things right. I really hope that we move forward with this.

I see that you already started posting in Reviewing & Optimizing Database Queries discussion - VERY good!

DA

Dmitry A.

unread,
Sep 11, 2011, 11:33:33 PM9/11/11
to in-por...@googlegroups.com
This patch has been reviewed and successfully tested.

Works well on the Front-end and Admin SQLs


DA
Reply all
Reply to author
Forward
0 new messages