Cascading innerJoins gives incorrect results

25 views
Skip to first unread message

Ferenc Collins

unread,
Apr 22, 2019, 12:54:47 PM4/22/19
to lovefield-users
I think I have answered this myself, but I just want to check with you guys that my logic is correct, and maybe this will help other users having similar problems.

Here is my original code snippet:

db.select()
 
.from(db_users)
 
.innerJoin(
 db_organisations_users
,
 db_users
.id.eq(db_organisations_users.user_link)
 
)
 
.innerJoin(
 db_organisations
,
 db_organisations
.id.eq(db_organisations_users.organisation_link)
 
)
 
.where(db_organisations_users.organisation_link.eq(this.orgId))
 
.exec()
 
.then(function(rows){
 console
.log("usersShow rows",rows);
 
});

This does not produce correct results, as I receive about 20 results instead of 10, and on 10 of them, db_organisations_users.user_link != db_users.id.

However, swapping the innerJoins around like this:

db.select()
 
.from(db_users)
 
.innerJoin(
 db_organisations
,
 db_organisations
.id.eq(db_organisations_users.organisation_link)
 
)
 
.innerJoin(
 db_organisations_users
,
 db_users
.id.eq(db_organisations_users.user_link)
 
)
 
.where(db_organisations_users.organisation_link.eq(this.orgId))

 
.exec()
 
.then(function(rows){
 console
.log("usersShow rows",rows);
 
});

...appears at first sight to produce the results I'm after, but I am wary that as I accrue more data, a crack will also appear. So at this point I'm thinking to myself I'm probably doing something wrong.

Then I note that in the docs under 4.1 SELECT Query Builder it states:

"All functions provided by select query, except orderBy(), can only be called once, otherwise an exception will be raised."

Ah, so I'm calling .innerJoin twice, and therefore getting unexpected (and unsupported) results. So if I try:

db.select()
 
.from(db_users)
 
.leftOuterJoin(
 db_organisations_users
,
 db_users
.id.eq(db_organisations_users.user_link)
 
)
 
.innerJoin(
 db_organisations
,
 db_organisations
.id.eq(db_organisations_users.organisation_link)
 
)
 
.where(db_organisations_users.organisation_link.eq(this.orgId))
 
.exec()
 
.then(function(rows){
 console
.log("usersShow rows",rows);
 
});

...bearing in mind that in my db there will always be a relationship between db_users.id and db_organisations_users.user_link, then .leftOuterJoin() will net the same result as .innerJoin() would, and so (hopefully) this is how to solve this kind of problem.

Hopefully.

However, if I want to run a more complex query involving a third or fourth join, I probably can't, can I?

dpa...@chromium.org

unread,
Apr 29, 2019, 8:13:41 PM4/29/19
to lovefield-users
Sounds that we fail to catch an invalid query during the query building process. Out of curiosity, what does explain() and toSql() produce for this query? Maybe worth filing a bug https://github.com/google/lovefield/issues so that we can tighten those checks.
 

Ah, so I'm calling .innerJoin twice, and therefore getting unexpected (and unsupported) results. So if I try:

db.select()
 
.from(db_users)
 
.leftOuterJoin(
 db_organisations_users
,
 db_users
.id.eq(db_organisations_users.user_link)
 
)
 
.innerJoin(
 db_organisations
,
 db_organisations
.id.eq(db_organisations_users.organisation_link)
 
)
 
.where(db_organisations_users.organisation_link.eq(this.orgId))
 
.exec()
 
.then(function(rows){
 console
.log("usersShow rows",rows);
 
});

...bearing in mind that in my db there will always be a relationship between db_users.id and db_organisations_users.user_link, then .leftOuterJoin() will net the same result as .innerJoin() would, and so (hopefully) this is how to solve this kind of problem.


I'll have to look a bit closer to see if this is a valid query, and whether Lovefield actually supports that, or hits undefined behavior. Could you post the equivalent (valid) SQL query? That would help investigating.
 
Hopefully.

However, if I want to run a more complex query involving a third or fourth join, I probably can't, can I?

Lovefield supports multi join queries, see for example these tests at https://github.com/google/lovefield/blob/master/tests/proc/end_to_end_select_tester.js#L664-L691. Having said that, need to investigate if the combination of leftOuterJoin and innerJoin presented earlier is handled (not sure).
 

dpa...@chromium.org

unread,
Apr 29, 2019, 8:16:25 PM4/29/19
to lovefield-users
Also looking at the example tests I linked in previous message, calling innerJoin() multiple times seems like a valid case.


On Monday, April 22, 2019 at 9:54:47 AM UTC-7, Ferenc Collins wrote:

Ferenc Collins

unread,
May 2, 2019, 11:10:02 AM5/2/19
to lovefield-users
During the course of filing an issue as you suggested, I discovered that wherever db_organisations_users.user_link != db_users.id , db_organisations_users.user_link was actually a number, not a string of a number - a result of a bad database conversion where I was previously using auto-inc numbering for IDs and now I'm using something akin to uuid, but shorter. Some of the data is incomplete and so the original numbers remain in the database as strings on the mysql, but when they're transferred to the lovefield db, I don't have type checking on import, because this wouldn't occur on a production database. So I know to delete those rows now. Anyway, to dot the i's and cross the t's, then here are my answers to your queries:

Here is my original code snippet, having substituted the "orgId" variable:

```
db.select()
 .from(db_users)
 .innerJoin(
 db_organisations_users,
 db_users.id.eq(db_organisations_users.user_link)
 )
 .innerJoin(
 db_organisations,
 db_organisations.id.eq(db_organisations_users.organisation_link)
 )
 .where(db_organisations_users.organisation_link.eq("0gfvxSqBh2ne1PUO"))
 .exec()
 .then(function(rows){
 console.log("usersShow rows",rows);
 });
```
This returns 15 rows in lovefield, many of which are duplicates and db_organisations_users.user_link != db_users.id

You can see from the code that my original reference SQL is:
```
select * from users
inner join organisations_users on users.id=organisations_users.user_link
inner join organisations on organisations.id=organisations_users.organisation_link
where organisation_link="0gfvxSqBh2ne1PUO"
```
Running that on my source MySQL instance, this only returns 7 rows

.toSql() returns:
```
SELECT * FROM users, organisations_users, organisations WHERE (organisations_users.organisation_link = '0gfvxSqBh2ne1PUO') AND ((organisations.id = organisations_users.organisation_link) AND (users.id = organisations_users.user_link));
```
executing that SQL on MySQL gives 7 rows

.explain() gives:
```
project()
-join(type: inner, impl: index_nested_loop, join_pred(organisations.id eq organisations_users.organisation_link))
--join(type: inner, impl: index_nested_loop, join_pred(users.id eq organisations_users.user_link))
---no_op_step(users)
---table_access_by_row_id(organisations_users)
----index_range_scan(organisations_users.organisation_link_idx, [0gfvxSqBh2ne1PUO, 0gfvxSqBh2ne1PUO], natural)
--no_op_step(organisations)
```

So for the time being I have not filed this as an issue, but would love to hear your thoughts.
Reply all
Reply to author
Forward
0 new messages