Query.findList() method sets distinct to true whenever a join is performed.

1,460 views
Skip to first unread message

aksha...@gmail.com

unread,
Dec 9, 2014, 8:05:37 AM12/9/14
to eb...@googlegroups.com
Hi everyone, 

I am using 'Query' (Expression List) in Play framework.

Whenever I perform a join as shown below, the findList() method sets distinct to true even if I set distinct to false for the query.

ExpressionList<Table1> query = Table1.find.where();
query = query.eq("Table2.somePropertyName", propertyName);
List<Table1> results = query.setDistinct(false).findList();

Where, 
Table1 extends Model and find is a ebean.Model.Finder defined as
public static Finder<String, Table1> find = new Finder<String, Table1>(String.class, Table1.class);
and there is a OneToMany relationship from Table1 to Table2

The resulting sql query generated is something like this:
SELECT DISTINCT id, ... 
FROM Table1 t1 JOIN Table2 t2
WHERE t1.id = t2.id AND t2.somePropertyName = "value";

The findList method seems to be setting distinct to true for all the join queries.A join shouldn't have a DISTINCT unless specified. I am unable to figure out a reason why findList is setting DISTINCT for the join query?
Please correct me if I am wrong.

My understanding of the code behind findList:
The findList method eventually makes a call to createQueryRequest which then sets distinct to true if 'manyWhereJoins' is not empty. I do not know what 'manyWhereJoins' here means. However, what I understood from the code is that it checks if the Property Name(in a join case like "Table2.somePropertyName") has a dot(.) and if it does contain a dot, it sets distinct to true.

using avaje ebeanorm 3.2.2

Thanks
Akshay

 

Rob Bygrave

unread,
Dec 9, 2014, 8:46:03 AM12/9/14
to ebean@googlegroups
The findList method seems to be setting distinct to true for all the join queries

Well not on all joins no it doesn't.  It does when it needs to based on cardinality. Table2 is a @OneToMany from Table1 right and that is the important point and why the distinct is added.  I can see also by looking at the table alias - you'll see the join alias in question starts with 'u' and not 't' (it is a join that can only be used for the predicate and not for 'fetching').

If you take the generated SQL, execute it in some query tool, then take away the distinct and execute it again it will make sense.  


Like:


    List<Order> orders = Ebean.find(Order.class)
            .select("id, status, orderDate") // just some columns to that the sql is a bit smaller
            .where().gt("details.orderQty", 0)    // predicate on a @OneToMany property
            .findList();


    // just run it raw to get a flat sql resultSet
    List<SqlRow> list0 = Ebean.createSqlQuery("select distinct t0.id , t0.status , t0.order_date "
            +" from o_order t0 join o_order_detail u1 on u1.order_id = t0.id  where u1.order_qty > 0")
            .findList();

    // now remove the distinct ...
    List<SqlRow> list1 = Ebean.createSqlQuery("select t0.id , t0.status , t0.order_date "
            +" from o_order t0 join o_order_detail u1 on u1.order_id = t0.id  where u1.order_qty > 0")
            .findList();


    // this will output for my case 3 rows
    System.out.println("list0: "+list0);


    // this will output for my case 7 rows ... and you will see duplicate rows
    System.out.println("list1: "+list1);


So then you've got to figure out why the distinct (3 rows) is actually what we want (and that the 7 rows with duplicates is actually not what we want).  So yes, Ebean is adding the distinct based on the fact it knows the predicate is on a  OneToMany (or ManyToMany) path - the important point being the cardinality of the relationship.



02:30:22.976 [main] TRACE org.avaje.ebean.SQL - txn[1004] select distinct t0.id c0, t0.status c1, t0.order_date c2 from o_order t0 join o_order_detail u1 on u1.order_id = t0.id  where u1.order_qty > ? ; --bind(0)
02:30:22.988 [main] DEBUG org.avaje.ebean.SUM - txn[1004] FindMany type[Order] origin[C6KxRF.DWeHD5.BvQ02m] exeMicros[25385] rows[3] name[] predicates[u1.order_qty > ? ] bind[0]
02:30:22.992 [main] TRACE org.avaje.ebean.TXN - txn[1004] Commit - query only
02:30:22.999 [main] TRACE org.avaje.ebean.SQL - txn[1005] select distinct t0.id , t0.status , t0.order_date  from o_order t0 join o_order_detail u1 on u1.order_id = t0.id  where u1.order_qty > 0; --bind()
02:30:23.016 [main] DEBUG org.avaje.ebean.SUM - txn[1005] SqlQuery  rows[3] time[18] bind[]
02:30:23.017 [main] TRACE org.avaje.ebean.TXN - txn[1005] Commit - query only
02:30:23.019 [main] TRACE org.avaje.ebean.SQL - txn[1006] select t0.id , t0.status , t0.order_date  from o_order t0 join o_order_detail u1 on u1.order_id = t0.id  where u1.order_qty > 0; --bind()
02:30:23.031 [main] DEBUG org.avaje.ebean.SUM - txn[1006] SqlQuery  rows[7] time[12] bind[]
02:30:23.032 [main] TRACE org.avaje.ebean.TXN - txn[1006] Commit - query only
list0: BeanList size[3] list[{id=2, status=2, order_date=2014-12-10}, {id=1, status=0, order_date=2014-12-10}, {id=3, status=3, order_date=2014-12-10}]
list1: BeanList size[7] list[{id=1, status=0, order_date=2014-12-10}, {id=1, status=0, order_date=2014-12-10}, {id=1, status=0, order_date=2014-12-10}, {id=2, status=2, order_date=2014-12-10}, {id=3, status=3, order_date=2014-12-10}, {id=3, status=3, order_date=2014-12-10}, {id=3, status=3, order_date=2014-12-10}]




Cheers, Rob.



--

---
You received this message because you are subscribed to the Google Groups "Ebean ORM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ebean+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aksha...@gmail.com

unread,
Dec 9, 2014, 10:40:09 AM12/9/14
to
Hi Rob, 

Thanks a lot for the early reply.

I take your point that it is a OneToMany relationship from Table1 to Table2 and the resulting join query may have duplicates.

But, why does the method findList have to set distinct automatically. Shouldn't it be the other way around? That is, the user specifying if the query requires a distinct. 

Is there a way to disable DISTINCT?

Akshay.

Rob Bygrave

unread,
Dec 9, 2014, 3:25:34 PM12/9/14
to ebean@googlegroups

In this case ... at the SQL level we need the distinct so yes Ebean is turning that on and it must do that or else we'd have a problem.

It is really the other way around in that Ebean takes note of setDistinct(true) ... and not setDistinct(false). Now setDistinct(true) didn't work as people expected until version 4.2.0 - refer to https://github.com/ebean-orm/avaje-ebeanorm/issues/158


In 4.2.0 or later you can use setDistinct(true) like:

  Customer.find.select("name").setDistinct(true).findList();

...  this is a special query in the ORM sense in that this then excludes the @Id property and that means that the beans returned can not longer be persisted or invoke lazy loading. If you are stuck on 3.x so you can't do this query and would need to drop back to a SqlQuery.



In short it's the other way ... setDistinct(true) is the developer telling ebean to drop the @Id property out of the query (and accept the consequences of no lazy loading and no persisting etc) and this works in version 4.2.0 and later.

Hopefully that clarifies it a bit.  


Cheers, Rob.


On 10 December 2014 at 04:40, <aksha...@gmail.com> wrote:
Hi Bob, 

Thanks a lot for the early reply.

I take your point that it is a OneToMany relationship from Table1 to Table2 and the resulting join query may have duplicates.

But, why does the method findList have to set distinct automatically. Shouldn't it be the other way around? That is, the user specifying if the query requires a distinct. 

Is there a way to disable DISTINCT?

Akshay.

On Tuesday, December 9, 2014 6:35:37 PM UTC+5:30, aksha...@gmail.com wrote:

--

aksha...@gmail.com

unread,
Dec 16, 2014, 7:26:22 AM12/16/14
to eb...@googlegroups.com
Hi Rob, 

Thanks for the reply. 
I have a situation where I do not need Distinct in my join query. The use of distinct is effecting my query performance.

Ex: 
Customer(cust_id int, Name varchar(15), ...)
Where 
cust_is the PRIMARY KEY.

Details(details_id int, cust_id int, addressType varchar(15), address varchar(15), pincode int, ...)
Where, 
details_id is the PRIMARY KEY
cust_id references Customer(cust_id)
addressType can have two values, namely, residential or office

Query to be execiuted:
select C.cust_id from Customer C join Details D where C.cust_id = D.cust_id and addressType = 'residential'; 

This query doesn't require a Distinct because of the condition addressType = 'residential' and each customer has only one residential address.

I know this could be solved by changing the database design. However, that is not a feasible solution for me right now.
I want to know if there are any hooks to modify the query before executing it?

Thanks
Akshay

Rob Bygrave

unread,
Dec 16, 2014, 3:18:06 PM12/16/14
to ebean@googlegroups
In this case best to just use RawSql. You might need to add explicit rawSqlBuilder.columnMapping(...) but probably not in this case.

Some something like:


RawSql rawSql = RawSqlBuilder.unparsed(yourSql);

ebeanServer.find(Customer.class).setRawSql(rawSql).findList();


... so with RawSqlBuilder.unparsed() Ebean uses the sql you given it without modifications.  If you use RawSqlBuilder.parsed() then Ebean parses the sql so that you can add additional where predicates, order by, limit etc (so augmenting the sql you give it with extra predicates, order by, limit etc).

This mechanism bypasses Ebean's sql generation but otherwise the beans are normal entity beans with no limitations so further batch lazy loading, query joins, updates etc all work as expected.


Note that as an alternative to using RawSql you can also give the query a ResultSet which is typically more handy when using a stored procedure to load the entity bean graph.



Cheers, Rob.





--

Rob Bygrave

unread,
Dec 16, 2014, 3:27:04 PM12/16/14
to ebean@googlegroups
With a bit of luck I'll look to do a working example tonight.  

I should say that ... probably you want to first try using RawSqlBuilder.parse(sql) ... as that will automatically determine the column ordering and generally be able to match the db columns to bean properties (and that means you don't need to specify any columnMapping explicitly.


Cheers, Rob.

Rob Bygrave

unread,
Dec 17, 2014, 5:18:47 AM12/17/14
to ebean@googlegroups
Ok, interesting enough I hit a bug when doing an example (due to the OneToMany). Logged the bug as https://github.com/ebean-orm/avaje-ebeanorm/issues/223
... and that is now fixed in master (so fix included in next release which will be 4.5.2).


So with that fix in place this example below reasonably matches what you are trying to do and fetches the following paths:

  order (id, status)
  customer (id, name)                         // ManyToOne from order
  details (id, orderQty)                        // This is a OneToMany from order to details  (and the bug I've just fixed)
  details.product (id, name)                // ManyToOne from details to product


... and "details" is the OneToMany ... but it is setup with predicates to ensure only 1 many is actually fetched to match your scenario.  Actually it doesn't matter if many "details" where fetched as long as the order by clause ensures the master rows come in order.



So with that fix you can go:

String rs = "select t0.id, t0.status, t1.id, t1.name, "+
        " t2.id, t2.order_qty, t3.id, t3.name " +
        "from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id " +
        "join o_order_detail t2 on t2.order_id = t0.id  " +
        "join o_product t3 on t3.id = t2.product_id  " +
        "where t0.id <= :maxOrderId  and t3.id = :productId "+
        "order by t0.id, t2.id asc";

RawSql rawSql = RawSqlBuilder.parse(rs)
        .columnMapping("t0.id", "id")
        .columnMapping("t0.status", "status")
        .columnMapping("t1.id", "customer.id")
        .columnMapping("t1.name", "customer.name")
        .columnMapping("t2.id", "details.id")
        .columnMapping("t2.order_qty", "details.orderQty")
        .columnMapping("t3.id", "details.product.id")
        .columnMapping("t3.name", "details.product.name")
        .create();

List<Order> ordersFromRaw = Ebean.find(Order.class)
        .setRawSql(rawSql)
        .setParameter("maxOrderId", 2)
        .setParameter("productId", 1)
        .findList();

printOrders(ordersFromRaw, "using RawSql");

I think being on Play you might be stuck on an old version of Ebean though.  The Play folks were supposed to have pulled out their Ebean module in Nov which would enable you to upgrade so hopefully that is all in hand on the Play side.


Cheers, Rob.

Rob Bygrave

unread,
Dec 17, 2014, 6:03:44 AM12/17/14
to ebean@googlegroups
Ok, so I've just added "tableAliasMapping" to RawSqlBuilder so instead of mapping each column we can just map each table alias (assuming that the db column names naturally map to the bean property names).

https://github.com/ebean-orm/avaje-ebeanorm/issues/224  ... will be included in the next release.
 

    String rs = "select o.id, o.status, c.id, c.name, "+
            " d.id, d.order_qty, p.id, p.name " +
            "from o_order o join o_customer c on c.id = o.kcustomer_id " +
            "join o_order_detail d on d.order_id = o.id  " +
            "join o_product p on p.id = d.product_id  " +
            "where o.id <= :maxOrderId  and p.id = :productId "+
            "order by o.id, d.id asc";


    RawSql rawSql = RawSqlBuilder.parse(rs)
            .tableAliasMapping("c", "customer")
            .tableAliasMapping("d", "details")
            .tableAliasMapping("p", "details.product")
            .create();

    List<Order> ordersFromRaw = Ebean.find(Order.class)
            .setRawSql(rawSql)
            .setParameter("maxOrderId", 2)
            .setParameter("productId", 1)
            .findList();


That takes a bit of the verbosity away when mapping more complex object graphs.


Cheers, Rob.

Reply all
Reply to author
Forward
0 new messages