Equivalent to Hibernate "Projections.rowCount()"

1,088 views
Skip to first unread message

Apoptosis

unread,
May 16, 2009, 10:13:50 PM5/16/09
to Ebean ORM
Rob,

I am still interested in using EBean for my web projects at work
instead of Hibernate. However, I still have one feature I can't match
yet...

We have a feature where we scroll data on a web page. You see these
everywhere. Basically there is too many rows in the database to show
at once so we allow the user to page through the data. In order to do
this on one spot we have a utility class PageScroller which takes a
Hibernate Criteria object (The Full Query), and reads common request
parameters for the page and then modifies the Criteria object on the
fly to only pull the results of the page.

That was kind of long winded, but I have replaced Criteria with your
Query object and have gotten almost everything converted except
getting a row count from the original Query.

In hibernate we do the following:

Criteria crit; // The passed in criteria object
crit.setProjection(Projections.rowCount());
Integer count = (Integer) crit.uniqueResult();
int totalItems = count.intValue();

// Put criteria back to original state
crit.setProjection(null);
crit.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

I have posted this in your forums before. We discussed either a
Query.clone() method or a Query.rowCount() method. I would prefer
rowCount() which would execute the Query except with a COUNT(*) in the
select clause. I am sure there would be other uses for this besides
the one I have described.

P.S. I can't login to the http://www.avaje.org/ site anymore. Did
you decide to just move to google groups?

rob bygrave

unread,
May 17, 2009, 12:25:12 AM5/17/09
to eb...@googlegroups.com
> I would prefer Query.rowCount() method ...

I can see the usefulness of query.rowCount() so yes... I'll have to have a look to see how to implement that. Note (I'm sure you will be aware of this) that strictly speaking the rowCount will be an indication as opposed to strictly accurate due to paging queries occuring over time with different transactions (so inserts, deletes that occur can effect the paging queries).


Web applications ...  paging results in general...

There are a few issues with paging like the issue that in general relational first rows, max rows don't work as well once the query includes a join to a OneToMany because the row count is different from the object count. I have a few plans to address this issue specifically for web applications (sub-session scoping of entity lists) for paging and sorting of results for web apps. This means Ebean can/should make use of background fetching and a few other tricks... to efficently array fetch results as they are paged through (without the webapp having to worry about it).  Anyway, there is a plan to do that as a web specific addon module for Ebean (that includes a servlet filter and some other infrastructure bits).

> P.S.  I can't login to the http://www.avaje.org/ site anymore.  Did you decide to just move to google groups?

No, if you can't login there must be a problem. (There have be general SSL issue with my tomcat version that I need to pursue). 

Cheers, Rob.


Apoptosis

unread,
May 17, 2009, 5:26:04 PM5/17/09
to Ebean ORM
Agreed. Our particular web application is data heavy with 90% reads
and 10% writes. So the chance of a insert/delete between getting the
count and executing the modified query is low.

Your plans sound interesting, and something that would help me. BTW
what got me going again is I saw your in memory filtering additions
for 1.2. Unfortunately that doesn't help me, there is too much data
to load it all into memory first then page through.

Thanks for considering the rowCount() method.

rob bygrave

unread,
May 17, 2009, 5:54:15 PM5/17/09
to eb...@googlegroups.com
This is probably not suitable for your situation but (just in case) ...

... sometimes it can make sense to use the backgroundFetchAfter with a max rows limit.


Ebean.find(Customer.class)
    .where() ...
    .orderBy(...)
    .setBackgroundFetchAfter(20)
    .setMaxRows(500)
    .findList();

After the first 20 beans have been fetched... the fetch will continue in a background thread until it is finished or reached the max of 500 beans (and the foreground thread will return after loading the first 20 beans).

The downside to this approach compared with paging ... it that pretty much has to be stateful (keep a hold of the list between requests) and will use more memory as all 500 beans can be held in the list as opposed to just the ones the user has paged to.

If you do this, you may also want to know that the List, Set or Map returned by an Ebean query implements the BeanCollection interface that has 2 methods on it to tell you if there are more rows to fetch, and if the background fetching has finished. (Note to self: I should look to make this information publically accessible via a BeanCollectionState similar to the new BeanState object in v1.3)


Methods on BeanCollection:

    /**
     * Set to true if maxRows was hit and there are actually more rows
     * available.
     * <p>
     * Can be used by client code that is paging through results using
     * setFirstRow() setMaxRows(). If this returns true then the client can
     * display a 'next' button etc.
     * </p>
     */
    public boolean hasMoreRows();

    /**
     * Returns true if the fetch has finished. False if the fetch is continuing
     * in a background thread.
     */
    public boolean isFinishedFetch();

Apoptosis

unread,
Jun 27, 2009, 9:50:23 PM6/27/09
to Ebean ORM
Thanks for the suggestion, this won't work for my app. We are
filtering/paging through ~1 million records.

Not to mention if I am going to convince my team that EBean can
replace Hibernate for us I need to show a 1:1 mapping of hibernate to
ebean features we use. This is the last thing I need to make our site
using EBean 100% functional.

No pressure ;) If you point me in the right direction, perhaps I can
submit a patch.

Rob Bygrave

unread,
Jun 28, 2009, 6:28:45 PM6/28/09
to eb...@googlegroups.com
I'll have a look at this. 

We have been busy getting the next version of Ebean ready - its been a lot of work (DDL generation, ServerConfig refactoring + some refactoring of internals) but I think it is starting to come together now.

I think I'll need to at least get a query.rowCount() started for you... as I'll probably add some API to internal things (like CQueryEngine).


FYI: In the back of my mind (for v1.4)... I also have thoughts of a PagingList type query and result.

PagingList pagingList = query.findPagingList();

The idea being that the pagingList handles the paging (issuing the paging queries etc).... this enable some things like fetching the next page in background etc and generally making this simpler.


Cheers, Rob.

Apoptosis

unread,
Jun 29, 2009, 9:50:49 PM6/29/09
to Ebean ORM
Thank you. It would be much appreciated.

Jacob

On Jun 28, 6:28 pm, Rob Bygrave <robin.bygr...@gmail.com> wrote:
> I'll have a look at this.
>
> We have been busy getting the next version of Ebean ready - its been a lot
> of work (DDL generation, ServerConfig refactoring + some refactoring of
> internals) but I think it is starting to come together now.
>
> I think I'll need to at least get a query.rowCount() started for you... as
> I'll probably add some API to internal things (like CQueryEngine).
>
> FYI: In the back of my mind (for v1.4)... I also have thoughts of a
> PagingList type query and result.
>
> PagingList pagingList = query.findPagingList();
>
> The idea being that the pagingList handles the paging (issuing the paging
> queries etc).... this enable some things like fetching the next page in
> background etc and generally making this simpler.
>
> Cheers, Rob.
>

Rob Bygrave

unread,
Jul 1, 2009, 9:23:16 AM7/1/09
to eb...@googlegroups.com
Ok, I have got a working version of this...   query.findRowCount()


There are some implementation details worth noting... specifically that the query will be modified removing any joins or expressions on OneToMany or ManyToMany properties (or their children).

So the implementation basically goes:

1.     Make a copy of the query (to manipulate)
2.     remove any where expressions on many properties
         ... and determine joins required for where expressions
3a.   remove any joins to many properties
3b.   remove any joins not required for where expressions
4.     build select count(*) ... sql


Where expressions meaning... expressions in the where clause.

So in this example, the bits in read are effectively removed from the count query because they are on a OneToMany property ("details");

        Query<Order> query = server.createQuery(Order.class);
        query.join("customer");
        query.join("details");
        query.where()
            .gt("id", 5)
            .gt("customer.id", 1)
            .like("customer.name", "R%")
            .gt("details.shipQty", 4);

        int rc = query.findRowCount();

Resulting SQL:

select count(*)  from o_order o join o_customer oc on oc.id = o.customer_id   where o.id > ?  and o.customer_id > ?  and oc.name like ?


I'll tidy up and commit the code into HEAD.

Any questions/comments etc most welcome.


Cheers, Rob.

Mario Ivankovits

unread,
Jul 1, 2009, 9:31:39 AM7/1/09
to Ebean ORM
> Any questions/comments etc most welcome.

You would like to hear comments on this ... feature?

You should have comments:

THIS IS SOOOO COOOL!!!! Especially the implementation details you kept
in mind!

:-)

Apoptosis

unread,
Jul 1, 2009, 9:26:30 PM7/1/09
to Ebean ORM
Thank you so much, I can't wait to try it out.


Does the removing of joins to many properties effect subsequent calls
to "findList()" "findSet" etc?

I am trying to wrap my mind around why you removing OneToMany joins.
So you have one order to many details. And the original query is
looking for any orders with details with quantity greater than 4. So
say there are 10 total orders in the system with only 2 quantities
greater than 4. I would think you would want the findRowCount to
return 2 not 10.

I am probably missing something obvious...


Rob Bygrave

unread,
Jul 1, 2009, 9:53:11 PM7/1/09
to eb...@googlegroups.com
>> Does the removing of joins to many properties effect subsequent calls
to "findList()" "findSet" etc?

The first thing findRowCount does is create a copy of the query ... and the modifications occur on the copied query (not the original).  So no, this has no effect on subsequent calls to "findList()" "findSet" etc.



>> I am trying to wrap my mind around why you removing OneToMany joins.

Conceptually we don't want to count the "many's" ... however...



>>  I would think you would want the findRowCount to return 2 not 10.

You are right... and in fact I should change this to ... use distinct id ...

select count(*) from (
  select distinct o.id 
  from o_order o
  join o_customer oc on oc.id = o.customer_id  
  join o_order_detail od on od.order_id = o.id

  where o.id > ? 
  and o.customer_id > ? 
  and oc.name like ?
  and details.ship_qty > ?
)

So that means, I don't remove any expressions at all (on OneToMany's) ...

Hmm, that looks better.

Rob Bygrave

unread,
Jul 2, 2009, 6:06:00 AM7/2/09
to eb...@googlegroups.com
So, my first implementation was not right, it made a bad assumption and wanted to ignore the many's... but as been pointed out the many expressions can actually reduce the count (thanks) so we need to keep all expressions.


The implementation then becomes:
 
- A copy of the query is first made (so that it can be modified)

- Only joins that are required for the where expressions are included in the query.

- The expressions are checked to see if a "Many" join is included

- If a "Many" join is included then the query counts distinct id's....
 
                   select count(*) from ( select distinct {id column(s)} ... )

Else just count rows ...

                   select count(*) from ...


Testing that out with some expressions on many properties etc and looks good.


Cheers, Rob.


Some examples:


        Query<Order> query = server.createQuery(Order.class);
        query.join("customer");
        query.join("details");
        query.where()
            .gt("id", 5)
            .gt("customer.id", 1);

        int rc = query.findRowCount();     

        // we can run the query after getting the count...
        int rc2 = query.findList().size();

        if (rc != rc2) {
               throw new RuntimeException("No Rob nooooo !! ");
        }


// neither join is needed...


select count(*) 
from o_order o 
where o.id > ?  and o.kcustomer_id > ?



example 2:


        Query<Order> query = server.createQuery(Order.class);
        query.join("customer");
        query.join("details");
        query.where()
            .gt("id", 5)
            .gt("customer.id", 1)
            .like("customer.name", "R%");


// we need the customer join to support the like expression

select count(*) 
from o_order o
join o_customer oc on oc.id = o.kcustomer_id  
where o.id > ?  and o.kcustomer_id > ?  and oc.name like ?


example 3:


        Query<Order> query = server.createQuery(Order.class);
        query.join("customer");
        query.join("details");
        query.where()
            .gt("id", 5)
            .gt("customer.id", 1)
            .like("customer.name", "R%")
            .gt("details.id", 10);


// need to count distinct id's now... because we include a join to a "Many"  (the order details is the many)

select count(*) from (
  select distinct o.id c0 
  from o_order o
  left outer join o_order_detail od on od.order_id = o.id 
  join o_customer oc on oc.id = o.kcustomer_id  
  where o.id > ?  and o.kcustomer_id > ?  and oc.name like ?  and od.id > ?
)

Apoptosis

unread,
Jul 2, 2009, 9:25:46 PM7/2/09
to Ebean ORM
Rob,

With Example #3 I finally see where you were coming from. Your
concerned with outer joins on one to many relationships.

My question to you is why does the Query in example 3 result in a
outer join? Seems to me that if your asking for the orders that have
details with id greater than 10. You should still get one row back
for each order even if that order has more than one detail with an id
> 10.
I think a "normal" join is sufficient.

Is it the count(*) that that is making you do the outer join?
Perhaps it should be count(o.*). The Query is for the Order.class,
and not for the order class and associated data.

Enjoy the birth of our nation,
Jacob

P.S. Last night I attached a patch to this group for fixing some of
the instanceof generics warnings I get in eclipse 3.5

http://ebean.googlegroups.com/web/instanceof_generics_patch.txt?hl=en&gda=hicH0E8AAABPoFXMA9Qltpz6jjE0cxEkacs3RiaC-r1MGXA0Wa4WTRXwej0XTa0PPQl32xnN5IEiko-kjlaocE8D0xK6O2v9nHMhSp_qzSgvndaTPyHVdA

Rob Bygrave

unread,
Jul 3, 2009, 3:26:20 AM7/3/09
to eb...@googlegroups.com
>> My question to you is why does the Query in example 3 result in a outer join?

Well come back to this later... but it doesn't matter whether it was outer or inner in this case.

The issue here is to understand that the distinct is required... To explain why distinct is required... I'll go with an example .



>> I think a "normal" join is sufficient.

Yes, but the count would be the same in this case with inner or outer... and just making it inner is not sufficient. You need to include the distinct and then wrap that query...



        String base = "from o_order o left outer join o_order_detail od on od.order_id = o.id  where o.id > 0 and od.id > 0";

        String notWrapped = "select o.id "+base;
        String wrapped = "select distinct o.id "+base;


        // ... to show the distinct is required
       
        Integer listSize1 = Ebean.createSqlQuery(notWrapped).findList().size();
        Integer listSize2 = Ebean.createSqlQuery(wrapped).findList().size();

        System.out.println("A> "+listSize1+" != "+listSize2);

Outputs:    A> 6 != 3                               
 .... this shows the distinct in action



        // same thing... but using count(*) rather than just the length of the list
       
        String notWrappedCount = "select count(*) as ct "+base;
        String wrappedCount = "select count(*) as ct from (" + "select distinct o.id "+base+" )";
       
        Integer c1 = Ebean.createSqlQuery(notWrappedCount).findUnique().getInteger("ct");
        Integer c2 = Ebean.createSqlQuery(wrappedCount).findUnique().getInteger("ct");
       
        System.out.println("B> "+c1+" != "+c2);
       
Outputs:    B> 6 != 3
... same but using count(*)



        // doing the same thing with Orm query and findRowCount()

        Query<Order> query2 = Ebean.createQuery(Order.class).where().gt("id", 0).gt("details.id", 0).query();
        int d1 = query2.findRowCount();
        int d2 = query2.findList().size();
       
        System.out.println("Using findRowCount> "+d1+" = "+d2);
       
Outputs:     Using findRowCount> 3 = 3



The findRowCount SQL was:

select count(*) from ( select distinct o.id c0  from o_order o left outer join o_order_detail od on od.order_id = o.id   where o.id > ?  and od.id > ? )



Does that explain it... have I done enough to explain why the distinct is required?

Apoptosis

unread,
Jul 5, 2009, 8:40:12 PM7/5/09
to Ebean ORM
Alright, you convinced me. :)

I want to say thank you for sticking with me, you sounded a little
angry at the end. I am not trying to attack you I just couldn't see
the problem. In fact I had to go build a test case in SQL and run the
queries myself before I got it.

I have never run into this issue with sql, and it really surprised me
that this didn't work:

SELECT COUNT(o.*) FROM o_order o
LEFT OUTER JOIN o_order_detail od ON od.order_id = o.id
WHERE o.id > 0 AND od.id > 0
GROUP BY o.id

Well thank you again for this feature I will put it to good use,
hopefully others do too.

Jacob

Rob Bygrave

unread,
Jul 5, 2009, 10:38:11 PM7/5/09
to eb...@googlegroups.com
Great. I wasn't angry so sorry if that's how it came out.. I started an email trying to explain it... and then thought it was better to explain this stuff via example code... so hopefully that was ok to follow (it meant that people could try out the sql themselves which is usually helpful in understanding it).

Thank you for pointing out the flaw in the original approach.

Yeah, we need to be mindful of joins to OneToMany's (and ManyToMany's) in the context of ORM (and the potential duplicate data in the relational results from those queries).

Cheers, Rob.
Reply all
Reply to author
Forward
0 new messages