The 'DISTINCT' problem

88 views
Skip to first unread message

Mrugendra Bhure

unread,
Aug 1, 2015, 12:20:02 AM8/1/15
to sonata-users
Hey,

=> We have a MySQL database with over 40 million rows.

=> Did a simple "select id, field1, field2,.... from table limit 32", returns in 1 sec.

=> Built Admin over it using Sonata admin, takes forever to load practically the same thing.

=> When the query was analyzed, it turns out sonata pager adds a "DISTINCT" keyword to the select statement.

=> While I can understand its requirement in join queries, but when you have a query which includes the primary key, and no one-to-many joins, this practically kills performance. Pager needs to be smarter.  I had to comment this part of code to get things working.

=> Is there a better option, perhaps I override this method in extended class and use that as the pager ? 

Code: 
========
sonata-project\doctrine-orm-admin-bundle\Datagrid\ProxyQuery.php

/**
     * This method alters the query to return a clean set of object with a working
     * set of Object
     *
     * @param \Doctrine\ORM\QueryBuilder $queryBuilder
     *
     * @return \Doctrine\ORM\QueryBuilder
     */

   
private function getFixedQueryBuilder(QueryBuilder $queryBuilder)
   
{
        $queryBuilderId
= clone $queryBuilder;

       
// step 1 : retrieve the targeted class
        $from  
= $queryBuilderId->getDQLPart('from');
        $class
= $from[0]->getFrom();
        $metadata
= $queryBuilderId->getEntityManager()->getMetadataFactory()->getMetadataFor($class);

       
// step 2 : retrieve identifier columns
        $idNames
= $metadata->getIdentifierFieldNames();

       
// step 3 : retrieve the different subjects ids
        $selects
= array();
        $idxSelect
= '';
       
foreach ($idNames as $idName) {
            $select
= sprintf('%s.%s', $queryBuilderId->getRootAlias(), $idName);
           
// Put the ID select on this array to use it on results QB
            $selects
[$idName] = $select;
           
// Use IDENTITY if id is a relation too. See: http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html
           
// Should work only with doctrine/orm: ~2.2
            $idSelect
= $select;
           
if ($metadata->hasAssociation($idName)) {
                $idSelect
= sprintf('IDENTITY(%s) as %s', $idSelect, $idName);
           
}
            $idxSelect
.= ($idxSelect !== '' ? ', ' : '') . $idSelect;
       
}        

        $queryBuilderId
->resetDQLPart('select');
       
//$queryBuilderId->add('select', 'DISTINCT '.$idxSelect); //Commented by Mrugendra.
        $queryBuilderId
->add('select', $idxSelect);
       
       
// for SELECT DISTINCT, ORDER BY expressions must appear in idxSelect list
       
/* Consider
            SELECT DISTINCT x FROM tab ORDER BY y;
        For any particular x-value in the table there might be many different y
        values.  Which one will you use to sort that x-value in the output?
        */

       
// todo : check how doctrine behave, potential SQL injection here ...
       
if ($this->getSortBy()) {
            $sortBy
= $this->getSortBy();
           
if (strpos($sortBy, '.') === false) { // add the current alias
                $sortBy
= $queryBuilderId->getRootAlias().'.'.$sortBy;
           
}
            $sortBy
.= ' AS __order_by';
            $queryBuilderId
->addSelect($sortBy);
       
}

        $results    
= $queryBuilderId->getQuery()->execute(array(), Query::HYDRATE_ARRAY);
        $idxMatrix  
= array();
       
foreach ($results as $id) {
           
foreach ($idNames as $idName) {
                $idxMatrix
[$idName][] = $id[$idName];
           
}
       
}

       
// step 4 : alter the query to match the targeted ids
       
foreach ($idxMatrix as $idName => $idx) {
           
if (count($idx) > 0) {
                $idxParamName
= sprintf('%s_idx', $idName);
                $queryBuilder
->resetDQLPart("where"); //Added by mrugendra.
                $queryBuilder
->andWhere(sprintf('%s IN (:%s)', $selects[$idName], $idxParamName));
                $queryBuilder
->setParameter($idxParamName, $idx);
                $queryBuilder
->setMaxResults(null);
                $queryBuilder
->setFirstResult(null);
           
}
       
}

       
return $queryBuilder;
   
}


Reply all
Reply to author
Forward
0 new messages