byID returns null without doing database query

Nicolaas Thiemen Francken - Sunny Side Up

unread,

Apr 6, 2015, 8:56:29 PM4/6/15

to silverstripe-dev

Hi Everyone

To reduce database queries, I would like to propose that the byID method (in ArrayList and DataList) automatically returns null if the parameter provided is zero... without doing a database query.

e.g.

Is this a good idea? It is definitely a risky one, but it could also save a few DB queries - for example when you use:

$myParent = Page::get()->byID($this->ParentID)

Nicolaas

Patrick Nelson

unread,

Apr 6, 2015, 10:58:54 PM4/6/15

to silverst...@googlegroups.com

Not a SS pro here, but from what I can tell (and in my quick tests), ArrayList doesn't hit the database at all (which is the overridden method you've provided), so I don't think anything would need to be done there. But here's the current implementation on DataObject:

public function byID($id) {
   $baseClass = ClassInfo::baseDataClass($this->dataClass);
   return $this->where("\"$baseClass\".\"ID\" = " . (int)$id)->First();
}

Based on this, it's a guarantee that yes this does hit the database. And yes, since it's a DataObject, there's an extremely high chance that the ID column will be auto-incremented, so it's unlikely it'll ever be zero. However, there are a few issues with this:

Is this standard ORM behavior? i.e. Should you actually expect the ORM to prevent you from hitting the database just because the ID value that you provided looks like zero?
While probably unlikely, it's certainly possible for a real zero ID to exist in the database (via update).
If there's an issue with performance, is it because of logic built into the SS CMS/framework or is it due to application code on the developer's end? In either case, the zero check can happen there (where there's a slowdown).

Personally I'd recommend against it, primarily because I'd expect it to execute a query, even if I am passing it junk information. Also because there are other optimizations that either already exist (e.g. MySQL's query cache, or that this is an indexed look-up) or can easily be made (e.g. not looking up relations if the ID is already zero). Just my 2c.

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to silverstripe-d...@googlegroups.com.
To post to this group, send email to silverst...@googlegroups.com.
Visit this group at http://groups.google.com/group/silverstripe-dev.
For more options, visit https://groups.google.com/d/optout.

Nicolaas Thiemen Francken - Sunny Side Up

unread,

Apr 7, 2015, 6:45:41 AM4/7/15

to silverstripe-dev

Hi Patrick

Thank you for your comments.

On 7 April 2015 at 14:58, Patrick Nelson <p...@catchyour.com> wrote:

Not a SS pro here, but from what I can tell (and in my quick tests), ArrayList doesn't hit the database at all (which is the overridden method you've provided), so I don't think anything would need to be done there. But here's the current implementation on DataObject:

Yes - agreed, this is mainly about DataLists.

public function byID($id) {
   $baseClass = ClassInfo::baseDataClass($this->dataClass);
   return $this->where("\"$baseClass\".\"ID\" = " . (int)$id)->First();
}
Based on this, it's a guarantee that yes this does hit the database. And yes, since it's a DataObject, there's an extremely high chance that the ID column will be auto-incremented, so it's unlikely it'll ever be zero. However, there are a few issues with this:

Is this standard ORM behavior? i.e. Should you actually expect the ORM to prevent you from hitting the database just because the ID value that you provided looks like zero?

Interesting question - but does it matter? I think the proposal just makes the ORM a bit smarter and more efficient.

While probably unlikely, it's certainly possible for a real zero ID to exist in the database (via update).

while it is theoretically possible that the ID is zero, it will not happen in practice. At least, I have never heard of an ID being zero and I have never seen it ever. Have you?

We are talking about DataObjects here and all DataObjects have an AutoIncrement ID.

If there's an issue with performance, is it because of logic built into the SS CMS/framework or is it due to application code on the developer's end?

A developer can write:

$obj = MyDataObject::get()->byID($this->ParentID)

// or ... $obj = MyDataObject::get()->byID($this->MyRelationID)

if($obj) {

// do stuff

}

Of course a dev can also write:

if($this->ParentID) {

$obj = MyDataObject::get()->byID($this->ParentID);

//but then we still need to write:

if($obj) {

//do stuff

}

Why is avoiding DB calls important?

* speed is a major issue for SS framework

* DB is one of the main bottlenecks.

In either case, the zero check can happen there (where there's a slowdown).
Personally I'd recommend against it, primarily because I'd expect it to execute a query,

what benefit does that provide? I can see your point, but why would you want that expectations to hold true? There are many ways to run a query and byID is just a short-cut for a common type of query. It is in that light that you should see this proposal.

even if I am passing it junk information. Also because there are other optimizations that either already exist (e.g. MySQL's query cache, or that this is an indexed look-up) or can easily be made (e.g. not looking up relations if the ID is already zero).

I agree that this is a strong argument against the proposal. I also think that SELECT * From MyDataObject WHERE ID = 123 LIMIT 1 (first) should run pretty fast as far as DB queries go. However, in checking if ID = 0, you avoid a lot of additional code that needs running in order to even get to a DB query, etc...

Nicolaas

Patrick Nelson

unread,

Apr 7, 2015, 11:49:33 AM4/7/15

to silverst...@googlegroups.com

No problem :P Keep in mind that I'm mainly just playing devil's advocate on this since I could go either way, honestly (since I was leaning, not necessarily strongly opposed). I'm actually more for saving developer time (which is expensive) over optimization (while both are important). If you can do both, that's preferred. It's a double edged sword however if it causes difficulties in debugging (due to unexpected behavior), but that may not be a big risk here.

To me, it's still definitely a gray area, I'd love to get feedback from the SS devs on this, since to me it's actually just interesting purely on a theoretical basis. Continuing strictly on the devil's advocate stance (to help weigh the pros vs. cons):

> Is this standard ORM behavior? i.e. Should you actually expect the ORM to prevent you from hitting the database just because the ID value that you provided looks like zero?
Interesting question - but does it matter? I think the proposal just makes the ORM a bit smarter and more efficient.

At a framework level, details like these do matter due to impact from very broad usage.

Tangential point: DataObject's also have a static "::get_by_id()" method which is also bound to the ID column and this change, if implemented, should probably also be implemented here as well for consistency. That broadens the impact as well.

> While probably unlikely, it's certainly possible for a real zero ID to exist in the database (via update).

while it is theoretically possible that the ID is zero, it will not happen in practice. At least, I have never heard of an ID being zero and I have never seen it ever. Have you?
We are talking about DataObjects here and all DataObjects have an AutoIncrement ID.

Nope, I haven't. Again, framework level, so fringe cases may include schema built by other systems (i.e. possibly not via "dev/build" and etc). Then again, bugs caused by this change in that situation are probably to be expected anyway (albeit I need to be devil's advocate here). Another edge, again, is UPDATE statements affecting the ID column.

> If there's an issue with performance, is it because of logic built into the SS CMS/framework or is it due to application code on the developer's end?
A developer can write:
$obj = MyDataObject::get()->byID($this->ParentID)
// or ... $obj = MyDataObject::get()->byID($this->MyRelationID)
if($obj) {
// do stuff
}
Of course a dev can also write:
if($this->ParentID) {
$obj = MyDataObject::get()->byID($this->ParentID);
//but then we still need to write:
if($obj) {
//do stuff
}
}

Why is avoiding DB calls important?
* speed is a major issue for SS framework
* DB is one of the main bottlenecks.

Yes, DB is a major bottleneck (given hundreds or thousands of calls), but then again, they're also typically very fast on simple WHERE calls such as these, especially on generic/routine calls which may occur frequently (aforementioned query caching and indexed look ups on primary keys). Chances are this would be a memory hit with little to no CPU, if this happened frequently on any particular table (given the uniformity of the generated SQL statement I'm seeing).

> even if I am passing it junk information. Also because there are other optimizations that either already exist (e.g. MySQL's query cache, or that this is an indexed look-up) or can easily be made (e.g. not looking up relations if the ID is already zero).

I agree that this is a strong argument against the proposal. I also think that SELECT * From MyDataObject WHERE ID = 123 LIMIT 1 (first) should run pretty fast as far as DB queries go. However, in checking if ID = 0, you avoid a lot of additional code that needs running in order to even get to a DB query, etc...

You only avoid the coding if you're also avoiding the extra DB call as well. Since the same code can be written which doesn't do the zero check but does the extra call at the framework. This is where optimization at the application level (above the framework) could come into play, if it became inefficient....

However: Automatic optimization is a big plus.

Maybe a better place to perform the optimization would be at the lower level, the query builder level, i.e. the DataQuery or SQLQuery object prior to execution. It could be checking for instances of only one WHERE clause exists, has the "ID" column and a value of 0 and, if so, immediately return an empty set upon execution (instead of hitting the DB). In that situation, you'd get more of the same benefits, but the issues I've raised above become even more critical. On top of that, you add a little more CPU overhead in having to parse the single WHERE clause (if one exists). That's another approach though that might cover more use cases :D

Reply all

Reply to author

Forward