DataQuery Filter Syntax

1,036 views
Skip to first unread message

Andrew Short

unread,
Apr 18, 2011, 12:35:07 AM4/18/11
to SilverStripe Development Mailing List
Hi All,

As part of the new ORM work we're trying to figure out the best syntax to use for the new filtering system, which is a layer that will replace the SQL fragments passed to DataObject::get() and ::get_one(), and a bit nicer to work with. I've come up with an initial alpha version of what I think is a good syntax, but it still needs a lot of work. I'm looking to get some feedback before fleshing it out.You can see a basic implementation at https://gist.github.com/923835.

Examples
-------------

First, before I launch into details, just a couple of quick examples of the syntax:

Gets all blog entries posted in the last week with the tag "example-tag":

    DataList::create('BlogEntry')->filter(array(
        'Date:gte'   => date('Y-m-d', strtotime('-1 week')),
        'Tags.Title' => 'example-tag',
    ))

Gets all pages which have either the Title or MenuTitle to anything but "Home" or "About Us":

    DataList::create('SiteTree')->filter('Title,MenuTitle:not', array('Home', 'About Us'))

Details
---------

Now for a quick summary of the syntax:
  • DataQuery::filter() can accept either a single filter, or an array of filters at once.
  • Filters are in the format field(,field)*:(:not)?(:negated)?(:filter name)?
  • The field(s) to match against is a comma separated list of fields using dot syntax. Mostly you would only pass one field at a time in, but if you pass in multiple fields like in the Title,MenuTitle example, each field will have the filter applied and then they will be joined into a single clause with ORs.
  • You can pass in multiple values in an array to run the filter against. This works for most filters, but the IN filter only accepts simple arrays. If you run a filter against multiple values then it just creates a clause for each value and joins them with an OR.
  • The filter is passed in in the format ":filtername" after the field to match against. Each filter corresponds to a callback which returns an SQL fragment. You can also add ":not" and ":negate" before or after the filter. If you're only matching against one value these will do the same thing, but if you're matching against multiple values then ":not" will wrap the whole thing in a NOT(), whereas ":negate" will invert the operators.
    • $list->filter('Foo:not:like', array('Baz', 'Buzz')); => NOT("Foo" LIKE 'Baz' OR "Foo" LIKE 'Buzz')
    • $list->filter('Foo:not:negate', array('Baz', 'Buzz') => "Foo" NOT LIKE 'Baz' OR "Foo" NOT LIKE 'Buzz'
  • You can create your own filters - for example you might make a :withinmonth filter or something.
I think this covers most basic use cases, since it allows for fairly complete AND filters support, and support for basic OR filters. Anything more advanced is difficult to implement without something like rails' arel. What do people think of this syntax? Any changes you would make? A different approach you think is better?

There are a couple of other things I think would be good to implement, and am looking for ideas here:
  • Passing in SQL values as field names. For example, if you attach a more complex join to a DataQuery, it would be nice to be able to manually filter by "My_JoinTable"."Foo". Maybe if a field name begins with a double quote we just use it directly as an SQL identifier? e.g. you could do $list->filter('"My_JoinTable"."Baz":lt', 5);
  • The other thing is the casting of values to filters against. At the moment only raw string values are implemented. It would be nice to also support SQL fragments here, so you can filter against things like NOW(). Also, another thing that needs to be considered is things like date casting - it would be nice to be able to do $list->filter('Date:gt', strtotime('-1 week')) without having to manually create the date string. Should this be automatic, or should you have to pass in that a string should be treated as a string, sql fragment, or date?
It would be great to get feedback on this from as many people as possible, since this is something that is going to impact pretty much every project. I'm very open to suggestions, and am trying to gather as many alternatives to make a good decision early in the process.

Andrew Short

Uncle Cheese

unread,
Apr 18, 2011, 9:45:31 AM4/18/11
to SilverStripe Core Development
This looks great. Similar to Doctrine, but a lot more intuitive, IMO.
Just the right amount of abstraction.

Maybe I missed something, but could we see an example of an AND
connective filter?



On Apr 18, 12:35 am, Andrew Short <andrewjsh...@gmail.com> wrote:
> Hi All,
>
> As part of the new ORM work we're trying to figure out the best syntax to
> use for the new filtering system, which is a layer that will replace the SQL
> fragments passed to DataObject::get() and ::get_one(), and a bit nicer to
> work with. I've come up with an initial alpha version of what I think is a
> good syntax, but it still needs a lot of work. I'm looking to get some
> feedback before fleshing it out.You can see a basic implementation athttps://gist.github.com/923835.
>
> Examples
> -------------
>
> First, before I launch into details, just a couple of quick examples of the
> syntax:
>
> Gets all blog entries posted in the last week with the tag "example-tag":
>
>     DataList::create('BlogEntry')->filter(array(
>         'Date:gte'   => date('Y-m-d', strtotime('-1 week')),
>         'Tags.Title' => 'example-tag',
>     ))
>
> Gets all pages which have either the Title or MenuTitle to anything but
> "Home" or "About Us":
>
>     DataList::create('SiteTree')->filter('Title,MenuTitle:not',
> array('Home', 'About Us'))
>
> Details
> ---------
>
> Now for a quick summary of the syntax:
>
>    - DataQuery::filter() can accept either a single filter, or an array of
>    filters at once.
>    - Filters are in the format field(,field)*:(:not)?(:negated)?(:filter
>    name)?
>    - The field(s) to match against is a comma separated list of fields using
>    dot syntax. Mostly you would only pass one field at a time in, but if you
>    pass in multiple fields like in the Title,MenuTitle example, each field will
>    have the filter applied and then they will be joined into a single clause
>    with ORs.
>    - You can pass in multiple values in an array to run the filter against.
>    This works for most filters, but the IN filter only accepts simple arrays.
>    If you run a filter against multiple values then it just creates a clause
>    for each value and joins them with an OR.
>    - The filter is passed in in the format ":filtername" after the field to
>    match against. Each filter corresponds to a callback which returns an SQL
>    fragment. You can also add ":not" and ":negate" before or after the filter.
>    If you're only matching against one value these will do the same thing, but
>    if you're matching against multiple values then ":not" will wrap the whole
>    thing in a NOT(), whereas ":negate" will invert the operators.
>       - $list->filter('Foo:not:like', array('Baz', 'Buzz')); => NOT("Foo"
>       LIKE 'Baz' OR "Foo" LIKE 'Buzz')
>       - $list->filter('Foo:not:negate', array('Baz', 'Buzz') => "Foo" NOT
>       LIKE 'Baz' OR "Foo" NOT LIKE 'Buzz'
>    - You can create your own filters - for example you might make a
>    :withinmonth filter or something.
>
> I think this covers most basic use cases, since it allows for fairly
> complete AND filters support, and support for basic OR filters. Anything
> more advanced is difficult to implement without something like rails' arel.
> What do people think of this syntax? Any changes you would make? A different
> approach you think is better?
>
> There are a couple of other things I think would be good to implement, and
> am looking for ideas here:
>
>    - Passing in SQL values as field names. For example, if you attach a more
>    complex join to a DataQuery, it would be nice to be able to manually filter
>    by "My_JoinTable"."Foo". Maybe if a field name begins with a double quote we
>    just use it directly as an SQL identifier? e.g. you could do
> $list->filter('"My_JoinTable"."Baz":lt',
>    5);
>    - The other thing is the casting of values to filters against. At the

Andrew Short

unread,
Apr 18, 2011, 11:57:09 AM4/18/11
to silverst...@googlegroups.com
Hey,

You just pass multiple filters in to the filter() call and they're joined with an AND. For example, both:

$list->filter(array(
    'Foo' => 'Bar',
    'Baz' => 'Buzz'
));

and

$list->filter('Foo', 'Bar');
$list->filter('Baz', 'Buzz');

will result in "Foo" = 'Bar' AND "Baz" = 'Buzz'.

Andrew Short

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To post to this group, send email to silverst...@googlegroups.com.
To unsubscribe from this group, send email to silverstripe-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/silverstripe-dev?hl=en.


Andrew Short

unread,
Apr 18, 2011, 12:10:50 PM4/18/11
to silverst...@googlegroups.com
There's one other issue that I completely forgot to post about in my original post - the case sensitivity of comparisons. Different DB's have different rules for this - for example in MySQL both = and LIKE are case insensitive, whereas in Oracle neither are. Below is a quick table of the SQL needed to specify a case sensitive/insensitive filter (not sure if 100% correct):
+--------------------------------------+
| Engine | Sensitive | Insensitive |
+------------+-----------+-------------+
| MySQL | = BINARY | = |
| Postgres | = | ILIKE |
| SQL Server | COLLATE | = |
| Sqlite | = | LIKE |
| Oracle | = | REGEXP_LIKE |
+------------+-----------+-------------+
In Sam's original spec for the data filter syntax, he had a ":matchcase" operator which could be used to make a filter case sensitive, and text filters were case insensitive by defaylt. This would be pretty difficult to implement, as can be seen in the table above. In my opinion, we don't really need this - people have gotten by in the past just using whatever their DBMS defaults to. Making every query case insensitive by default would be reasonably difficult, and I'm also worried about slowdowns it may incur - for example in Oracle it would mean that every text filter by default would use a REGEXP_LIKE match, which I imagine would be quite expensive.

As such I propose that we just run with the default = and LIKE operators, using an :equals and :like filter type, and leave it up to people to change their table collations if they need case sensitivity or insensitivity. Maybe a good solution to cover most use cases would be to provide an :ilike filter, which is implemented by the current DB driver, and returns a case insensitive LIKE filter. Of course, this would probably need to be complemented by :istartswith, :iendswith and :icontains, which I think would cover most use cases.

Any thoughts? Has anyone done any projects, or can think of any areas in Sapphire/CMS where case sensitivity is important? The only one I can really think of is with the filesystem stuff, which already disallows multiple files with the same name but different cases.

Sam Minnée

unread,
Apr 18, 2011, 5:18:59 PM4/18/11
to silverst...@googlegroups.com

> people have gotten by in the past just using whatever their DBMS defaults to.

This is only true in an environment where you're doing all your dev on a single DBMS. SilverStripe/Sapphire aren't in that camp - this code is intended to be deployed to a number of different databases.

> Making every query case insensitive by default would be reasonably difficult, and I'm also worried about slowdowns it may incur

"Reasonably difficult" doesn't concern me too much, but "slowdowns" do.

> for example in Oracle it would mean that every text filter by default would use a REGEXP_LIKE match, which I imagine would be quite expensive.

I think that this is incorrect information, based on http://www.orafaq.com/node/91http://www.orafaq.com/node/91 - it's talking about a "new feature" but it's a blog post from 2005.

> As such I propose that we just run with the default = and LIKE operators, using an :equals and :like filter type, and leave it up to people to change their table collations if they need case sensitivity or insensitivity. Maybe a good solution to cover most use cases would be to provide an :ilike filter, which is implemented by the current DB driver, and returns a case insensitive LIKE filter. Of course, this would probably need to be complemented by :istartswith, :iendswith and :icontains, which I think would cover most use cases.

My main gripe with this is that it means that behaviour is no longer database-agnostic, and it would likely lead to really weird hard-to-debug bugs if you had a module that assumed case-insensitivity and then used it on a database that was case-sensitive. I think that we should make case-insensitive match the default, and provide a modifier for case-sensitive matches.

Sam Minnée

unread,
Apr 18, 2011, 5:40:31 PM4/18/11
to silverst...@googlegroups.com
> As part of the new ORM work we're trying to figure out the best syntax to use for the new filtering system, which is a layer that will replace the SQL fragments passed to DataObject::get() and ::get_one(), and a bit nicer to work with. I've come up with an initial alpha version of what I think is a good syntax, but it still needs a lot of work. I'm looking to get some feedback before fleshing it out.You can see a basic implementation at https://gist.github.com/923835.

Thanks for getting this done, AJShort! It's looking really good.

> You can also add ":not" and ":negate" before or after the filter. If you're only matching against one value these will do the same thing, but if you're matching against multiple values then ":not" will wrap the whole thing in a NOT(), whereas ":negate" will invert the operators.
> • $list->filter('Foo:not:like', array('Baz', 'Buzz')); => NOT("Foo" LIKE 'Baz' OR "Foo" LIKE 'Buzz')
> • $list->filter('Foo:not:negate', array('Baz', 'Buzz') => "Foo" NOT LIKE 'Baz' OR "Foo" NOT LIKE 'Buzz'

I assume that the second example is supposed to be "Foo:like:negate", but that's just a typo.

My main question here is whether we *need* two different styles of negation with subtly different behaviour. What do others think?

> I think this covers most basic use cases, since it allows for fairly complete AND filters support, and support for basic OR filters.

One simple way of introducing ORs would be to have a function or() that could be passed multiple filter arrays:

So this:
$list->or( array('Status' => 'Active'), array('LastEdited:gt' => '1 week ago') )

Would create the WHERE clause:
("Status" = 'Active') OR ("LastEdited" > NOW() - INTERVAL 1 WEEK)

(Note that I'm making some assumptions about how the date filters will work.)

> • Passing in SQL values as field names. For example, if you attach a more complex join to a DataQuery, it would be nice to be able to manually filter by "My_JoinTable"."Foo". Maybe if a field name begins with a double quote we just use it directly as an SQL identifier? e.g. you could do $list->filter('"My_JoinTable"."Baz":lt', 5);

I'd prefer it if we got things to a point where direct SQL fragments were only used in really weird edge cases, and that SQL-fragment-based manipulation was done with separate methods - where() rather than filter(). Otherwise we're more likely to get SQL-injection errors. If you're manipulating SQL-fragments directly, what's wrong with where()?

> The other thing is the casting of values to filters against. At the moment only raw string values are implemented.

I think that the filters should be aware of the field type and respond appropriately.

> It would be nice to also support SQL fragments here, so you can filter against things like NOW().

I'd prefer it if NOW() was abstracted, so that it could be CURRENT_DATE() on the databases which need that. I'd prefer it if now were handled specially by the date comparison filters.

> Also, another thing that needs to be considered is things like date casting - it would be nice to be able to do $list->filter('Date:gt', strtotime('-1 week')) without having to manually create the date string. Should this be automatic, or should you have to pass in that a string should be treated as a string, sql fragment, or date?

I think it would be good if we could support $list->filter('Date:gt', "1 week ago"), not to mention $list->filter('Date:gt', 'now'). The way to go that reliably would be for the gt filter to know the type of the field being compared. Potentially, the code that selects which filter function to run can be configured to use a different filter function depending on the field type. This would let you be tighter about comparing strings vs. comparing numbers, too:

function gt($field, $value) {
return "$field > '" . Convert::raw2sql($value) . "'";
}

function gt_numeric($field, $value) {
if(!is_numeric($value)) throw new InvalidArgumentException("'$value' isn't numeric");
return "$field > $value";
}

function gt_date($field, $value) {
$dateValue = date('Y-m-d H:i:s', strtotime($value));
return gt($field, $dateValue);
}

(Note: This code is just a sketch; the date parsing is naïve, would be in its own function, and it's not clear which object these functions sit inside)

Sam Minnée

unread,
Apr 18, 2011, 5:54:22 PM4/18/11
to SilverStripe Core Development
> One simple way of introducing ORs would be to have a function or() that could be passed multiple filter arrays:
>
> So this:
> $list->or( array('Status' => 'Active'), array('LastEdited:gt' => '1 week ago') )
>
> Would create the WHERE clause:
> ("Status" = 'Active') OR ("LastEdited" > NOW() - INTERVAL 1 WEEK)

It occurred to me after posting this that we could just as easily
support multiple array arguments to filter:

$list->filter( array('Status' => 'Active'), array('LastEdited:gt' =>
'1 week ago') )

The only thing I'd wonder is whether the rules by which each filter is
combined are too opaque:

- Items within each array are ANDed together
- Each array is then ORed together.

spronk

unread,
Apr 19, 2011, 9:54:23 PM4/19/11
to SilverStripe Core Development
Last time I considered this, we preferred the more readable or than
hidden rules surrounding arrays.

In terms of supporting more complex boolean logic without a SQL object
model, you could use nested filtering functions capable of generating
their own special data points.

$list->filter(
$list->or(
array('status' => 'Active'),
array('status' => 'Pending')
),
$list->and(
array('initialised' => true)
)
)

Conceptually, the or() and and() functions would return a special
object, say a QueryLogic object, capable of storing nested levels of
logic with knowledge of the logical operators used. You could also
take this one step further and change array() to $list-
>condition('status', 'Pending') to return a single level filter and
remove the semantically meaningless array() call entirely.

For implementation, the filter(), or(), and and() functions would all
use variable arguments that build QueryLogic nests by detecting
between QueryLogic and array arguments, i.e. if $filter sees:

QueryLogic(OR) {
QueryLogic(AND) {
array(), array()
}
}
array(x=>y)

it would create a wrapper logic object as follows:

QueryLogic(AND) {
// existing QL(OR)
// nest from existing QL(OR)
QueryLogic(AND) {
array(x=>y)
}
}

If you're not concerned about the return value of $filter (i.e. it's a
state change method), you can even alias and() and filter().

With regards to the filter text myself, I personally am not a big fan
of what I call "magic strings" and prefer to build things that can be
manipulated in object space, but if this is not feasable as you say
(ARel) I think the idea is reasonably sound.

For things like dates and "1 week ago", I would prefer to use
something like DateHelper::str("1 week ago") to avoid coupling a
single set of date logic to the ORM.

Anyhow, I like the RDBMS-agnostic idea, as well as the idea to keep
direct SQL to a minimum where possible.

Keith

spronk

unread,
Apr 19, 2011, 9:57:01 PM4/19/11
to SilverStripe Core Development
Oh yeah,

One benefit of keeping *something* around in object space (ARel) is
"smart folder"-type operations. Serialise your query object to the
database and bam, instant smart group.

Sam Minnée

unread,
Apr 19, 2011, 11:05:23 PM4/19/11
to silverst...@googlegroups.com

>> $list->filter(
>> $list->or(
>> array('status' => 'Active'),
>> array('status' => 'Pending')
>> ),
>> $list->and(
>> array('initialised' => true)
>> )
>> )
>>
>> Conceptually, the or() and and() functions would return a special
>> object, say a QueryLogic object, capable of storing nested levels of
>> logic with knowledge of the logical operators used.

or() and and() methods that return filter-segment objects seem like a pretty good way of allowing for arbitrarily nested query filters. My only concern with $list->or() and $list->and() is that you don't always have the DataList available in a variable.

We could use statics - DataList::or() and DataList::and() - and make them very simple methods:

function or() {
$args = func_get_args();
return new DataQuery_OrFilter($args);
}
function and() {
$args = func_get_args();
return new DataQuery_AndFilter($args);
}

So, we could implement this query:
SELECT * FROM Obj WHERE Status IN ('Active', 'Pending') OR (Status = 'Initialising' AND Initialised = 1)

With this request:

DataObject::get("Obj")->filter(DataList::or(
array("Status" => array("Active", "Pending")),
DataList::and(array(
"Status" => "Initialising",
"Initialised" => true
))
));

In this model, DataList::or() and DataList::and() can be passed a number of arguments, each of which is an array or a DataList_Filter. All of the array elements and the DataList_Filter objects are combined with either AND or OR as appropriate.

By default, filter() will combine the array or k/v passed with a DataList_AndFilter.

>> You could also
>> take this one step further and change array() to $list->condition('status', 'Pending') to return a single level filter and
>> remove the semantically meaningless array() call entirely.

This sounds like a path that would lead to something similar to PHPUnit's mock system, which I really dislike because it's verbose and byzantine.

>> With regards to the filter text myself, I personally am not a big fan
>> of what I call "magic strings" and prefer to build things that can be
>> manipulated in object space, but if this is not feasable as you say
>> (ARel) I think the idea is reasonably sound.

My main concern is that PHP isn't well-suited to building things in object space - you end up with really verbose code. So I see "magic strings" following a clearly defined API as being a pragmatic solution, which we already use for things like field definitions.

>> For things like dates and "1 week ago", I would prefer to use
>> something like DateHelper::str("1 week ago") to avoid coupling a
>> single set of date logic to the ORM.

This is a good example of the verbosity that I'd like to avoid.

spronk

unread,
Apr 20, 2011, 5:51:20 PM4/20/11
to SilverStripe Core Development
I think using static methods is a good idea. Would work off a shorter-
named instance as well in that case.

> My main concern is that PHP isn't well-suited to building things in object space - you end up with
> really verbose code.  So I see "magic strings" following a clearly defined API as being a pragmatic
> solution, which we already use for things like field definitions.

Ah, the eternal struggle between verbosity and friendliness to
autocompleters :). As I said, personal preference is toward the
autocompleter, but I'm not going to argue against the verbosity being
annoying. IMO, the field defs work because the are very simple and
analogous to what exists in the database layer already. I think the
same thing needs to be a cornerstone here, so I'll put a flag in the
ground saying keep it simple :]

> >> For things like dates and "1 week ago", I would prefer to use
> >> something like DateHelper::str("1 week ago") to avoid coupling a
> >> single set of date logic to the ORM.
>
> This is a good example of the verbosity that I'd like to avoid.

Fair enough. For string format though, I'll point you toward
DateTime::modify(). Perhaps something like "-1 week" could be used
instead of "1 week ago", assuming NOW as the base for modification, to
utilise an existing string format (could also be passed directly to
DT::modify() in the event that it begins with a + or -, as well).

Hamish Friedlander

unread,
Apr 20, 2011, 6:57:29 PM4/20/11
to silverst...@googlegroups.com
Generally I like where this is going, and I have to add my thanks to Andrew for his effort on this.

** On :not versus :negate

I think the language isn't clear enough for me to remember what the difference between the two is without having to look it up every time.

I'd rather see the addition of :any and :all, like:

$list->filter('Foo:not:like:any', array('Baz', 'Buzz'));
$list->filter('Foo:not:like:all', array('Baz', 'Buzz'));

This would also be useful when following a has_many, i.e.

$list->filter('Children.Title:any', array('Baz', 'Buzz')); // Matches if any child has a title of Baz or Buzz
$list->filter('Children.Title:all', array('Baz', 'Buzz')); // Matches if any child has a title of Baz, and any child has a title of Buzz

(Although Children isn't actually a has_many relationship - following custom joins will be interesting, see below)

That brings up the question of how to say "find a Page where _all_ of it's children match some filter". How about

$list->filter('Children:all.ClassName, array('BlogPage')); // Matches if all children are BlogPosts

** On matching two fields

Something I haven't seen covered is matching two fields against one another. This is exactly the situation we've run into in the template language, where traditionally <% if A = B %> meant if $A = "B", but we quickly ran into places this was a limitation.

Maybe we always need to mark fields somehow?

$topposts = $list->filter('$TopID', array('0', '$ID'));

Or the opposite - mark strings (although this is hella ugly)

$topposts = $list->filter('TopID', array('"0"', 'ID'));

Not sure what the right answer is here.

** On or and and

I dont like just or() and and() as global functions - they're DataList specific things, but "or" and "and" are general terms.

My feeling is that filter lists should be constructable without attaching to a specific class. Multiple calls to filter builds "ands", multiple arguments to filter builds "ors" (as already implicitly stated), so

A = 1 AND ((B = 2 OR C = 3) AND D = 4)

would be

DataList::create('DataObject')
  ->filter('A', 1)
  ->filter( DataList::create()->filter('B', 2, 'C', 3)->filter('D', 4) )

Potentially, DataList::filter could be a shortcut for an empty create, so

DataList::create('DataObject')
  ->filter('A', 1)
  ->filter( DataList::filter('B', 2, 'C', 3)->filter('D', 4') )

Trying to pass a DataList with sort or non-filter parts as an argument to filter would throw an error

** On custom joins

I'd like to see these created with names, so

DataList::create('DataObject')
  ->inner_join(
    DataList::join('My_Random_Table', 'Randoms')->on('$ID', '$Randoms.ID')
  )
  ->filter('$Randoms.Name', 'Pow!')

We'd need a solution to the "matching fields against fields" problem again - I've used $ marking in that example.

Agree that if you do use SQL it should be through a seperate function.

** On case sensitivity

I agree that just relying on DB natural behaviour isn't good enough in every situation. It would be nice to be able to say you don't care sometimes for speed - ideally on the field. Maybe:

class SomeModel {
  static $db = array(
    'Foo' => 'String',
    'Bar' => 'String
  );

  static $matching = array(
    'Foo' => 'insensitive',
    'Bar' => 'natural'
  );
}

Or just as an argument to the string type itself. Less flexible but nicer to read.

** On date matching

It would be nice to do date math, but this isn't something query specific and is better handled by a utility library.

** On following custom joins

Something I have to do fairly often is to write my join functions manually. One example of this is the Children() function on SiteTree. As it stands, it wouldn't be possible to do 

$withchildren = DataList::create('SiteTree')->filter('Children.ID:not', 'NULL');

Because Children is a custom function on SiteTree, not a has_many.

This is probably work-around-able by annotations and having join functions get passed the datalist, so we'd rewrite SiteTree#Children to look like

/** @join Children */
function joinToChildren($list) {
  $list
    ->left_join(
      DataList::join('SiteTree', 'Children')->on('$ID', '$Children.ParentID')
    )
}

Sapphire would be responsible for creating the Children function for this like it does for has_many items at the moment so that you can still do

$page->Children();

Hamish Friedlander
SilverStripe

Michael Gall

unread,
Apr 20, 2011, 10:13:15 PM4/20/11
to silverst...@googlegroups.com
I'm going to play the naysayer here. I'm not sure why we are creating a new dialect that is kind of like SQL, but not really

Foo:not:like
:any



On Thu, Apr 21, 2011 at 8:57 AM, Hamish Friedlander <ham...@silverstripe.com> wrote:
Generally I like where this is going, and I have to add my thanks to Andrew for his effort on this.

** On :not versus :negate

I think the language isn't clear enough for me to remember what the difference between the two is without having to look it up every time.

I'd rather see the addition of :any and :all, like:

$list->filter('', array('Baz', 'Buzz'));

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To post to this group, send email to silverst...@googlegroups.com.
To unsubscribe from this group, send email to silverstripe-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/silverstripe-dev?hl=en.



--
Checkout my new website: http://myachinghead.net
http://wakeless.net

Michael Gall

unread,
Apr 20, 2011, 10:21:15 PM4/20/11
to silverst...@googlegroups.com
Sorry about that. Let's start again.


I'm going to play the nayser here. I'm not sure why we are creating a new dialect here that is kind of like SQL, but not quite.

I'm not sure why we would use : as a delimiter when a space is the SQL delimiter and something like

Foo:not:like:any
Foo:gt

would be

'Foo not like ?' (or choose to have an implicit ?.
Foo >

The list goes on.

There really is very little difference processing wise, except one is valid SQL and the other is something else.

I suppose the question is, what if I want to use a space in my select? Well, the same question will apply if I want to use a : in a select as well, once this starts getting more complex there is going to be cases where we have to respect ' and `. So why not work with SQL to begin with.

I think you'll find everything is going to be easier to port from the beginning if we keep this as SQL like otherwise I see nothing but unneccessary learning curves ahead.


Cheers,


Michael


On Thu, Apr 21, 2011 at 12:13 PM, Michael Gall <mic...@wakeless.net> wrote:
I'm going to play the naysayer here. I'm not sure why we are creating a new dialect that is kind of like SQL, but not really





Sam Minnée

unread,
Apr 20, 2011, 11:38:35 PM4/20/11
to silverst...@googlegroups.com
> I'm going to play the nayser here. I'm not sure why we are creating a new dialect here that is kind of like SQL, but not quite.

Hehe, it's always good to have a naysayer in a discussion, even if it is annoying to have to defend an idea. :-P I have a few comments but no real conclusion:

- SQL is a much more open ended language than what we were envisioning, and I don't really want to sign up for writing a SQL parser and parsing those queries on every pageview. I've been resistant to an open-ended query syntax for that reason instead focusing on a simpler model that caters to 95% of cases well.

- Tying ourselves to SQL would make it more difficult to add non-relational back-ends to the ORM (well, I suppose it would be OXM in that case). Although it's not on the 3.0 roadmap, long-term I would like to see access to NoSQL databases and even APIs for services such as Twitter and Facebook accessed in a similar manner. A key => value matcher will be much easier to port to these services.

- My chief issue with Hamish's not:like:any suggestion is that it is heading down the path of making an open-ended parser. Perhaps it is okay but I'd want to dig into the implementation details.

- With that possible exception, the language isn't really all that much like SQL, particularly when you consider relation filters. It's really just SearchContext in different clothing.

Michael Gall

unread,
Apr 20, 2011, 11:43:08 PM4/20/11
to silverst...@googlegroups.com
Are there tests written for the selector/conditional syntax? If so, where are they located?


Cheers,


Michael

Sam Minnée

unread,
Apr 20, 2011, 11:45:59 PM4/20/11
to silverst...@googlegroups.com
Not yet, coming soon.

Sam Minnée

unread,
Apr 21, 2011, 12:03:29 AM4/21/11
to silverst...@googlegroups.com

On 21/04/2011, at 10:57 AM, Hamish Friedlander wrote:

> Generally I like where this is going, and I have to add my thanks to Andrew for his effort on this.
>
> ** On :not versus :negate
>
> I think the language isn't clear enough for me to remember what the difference between the two is without having to look it up every time.
>
> I'd rather see the addition of :any and :all, like:
>
> $list->filter('Foo:not:like:any', array('Baz', 'Buzz'));
> $list->filter('Foo:not:like:all', array('Baz', 'Buzz'));
>
> This would also be useful when following a has_many, i.e.
>
> $list->filter('Children.Title:any', array('Baz', 'Buzz')); // Matches if any child has a title of Baz or Buzz
> $list->filter('Children.Title:all', array('Baz', 'Buzz')); // Matches if any child has a title of Baz, and any child has a title of Buzz

Perhaps, rather than having 2 positions for modifiers, we ust
- :not / :none (synonyms) - none of the RHS items match
- :any - at least one of the RHS items match
- :all - all of the RHS items match

> That brings up the question of how to say "find a Page where _all_ of it's children match some filter". How about
> $list->filter('Children:all.ClassName, array('BlogPage')); // Matches if all children are BlogPosts

I don't think we need to support this:
- It makes the filter system substantially more complex
- I don't think SQL supports this without a bunch of mucking around with nested queries.

> ** On matching two fields

> $topposts = $list->filter('$TopID', array('0', '$ID'));

That seems reasonable although I'd say that for the LHS argument the $ should be optional.

> Or the opposite - mark strings (although this is hella ugly)
> $topposts = $list->filter('TopID', array('"0"', 'ID'));

This seems like it would open us to SQL injection errors and in general I hate this with the fire of 1000 suns. OK, maybe only 500 suns.

> I dont like just or() and and() as global functions - they're DataList specific things, but "or" and "and" are general terms.

I think I proposed DataList::or() and DataList::and().

> My feeling is that filter lists should be constructable without attaching to a specific class. Multiple calls to filter builds "ands", multiple arguments to filter builds "ors" (as already implicitly stated), so

Perhaps a DataList could be constructed without a class:

$filter = DataList::create()->filter('A', 1)->sort('B');

Or a separate class, DataFilter, could provide a similar interface:

$filter = DataFilter::create()->filter('A', 1)->sort('B')

And then an apply() method on DataList could apply the sorting & filtering of one object onto another

$list = DataList::create('SomeObj');
$list->apply($filter); // same as calling ->filter('A',1)->sort('B');

The DataFilter *could* be as simple as a recording of the chain of methods called & arguments passed, rather than having any logic of its own.

> ** On custom joins
>
> I'd like to see these created with names, so
>
> DataList::create('DataObject')
> ->inner_join(
> DataList::join('My_Random_Table', 'Randoms')->on('$ID', '$Randoms.ID')
> )
> ->filter('$Randoms.Name', 'Pow!')
>
> We'd need a solution to the "matching fields against fields" problem again - I've used $ marking in that example.

I'd hope that in 99% of cases you're not doing custom joins and are instead specifying custom relations. However, internally the DataList is going to need a system for attaching joins and something like this might be appropriate. And, if that were the case, then exposing it as a public-but-infrequently-used API seems acceptable.

> ** On following custom joins
>
> Something I have to do fairly often is to write my join functions manually. One example of this is the Children() function on SiteTree. As it stands, it wouldn't be possible to do
>
> $withchildren = DataList::create('SiteTree')->filter('Children.ID:not', 'NULL');
>
> Because Children is a custom function on SiteTree, not a has_many.

Any function that returns a DataList should be usable as a first-class relation. That is my goal. (Exactly how to implement that is a little tricky but I'm getting there).

Jeremy Lawson

unread,
Apr 25, 2011, 4:11:28 PM4/25/11
to SilverStripe Core Development
Can we have some examples of what it might be like to pass SQL
directly? Would it work in conjunction with a partially built filter?

DataList::create('SiteTree')->filter('A', '1')->where('(SELECT
COUNT(*) FROM SiteTree AS Child WHERE Child.ParentID=SiteTree.ID LIMIT
1) > 0');

... Only SiteTree records with A=1, and with at least 1 child.

A natural method of solving this particular example might be useful,
but the principle of inserting SQL remains.

Ingo Schommer

unread,
Apr 26, 2011, 4:59:12 PM4/26/11
to silverst...@googlegroups.com

Thanks Andrew! I really like where this is going - as Aaron said, just the right amount of abstraction.

Not a full blown SQL abstraction/dialect like Doctrine, but powerful enough to work for most

of the day-to-day queries.


Andrew Short wrote:

Gets all pages which have either the Title or MenuTitle to anything but "Home" or "About Us":
    DataList::create('SiteTree')->filter('Title,MenuTitle:not', array('Home', 'About Us'))
filter('Title,MenuTitle:not') is a bit ambiguous to read (is the not applying to both?).

While its a handy shortcut, I assume we'd support filter('Title:not,MenuTitle:not') as well?


Sam wrote:

> The other thing is the casting of values to filters against. At the moment only raw string values are implemented.

I think that the filters should be aware of the field type and respond appropriately.

How about field aliases and computed columns?

SELECT CONCAT("YearColumn",'-',"MonthColumn",'-',"DayColumn") AS "MyDate" WHERE "MyDate" > '2010-10-10';

We could add "MyDate" to DataObject::$casting, which I've done in the past - but seems a bit messy for a one-off query.

Should we be able to cast fields in a specific list instances manually?


By the way, we should ensure that any casting works with many_many_extraFields as well. 

Example: Player many_many Teams, Player_Teams.Position = ENUM('Player','Captain')

How would I find all Teams where a Player is Captain? DataList::create('Team')->filter('Players.Position', 'Captian') ?

Do we allow explicit joins of relations, or just implicitly through the field name dot syntax?


Or maybe its time to retire many_many_extraFields in favour of an intermediary object/table and

"through" relationships?


Sam wrote:
> As such I propose that we just run with the default = and LIKE operators, using an :equals and :like filter type, and leave it up to people to change their table collations if they need case sensitivity or insensitivity. Maybe a good solution to cover most use cases would be to provide an :ilike filter, which is implemented by the current DB driver, and returns a case insensitive LIKE filter. Of course, this would probably need to be complemented by :istartswith, :iendswith and :icontains, which I think would cover most use cases.
My main gripe with this is that it means that behaviour is no longer database-agnostic, and it would likely lead to really weird hard-to-debug bugs if you had a module that assumed case-insensitivity and then used it on a database that was case-sensitive.  I think that we should make case-insensitive match the default, and provide a modifier for case-sensitive matches.

I haven't come across case sensitivity requirements very often, but from a framework perspective "just use DBMS defaults" sounds like a big assumption.

We've already got hooks for date SQL selectors in the different database drivers, so could implement the feature matrix Andrew mentioned in these classes, right?


spronk wrote:
You could also
>> take this one step further and change array() to $list->condition('status', 'Pending') to return a single level filter and
>> remove the semantically meaningless array() call entirely.

Its more typing than array(), and a very common use case. Its a shame that we don't have [] to initialize arrays,

but I still prefer this to $list->condition() or DataList::condition.


> >> For things like dates and "1 week ago", I would prefer to use 
> >> something like DateHelper::str("1 week ago") to avoid coupling a 
> >> single set of date logic to the ORM. 

> This is a good example of the verbosity that I'd like to avoid. 

Assuming we'd use strtotime() for natural language parsing,

I can see how it might fall short in some cases - but even if we allow

arguments like "1 week ago", nothing it stopping you from doing natural language conversion

with your own libraries and just passing in a timestamp, right?


Hamish wrote:
Something I haven't seen covered is matching two fields against one another. This is exactly the situation we've run into in the template language, where traditionally <% if A = B %> meant if $A = "B", but we quickly ran into places this was a limitation.
Maybe we always need to mark fields somehow?
$topposts = $list->filter('$TopID', array('0', '$ID'));
Or the opposite - mark strings (although this is hella ugly)
$topposts = $list->filter('TopID', array('"0"', 'ID'));
DataList::create('DataObject')
  ->inner_join(
    DataList::join('My_Random_Table', 'Randoms')->on('$ID', '$Randoms.ID')
  )
  ->filter('$Randoms.Name', 'Pow!')
We'd need a solution to the "matching fields against fields" problem again - I've used $ marking in that example.

I don't like dollar field names in single quotes, it looks too much like native PHP. Do we use this anywhere else in sapphire?

Agreed that we need to mark fields somehow, but it seems far more common to compare to strings rather than field names,

so this should be the syntactically easier way. Django has F() for this:

http://docs.djangoproject.com/en/1.3/topics/db/queries/#filters-can-reference-fields-on-the-model

Yes, its a global function - but not semantically overloaded already,

and IMHO easier to understand than "$MyField".


John Milmine

unread,
Apr 29, 2011, 11:46:56 PM4/29/11
to SilverStripe Core Development
Wow some great discussion going on here!

Just gonna try pull some issues together here:
1) Static vs non-static functions (DataList::and() vs $list->add())
Not sure where I stand here, using DataList:: seems verbose to be, I
would either like short version maybe DB::, or solve the problem of
not having an instance that Sam bought up.

2) Date comparison
I agree with spronk that for things like NOW() - INTERVAL 1 WEEK it
would be better to assume now and use DateTime modify syntax like -1
week or +1 day etc.

3) Field name identification
Agree with Ingo that $ looks too much like native PHP. I quite like
':ID' which is like Ruby, however that may get confusing with the
filter syntax (:gt:any etc) talked about earlier.

As Hamish mentioned this was a problem in the template language, and I
can't see why this type of syntax couldn't be used there too. I think
this would help not having to learn different syntax on different
parts of the framework.

This inconsistency here is when do you use it (I know what the answer
is), but we could easily get developers using:
$list->filter(':Title', ':MenuTitle').
Hmmmmm...

4) Filter conditions
I agree with Michael, 'Title:gt' and any derivative of seems like some
new syntax to learn, I would strongly dissuade you from going down
this road. Parsing strings seems like something that should be
reserved for the template language.

Although it does limit how you use arrays I think this seems way
easier to understand:
$list->filter('Date', '>', '-1 year')
Although it might be better as
$list->filter('Title', 'Home', '!=') and have the operator optional
and if not supplied then defaults to '='.
This means you can still do $list->filter('Title,MenuTitle',
array('Home', 'About Us'), '!=')

5) DataList::create()?
Does this look weird to anyone else? DataList::create('SiteTree')
sounds like you're creating a SiteTree object. I think there are
better verbs to use here: http://groups.google.com/group/silverstripe-dev/browse_thread/thread/a4f7c1d6d42f87b7.
I like find, init, source, even object over create.

6) Advanced SQL properties (casting, if, sum, etc)
Agree with Ingo that there should be some support for it, even if you
guys just create the api and get the community to add support for
stuff as time goes by. The toughest thing
here will be multiple database languages.

Just my 20c.

Sam Minnée

unread,
May 1, 2011, 1:42:43 AM5/1/11
to SilverStripe Core Development
I've posted a commit here that adds a new object to work alongside
DataList and DataObject: DataModel.
https://github.com/silverstripe/sapphire/commit/7fbb919ce824ba3b80e0c294c90c9b42d048bdca

Essentially, if DataList represents a table, and DataObject represents
a record, then DataModel represents a database. Specifically, it
represents a named collection of DataLists, with a suitable default if
none is provided.

The other thing that this patch does is ensure at $this->model is
available on Controllers, RequestHandlers and DataObjects. Right now,
there are a few places where code may need to be modified in order to
avoid using the global DataModel, (returned by DataModel::inst());
passing $model as an argument to a RequestHandler constructor would be
one way around this.

This means that we can make most of our data queries via $this-
>model. For example.

$this->model->SiteTree->byID(5);

In the default configuration, $this->model->SiteTree would return the
DataList::create('SiteTree'). However, we would be able to 'site wide
settings' such as whether you're on stage or draft, or the language
you're currently viewing, by manipulating the DataModel, thus keeping
them out of the global state. We'd also (in principle) be able to set
up a separate DataModel for running tests, rather than having a lot of
global state settings relating to the database you currently have
open.

What do people think?

keith / spronk

unread,
May 1, 2011, 9:49:09 PM5/1/11
to SilverStripe Core Development
Nice organisational skills there Mr. Milmine :]

A few comments:
3) Filter Name Identification
Bloody PHP....! How about wrapping field names with {} as is used in
templates (albeit with a $...)? Or alternatively take a book from
MySQL and use ``, or MSSQL and use []?

4) Filter condition syntax
The issue with three arguments here is that it makes the ability to
accept many arguments as filter conditions very difficult. However, I
prefer the use of actual SQL operators (!<>= LIKE etc). I would vote
for something like "Field >" or "Field:>" over "Field:gt". In fact, I
would probably allow it both ways (to accommodate users of other
ORMs).

5) DataList::create()
How about for()? If not, my vote would be for source() or src().

6) Supporting SiteTree operations like Children
No. :)

This couples the ORM to the SiteTree. For relationships like Children
I would prefer to see the SiteTree API use something internally, i.e.
figuring out what exactly its children are, and creating a query based
on the database information from those, i.e. internally: $filter-
>('ParentID:equals', $this-Id);


7) Sam's latest commit:

Can you give a bit more information on the purpose of DataModel? To
make my point clearer, is there a reason for using it as opposed to
coupling DataObject? It seems to me that DataObject itself could
manage this without negatively affecting ability to inject different
database services etc...

It might just be that DataModel in the commit is missing a lot of the
code indicating its usefulness? :]
> better verbs to use here:http://groups.google.com/group/silverstripe-dev/browse_thread/thread/....

Sam Minnée

unread,
May 1, 2011, 10:53:33 PM5/1/11
to silverst...@googlegroups.com
> Can you give a bit more information on the purpose of DataModel? To
> make my point clearer, is there a reason for using it as opposed to
> coupling DataObject? It seems to me that DataObject itself could
> manage this without negatively affecting ability to inject different
> database services etc...

DataModel represents the *collection* of different DataObject classes. Hamish F suggested 'DataSchema' might be a better name.

So:
- A DataSchema is a named collection of DataLists
- A DataList is an ordered collection of DataObjects

Or, to put it another way
- $schema is a DataSchema
- $schema->SiteTree is a DataList
- $schema->SiteTree->byID(5) is a DataOject

If you prefer it could be $schema->SiteTree() rather than $schema->SiteTree, if you don't like the use of magic properties here.

You *could* represent this information as static methods on DataObject (this is how it's done in 2.4) but that seems inferior to having an object representing the collection of DataObject types.

Sam Minnée

unread,
May 1, 2011, 10:56:55 PM5/1/11
to silverst...@googlegroups.com
> 6) Supporting SiteTree operations like Children
> No. :)
>
> This couples the ORM to the SiteTree. For relationships like Children
> I would prefer to see the SiteTree API use something internally, i.e.
> figuring out what exactly its children are, and creating a query based
> on the database information from those, i.e. internally: $filter-
>> ('ParentID:equals', $this-Id);

The intention wouldn't be to couple the ORM to SiteTree. Rather, it would be to allow *any* function that returns a DataList to be used as if it were a relation. Perhaps there would be some special settings that would need to be added to the DataList, but the general concept is the same.

I agree that the ORM shouldn't be coupled to SiteTree.

keith / spronk

unread,
May 2, 2011, 12:21:07 AM5/2/11
to SilverStripe Core Development
> You *could* represent this information as static methods on DataObject (this is how it's done in 2.4) but that seems inferior to having an object representing the collection of DataObject types.

Any elaboration on why it seems inferior? I personally see the
collection of DataObject types as the classes themselves, and the
current implementation of the DataModel class doesn't seem to
really... do much.

What else are you thinking would DataModel/Schema be used for? i.e.
wrapping a database connection? abstracting the ORM API?

I could see it being useful if the DataModel was responsible for
creating the DataQuery objects, that way you could inject, for
example, a non-SQL data provider before returning DataList objects. As
it stands though, I don't see what DataModel is *doing* other than
being a non-flyweight Flyweight :)

Sam Minnée

unread,
May 2, 2011, 12:36:22 AM5/2/11
to silverst...@googlegroups.com

>> You *could* represent this information as static methods on DataObject (this is how it's done in 2.4) but that seems inferior to having an object representing the collection of DataObject types.
>
> Any elaboration on why it seems inferior? I personally see the
> collection of DataObject types as the classes themselves, and the
> current implementation of the DataModel class doesn't seem to
> really... do much.


You end up with 2 sets of mashed together with no real coherence - the instance methods that control individual data objects and the static methods that control datatype-wide settings. Static methods / properties are best avoided where possible because the limit your codes modularity; the most likely place where you will run into practical problems would be with unit testing.

If you use static methods it's also impossible for your code to work with two separate configurations in parallel. It would make sense, for example, to keep track of whether you're accessing the publish or the draft site as a property of the DataSchema/DataModel. If this information is stored in a static (as it currently is) it's awkward to perform operations where you compare content from the stage & published site: you need to query one, alter the static setting, query the other, and then manually ensure that your static settings have been reset correctly. Plenty of bugs, over the years, have arisen out of problems with this approach.

Although we might not be able to do away with singletons entirely (because passing the instances down the object graph introduces too much boilerplate) I would still like to move toward singleton instances of objects, rather than having large blocks of implementation code locked within static methods.

John Milmine

unread,
May 2, 2011, 5:21:23 AM5/2/11
to SilverStripe Core Development
Does DataModel/Schema give the ability to have multiple databases? If
not why not? It would seem you're going down that path?

Spronk and I had a ConnectionPool object handled database instances
and connection strings, however I'm not sure this is your intention at
all...

On the $schema->SiteTree vs $schema->SiteTree(), both are magic, I
don't see one being better than the other.

I think I prefer Schema, I generally refer to a DataObject as a model.




Ingo Schommer

unread,
May 2, 2011, 5:23:16 AM5/2/11
to silverst...@googlegroups.com
Hope I'm not getting too off topic here, but Symfony2/Doctrine has the concept of an "entity manager"
Its a much more radical approach, doing away with ActiveRecord completely
in favour of a more flexible database schema mapping, but acts on a similar level of abstraction.
I'm not suggesting we should go down that road (otherwise we'll never get SS3 out *g*),
but at least for me it was a very interesting read on other approaches to manage ORM information.

Marcus Nyeholt

unread,
May 2, 2011, 6:26:21 AM5/2/11
to silverst...@googlegroups.com
Yeah, I prefer that kind of approach (http://www.ezpdo.net is another one that's really nice to use). 

One big reason I'd like to see SS3 pick up some of this is to properly separate concerns - DataObject at the moment performs much of the functionality (using static methods and static collections) that an entity manager would normally do. Makes it very difficult to touch any of this as everything extends from it, instead of composition. 

Marcus

--

John Milmine

unread,
May 2, 2011, 4:55:24 PM5/2/11
to SilverStripe Core Development
I second the approach of a Object manager, spronk and I had one in the
framework we developed. I think it was called IdentityMap, it was a
static singleton which knew of every object loaded, so that you didn't
load the same object twice.

I bought this idea up on: http://groups.google.com/group/silverstripe-dev/browse_thread/thread/b629c95ed154411a

Agree with Ingo however that if it means 3 won't happen, then flag.
However if 3 contained an IdentityMap, DataQuery syntax and a Database
connection pool, my shopping list would be a lot smaller.

John Milmine

unread,
May 2, 2011, 4:59:10 PM5/2/11
to SilverStripe Core Development
I'm not sure if it was the same object or not (may have been
PersistanceManager), however we also had an object which enabled you
to group create (insert with multiple rows), and I believe group save
(had transaction support).

Sorry getting way off topic here.

keith / spronk

unread,
May 2, 2011, 7:08:55 PM5/2/11
to SilverStripe Core Development
Sam: I see your train of thought now and agree. I do think though that
statically accessible stuff could be used for the 'default' DataModel.

John: The bible on this, so to speak, is Fowler's Patterns of
Enterprise Application Architecture: http://martinfowler.com/eaaCatalog/.
In our impl, there were 3 classes involved. Object,
PersistenceManager, and IdentityMap. We used IdentityMap (http://
martinfowler.com/eaaCatalog/identityMap.html) to cache loaded objects
by their UUIDs in memory, loading using the LazyLoad pattern
(specifically in this case a Virtual proxy). The benefits to this are
obvious - hitting the database less for frequently used objects, and
also being able to manage memory more easily by ensuring only one copy
of an object is loaded at any one time. The downside is a possible
increase in execution complexity and memory use for small datasets.

Our PersistenceManager was utilised by the IdentityMap to perform the
loading and saving for any object inheriting from the base DataObject.
It's a conceptual extension to the Data Mapper pattern from PoEAA,
except with the ability to map any class complying to the data
definition logic (in SS's case, this is $db, $has_one, $has_many etc).

If you take this idea one step further you can see how you can
abstract away the actual implementation of data from the model's
structure. For example, the DataModel/Schema (or PersistenceManager,
DataMapper, whatever you want to call it) could provide a mapping
between source data and model data; i.e. a SerialisedObjectDataModel
could be instantiated and used to provide access to a stub, non-sql
source for testing, for example.

Sam Minnée

unread,
May 2, 2011, 8:18:52 PM5/2/11
to silverst...@googlegroups.com
On the DataMapper thing, A few thoughts:

1) The DataQuery is currently responsible for extracting records from the database. It decides which tables to extract data from, how to join tables together to make multi-table inheritance work. For this reason, the DataQuery could be renamed to DefaultDataMapper and be given the additional responsibility of *writing* data to the database.

2) DataMapper could be an interface that DefaultDataMapper implements. Other providers (for alternative database schemas, RESTful APIs, NoSQL databases) could also implement DataMapper.

3) In this new form, the DataMapper could be configured such that the "base responsibilities" for getting data from the database - the interpretation of Sapphire's multi-table inheritance, etc - could be kept distinct from the filter, sort, and join operations that are applied to the DataList. In this model, DataList::filter() would update a property on DataList, that is passed to a method of DataMapper when DataList::getIterator() is called. Right now, DataList::filter() passes the data straight through to DataQuery::filter(). (This will probably only make sense if you've read the code of DataList and DataQuery.)

4) To create new records, newObject() should be called on DataList or DataMapper, rather than directly creating a new record:
$page = DataList::create('SiteTree')->newObject();

Or, using the DataSchema:
$page = $this->schema->SiteTree->newObject();

5) For this reason, every DataObject can have a mapper associated with it. The code for DataObject::write() can then be simplified to this:
function write() {
$this->dataMapper->writeObject($this);
}

6) Under this scheme I would still expect DataObject to be responsible for keeping track of which fields have changed, and providing that to DataMapper::writeObject() through some interface. I'm not sure whether DataObject::write() or DataMapper::writeObject() should be responsible for triggering the DataObject validation.

7) The DataMapper can also be responsible for ensuring that all records live within an IdentityMap, although I would expect that the IdentityMap is implemented as a separate helper class that DataMapper calls upon.

Simon J Welsh

unread,
May 6, 2011, 12:45:30 AM5/6/11
to silverst...@googlegroups.com
On 1/05/2011, at 5:42 PM, Sam Minnée wrote:

> This means that we can make most of our data queries via $this->model. For example.
>
> $this->model->SiteTree->byID(5);

Have you considered how this would handle namespaced classes? I don't think PHP would like $this->model->rentbox\tenancy\Inspection (for example) all that much.
---
Simon Welsh
Admin of http://simon.geek.nz/

Who said Microsoft never created a bug-free program? The blue screen never, ever crashes!

http://www.thinkgeek.com/brain/gimme.cgi?wid=81d520e5e

Sam Minnée

unread,
May 6, 2011, 12:52:38 AM5/6/11
to silverst...@googlegroups.com
Good point, we'd need to think about that. There are a few options, I guess:

- Don't support namespaces in the DataSchema. Yes, by default the "SiteTree" in the schema corresponds to the name of the class SiteTree, but this is could be treated as a convention. If you want to pull together models from different PHP namespaces then we could provide a clean way for a developer to define their schema.

- Because $this->model is an instance property, we could by default look for models within the same namespace, limiting the number of times that you need to explicitly specify the namespace.

- Have an optional method that accepts a string that you can use to pass namespaces: $this->model->get('rentbox\\tenancy\\Inspecition')->byID(5). It's more verbose but hopefully it would only need to be rarely used?

Marcus Nyeholt

unread,
May 6, 2011, 5:01:25 AM5/6/11
to silverst...@googlegroups.com
How would namespaced model objects be stored? Would the table name take into account the namespace, going back to rentbox_tenancy_Inspection or similar? 



--

Simon J Welsh

unread,
May 6, 2011, 5:11:32 AM5/6/11
to silverst...@googlegroups.com
On 6/05/2011, at 4:52 PM, Sam Minnée wrote:

> Good point, we'd need to think about that. There are a few options, I guess:
>
> - Don't support namespaces in the DataSchema. Yes, by default the "SiteTree" in the schema corresponds to the name of the class SiteTree, but this is could be treated as a convention. If you want to pull together models from different PHP namespaces then we could provide a clean way for a developer to define their schema.

That kinda defeats the point of namespaces as you'll have to do conflict resolving when two classes have the same name (I have a lot of Controller classes for instance).

> - Because $this->model is an instance property, we could by default look for models within the same namespace, limiting the number of times that you need to explicitly specify the namespace.

The majority of my lookups are cross-namespace, so that'll only help in most cases, not to mention I don't think $this->model can get the namespace of $this.

> - Have an optional method that accepts a string that you can use to pass namespaces: $this->model->get('rentbox\\tenancy\\Inspecition')->byID(5). It's more verbose but hopefully it would only need to be rarely used?

Since most people wouldn't use namespaces, you could potentially get away with making the optional method __get. Providing a method fallback would be my preferred way of doing it.

On 6/05/2011, at 9:01 PM, Marcus Nyeholt wrote:

> How would namespaced model objects be stored? Would the table name take into
> account the namespace, going back to rentbox_tenancy_Inspection or similar?


MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've got a decent amount of such tables), so long as it is escaped properly. When I have the time, I'll be redoing my namespace patch for the 3.0 parser, and will update all the places a class name is sent to the database to actually escape the class name, now that names aren't guaranteed to be simple strings. (I've spent too many hours tracking done where another Convert::raw2sql($class) was needed to not but the effort into making sure they show up in 3.0).

The main problem is actually namespaced template files on Windows, so it's a potential caveat for those developing or deploying to a Windows-based machine.

Simon

unread,
May 6, 2011, 5:28:47 AM5/6/11
to SilverStripe Core Development
> On 6/05/2011, at 9:01 PM, Marcus Nyeholt wrote:
>
> > How would namespaced model objects be stored? Would the table name take into
> > account the namespace, going back to rentbox_tenancy_Inspection or similar?
>
> MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've got a decent amount of such tables), so long as it is escaped properly. When I have the time, I'll be redoing my namespace patch for the 3.0 parser, and will update all the places a class name is sent to the database to actually escape the class name, now that names aren't guaranteed to be simple strings. (I've spent too many hours tracking done where another Convert::raw2sql($class) was needed to not but the effort into making sure they show up in 3.0).

Support was added in MySQL 5.1.6 (see http://dev.mysql.com/doc/refman/5.1/en/identifiers.html)
and Postgres 8.3 has support as well (no idea on the lower limit
though). Both require you to quote the table name though, which you
should be doing anyway.

Marcus Nyeholt

unread,
May 6, 2011, 5:37:06 AM5/6/11
to silverst...@googlegroups.com

MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've got a decent amount of such tables), so long as it is escaped properly.

What about pgsql, mssql, sqlite, oracle, keyval stores? While the issue of mapping same named classes is very rare in languages that have namespaces, a typical way of resolving it is to just specify a mapping from classname -> tablename. I'd rather introduce that than have tables\named\like\this. My view is that the DB layer does not need namespacing; if you think you do, you're probably doing something horrible :). 
 

The main problem is actually namespaced template files on Windows, so it's a potential caveat for those developing or deploying to a Windows-based machine.

One thing to keep in mind is that SS is MS certified now, so anything introduced needs to run in IIS to become core (and template namespacing would be something nice to have :)). 

Simon J Welsh

unread,
May 6, 2011, 5:59:36 AM5/6/11
to silverst...@googlegroups.com
On 6/05/2011, at 9:37 PM, Marcus Nyeholt wrote:

>> MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to
>> encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've
>> got a decent amount of such tables), so long as it is escaped properly.
>
> What about pgsql, mssql, sqlite, oracle, keyval stores? While the issue of
> mapping same named classes is very rare in languages that have namespaces, a
> typical way of resolving it is to just specify a mapping from classname ->
> tablename. I'd rather introduce that than have tables\named\like\this. My
> view is that the DB layer does not need namespacing; if you think you do,
> you're probably doing something horrible :).

I'm of the opinion that if a DBMS doesn't allow something in its table names, then the DB wrapper should be doing some conversion rather than converting it in every case. We can't use underscores (imagine the class Page\Live), so it's not likely to be a pretty solution.

>> The main problem is actually namespaced template files on Windows, so it's
>> a potential caveat for those developing or deploying to a Windows-based
>> machine.
>
> One thing to keep in mind is that SS is MS certified now, so anything
> introduced needs to run in IIS to become core (and template namespacing
> would be something nice to have :)).

I don't have a Windows machine to test, but what happens if you either copy a file with \s in its name to a Windows file share or try saving such a file directly in Windows? If it borks, there'll need to be some workaround there as well, which I don't think replacing with underscores would work in due to potential conflicts with action-specific templates and we can't rely on case differences here either. A space or + or something else that can be used in the filesystem but not in a class name would work here.

Reply all
Reply to author
Forward
0 new messages