--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To post to this group, send email to silverst...@googlegroups.com.
To unsubscribe from this group, send email to silverstripe-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/silverstripe-dev?hl=en.
+--------------------------------------+| Engine | Sensitive | Insensitive |+------------+-----------+-------------+| MySQL | = BINARY | = || Postgres | = | ILIKE || SQL Server | COLLATE | = || Sqlite | = | LIKE || Oracle | = | REGEXP_LIKE |+------------+-----------+-------------+
This is only true in an environment where you're doing all your dev on a single DBMS. SilverStripe/Sapphire aren't in that camp - this code is intended to be deployed to a number of different databases.
> Making every query case insensitive by default would be reasonably difficult, and I'm also worried about slowdowns it may incur
"Reasonably difficult" doesn't concern me too much, but "slowdowns" do.
> for example in Oracle it would mean that every text filter by default would use a REGEXP_LIKE match, which I imagine would be quite expensive.
I think that this is incorrect information, based on http://www.orafaq.com/node/91http://www.orafaq.com/node/91 - it's talking about a "new feature" but it's a blog post from 2005.
> As such I propose that we just run with the default = and LIKE operators, using an :equals and :like filter type, and leave it up to people to change their table collations if they need case sensitivity or insensitivity. Maybe a good solution to cover most use cases would be to provide an :ilike filter, which is implemented by the current DB driver, and returns a case insensitive LIKE filter. Of course, this would probably need to be complemented by :istartswith, :iendswith and :icontains, which I think would cover most use cases.
My main gripe with this is that it means that behaviour is no longer database-agnostic, and it would likely lead to really weird hard-to-debug bugs if you had a module that assumed case-insensitivity and then used it on a database that was case-sensitive. I think that we should make case-insensitive match the default, and provide a modifier for case-sensitive matches.
Thanks for getting this done, AJShort! It's looking really good.
> You can also add ":not" and ":negate" before or after the filter. If you're only matching against one value these will do the same thing, but if you're matching against multiple values then ":not" will wrap the whole thing in a NOT(), whereas ":negate" will invert the operators.
> • $list->filter('Foo:not:like', array('Baz', 'Buzz')); => NOT("Foo" LIKE 'Baz' OR "Foo" LIKE 'Buzz')
> • $list->filter('Foo:not:negate', array('Baz', 'Buzz') => "Foo" NOT LIKE 'Baz' OR "Foo" NOT LIKE 'Buzz'
I assume that the second example is supposed to be "Foo:like:negate", but that's just a typo.
My main question here is whether we *need* two different styles of negation with subtly different behaviour. What do others think?
> I think this covers most basic use cases, since it allows for fairly complete AND filters support, and support for basic OR filters.
One simple way of introducing ORs would be to have a function or() that could be passed multiple filter arrays:
So this:
$list->or( array('Status' => 'Active'), array('LastEdited:gt' => '1 week ago') )
Would create the WHERE clause:
("Status" = 'Active') OR ("LastEdited" > NOW() - INTERVAL 1 WEEK)
(Note that I'm making some assumptions about how the date filters will work.)
> • Passing in SQL values as field names. For example, if you attach a more complex join to a DataQuery, it would be nice to be able to manually filter by "My_JoinTable"."Foo". Maybe if a field name begins with a double quote we just use it directly as an SQL identifier? e.g. you could do $list->filter('"My_JoinTable"."Baz":lt', 5);
I'd prefer it if we got things to a point where direct SQL fragments were only used in really weird edge cases, and that SQL-fragment-based manipulation was done with separate methods - where() rather than filter(). Otherwise we're more likely to get SQL-injection errors. If you're manipulating SQL-fragments directly, what's wrong with where()?
> The other thing is the casting of values to filters against. At the moment only raw string values are implemented.
I think that the filters should be aware of the field type and respond appropriately.
> It would be nice to also support SQL fragments here, so you can filter against things like NOW().
I'd prefer it if NOW() was abstracted, so that it could be CURRENT_DATE() on the databases which need that. I'd prefer it if now were handled specially by the date comparison filters.
> Also, another thing that needs to be considered is things like date casting - it would be nice to be able to do $list->filter('Date:gt', strtotime('-1 week')) without having to manually create the date string. Should this be automatic, or should you have to pass in that a string should be treated as a string, sql fragment, or date?
I think it would be good if we could support $list->filter('Date:gt', "1 week ago"), not to mention $list->filter('Date:gt', 'now'). The way to go that reliably would be for the gt filter to know the type of the field being compared. Potentially, the code that selects which filter function to run can be configured to use a different filter function depending on the field type. This would let you be tighter about comparing strings vs. comparing numbers, too:
function gt($field, $value) {
return "$field > '" . Convert::raw2sql($value) . "'";
}
function gt_numeric($field, $value) {
if(!is_numeric($value)) throw new InvalidArgumentException("'$value' isn't numeric");
return "$field > $value";
}
function gt_date($field, $value) {
$dateValue = date('Y-m-d H:i:s', strtotime($value));
return gt($field, $dateValue);
}
(Note: This code is just a sketch; the date parsing is naïve, would be in its own function, and it's not clear which object these functions sit inside)
or() and and() methods that return filter-segment objects seem like a pretty good way of allowing for arbitrarily nested query filters. My only concern with $list->or() and $list->and() is that you don't always have the DataList available in a variable.
We could use statics - DataList::or() and DataList::and() - and make them very simple methods:
function or() {
$args = func_get_args();
return new DataQuery_OrFilter($args);
}
function and() {
$args = func_get_args();
return new DataQuery_AndFilter($args);
}
So, we could implement this query:
SELECT * FROM Obj WHERE Status IN ('Active', 'Pending') OR (Status = 'Initialising' AND Initialised = 1)
With this request:
DataObject::get("Obj")->filter(DataList::or(
array("Status" => array("Active", "Pending")),
DataList::and(array(
"Status" => "Initialising",
"Initialised" => true
))
));
In this model, DataList::or() and DataList::and() can be passed a number of arguments, each of which is an array or a DataList_Filter. All of the array elements and the DataList_Filter objects are combined with either AND or OR as appropriate.
By default, filter() will combine the array or k/v passed with a DataList_AndFilter.
>> You could also
>> take this one step further and change array() to $list->condition('status', 'Pending') to return a single level filter and
>> remove the semantically meaningless array() call entirely.
This sounds like a path that would lead to something similar to PHPUnit's mock system, which I really dislike because it's verbose and byzantine.
>> With regards to the filter text myself, I personally am not a big fan
>> of what I call "magic strings" and prefer to build things that can be
>> manipulated in object space, but if this is not feasable as you say
>> (ARel) I think the idea is reasonably sound.
My main concern is that PHP isn't well-suited to building things in object space - you end up with really verbose code. So I see "magic strings" following a clearly defined API as being a pragmatic solution, which we already use for things like field definitions.
>> For things like dates and "1 week ago", I would prefer to use
>> something like DateHelper::str("1 week ago") to avoid coupling a
>> single set of date logic to the ORM.
This is a good example of the verbosity that I'd like to avoid.
Generally I like where this is going, and I have to add my thanks to Andrew for his effort on this.** On :not versus :negateI think the language isn't clear enough for me to remember what the difference between the two is without having to look it up every time.I'd rather see the addition of :any and :all, like:
$list->filter('', array('Baz', 'Buzz'));
--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.
To post to this group, send email to silverst...@googlegroups.com.
To unsubscribe from this group, send email to silverstripe-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/silverstripe-dev?hl=en.
I'm going to play the naysayer here. I'm not sure why we are creating a new dialect that is kind of like SQL, but not really
Hehe, it's always good to have a naysayer in a discussion, even if it is annoying to have to defend an idea. :-P I have a few comments but no real conclusion:
- SQL is a much more open ended language than what we were envisioning, and I don't really want to sign up for writing a SQL parser and parsing those queries on every pageview. I've been resistant to an open-ended query syntax for that reason instead focusing on a simpler model that caters to 95% of cases well.
- Tying ourselves to SQL would make it more difficult to add non-relational back-ends to the ORM (well, I suppose it would be OXM in that case). Although it's not on the 3.0 roadmap, long-term I would like to see access to NoSQL databases and even APIs for services such as Twitter and Facebook accessed in a similar manner. A key => value matcher will be much easier to port to these services.
- My chief issue with Hamish's not:like:any suggestion is that it is heading down the path of making an open-ended parser. Perhaps it is okay but I'd want to dig into the implementation details.
- With that possible exception, the language isn't really all that much like SQL, particularly when you consider relation filters. It's really just SearchContext in different clothing.
> Generally I like where this is going, and I have to add my thanks to Andrew for his effort on this.
>
> ** On :not versus :negate
>
> I think the language isn't clear enough for me to remember what the difference between the two is without having to look it up every time.
>
> I'd rather see the addition of :any and :all, like:
>
> $list->filter('Foo:not:like:any', array('Baz', 'Buzz'));
> $list->filter('Foo:not:like:all', array('Baz', 'Buzz'));
>
> This would also be useful when following a has_many, i.e.
>
> $list->filter('Children.Title:any', array('Baz', 'Buzz')); // Matches if any child has a title of Baz or Buzz
> $list->filter('Children.Title:all', array('Baz', 'Buzz')); // Matches if any child has a title of Baz, and any child has a title of Buzz
Perhaps, rather than having 2 positions for modifiers, we ust
- :not / :none (synonyms) - none of the RHS items match
- :any - at least one of the RHS items match
- :all - all of the RHS items match
> That brings up the question of how to say "find a Page where _all_ of it's children match some filter". How about
> $list->filter('Children:all.ClassName, array('BlogPage')); // Matches if all children are BlogPosts
I don't think we need to support this:
- It makes the filter system substantially more complex
- I don't think SQL supports this without a bunch of mucking around with nested queries.
> ** On matching two fields
> $topposts = $list->filter('$TopID', array('0', '$ID'));
That seems reasonable although I'd say that for the LHS argument the $ should be optional.
> Or the opposite - mark strings (although this is hella ugly)
> $topposts = $list->filter('TopID', array('"0"', 'ID'));
This seems like it would open us to SQL injection errors and in general I hate this with the fire of 1000 suns. OK, maybe only 500 suns.
> I dont like just or() and and() as global functions - they're DataList specific things, but "or" and "and" are general terms.
I think I proposed DataList::or() and DataList::and().
> My feeling is that filter lists should be constructable without attaching to a specific class. Multiple calls to filter builds "ands", multiple arguments to filter builds "ors" (as already implicitly stated), so
Perhaps a DataList could be constructed without a class:
$filter = DataList::create()->filter('A', 1)->sort('B');
Or a separate class, DataFilter, could provide a similar interface:
$filter = DataFilter::create()->filter('A', 1)->sort('B')
And then an apply() method on DataList could apply the sorting & filtering of one object onto another
$list = DataList::create('SomeObj');
$list->apply($filter); // same as calling ->filter('A',1)->sort('B');
The DataFilter *could* be as simple as a recording of the chain of methods called & arguments passed, rather than having any logic of its own.
> ** On custom joins
>
> I'd like to see these created with names, so
>
> DataList::create('DataObject')
> ->inner_join(
> DataList::join('My_Random_Table', 'Randoms')->on('$ID', '$Randoms.ID')
> )
> ->filter('$Randoms.Name', 'Pow!')
>
> We'd need a solution to the "matching fields against fields" problem again - I've used $ marking in that example.
I'd hope that in 99% of cases you're not doing custom joins and are instead specifying custom relations. However, internally the DataList is going to need a system for attaching joins and something like this might be appropriate. And, if that were the case, then exposing it as a public-but-infrequently-used API seems acceptable.
> ** On following custom joins
>
> Something I have to do fairly often is to write my join functions manually. One example of this is the Children() function on SiteTree. As it stands, it wouldn't be possible to do
>
> $withchildren = DataList::create('SiteTree')->filter('Children.ID:not', 'NULL');
>
> Because Children is a custom function on SiteTree, not a has_many.
Any function that returns a DataList should be usable as a first-class relation. That is my goal. (Exactly how to implement that is a little tricky but I'm getting there).
Thanks Andrew! I really like where this is going - as Aaron said, just the right amount of abstraction.
Not a full blown SQL abstraction/dialect like Doctrine, but powerful enough to work for most
of the day-to-day queries.
Andrew Short wrote:
Gets all pages which have either the Title or MenuTitle to anything but "Home" or "About Us":
DataList::create('SiteTree')->filter('Title,MenuTitle:not', array('Home', 'About Us'))
filter('Title,MenuTitle:not') is a bit ambiguous to read (is the not applying to both?).
While its a handy shortcut, I assume we'd support filter('Title:not,MenuTitle:not') as well?
Sam wrote:
> The other thing is the casting of values to filters against. At the moment only raw string values are implemented.
I think that the filters should be aware of the field type and respond appropriately.
How about field aliases and computed columns?
SELECT CONCAT("YearColumn",'-',"MonthColumn",'-',"DayColumn") AS "MyDate" WHERE "MyDate" > '2010-10-10';
We could add "MyDate" to DataObject::$casting, which I've done in the past - but seems a bit messy for a one-off query.
Should we be able to cast fields in a specific list instances manually?
By the way, we should ensure that any casting works with many_many_extraFields as well.
Example: Player many_many Teams, Player_Teams.Position = ENUM('Player','Captain')
How would I find all Teams where a Player is Captain? DataList::create('Team')->filter('Players.Position', 'Captian') ?
Do we allow explicit joins of relations, or just implicitly through the field name dot syntax?
Or maybe its time to retire many_many_extraFields in favour of an intermediary object/table and
"through" relationships?
Sam wrote:
> As such I propose that we just run with the default = and LIKE operators, using an :equals and :like filter type, and leave it up to people to change their table collations if they need case sensitivity or insensitivity. Maybe a good solution to cover most use cases would be to provide an :ilike filter, which is implemented by the current DB driver, and returns a case insensitive LIKE filter. Of course, this would probably need to be complemented by :istartswith, :iendswith and :icontains, which I think would cover most use cases.
My main gripe with this is that it means that behaviour is no longer database-agnostic, and it would likely lead to really weird hard-to-debug bugs if you had a module that assumed case-insensitivity and then used it on a database that was case-sensitive. I think that we should make case-insensitive match the default, and provide a modifier for case-sensitive matches.
I haven't come across case sensitivity requirements very often, but from a framework perspective "just use DBMS defaults" sounds like a big assumption.
We've already got hooks for date SQL selectors in the different database drivers, so could implement the feature matrix Andrew mentioned in these classes, right?
spronk wrote:
You could also
>> take this one step further and change array() to $list->condition('status', 'Pending') to return a single level filter and
>> remove the semantically meaningless array() call entirely.
Its more typing than array(), and a very common use case. Its a shame that we don't have [] to initialize arrays,
but I still prefer this to $list->condition() or DataList::condition.
> >> For things like dates and "1 week ago", I would prefer to use
> >> something like DateHelper::str("1 week ago") to avoid coupling a
> >> single set of date logic to the ORM.
>
> This is a good example of the verbosity that I'd like to avoid.
Assuming we'd use strtotime() for natural language parsing,
I can see how it might fall short in some cases - but even if we allow
arguments like "1 week ago", nothing it stopping you from doing natural language conversion
with your own libraries and just passing in a timestamp, right?
Hamish wrote:
Something I haven't seen covered is matching two fields against one another. This is exactly the situation we've run into in the template language, where traditionally <% if A = B %> meant if $A = "B", but we quickly ran into places this was a limitation.
Maybe we always need to mark fields somehow?
$topposts = $list->filter('$TopID', array('0', '$ID'));
Or the opposite - mark strings (although this is hella ugly)
$topposts = $list->filter('TopID', array('"0"', 'ID'));
DataList::create('DataObject')
->inner_join(
DataList::join('My_Random_Table', 'Randoms')->on('$ID', '$Randoms.ID')
)
->filter('$Randoms.Name', 'Pow!')
We'd need a solution to the "matching fields against fields" problem again - I've used $ marking in that example.
I don't like dollar field names in single quotes, it looks too much like native PHP. Do we use this anywhere else in sapphire?
Agreed that we need to mark fields somehow, but it seems far more common to compare to strings rather than field names,
so this should be the syntactically easier way. Django has F() for this:
http://docs.djangoproject.com/en/1.3/topics/db/queries/#filters-can-reference-fields-on-the-model
Yes, its a global function - but not semantically overloaded already,
and IMHO easier to understand than "$MyField".
DataModel represents the *collection* of different DataObject classes. Hamish F suggested 'DataSchema' might be a better name.
So:
- A DataSchema is a named collection of DataLists
- A DataList is an ordered collection of DataObjects
Or, to put it another way
- $schema is a DataSchema
- $schema->SiteTree is a DataList
- $schema->SiteTree->byID(5) is a DataOject
If you prefer it could be $schema->SiteTree() rather than $schema->SiteTree, if you don't like the use of magic properties here.
You *could* represent this information as static methods on DataObject (this is how it's done in 2.4) but that seems inferior to having an object representing the collection of DataObject types.
The intention wouldn't be to couple the ORM to SiteTree. Rather, it would be to allow *any* function that returns a DataList to be used as if it were a relation. Perhaps there would be some special settings that would need to be added to the DataList, but the general concept is the same.
I agree that the ORM shouldn't be coupled to SiteTree.
You end up with 2 sets of mashed together with no real coherence - the instance methods that control individual data objects and the static methods that control datatype-wide settings. Static methods / properties are best avoided where possible because the limit your codes modularity; the most likely place where you will run into practical problems would be with unit testing.
If you use static methods it's also impossible for your code to work with two separate configurations in parallel. It would make sense, for example, to keep track of whether you're accessing the publish or the draft site as a property of the DataSchema/DataModel. If this information is stored in a static (as it currently is) it's awkward to perform operations where you compare content from the stage & published site: you need to query one, alter the static setting, query the other, and then manually ensure that your static settings have been reset correctly. Plenty of bugs, over the years, have arisen out of problems with this approach.
Although we might not be able to do away with singletons entirely (because passing the instances down the object graph introduces too much boilerplate) I would still like to move toward singleton instances of objects, rather than having large blocks of implementation code locked within static methods.
--
1) The DataQuery is currently responsible for extracting records from the database. It decides which tables to extract data from, how to join tables together to make multi-table inheritance work. For this reason, the DataQuery could be renamed to DefaultDataMapper and be given the additional responsibility of *writing* data to the database.
2) DataMapper could be an interface that DefaultDataMapper implements. Other providers (for alternative database schemas, RESTful APIs, NoSQL databases) could also implement DataMapper.
3) In this new form, the DataMapper could be configured such that the "base responsibilities" for getting data from the database - the interpretation of Sapphire's multi-table inheritance, etc - could be kept distinct from the filter, sort, and join operations that are applied to the DataList. In this model, DataList::filter() would update a property on DataList, that is passed to a method of DataMapper when DataList::getIterator() is called. Right now, DataList::filter() passes the data straight through to DataQuery::filter(). (This will probably only make sense if you've read the code of DataList and DataQuery.)
4) To create new records, newObject() should be called on DataList or DataMapper, rather than directly creating a new record:
$page = DataList::create('SiteTree')->newObject();
Or, using the DataSchema:
$page = $this->schema->SiteTree->newObject();
5) For this reason, every DataObject can have a mapper associated with it. The code for DataObject::write() can then be simplified to this:
function write() {
$this->dataMapper->writeObject($this);
}
6) Under this scheme I would still expect DataObject to be responsible for keeping track of which fields have changed, and providing that to DataMapper::writeObject() through some interface. I'm not sure whether DataObject::write() or DataMapper::writeObject() should be responsible for triggering the DataObject validation.
7) The DataMapper can also be responsible for ensuring that all records live within an IdentityMap, although I would expect that the IdentityMap is implemented as a separate helper class that DataMapper calls upon.
> This means that we can make most of our data queries via $this->model. For example.
>
> $this->model->SiteTree->byID(5);
Have you considered how this would handle namespaced classes? I don't think PHP would like $this->model->rentbox\tenancy\Inspection (for example) all that much.
---
Simon Welsh
Admin of http://simon.geek.nz/
Who said Microsoft never created a bug-free program? The blue screen never, ever crashes!
- Don't support namespaces in the DataSchema. Yes, by default the "SiteTree" in the schema corresponds to the name of the class SiteTree, but this is could be treated as a convention. If you want to pull together models from different PHP namespaces then we could provide a clean way for a developer to define their schema.
- Because $this->model is an instance property, we could by default look for models within the same namespace, limiting the number of times that you need to explicitly specify the namespace.
- Have an optional method that accepts a string that you can use to pass namespaces: $this->model->get('rentbox\\tenancy\\Inspecition')->byID(5). It's more verbose but hopefully it would only need to be rarely used?
--
> Good point, we'd need to think about that. There are a few options, I guess:
>
> - Don't support namespaces in the DataSchema. Yes, by default the "SiteTree" in the schema corresponds to the name of the class SiteTree, but this is could be treated as a convention. If you want to pull together models from different PHP namespaces then we could provide a clean way for a developer to define their schema.
That kinda defeats the point of namespaces as you'll have to do conflict resolving when two classes have the same name (I have a lot of Controller classes for instance).
> - Because $this->model is an instance property, we could by default look for models within the same namespace, limiting the number of times that you need to explicitly specify the namespace.
The majority of my lookups are cross-namespace, so that'll only help in most cases, not to mention I don't think $this->model can get the namespace of $this.
> - Have an optional method that accepts a string that you can use to pass namespaces: $this->model->get('rentbox\\tenancy\\Inspecition')->byID(5). It's more verbose but hopefully it would only need to be rarely used?
Since most people wouldn't use namespaces, you could potentially get away with making the optional method __get. Providing a method fallback would be my preferred way of doing it.
On 6/05/2011, at 9:01 PM, Marcus Nyeholt wrote:
> How would namespaced model objects be stored? Would the table name take into
> account the namespace, going back to rentbox_tenancy_Inspection or similar?
MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've got a decent amount of such tables), so long as it is escaped properly. When I have the time, I'll be redoing my namespace patch for the 3.0 parser, and will update all the places a class name is sent to the database to actually escape the class name, now that names aren't guaranteed to be simple strings. (I've spent too many hours tracking done where another Convert::raw2sql($class) was needed to not but the effort into making sure they show up in 3.0).
The main problem is actually namespaced template files on Windows, so it's a potential caveat for those developing or deploying to a Windows-based machine.
MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've got a decent amount of such tables), so long as it is escaped properly.
The main problem is actually namespaced template files on Windows, so it's a potential caveat for those developing or deploying to a Windows-based machine.
>> MySQL has finally (in 5.1.4-ish, I think) managed to figure out how to
>> encode filenames, so rentbox\tenancy\Inspection is a valid table name (I've
>> got a decent amount of such tables), so long as it is escaped properly.
>
> What about pgsql, mssql, sqlite, oracle, keyval stores? While the issue of
> mapping same named classes is very rare in languages that have namespaces, a
> typical way of resolving it is to just specify a mapping from classname ->
> tablename. I'd rather introduce that than have tables\named\like\this. My
> view is that the DB layer does not need namespacing; if you think you do,
> you're probably doing something horrible :).
I'm of the opinion that if a DBMS doesn't allow something in its table names, then the DB wrapper should be doing some conversion rather than converting it in every case. We can't use underscores (imagine the class Page\Live), so it's not likely to be a pretty solution.
>> The main problem is actually namespaced template files on Windows, so it's
>> a potential caveat for those developing or deploying to a Windows-based
>> machine.
>
> One thing to keep in mind is that SS is MS certified now, so anything
> introduced needs to run in IIS to become core (and template namespacing
> would be something nice to have :)).
I don't have a Windows machine to test, but what happens if you either copy a file with \s in its name to a Windows file share or try saving such a file directly in Windows? If it borks, there'll need to be some workaround there as well, which I don't think replacing with underscores would work in due to potential conflicts with action-specific templates and we can't rely on case differences here either. A space or + or something else that can be used in the filesystem but not in a class name would work here.