Warning, this is _VERY_ LONG. I'm sorry, but I don't know how to write it all
differently. I have tried to summarise below enough to know whether you want to
read anything in more depth...
Executive Summary
-----------------
* I have created a Joomla! ODBC driver for OpenLink's Virtuoso Universal Server
* The main use cases for Virtuoso relate to publication for the Semantic Web
* Currently the implementation is "rudimentary" but essentially working.
* During this experiment I encountered some;
- ISO SQL standards compliance issues in core Joomla! queries
- problems and omissions with the JDatabase abstraction
- general issues with grouping long data types, auto-increment fields
- reserved words in the Virtuoso RDBMS that are part of the Joomla! schema
- other little bugs
* I'm seeking feedback about these issues to know how best to proceed because;
- I have made hundreds of changes to quote SQL reserved words in Joomla!
- it would help my planning to know if some issues can be addressed in core
Some background
---------------
OpenLink's Virtuoso Server has both Open Source and Commercial editions. At core
it is a high-performance RDBMS with close technological ties to ODBC and SQL
standardisation. Virtuoso is also a webserver with a PHP SAPI module, so it can
actually host Joomla! entirely within a single server process (as a point of
interest, not that it's necessarily a good idea or even useful in most cases).
In recent times Virtuoso has mostly differentiated itself from mainstream RDBMS
by way of it's built-in RDF store and SPARQL query processing abilities, making
it ideal for Semantic Web applications.
I don't work for or have any affiliation with OpenLink, I'm just a guy who loves
both Joomla! and Virtuoso, with an interest in Semantic Web technologies.
Basically, I want to play around with using Virtuoso to create RDF mappings of
Joomla! content, among other things. Traditionally, Drupal has always been the
CMS of choice for RDF and other Semantic Web stuff, but I wanted to use Joomla!
and I guess that is what Open Source is all about :P
I'm hoping that any work I do on Virtuoso's ODBC driver for Joomla! might help
someone else to at least get started with other ODBC drivers for Joomla! If
there is another ODBC implementation around already, I didn't find it. So maybe
parts of my class could be separated as a generic ODBC driver (from which the
Virtuoso one and other variants can inherit) once it's fully completed, tested
and matured.
The reason I'm writing to you is because I'm hoping that Joomla! wants to fix
some of the issues I found that could potentially help other RDBMS driver
efforts, especially while my changes are against recent source. I thought it
best to open discussion about these things before trying to file any bugs, to
find out which issues you might accept and how you might want to deal with those
(e.g. how many bugs to file, whether to provide patches, against which source
trees, at what level of granularity etc.).
And I thought you just might like to know anyway about Joomla! running on
Virtuoso as a curiosity if nothing else.
Issues related to ISO SQL Compliance
------------------------------------
Summary:
* Use <> instead of != for "not equal"
* Quote SQL reserved words LANGUAGE, MODULE, OPTION and TIME as identifiers [1]
* DATABASE operations are not abstracted by JDatabase
* Can't assume " denotes a string in prefix replacement
The ISO standard operator for "not equal to" is the more traditional <> symbol
instead of the PHP-like != symbol. Not all RDBMS will work with != but all that
I know of work with <> operator. As of 2.5.2 there were 27 occurrences of !=
used in SQL across 20 files [2].
LANGUAGE is a reserved word in ISO SQL-92, SQL-99 and also SQL-2003. I'm sure I
don't need to tell you that this is one is a biggie for Joomla! As of 2.5.2 I
found 77 occurrences across 40 files [3].
MODULE is also a reserved word in SQL-92, SQL-99 & SQL-2003. Not such a big one,
but still, as of 2.5.2 I found 13 occurrences across 6 files [4].
TIME is another [5] and also OPTION, but I only found one occurrence each of
these. The commit for OPTION accidentally ended up in my big commit for LANGUAGE
[3], basically because git hates me and I was too tired to bother fixing that. I
can't really imagine anyone would want one without the other anyway.
There is no abstraction within the driver class for database creation, and
JInstallationModelDatabase::createDatabase makes assumptions:
$query = 'CREATE DATABASE '.$db->quoteName($name).' CHARACTER SET `utf8`';
Similarly in setDatabaseCharset with ALTER DATABASE. Neither of these statements
are applicable to (or supported by) Virtuoso and `utf8` is using MySQL
identifier quotes, so I've assumed this part of the installation is already
slated for re-implementation and have not tried to patch the code here. Instead
I'm just using regex in setQuery to fix this and a couple of other issues. But
if you would welcome patches from outside the team to implement such things, I
would of course be happy to write them.
Finally, there is an issue in the base JDatabase class replacePrefix method,
where it tries to be too clever about strings. In Virtuoso (and as per SQL
standard) strings must be single-quoted and only identifiers are double-quoted.
So in the case of ISO standard SQL JDatabase will deliberately _not_ replace #__
in the double-quoted identifer names, wrongly assuming that they are string
literals.
But this is essentially a non-issue because it's easy to just overload that
method in the driver class. I note it here more for completeness and because it
might be "nice" to have JDatabase base class be fully RDBMS-agnostic about such
things, particularly where this runs contrary to standards.
Issues related to ODBC
----------------------
Summary:
* DATETIME literal formats should not assume to be quoted as string literals
* Connection parameters could be improved for DSNs instead of host names
* Ideally the database driver could pre-populate some connection parameters
The JDatabase base class defines getDateFormat to allow for differences between
RDBMS implementations for date and time literals. However, since the base
implementation does not automatically quote the usual ISO format date/time
strings, consumers of this function all do that themselves. However, this is an
unsafe assumption, for example in ODBC the following "brace escape" notations
are used:
{ts 'YYYY-MM-DD HH:MM:SS'}
{d 'YYYY-MM-DD'}
{t 'HH:MM:SS'}
Currently, if I were to override getDateTime to:
return '{ \\t\\s \'d.m.Y H:i:s\' }';
...the correct syntax is produced by PHP in DateTime::format, but this all gets
escaped and quoted as a string literal, and so it ends up in queries as
something like:
AND (foo OR a.publish_up <= '{ts \'24-03-2012 05:43:47\'}')
...instead of the required...
AND (foo OR a.publish_up <= {ts '24-03-2012 05:43:47'})
The problem would be the same trying to use ISO standard syntax here (similar to
ODBC with a type prefix to the string literal, only without the curly brackets),
or even a CAST. I might be wrong, but I think all RDBMS that will happily accept
solely a string literal for DATETIME values (i.e. all cases where the current
method is suitable), would accept the typical ISO-style format anyway.
So really, I think quoting as a string literal (where needed) should be included
as part of the format returned from getDateFormat, which would be more
convenient for the client code anyway. Without such a change, all I can do is
regex rewrite anything that looks like an unescaped string literal
date/time/timestamp in setQuery. While the date/time pattern can be identified
quite reliably, it will take some effort to properly harden that expression to
exclude strings, functions, DDL queries and probably other exceptions. It seems
to me that getDateFormat was intended to avoid all that.
On the user interface side, with ODBC the typical host/user/password/database
credentials are not entirely appropriate because of the Data Source Name (DSN)
method of setting up connection details in ODBC INI files. These named data
sources can be enumerated using PHP's odbc_data_source function [6]. For now I
am just taking the hostname option to be the DSN. It would obviously be nicer to
present the user with a list of DSNs to select from, or at least allow the
driver class to name it's own parameters.
Furthermore, Virtuoso has a nice little trick when it is also acting as the
webserver, whereby an ODBC DSN including username and password of the user
owning the hosting virtual directory can be retrieved by a function exported by
the PHP SAPI module: __virt_internal_dsn. Where this function exists it would be
nice to at least be able to pre-populate the connection fields during installation.
Issues that may also affect other RDBMS
---------------------------------------
Summary:
* PASSWORD is a reserved word in Virtuoso (+ ?)
* Long data types not supported in GROUP BY or ORDER BY clauses
* Auto-increment fields should not be set to zero
* Limits and offsets may need SQL rewrites
PASSWORD is a reserved word in Virtuoso. I'm not aware that this is a reserved
word in ISO SQL but it does seem like one that might be reserved in other RDBMS
also. I only found two occurrences of password used unquoted [7]. More
generally, I wonder if it might not be too difficult to parse identifiers and
automatically quote them.
In converting the schema for Virtuoso (still very much a work in progress, the
driver class itself doesn't need much more) it seemed unavoidable to make a
number of fields, particularly in #__content long data types (e.g. intro text).
Unless things have changed in recent years, it is not uncommon to find that
RDBMS will not group, sort, join or aggregate such columns. However, Joomla! has
a questionable requirement to do just that, which happens at the bottom of
getListQuery methods in many of the model classes. Example:
[from components/com_content/models/articles.php]
// Add the list ordering clause.
$query->order($this->getState('list.ordering', 'a.ordering').' '
.$this->getState('list.direction', 'ASC'));
$query->group('a.id, a.title, a.alias, a.title_alias, a.introtext,
a.checked_out, a.checked_out_time, a.catid, a.created, a.created_by,
a.created_by_alias, a.created, a.modified, a.modified_by, uam.name,
a.publish_up, a.attribs, a.metadata, a.metakey, a.metadesc, a.access,
a.hits, a.xreference, a.featured, a.fulltext, a.state, a.publish_down,
badcats.id, c.title, c.path, c.access, c.alias, uam.id, ua.name, ua.email,
contact.id, parent.title, parent.id, parent.path, parent.alias,
v.rating_sum, v.rating_count, c.published, c.lft, a.ordering, parent.lft,
fp.ordering, c.id, a.images, a.urls');
return $query;
It seems excessive, unless there is sometimes an aggregate in the select list
that I didn't spot with a cursory glance and has not yet shown up in my limited
testing.
I could not understand this behaviour and was surprised that MySQL handles it
with long data types. I know for sure than many RDBMS won't. For some it might
be avoidable if they have more generous limits on maximum row size than
Virtuoso, but in my case I would need to seriously limit the amount of text
across fields like introtext, params, all the metadata fields etc. to no more
than ~5000 characters in total to fit all of it within the row.
So in the end I just added some regex to strip out any GROUP BY clause that
contained some known long field names, which appears to have dealt with the
offending queries without any obvious ill-effects. In any case, I can see this
being an issue not only for Virtuoso.
MySQL interprets setting zero to an auto-increment field to mean that you want
to take the next serial in the sequence. Virtuoso interprets zero literally, as
do some other RDBMS which expect NULL if you want an automatic default from the
sequence. Since NULL works just fine for MySQL anyway, perhaps this could be
changed for JDatabase::insertObject in the base class to exclude the primary key
from the values set for INSERTs. Fortunately this method can just be overridden,
but it might make other driver implementations less painful. I'm afraid I don't
know whether the SQL standards have any position on that matter, I suspect not.
Similar to MS SQL Server, Virtuoso uses the TOP keyword before the SELECT list
for limits and offsets (although last time I used SQL Server, either v7 or 2000
I think, it did not support offsets in TOP) which means "rewriting" within the
query rather than just appending to the end as you would with the MySQL syntax.
So to do that reliably requires regex replacement, which is not ideal and could
be avoided if passing only the column list to the "select" method.
Issues specific to Virtuoso
---------------------------
Summary:
* Implicit casting string literals to timestamp values causes logic errors [8]
* Implicit casting to non-string types (even numeric) is noisy (QW004 errors)
* Null characters in NVARCHAR data types not handled properly
(primarily affects session data field)
* No unicode collations provided by OpenLink, must be user defined
* UTF-8 collations are not currently supported
[I will document this stuff in more detail elsewhere, I doubt it is of interest
to Joomla! except to note that Internationalisation is problematic for this
driver right now, although UTF-8 works fine besides sort order]
Simple bugs in Joomla!
----------------------
There were originally three of these on my list, but I just noticed that two
have since been fixed in 2.5.3 which leaves me with just:
* Default null date hard-coded as 000-00-00 in user notes form [9]
In conclusion
-------------
I was very pleased to see the progress in making Joomla! truly RDBMS-agnostic. I
once started to implement a PostgreSQL driver some years back but there just
wasn't nearly enough abstraction then, nor did there seem to be enough
motivation within the core team to really push the amount of rewriting and
refactoring needed to make it happen.
No doubt I have missed some instances of some of the issues identified above,
have many issues yet to discover, and much work to do yet on the Virtuoso
schema, but overall it was easier than I expected to get Joomla! up and running
with Virtuoso thanks to the work done in this area. I feel much more confident
now about investing my time into such projects!
I hope you regard the information I have collected here useful for improving
Joomla! further. For my part I welcome all feedback about whether I have gone
about this the right way, if there are some issues I have misunderstood, ways I
could present information or code changes that would fit better with your
development processes and so on...
Cheers!
-Will
[0] Virtuoso Open Source Edition <http://virtuoso.openlinksw.com/>
[1] SQL Reserved Words Reference
<http://developer.mimer.se/validator/sql-reserved-words.tml>
[2] ISO Not Equal Symbol
<https://github.com/wdaniels/joomla-cms/commit/4bc6a12a091d7781df7ff4f2735eeabc51f623c4>
[3] ISO Reserved Words LANGUAGE and OPTION
<https://github.com/wdaniels/joomla-cms/commit/155781a77b9ce8aa5f597f94db714c4279ff6346>
[4] ISO Reserved Word MODULE
<https://github.com/wdaniels/joomla-cms/commit/48715e01ba614cce8aaa64086bdc63569779f35e>
[5] ISO Reserved Word TIME
<https://github.com/wdaniels/joomla-cms/commit/d1268815033b2d2a3f3f7d39a99bc64f5f7705e6>
[6] PHP Function Reference [odbc_data_source]
<http://www.php.net/manual/en/function.odbc-data-source.php>
[7] Virtuoso Reserved Word PASSWORD
<https://github.com/wdaniels/joomla-cms/commit/72b3ea88c2c7ba852c92b02229edd392fbadeccd>
[8] Reported SQL Logic Error [virtuoso-devel]
<https://sourceforge.net/mailarchive/forum.php?thread_name=4F695C0A.4030007%40willdaniels.co.uk&forum_name=virtuoso-devel>
[9] NULL Date Issue In User Notes Form
<https://github.com/wdaniels/joomla-cms/commit/fb62cca49f5cf268d1898caaee55f56c3a49fbc5>
Best regards
Rouven
However, Joomla! has a questionable requirement to do just that, which happens at the bottom of getListQuery methods in many of the model classes. Example:
[from components/com_content/models/articles.php]
// Add the list ordering clause.
$query->order($this->getState('list.ordering', 'a.ordering').' '
.$this->getState('list.direction', 'ASC'));
$query->group('a.id, a.title, a.alias, a.title_alias, a.introtext,
a.checked_out, a.checked_out_time, a.catid, a.created, a.created_by,
a.created_by_alias, a.created, a.modified, a.modified_by, uam.name,
a.publish_up, a.attribs, a.metadata, a.metakey, a.metadesc, a.access,
a.hits, a.xreference, a.featured, a.fulltext, a.state, a.publish_down,
badcats.id, c.title, c.path, c.access, c.alias, uam.id, ua.name, ua.email,
contact.id, parent.title, parent.id, parent.path, parent.alias,
v.rating_sum, v.rating_count, c.published, c.lft, a.ordering, parent.lft,
fp.ordering, c.id, a.images, a.urls');
return $query;
It seems excessive, unless there is sometimes an aggregate in the select list that I didn't spot with a cursory glance and has not yet shown up in my limited testing.
I could not understand this behaviour and was surprised that MySQL handles it with long data types. I know for sure than many RDBMS won't. For some it might be avoidable if they have more generous limits on maximum row size than Virtuoso, but in my case I would need to seriously limit the amount of text across fields like introtext, params, all the metadata fields etc. to no more than ~5000 characters in total to fit all of it within the row.
So in the end I just added some regex to strip out any GROUP BY clause that contained some known long field names, which appears to have dealt with the offending queries without any obvious ill-effects. In any case, I can see this being an issue not only for Virtuoso.
You are correct Amy, Mark has misunderstood the SQL spec. This is ringing some
bells, I came across the same misunderstanding once before, though I'm
struggling to remember how it came about now.
Anyway, I'm sure we can get that all clarified. I have a copy of the standard
(somewhere) and I'm not afraid to use it :D
I will post in that thread when I get approved for the group.
> Also, might I recommend that you post this on the Platform list? That's a
> better fit for this type of interaction.
>
> https://groups.google.com/forum/#!forum/joomla-dev-platform
Will do.
> Appreciate your work.
Thanks :)
OK, so I will divide my changes by Platform and CMS. Then in the CMS by
Front/Admin, then by component also? I guess this should be OK for unit testing.
I have a lot of other work the next few days but should be done some time next
week and I'll register a pull request then.
So if I make patches also to solve the date literal quoting problem, these
should make it into core eventually? Or does that need further discussion on the
platform list?
Thanks for the encouragement :)
-Will
On 24/03/12 19:41, elin wrote:
Whoa, that's great. We are also looking at some query changes in the CMS
that are needed for Postgres, and if you wanted to propose what you have as
patches that would be great. What is good is to keep the patches to a size
that is manageable for people doing manual tests. If you could make a pull
request and then a linked tracker item for each of the commits that would
be great.
I think you might want to repost some this on the platform list
(specifically about standards compliance and I think most importantly the
date issue) because that is where the people working on database issues
tend to be most active. (This mailing list is more about extension, while
the CMS and Platform lists are about the core code).
Elin
On Saturday, March 24, 2012 2:32:39 PM UTC-4, Rouven Weßling wrote:
Wow, impressive list of things. I noticed that some of the files affected
are platform files (basically everything in libraries/joomla). Could you
make a pull request for those changes to the platform repro? Seems like it
aren't a lot of changes so we should be able to get that sorted out quickly.
Best regards
Rouven
OK, so I will divide my changes by Platform and CMS. Then in the CMS by Front/Admin, then by component also? I guess this should be OK for unit testing.
I have a lot of other work the next few days but should be done some time next week and I'll register a pull request then.
So if I make patches also to solve the date literal quoting problem, these should make it into core eventually? Or does that need further discussion on the platform list?
Thanks for the encouragement :)
-Will
--
You received this message because you are subscribed to the Google Groups "Joomla! General Development" group.
To post to this group, send an email to joomla-dev-general@googlegroups.com.
To unsubscribe from this group, send email to joomla-dev-general+unsub...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/joomla-dev-general?hl=en-GB.
The operative word here is _when_. The rule does not apply if you are not
grouping _anything_. I didn't notice any aggregates used in the places where I
have seen this. And without any aggregates, ORDER BY should be used to "group"
items in a list.
This reminds me now of where I have encountered the same misunderstanding
before. Some RDBMS automatically add any missing columns to GROUP BY internally,
in selection order, at the end of the list. It is not a good idea to rely on
that behaviour for portability of course, because other RDBMS are more strict
and I think some subtlety of that behaviour (perhaps related to the SELECT *
case) changed once with MS SQL Server between versions. I'm pretty sure that is
where I've seen this come up before (but I might be wrong, it would have been
years ago).
Cheers,
Sam Moffatt
http://pasamio.id.au
>> joomla-de...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> joomla-dev-gene...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/joomla-dev-general?hl=en-GB.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Joomla! General Development" group.
> To post to this group, send an email to joomla-de...@googlegroups.com.
> To unsubscribe from this group, send email to
> joomla-dev-gene...@googlegroups.com.
Hi Sam,
Yes, Oracle I remember is one of the picky ones.
I'm 99% sure it is also one of the ones that will not GROUP BY long data types.
Quite honestly, I thought all RDBMS refused to do that until now. Normally if
you want to select a long column in conjunction with aggregates it is necessary
to either do something like CONVERT and truncate to a shorter VARCHAR, or fetch
the long column data separately afterwards.
But the point is there are no aggregates in those queries. That I noticed. I
wish I'd looked more thoroughly now, perhaps there is :@
-Will
But the point is there are no aggregates in those queries. That I noticed. I
wish I'd looked more thoroughly now, perhaps there is :@
On 25/03/12 06:11, Mark Dexter wrote:
"When grouping,
The operative word here is _when_.
Il giorno 25/mar/2012, alle ore 07:11, Mark Dexter <dexter...@gmail.com> ha scritto:
> This is a quote from one of many SQL online tutorials:
>
> "When grouping, keep in mind that all columns that appear in your SELECT column list, that are not aggregated (used along with one of the SQL aggregate functions), have to appear in the GROUP BY clause too."
>
True for Postgresql too.
Eng. Gabriele Pongelli
Not certain which query you are looking at since I don't see that in the content query. But, there are subqueries that have been present since 1.6. Those are not SELECT level inclusions.
It's a pretty intense query - and adding aggregation isn't going to help performance any.
I still can't see why it's needed.
On Sun, Mar 25, 2012 at 8:45 PM, ma...@willdaniels.co.uk
<ma...@willdaniels.co.uk> wrote:
> On Sunday, 25 March 2012 09:24:52 UTC+1, Amy Stephen wrote:
>>
>> Not certain which query you are looking at since I don't see that in the
>> content query. But, there are subqueries that have been present since 1.6.
>> Those are not SELECT level inclusions.
>
>
> That one I found in JCategories::_load
> (libraries/joomla/application/categories.php). Logically there shouldn't be
> any need to do anything similar with content items (counting what?) I'm just
> trying to give everyone the benefit of the doubt and assuming there must be
> aggregates sometimes, somewhere in those queries using GROUP BY.
Do you mean this excerpt from "libraries/joomla/application/categories.php"?
<quote>
$query->select('COUNT(i.' .
$db->quoteName($this->_key) . ') AS numitems');
}
// Group by
$query->group('c.id, c.asset_id, c.access, c.alias,
c.checked_out, c.checked_out_time,
c.created_time, c.created_user_id,
c.description, c.extension, c.hits, c.language, c.level,
c.lft, c.metadata, c.metadesc, c.metakey,
c.modified_time, c.note, c.params, c.parent_id,
c.path, c.published, c.rgt, c.title,
c.modified_user_id');
</quote>
Ref:
https://github.com/joomla/joomla-cms/blob/master/libraries/joomla/application/categories.php#L344
Looks pretty clear to me there is a COUNT there. It is indeed
conditionally included a mere three or four lines above the group by
statement. It could potentially be rolled in to the if statement so
it's only included when the COUNT is also added.
Cheers,
Sam
Looks pretty clear to me there is a COUNT there. It is indeed
conditionally included a mere three or four lines above the group by
statement.
--
You received this message because you are subscribed to the Google Groups "Joomla! General Development" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/joomla-dev-general/-/Vt8wH-tdRrkJ.
Sam Moffatt
http://pasamio.id.au
Do you mean this excerpt from "libraries/joomla/application/categories.php"?
<quote>
$query->select('COUNT(i.' .
$db->quoteName($this->_key) . ') AS numitems');
}
// Group by
$query->group('c.id, c.asset_id, c.access, c.alias,
c.checked_out, c.checked_out_time,
c.created_time, c.created_user_id,
c.description, c.extension, c.hits, c.language, c.level,
c.lft, c.metadata, c.metadesc, c.metakey,
c.modified_time, c.note, c.params, c.parent_id,
c.path, c.published, c.rgt, c.title,
c.modified_user_id');
</quote>
Ref:
https://github.com/joomla/joomla-cms/blob/master/libraries/joomla/application/categories.php#L344
1) GROUP BY is used in many places where it shouldn't be there
2) Where GROUP BY _is_ needed for an aggregate, the query form must be altered
to avoid aggregating on all columns for portability.
There has been much confusion in the thread about (2) because everybody has
assumed we are talking about not explicitly listing all the columns, which was
_correctly_ added for portability. The problem is that this is still not a
viable query pattern for portable SQL.
On 25/03/12 16:02, Mark Dexter wrote:
> Another way to work around this issue might be to put the mediumtext column
> in an aggregate expression in the SELECT clause (for example, MAX(blah))
> and then remove it from the GROUP BY. Don't know if that would have the
> same issues or if you would want to truncate it in the aggregate expression.
Sure, but it's an awful lot of work (for code maintenance as much as the RDBMS)
to do things like that in places where you don't need to. In most of the places
where we have these GROUP BY clauses there are no aggregates at all, the GROUP
BY can just be removed entirely.
> Also, I agree that any query with a GROUP BY should probably not use a.* in
> the select clause, since this would require manually editing the GROUP BY
> any time a column is added to the table.
Yes, this is the only reason a problem genuinely still exists in a few places
(like the categories.php I cited before). In those cases the best solution is to
not try to do the whole query in one. So instead of the form:
SELECT c.*, COUNT(foo)
FROM categories c <join content>
WHERE <access controls>
GROUP BY <everything>
...you would do something more like...
SELECT c.*, sub.* FROM categories
INNER JOIN
( SELECT c.id _id, COUNT(foo) numitems
FROM categories c <join content>
WHERE <access controls>
GROUP BY c.id
) sub ON (c.id = sub._id)
ORDER BY <sort orders>
Thus avoiding expensive aggregation on non-indexed or large columns, and also
avoiding the problem of RDBMS that won't automagically truncate your long
columns in the background (Oracle, Virtuoso, Access, no doubt many others).
> Mark
>
> On Sun, Mar 25, 2012 at 7:54 AM, elin<elin....@gmail.com> wrote:
>
>> the CMS queries on Access ( though it sure would be fascinating to see ) we
>> should strive to be as agnostic as possible if only because of the
>> possibility that different systems are default truncating to different
>> lengths (of course this could already be happening on different servers
I don't think the issue is that some RDBMS will truncate at different lengths,
because we're not genuinely interested in accurately grouping on those columns.
RDBMS that have a problem here simply won't truncate _at all_ and will throw and
exception when compiling the SQL.
It is not so much a feature deficiency in those RDBMS (well, it might be for
Access), it is more a case of not wanting to make it easy to do things that you
probably didn't want to be doing in the first place. If the query author
actually _does_ want to group rows in the result by a long column, he/she is
expected to write a suitable expression to reduce the problem. Grouping by
almost every column just to make the query work as one huge statement muddies
the logic, makes the code harder to maintain and runs the risk of performing
very badly.
Cheers!
-Will