Guus, I suggest you spend a bit more time reading up on SS_List, DataList, and DataQuery. Much of the abstraction you speak of already exists. DataList is intended to be specific to relational databases; SS_List is more broadly applicable to datastores. Facilities such as GridField are built to rely on the SS_List interface.
To clarify one thing right up front: I speak of data store as opposed to data base: the data store can be anything, whereas a data base is generally some form of SQL. The distinction is in the fact that I see my idea as a general store-agnostic way of accessing data in SilverStripe… That is to say, one could come up with: SalesforceDatabase, LdapDatabase, MySQLDatabase, PostgressDatabase, RESTDatabase, SOAPDatabase, SharePointDatabase but also (see later) MySQL_LdapDatabase, or MySQL_RESTDatabase. (We might need to abstract SS_Database into SS_Datastore, but we’ll cross that bridge when we get there.)
Looking at DataList, the dbConn that Sam proposed should be more of a function, so that it can pull $this->dataClass->$db_connection(). DataObject::db_connection() then provides a two element array: Type and Config, which by default is array(‘Type’ => ‘MySQL’, ‘Config’ => $databaseConfig). DataList then understands that ‘Type’Database is the class / object to do the querying and *it* in turn understands that ‘Config’ tells it how to connect to the data store.
The only area that is indeed untouched by this scheme is direct querying through SQLQuery. Since SQLQuery is such a specialised class and case, I would not worry so much about its mechanism to connect to the right data store. However, the core needs to move away from direct SQLQuery usage *or* we need a factory design pattern which one can pool connections to data stores and make them accessible much like DataModel makes a collection of DataLists accessible. (Connecting to a data store is always probably the most expensive (in terms of time) operation, so using the same data store multiple times must happen without incurring that connection cost more than once.) If we pool connections like that, the two-element array in DataObject::db_connection() becomes a three-element array: array(‘Name’ => ‘ProjectDB’, ‘Type’ => ‘MySQL’, ‘Config’ => array(‘Host’ => ‘mysql.example.com:3307’, ‘User’ => ‘projects’, ‘Pass’ => ‘projects’, ‘Schema’ => ‘project’)), so that the DataConnections can __get and __set this connection information, create the appropriate class on first __set, make sure it is connected to the data store, and set that object within the connection array passed through DataList.
So we now solved the part one problems of Marcus’ initial reply. (other than it still is a big chunk of work J)
On to the second problem: Intra-data-store joins. Agree with Sam that we can probably throw an Exception saying: Intra-Data-Store Join is not possible. However, it would be much more elegant if we could somehow entice the driver of the join (i.e. the outer most class) to join the inner-next class’s data with itself / or the inner most class to join the outer-next class’s data with itself. Need to think about this though, as it will incur some sort of big-time performance penalty (think a huge number of rows)… Databases themselves are obviously better at joining than PHP. But, aha, the SS_Database abstraction we just clarified above, could construe a MySQL_LdapDatabase class that takes care of joins between the two (and is a specialised class handling specialised data store requirements based on whatever the users’ needs are, so SS does not have to care about it at all, since all SS data always comes from one data store). The way to go about that would be to have a DataObject with a DualName, i.e. StaffCommission where Staff comes from LDAP, and Commission is a derivative of Salesforce.com and its db_connection is the LDAP_SalesforceDatabase that knows how to calculate Commission (sales volume) for each Sales person (staff) on a monthly basis by meshing the two distinct data stores… Solutions are endless with this approach… one can then even dream of specialised external data modules that provide this sort of glue…
Solved that too, then by leaving special cases to people who need them, but provide the glue in SS to enable people to create these special cases.
Last item that would be of interest, again, is /dev/build, whereby that process only takes care of DataObject descendants that actually use the ‘Default’ data store. (‘Default’ is now the name of the three-way array used in DataConnections from the first part of the solution), and created (in mysite/_config.php) by the installer, much like it is now.
At the moment most of the core of SS is very much single DB connection minded; most core classes uses the DB:: static method calls for database interaction, which all use a 'default' database connection. While it might be possible to manage the swapping around of what the 'defaut' connection is externally from the core, I'd imagine that it would only be successful for a limited number of scenarios. So the first hurdle to overcome would be to change core classes to use either instance methods, or make the DB:: static methods accept a different db connection identifier, either of which is non-trivial.
DataList could potentially have a dbConn property; this could be set to DB::getConn(), or to something else, with DI.
DataList::toArray() would need to $this->dbConn->query($this->dataQuery->sql().
The more significant change is to ensure that direct SQLQuery construction isn't used in the rest of core, but that should be achievable and is a worthwhile bit of housekeeping apart from this change.
A second big problem with using different DB connections per object type is that the ORM can no longer manage related data properly (through $has_one, $many_many etc) as it'd be trying to join across databases, and if the type was inherited, all types in the hierarchy would need to be stored in the same DB as joins occur when loading inherited types. While there are ways around all of this (doing joins in code, etc), it's not ideal.
The most straightforward solution would be to throw exception when you joined across different databases. This would let you make use of multiple databases as long as you managed the partitioning yourself. There are many situations where you might have a coherent but loosely coupled set of models running on another database, and being forced not to run joins across multiple databases would be an acceptable limitation.
Joining in PHP would be helpful in other contexts (e.g. linking a relational database to some kind of web API), but definitely a lot more work. I would be inclined to approach its implementation from the perspective of "how can we execute these joins using only the SS_List API?"
The idea is that you write an adaptor for the external content source, and a wrapper (or wrappers if needbe) for the external content items. The ExternalContentItem is a DataObject subclass, so can be used within SS much as you would use a data object. There's also mechanisms for importing the objects into a native SS data object representation - for example bringing through Wordpress content into SilverStripe. You could do something similar to wrap your external data sources
Some example connectors to get an idea of what's involved