left outer joins in datamapper?

534 views
Skip to first unread message

terminal_breaker

unread,
Dec 7, 2011, 4:07:48 PM12/7/11
to DataMapper
Is there a way to do left outer joins in Datamapper without having to
resort to talking to the data-store directly. I've been searching the
net and the Datamapper docu with not success on details how to do so.

For example:

SELECT p.user_id, p.description, u.user_name
FROM privileges p
LEFT OUTER JOIN users u ON (p.user_id = u.id)
WHERE p.description = "delete rights"

And let's say that I have a User class and a Privilege class too.


Chris Corbyn

unread,
Dec 7, 2011, 4:24:07 PM12/7/11
to datam...@googlegroups.com
The words 'left' and 'right' appear nowhere in the source code for the dataobjects adapter.


I've always seen this as a grave omission where the abstraction makes too many assumptions.  We really need to ability to do such things too.

You can do it, but you'll have to drop to using SQL, either directly on the adapter to get structs back, or you can try my dm-sql-finders https://github.com/d11wtq/dm-sql-finders which will give you a Collection as you'd expect.


--
You received this message because you are subscribed to the Google Groups "DataMapper" group.
To post to this group, send email to datam...@googlegroups.com.
To unsubscribe from this group, send email to datamapper+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/datamapper?hl=en.


Jordan Ritter

unread,
Dec 7, 2011, 4:40:06 PM12/7/11
to datam...@googlegroups.com

What type of concrete objects would you expect the ORM layer to return, if such a method call existed?  Outer joins tend to be used for getting mixed tuples of data that don't correspond to concrete ORM types.  I can't immediately fathom what any ORM's behaviour should be..

What do you mean by "without talking to the data-store directly"?  The only way I see to do this is to pass the DM adapter the SQL directly, and process the responses as abstract tuples of data (you'll get back an array).  You could hydrate objects yourself manually from there, but YMMV: with any kind of outer join you can end up with nulls, and extra check-fields-before-hydration logic would be crappy to deal with (and why bother - this is DM's job to begin with).

FWIW, you might consider using direct SQL to conduct your outer join(s) and only retrieve primary keys, then use the ORM to hydrate the objects akin to User.all(:id => [..]).

cheers,
--jordan


Chris Corbyn

unread,
Dec 7, 2011, 4:49:54 PM12/7/11
to datam...@googlegroups.com
I would expect a Collection of the Model from which the query was executed, with any additional fields discarded.

What about all users who have never posted?  I know this works right on the surface now, but it's hugely inefficient to the point it could never be used in production.

SQL:

SELECT users.* FROM users LEFT JOIN posts ON posts.user_id = users.id WHERE posts.id IS NULL

(DataMapper builds a gigantic IN clause, which obviously you'd have to optimise out in production where you're looking at millions of rows)

There are certainly other valid cases for left join being used as part of a more complex constraint.  My issue is that the ORM wants to believe it should know better than the person trying to extract the data, which is basically a broken abstraction.  I love DM, but it does run us into a corner quite often so we're holding out for v2 and hopefully we'll have a clearer idea of what the future is looking like for queries where you actually need to manipulate the join behaviour yourself.

terminal_breaker

unread,
Dec 7, 2011, 5:13:57 PM12/7/11
to DataMapper
> What do you mean by "without talking to the data-store directly"?

The phrase 'talking to the data-store directly' is from the online
Datamapper documentation under the section titled 'Talking directly to
your data-store': http://datamapper.org/docs/find.html

>  I can't immediately fathom what any ORM's behaviour should be

In ActiveRecord I can create outer joins using sql fragments:

Privilege.select("privileges.user_id, privileges.description,
users.user_name ").joins('LEFT OUTER JOIN users ON privileges.user_id
= users.id').where("privileges.description = 'delete rights' ")

This would return to me an ActiveRecord object.

Jordan Ritter

unread,
Dec 7, 2011, 5:22:08 PM12/7/11
to datam...@googlegroups.com
Agreed, there's several bad problems with JOINs in DM1, which are all Very Hard to solve due to DM's legacy complexity.  

FWIW, fail cases + a wrie-up of one solution: https://gist.github.com/277406, and an (incomplete) attempt to implement it (in DM1): https://github.com/jpr5/dm-core/tree/fix_joins . We really should put up a doc about this somewhere on datamapper.org, so it's not a surprise to users..

Best way to solve these issues is using a relational algebra + AST, so yeah the time+place to do this is DM2.

cheers,
--jordan

Jordan Ritter

unread,
Dec 7, 2011, 5:24:46 PM12/7/11
to datam...@googlegroups.com
Well then I think Chris' adapter might be the most concise approach to try, assuming you match the side/type of outer join with the model you're hydrating.

cheers,
--jordan

Ted Han

unread,
Dec 7, 2011, 5:26:42 PM12/7/11
to datam...@googlegroups.com
I've got a patch somewhere that allows one to patch in different types of joins, but i never merged it in because it breaks the abstraction :\

i'll try to dig it up when i get home (which won't be for a while yet).

-- 
Ted Han
DocumentCloud
Investigative Reporters & Editors

terminal_breaker

unread,
Dec 7, 2011, 6:07:16 PM12/7/11
to DataMapper
Thanks all for the responses. It's been very enlightening.


On Dec 7, 4:26 pm, Ted Han <t...@knowtheory.net> wrote:
> I've got a patch somewhere that allows one to patch in different types of joins, but i never merged it in because it breaks the abstraction :\
>
> i'll try to dig it up when i get home (which won't be for a while yet).
>
> --
> Ted Han

> @knowtheory (http://twitter.com/knowtheory)


> DocumentCloud
> Investigative Reporters & Editors
>
>
>
>
>
>
>
> On Wednesday, December 7, 2011 at 4:24 PM, Jordan Ritter wrote:
> > Well then I think Chris' adapter might be the most concise approach to try, assuming you match the side/type of outer join with the model you're hydrating.
>
> > cheers,
> > --jordan
>
> > On Dec 7, 2011, at 2:13 PM, terminal_breaker wrote:
>
> > > > What do you mean by "without talking to the data-store directly"?
>
> > > The phrase 'talking to the data-store directly' is from the online
> > > Datamapper documentation under the section titled 'Talking directly to
> > > your data-store':http://datamapper.org/docs/find.html
>
> > > > I can't immediately fathom what any ORM's behaviour should be
>
> > > In ActiveRecord I can create outer joins using sql fragments:
>
> > > Privilege.select("privileges.user_id, privileges.description,
> > > users.user_name ").joins('LEFT OUTER JOIN users ON privileges.user_id
> > > = users.id').where("privileges.description = 'delete rights' ")
>
> > > This would return to me an ActiveRecord object.
>

> > > On Dec 7, 3:40 pm, Jordan Ritter <j...@darkridge.com (http://darkridge.com)> wrote:
> > > > What type of concrete objects would you expect the ORM layer to return, if such a method call existed? Outer joins tend to be used for getting mixed tuples of data that don't correspond to concrete ORM types. I can't immediately fathom what any ORM's behaviour should be..
>
> > > > What do you mean by "without talking to the data-store directly"? The only way I see to do this is to pass the DM adapter the SQL directly, and process the responses as abstract tuples of data (you'll get back an array). You could hydrate objects yourself manually from there, but YMMV: with any kind of outer join you can end up with nulls, and extra check-fields-before-hydration logic would be crappy to deal with (and why bother - this is DM's job to begin with).
>
> > > > FWIW, you might consider using direct SQL to conduct your outer join(s) and only retrieve primary keys, then use the ORM to hydrate the objects akin to User.all(:id => [..]).
>
> > > > cheers,
> > > > --jordan
>
> > > > On Dec 7, 2011, at 1:07 PM, terminal_breaker wrote:
>
> > > > > Is there a way to do left outer joins in Datamapper without having to
> > > > > resort to talking to the data-store directly. I've been searching the
> > > > > net and the Datamapper docu with not success on details how to do so.
>
> > > > > For example:
>
> > > > > SELECT p.user_id, p.description, u.user_name
> > > > > FROM privileges p
> > > > > LEFT OUTER JOIN users u ON (p.user_id = u.id)
> > > > > WHERE p.description = "delete rights"
>
> > > > > And let's say that I have a User class and a Privilege class too.
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google Groups "DataMapper" group.

> > > > > To post to this group, send email to datam...@googlegroups.com (mailto:datam...@googlegroups.com).
> > > > > To unsubscribe from this group, send email to datamapper+...@googlegroups.com (mailto:datamapper+...@googlegroups.com).


> > > > > For more options, visit this group athttp://groups.google.com/group/datamapper?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups "DataMapper" group.

> > > To post to this group, send email to datam...@googlegroups.com (mailto:datam...@googlegroups.com).
> > > To unsubscribe from this group, send email to datamapper+...@googlegroups.com (mailto:datamapper+...@googlegroups.com).


> > > For more options, visit this group athttp://groups.google.com/group/datamapper?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups "DataMapper" group.

> > To post to this group, send email to datam...@googlegroups.com (mailto:datam...@googlegroups.com).
> > To unsubscribe from this group, send email to datamapper+...@googlegroups.com (mailto:datamapper+...@googlegroups.com).

Chris Corbyn

unread,
Dec 7, 2011, 6:36:22 PM12/7/11
to datam...@googlegroups.com

On 08/12/2011, at 9:26 AM, Ted Han wrote:

> I've got a patch somewhere that allows one to patch in different types of joins, but i never merged it in because it breaks the abstraction :\


I'd love to see what this looks like :)

I don't think it should be a concern if it "breaks the abstraction", provided that it's documented that it is geared towards SQL data stores. I don't believe it's possible to have a perfect abstraction that prevents you from using SQL while still supporting all the features of SQL. I'm willing to wait and see how DM2 pans out, but there are just too many edge cases that I have this awkward feeling it will still be a closed box capable to leaving you in a place you don't want to be when it comes to complex queries, if it doesn't provide the means to work directly with SQL. We use DM's support for non-SQL data stores with a few different adapters (REST, memory) and I do think that is a big selling point to DM, but supporting other data stores should not mean not supporting SQL purely on the basis that it doesn't work with those other data stores. 95% of use cases will be with SQL databases, so focus on that 95% and provide a (large) subset to the rest. Allowing SQL injection goes a long way to solving this.

Other examples where the lack of ability to use SQL becomes an issue are ordering by the result of a native RDBMS function, doing joins to derived tables (inline temporary views) and using native RDBMS functions in a join condition. These are all things anybody who's worked on a large website that requires a heuristic type of analysis on the data has probably done at some point or other.

Just my $0.02.

Clifford Heath

unread,
Dec 7, 2011, 8:44:15 PM12/7/11
to datam...@googlegroups.com
On 08/12/2011, at 10:36 AM, Chris Corbyn wrote:
> I don't believe it's possible to have a perfect abstraction that prevents you from using SQL while still supporting all the features of SQL.

This is false - as a counter-example, CQL already does this.
It's not yet fully implemented, but the language is built on the
fact-oriented approach which encompasses and extends
everything available in SQL.

Clifford Heath.

Dan Kubb (dkubb)

unread,
Dec 7, 2011, 10:42:52 PM12/7/11
to DataMapper
Hi Clifford,

> > I don't believe it's possible to have a perfect abstraction that prevents you from using SQL while still supporting all the features of SQL.
>
> This is false - as a counter-example, CQL already does this.
> It's not yet fully implemented, but the language is built on the
> fact-oriented approach which encompasses and extends
> everything available in SQL.

Really? That's amazing. When you say "SQL" though, do you mean one of
the ANSI spec revisions, a subset, a vendor specific flavour, all of
those, or something else?

I must admit I haven't seen any query abstraction that doesn't begin
to leak at some point, especially as you start to get near the edge. I
do think we're seeing better and better approaches, and the line is
being pushed out further as we experiment and discover better
interfaces and internal representations for queries; but I have yet to
see a case where the abstraction is perfect.

--

Dan

Clifford Heath

unread,
Dec 8, 2011, 6:27:28 PM12/8/11
to datam...@googlegroups.com
On 08/12/2011, at 2:42 PM, Dan Kubb (dkubb) wrote:
>>> I don't believe it's possible to have a perfect abstraction that prevents you from using SQL while still supporting all the features of SQL.
>> This is false - as a counter-example, CQL already does this.

> Really? That's amazing.


Well, yes. It's just a humanised language wrapped around first-order logic
with bag comprehensions. There's no limit to the complexity of the resolved
expression, so anything that you can define in FOL can be said in CQL.
FOL is equivalent to the relational model in expressive power, and SQL
is less powerful than both.

In some cases it's necessary to break an expression into several parts by
defining derived fact types, but that's because the humanised language
doesn't include the use of parentheses to allow arbitrary and/or
(conjunctive/disjunctive) nesting. But you want to break up such queries
anyhow, it makes them much more comprehensible.

> When you say "SQL" though, do you mean one of
> the ANSI spec revisions, a subset, a vendor specific flavour, all of
> those, or something else?

ANSI, at least, including all kinds of joins (inner, outer, anti, semi),
grouping, ordering and recursive queries (using CTEs in SQL).
IOW it's relationally complete.

CQL doesn't support (or need) NULLs, which makes things simpler,
as all the garbage that SQL introduces to deal with the anomalies
they introduce is avoided. I'm sure there are individual functions of
vendor versions that aren't built-in, but that's unavoidable. Functions
and fact type derivations in CQL can be externally defined, and expose
into the language without breaking the model or structure.

CQL doesn't have a built-in row-numbering feature like ROW_NUMBER(),
but that's only technically needed when you have a result set containing
exact duplicates. (If you exclude exact duplicates, you can always
define a total ordering and derive a row-number from that ordering.
A naive implementation would be terribly inefficient, but the question
was about expressive power, not implementation efficiency).

> I must admit I haven't seen any query abstraction that doesn't begin
> to leak at some point, especially as you start to get near the edge.

Do you consider the need to sometimes define external functions (for
example, atan2(x, y)) externally to be leaky? I mean, you could possibly
define even that in FOL, but it wouldn't make sense to. So I don't consider
the need for that type of extension to be leaky.

Do you consider Tutorial D to be leaky?

The presentation of CQL on my website is almost three years out of date,
sorry about that. I'm trying to finish incorporating arithmetic into the
query implementation (big job!) before updating it.

> I have yet to see a case where the abstraction is perfect.

Regarding internal representations, I've been considering whether to translate
all FOL expressions to use a single operator, effectively the Sheffer Stroke
(actually the "mark of distinction" of George Spencer-Brown from "Laws of
Form"), because that guarantees the ability to deterministically reach the
single simplest form when optimising the query. It's a big change from my
current internal representation however, and I have yet to finish that.

The only other existing implementation I know of is in NORMA Pro (the
basic version is at http://sf.net/projects/orm), which uses the role calculus
for its internal representation. That work is much more complete than mine,
and is in use in some incredible ways with clients of LogicBlox Inc, but they
have no query language. Instead, you have to build queries through a set
of complex Windows dialogs, editing the join paths using a difficult hierarchical
editor.

Clifford Heath, Data Constellation, http://dataconstellation.com
Agile Information Management and Design
Skype: cjheath, Ph: (+61/0)401-533-540

Reply all
Reply to author
Forward
0 new messages