New Query Improvements

12 views
Skip to first unread message

Dan Kubb (dkubb)

unread,
Oct 30, 2009, 1:28:51 PM10/30/09
to DataMapper
Hi Everyone,

I've been speccing the underlying objects that make up Query
generation a bit better and have found some interesting improvements,
and one potential problem that I wanted to bring up to the list.

DataMapper has a special Collection object that gives it the
interesting ability to chain collections without lazy loading from the
datastore. So you can do things like this:

class User
# ...

def self.active
all(:active => true)
end

def self.confirmed
all(:confirmed => true)
end
end

User.active.comfirmed
# => SELECT * FROM users WHERE active = true AND confirmed = true

(I'm using SQL because it's a nice compact way to represent a query
that everyone understands. Everything described in this post should
work for all datastores, albeit sometimes not as efficiently as with
an RDBMS that supports subselects)

This allows you to do things like ActiveRecord's named scope, but
without any special DSL or complex code. It's all plain Ruby which is
fantastic.

I've been thinking about the Query and Collection object while
speccing them, and I realized that a Query is just a representation of
a *potential* Set of resources and that we could apply Set operations
like union, intersection and difference on them. This would allow us
to do things like:

User.all(:active => true) | User.all(:confirmed => false)
User.all(:active => true) + User.all(:confirmed => false)
# => SELECT * FROM users WHERE active = true OR confirmed = true

User.all(:active => true) & User.all(:confirmed => false)
# => SELECT * FROM users WHERE active = true AND confirmed = true

User.all(:active => true) - User.all(:confirmed => false)
# => SELECT * FROM users WHERE active = true AND NOT(confirmed =
true)

The first one makes me most excited because while the SQL generation
code has supported OR conditions for a long time, we've not had a way
to do OR in the DataMapper API without resorting to dropping to raw
SQL. This is a bit wordy still, but providing primitives to do these
things will allow us to build even better APIs on top later on.

One cool thing about this is that it could be combined with query
chaining to provide some amazing possibilities, eg:

(User.active & User.confirmed) + (User.customers - User.overdue)

This would return a Collection that when lazy-loaded would run a
*single query* on the datastore. That's pretty *huge* IMHO.

The best thing about this is that the implementation is almost
trivial. Here's a quick code spike I did the other day showing how
easy it was to add to the DataMapper API:

http://gist.github.com/221039

This doesn't handle every example, but it does demonstrate most of the
behavior with just a few lines of code.

In thinking about this though, I did discover one flaw in our current
approach to query chaining, consider the following contrived example
where we want to find the top 10 users out of the 50 most recent
signups:

class User
# ...

def self.recent
all(:order => [ :created_at.desc ], :limit => 50)
end

def self.top10
all(:order => [ :points.desc ], :limit => 10)
end
end

User.recent.top10
# => SELECT * FROM users ORDER BY points DESC LIMIT 10

Now notice how the query for the top10 method clobbers the recent
query? What we'd want is something the equivalent of:

# => SELECT * FROM users WHERE id IN(SELECT id FROM users ORDER BY
created_at DESC LIMIT 50) ORDER BY points LIMIT 10

So in the above case we select the last 50 users, and then from that
group we get the top 10 users sorted from highest to lowest point
score.

The approach we use to get the intersection of the queries is a bit
flawed when you include :limit, :offset or :links in that it clobbers
the values from the other query when it really doing be doing a
subselect type query. It should work perfectly when you want to merge
conditions, which is what I suspect most people use it for, but I
wonder how many people have been hit by this?

My plan to fix this is to keep the query chaining as-is for now, and
add union, intersection and difference to the underlying Query and
Collection objects. I'll spec all the behavior so that it handles the
above cases perfectly, and then I'll swap out the query chaining with
the intersection code and make sure all the specs pass.

I'm not sure if this will be in the next release or not (which is
coming next week), but it should definitely be available in the
release after that, probably in 2-3 weeks from now at the latest.

--

Dan
(dkubb)

Dan Kubb (dkubb)

unread,
Oct 30, 2009, 1:34:49 PM10/30/09
to DataMapper
Whoops! Tiny correction on the SQL query examples, they should be:

Union:

User.all(:active => true) | User.all(:confirmed => false)
User.all(:active => true) + User.all(:confirmed => false)
# => SELECT * FROM users WHERE active = true OR confirmed = false

Intersection:

User.all(:active => true) & User.all(:confirmed => false)
# => SELECT * FROM users WHERE active = true AND confirmed = false

Difference:

User.all(:active => true) - User.all(:confirmed => false)
# => SELECT * FROM users WHERE active = true AND NOT(confirmed =
false)

That's what I get for cutting and pasting ;)

--

Dan
(dkubb)

Thanatos

unread,
Oct 30, 2009, 1:46:27 PM10/30/09
to DataMapper
Could we add a some form of Union/Intersect to a conditions hash? so
it could be User.all(Condition({:active => true}) | Condition
({:confirmed => false}))? This seems very unlikely and potentially
more verbose, but thought I would throw it out there. The default
would be to AND together, you could use |/& for union/intersect, and
could group Condition within another Condition.

Either way, I like it and I think it's a solid improvement!

cheapRoc

unread,
Oct 30, 2009, 1:58:25 PM10/30/09
to DataMapper
Is this the trick or treat part? ;)

Dan Kubb (dkubb)

unread,
Oct 30, 2009, 2:03:46 PM10/30/09
to DataMapper
Thanatos,

> Could we add a some form of Union/Intersect to a conditions hash? so
> it could be User.all(Condition({:active => true}) | Condition
> ({:confirmed => false}))?

Yeah, that seems more verbose that what I'm planning, so I'd probably
say no for now.

I like the idea of encouraging people to build up queries using class
methods too. In AR when you used named scopes one of the better
approaches I've seen is writing a named scope for each distinct state,
and then chaining them together when building the queries. That's the
sort of thing I'd like to encourage when using DM, except with the
simpler class methods that return a Collection for each state.

> Either way, I like it and I think it's a solid improvement!

Glad you like it!

--

Dan
(dkubb)

Jacques Crocker

unread,
Oct 30, 2009, 2:08:18 PM10/30/09
to datam...@googlegroups.com
I'm a big fan of this change, however I would say this isn't going to
replace Datamapper's desperate need for a simple, and straightforward
OR syntax with finders.

Dan, have you looked at the syntax proposed by dm-sugar-glider? Maybe
something similar would be a good approach for easy OR queries
http://github.com/knowtheory/dm-sugar-glider

Thanatos

unread,
Oct 30, 2009, 2:12:00 PM10/30/09
to DataMapper
Yeah, I agree. Here's an example hacked together that fakes union/
intersection. But because it is using hashes, doesn't maintain the
correct order. Interesting none-the-less:

http://gist.github.com/222596

Tony (Thanatos)

Thanatos

unread,
Oct 30, 2009, 2:13:45 PM10/30/09
to DataMapper
I've looked at sugar-glider, and I like it about as much as squirrel
from the guys at thoughtbot. Pretty cool. I also like the idea of
being able to use the &/| operators, but without syntax parsing.

Tony

On Oct 30, 2:08 pm, Jacques Crocker <merbj...@gmail.com> wrote:
> I'm a big fan of this change, however I would say this isn't going to  
> replace Datamapper's desperate need for a simple, and straightforward  
> OR syntax with finders.
>
> Dan, have you looked at the syntax proposed by dm-sugar-glider? Maybe  
> something similar would be a good approach for easy OR querieshttp://github.com/knowtheory/dm-sugar-glider

Jacques Crocker

unread,
Oct 30, 2009, 2:13:55 PM10/30/09
to datam...@googlegroups.com
To expand on my earlier message (I should have read the entire thread
first), I agree with Thanatos that we need something within the actual
conditions hash without having to duplicate the finder syntax. It's
maybe a small thing, but having to duplicate the entire query call
just to add a simple OR query seems like a bit of problem.

Thanatos

unread,
Oct 30, 2009, 2:22:04 PM10/30/09
to DataMapper
Yes, and a querying DSL seems to solve this, at least from a syntax
perspective.

Maybe:

User.all do |users|
users.active | users.confirmed
end

How does that sit?

Tony

Dan Kubb (dkubb)

unread,
Oct 30, 2009, 2:23:13 PM10/30/09
to DataMapper
Jacques,

> I'm a big fan of this change, however I would say this isn't going to  
> replace Datamapper's desperate need for a simple, and straightforward  
> OR syntax with finders.

I agree. The main reason for introducing this is that it fits in with
the Collection API naturally, and provides primitive logic that we can
build even better APIs on top of easily.

> Dan, have you looked at the syntax proposed by dm-sugar-glider? Maybe  
> something similar would be a good approach for easy OR querieshttp://github.com/knowtheory/dm-sugar-glider

This actually is inspired somewhat by the Sugar Glider project that
Ted (knowtheory) is working on. To make things easier for that
project, I was speccing the Query conditions API, and realized that it
might make it easier if there was a simple way to combine new
conditions with existing conditions. If we think of a Query as a Set,
and provide the standard union/intersection/difference methods it
makes this alot easier. Then I thought: what if I expose this through
the Collection? Set operations are *already* available on Collection,
they just lazy load and are totally inefficient -- what if I make it
so they don't lazy load and simply combine the conditions together?
That's how this idea was born.

The Sugar Glider project is more of an experimental query interface
that when complete, I would like everyone to try, and if it proves
useful we can talk about merging it into the core. It provides
similar capabilities to my dm-ambition gem (which is awesome if you're
running on 1.8), but it will have no dependency on ParseTree, so
should work equally well with 1.8, 1.9 and JRuby.

--

Dan
(dkubb)

Thanatos

unread,
Oct 30, 2009, 2:25:37 PM10/30/09
to DataMapper
Oh, then yeah, this seems like a perfect way to help that out! Ignore
my DSL suggestions.

Tony

Dan Kubb (dkubb)

unread,
Oct 30, 2009, 2:38:28 PM10/30/09
to DataMapper
Thanatos,

> Oh, then yeah, this seems like a perfect way to help that out! Ignore
> my DSL suggestions.

In the past there's been alot of discussion about alternate API
interfaces for queries. The problem with most of the suggestions is
that they typically address only one part of what the API would need.
For something to be a true improvement, it would need to support AND,
OR and NOT operations, with and without nesting.

I think monkey patching Hash and other ruby-core primitives is
something we should remove from consideration. It's messy, not
"polite" ruby, and probably not the optimal API anyway.

I would hate to see stuff like this associated with DataMapper:

({ :name => 'Dan Kubb' } | { :name => 'Thanatos' }) & { :active =>
true }

Our focus has always been on providing a nice, clean consistent API,
and I think code like the above is not representative of what we are
trying to do.

Plus Imagine how it'd look with deeper nesting, and negation isn't
handled elegantly (how would you negate a whole condition branch for
example?). Ted's Sugar Glider API should address all of these
concerns and more, without having to resort to writing "line noise"
for our queries :)

--

Dan
(dkubb)

myobie

unread,
Oct 30, 2009, 3:22:03 PM10/30/09
to DataMapper
Thanx for working so hard on this dkubb, this should solve 80-90% of
the times I need fancy queries.
Message has been deleted
Message has been deleted

Clifford Heath

unread,
Oct 30, 2009, 7:14:55 PM10/30/09
to datam...@googlegroups.com
On 31/10/2009, at 4:28 AM, Dan Kubb (dkubb) wrote:
> This would return a Collection that when lazy-loaded would run a
> *single query* on the datastore. That's pretty *huge* IMHO.

Yeah, it's big... but isn't it what Sequel has been doing for, like...
ever?

Cool stuff anyhow. I have the vestigial structures in place to build
it for
ActiveFacts also, with a few interesting twists of my own.

Clifford Heath.

Dan Kubb (dkubb)

unread,
Oct 30, 2009, 9:36:23 PM10/30/09
to DataMapper
Clifford,

> > This would return a Collection that when lazy-loaded would run a
> > *single query* on the datastore.  That's pretty *huge* IMHO.
>
> Yeah, it's big... but isn't it what Sequel has been doing for, like...  
> ever?

Hehe. You are probably correct, although I can't find any similar
examples for Sequel in it's docs or when I google it.

Either way, the hard part wasn't necessarily constructing the SQL
queries, the hard part was making the API such that subqueries would
be possible with RDBMS' that support it *while* also not tightly
coupling DM to how RDBMS' work.

If DataMapper only had to support RDBMS', everything would be much
simpler and we would be doing this forever too :P

> Cool stuff anyhow. I have the vestigial structures in place to build  
> it for ActiveFacts also, with a few interesting twists of my own.

Cool! I look forward to seeing it.

--

Dan
(dkubb)
Reply all
Reply to author
Forward
0 new messages