New Finder Syntax (before 1.0)

Sam Smoot

unread,

Jul 2, 2008, 9:51:22 PM7/2/08

to DataMapper

I just wanted to put this out there as warning... probably have an
implementation in the next couple weeks.

DM finder syntax is going to change to Arrays. This has 4 major
advantages:

* Conditions order is preserved, which as any DBA will tell you can
make or break the performance of your queries
* Conditions OR support in queries becomes trivial to implement, which
is something we've needed for awhile, and is vital for
CompositePrimaryKeys+StrategicEagerLoading
* Conditions Nesting becomes natural and easy, replacing parenthesis
in SQL with Arrays in Ruby
* Conditions and Options are separated, meaning query options are no
longer reserved from being used as property names

This:
Zoo.all(:name => 'bob')

Becomes:
Zoo.all(:name.eq('bob'))

This:
Person.all(:name.like => 'b%', :age.gt => 30, :order => [:name.desc])

Becomes:
Person.all(:name.like('b%'), :age.gt(30), :order => [:name.desc])

This:
Person.all(:conditions => ["(name like ? AND retired = ?) OR age > ?",
'bob', true, 30] :order [:age])

Becomes:
Person.all([ :name.like('bob'), :retired.is_true ].or([ :age.gt(30) ]), :order
=> [:age])

And because you can refine Collections, a slightly prettier but
insignificantly slower version (outside of micro-benchmarks anyways)
might look like:

Person.all(:name.like('bob'), :retired.is_true).or(:age.gt(30)).order(:age)

So, feel free to make suggestions. Because the 4 points are such a big
issue though, and no syntax suggested so far actually satisfies them
all except this, I don't see a lot of alternatives...

BTW, thanks to Yehuda Katz and Avi Bryant for the idea!

Carl Lerche

unread,

Jul 13, 2008, 8:55:32 PM7/13/08

to DataMapper

I'm really not a fan of the syntax. It looks WAY cluttered. I probably
wouldn't use it.

I think that advanced queries should be modular so that we can pick
the way we want to do it.

Method 1: ParseTree based (doesn't work on jruby, etc...)

ex: Person.all(:order => :age) { (name =~ 'bob' && retired) || age >
30) } # or something

Method 2: Custom query language (easy to build using ParseTree)

ex: Person.all("(name =~ 'bob' && retired) || age > :age", :age =>
30, :order => [:age]) # or something

I would suggest that the syntax be as close to the ParseTree based
solution so that it's easy to convert once ParseTree works on all ruby
VMs.

Method 3: Pure ruby as shown above.

Each method would return some kind of condition tree that the adapters
can convert to a query. There should also be a method to pass adapter
specific conditions.

Anyway, what are your thoughts?

Regards,
-carl

Daniel N

unread,

Jul 13, 2008, 9:16:25 PM7/13/08

to DataMapper

I really like the thoughts on being able to run advanced queries. I
really like the block method for syntax, but I'm concerned that the
block method, if done using parse tree would not be available for all
VM's. Isn't this similar to Ambition?

I'm easy with either syntax, the thing in my mind for either of these
is which would be more robust to build on top of?

Carl Lerche

unread,

Jul 13, 2008, 9:38:48 PM7/13/08

to datam...@googlegroups.com

Daniel,

Yes, the block syntax is similar to Ambition (which is built on ParseTree) and would not be available in VMs. This is why I think that the query syntax stuff should be modular. Ambition style stuff is the pretty way, but there are other options available if that doesn't work for you.

Why should we make block style queries unavailable just because they don't work on some VMs?

My point is that, every method has pros and cons. Let's make them all available and let the developer pick the method (s)he wants.

-carl

--
EPA Rating: 3000 Lines of Code / Gallon (of coffee)

afrench

unread,

Jul 13, 2008, 10:32:00 PM7/13/08

to DataMapper

Just to put things side-by-side to improve comparison, here's an
average and slightly-complex SQL statement:

SELECT posts.id, posts.title, posts.body, posts.slug
FROM posts
INNER JOIN posts_categories on posts.id = posts_categories.post_id
INNER JOIN categories on posts_categories.category_id = categories.id
WHERE
(categories.public IS TRUE and posts.created_at < NOW())
OR
(categories.private IS TRUE and posts.public IS TRUE)
LIMIT 10
OFFSET 0

Would represented in the proposed syntax as:

Posts.all(:limit => 10, :links => [:categories],
[Post.categories.eql(true), Post.created_at.lt(Time.now)],
[Category.private.eql(true), Post.public.eql(true)]

And would be represented in the block syntax as:

Posts.all(:limit => 10, :links => [:categories]) { |post|
(post.categories.select{|c| c.public == true} && post.created_at <
Time.now) || (post.categories.select{|c| c.private = true} &&
post.public == true}

(( I've never used Ambition's query syntax before, so please excuse
any misrepresentations....how do you query contents of a JOIN in
Ambition properly? ))

I'm personally still partial to the original proposal simply because
seeing too many &&, ||, and ! coupled with ( and ) in one line tends
to totally mess me up and I have to go back and re-read the line to
ensure I really do understand what's going on.

The original proposed syntax is nice in that it's pretty friendly to
eyeballs as well as alternative ruby VMs.

===
~Adam
(afrench on irc)

Sindre Aarsaether (somebee)

unread,

Jul 14, 2008, 4:59:45 AM7/14/08

to DataMapper

Hmm, I'm not into this either. Looks very cluttered and forced :-/
I like Carl Lerche's suggestion with a custom query-language better.

1. Ruby 1.9 preserves order in hashes, and other vms will aswell?
If the new hash-syntax from 1.9 is implemepted in rubinius etc to,
we will have even nicer syntax.

Male.all([ :name.like('bob'), :retired.is_true ].or([ :age.gt(30) ]), :order
=> [:age])

would then be:

Male.all({ name.like: 'bob', retired: true } | { age.gt: 30 }, order:
[:age])

There would be no disadvatages to this compared to array-approach in
1.9,
and I think the other vms will make this possible as well
( atleast preservation of order in hash )

2. OR-conditions would be just as trivial with the hash-approach
(http://groups.google.com/group/datamapper/browse_thread/thread/
a948a2ac07393b6d/)

3. Is this for anything else than AND/OR nesting? If not, see (2)

4. Would be quite easy to do now to, as you can just wrap your
conditions in a hash like:

Country.all({ :law => true, :order => false}, :order => [:name.asc])

Or one could make use of collections and do:

Country.all(:law => true, :order => false).order(:name.asc)

which just returns a lazy ordered collection

Person.all(:name.eq('david'), :id.in([1,2,3,4]), :age.gt(80))
Person.all(:name => 'david', :id => [1,2,3,4], :age.gt => 80)

Even though length is the same, I really prefer the last. Much more
readable
How would you have ranges here? .eq feels worse than =>,
and .between(*range*)
is a little unintutive. I use ranges _alot_, and really like the way
it
looks and works in our current approach.

Posting.all(:date => range, :id => [1,2,3,4], :sum.gt => 0)
Posting.all(:date.between(range), :id.in([1,2,3,4]), :sum.gt(0))

We need not_between to, so I'm guessing just passing a range to .eq
would
be the best way.

-----------------------------

I agree that there should be a more advanced alternative, but
_replacing_ the
old one with this is imo not the best solution..

Ana Nelson

unread,

Jul 14, 2008, 6:20:30 AM7/14/08

to DataMapper

Sam,

The array syntax looks very simple and straightforward to me. I don't
think Ambition-style syntax should be a priority at this point, I
would rather project development time goes elsewhere.

Ruby 1.9 is a long way away from being everyone's daily language, and
even when it is there will be legacy 1.8 apps around for years. As a
JRuby + MRI user I would really like to see compatibility preserved
across all VMs.

One of DataMapper's main attractions is the simplicity of its code. I
would hate to see that compromised. If some people want a ParseTree
based syntax then that can go into its own dm-parsetree gem.

-Ana

On Jul 14, 9:59 am, "Sindre Aarsaether (somebee)" <sin...@identu.no>
wrote:

Carl Lerche

unread,

Jul 14, 2008, 12:17:27 PM7/14/08

to datam...@googlegroups.com

The problem with the current finder syntax is that it only supports
AND queries. If anything more advanced is needed, one must drop down
to Sql; however, not all adatrs support SQL (like, the Yaml one). For
these adapters, some kind of advanced finder syntax is needed.

As for the hash syntax with operator overloading, I am really against
overloading the bitwise operation operators.

In the end, I think the best solution right now is to build the back
end for the advaced queries and then implement all solutions in
plugins and see what sticks.

Carl

Sam Smoot

unread,

Jul 14, 2008, 2:30:13 PM7/14/08

to DataMapper

On Jul 14, 11:17 am, "Carl Lerche" <carl.ler...@gmail.com> wrote:
> The problem with the current finder syntax is that it only supports
> AND queries. If anything more advanced is needed, one must drop down
> to Sql; however, not all adatrs support SQL (like, the Yaml one). For
> these adapters, some kind of advanced finder syntax is needed.
>
> As for the hash syntax with operator overloading, I am really against
> overloading the bitwise operation operators.
>
> In the end, I think the best solution right now is to build the back
> end for the advaced queries and then implement all solutions in
> plugins and see what sticks.
>
> Carl

Right. That back-end needs a syntax however. Otherwise we're expecting
everyone's syntax plugins to manipulate the Query#conditions AST
directly. Which is fragile.

Also, I've gotta say, I'm probably speaking from a position of CS
ignorance, but I have a sneaking suspicion that Ruby 1.9's ordered-
hashes is one of the less good design decisions the MRI team has made,
forcing a memory/computation hit, when for most uses, it doesn't
matter. So MRI will be paying a penalty there for a very long time to
come.

Either way, condition order is *very* important. So unless someone can
suggest a syntax that works across VMs, and preserves order, I'm un-
moved from Yehuda's syntax. Plugins encouraged sure.

-Sam

Jon Hancock

unread,

Jul 15, 2008, 4:18:40 AM7/15/08

to DataMapper

I'll throw in my 2 cents:

1 - if the new syntax is the only way you can get everything to fit
together, then I guess you have to do it.
2 - however, I am of the opinion that we should not have a syntax that
supports complex queries. Although Adam's example shows a terse query
written in ruby over the expanded SQL, you _must know_ what the
resultant SQL will look like or else your due for a lot of pain at
some point. The point is, for most apps, you cannot get away from
knowing SQL. Its nice to have shortcuts for the simple stuff. But
trying to get a free ride on complex queries always ends up a mess.

Since I'm not developing this code, I have to live with what you guys
decide. But my personal choice is to have a simple finder syntax for
the easy stuff and write my own SQL for anything complex.

Jon
shellshadow.com

Sam Smoot

unread,

Jul 15, 2008, 4:26:59 AM7/15/08

to DataMapper

On Jul 15, 3:18 am, Jon Hancock <shellsha...@gmail.com> wrote:
> I'll throw in my 2 cents:
>
> 1 - if the new syntax is the only way you can get everything to fit
> together, then I guess you have to do it.
> 2 - however, I am of the opinion that we should not have a syntax that
> supports complex queries. Although Adam's example shows a terse query
> written in ruby over the expanded SQL, you _must know_ what the
> resultant SQL will look like or else your due for a lot of pain at
> some point. The point is, for most apps, you cannot get away from
> knowing SQL. Its nice to have shortcuts for the simple stuff. But
> trying to get a free ride on complex queries always ends up a mess.

Jon, you'll get no argument from me.

This isn't really intended (at least for now): For people to try to
write complex queries.

I mean, I don't see why you couldn't exactly. It's just not the point.

We need preserved order of conditions. If you're using a sub-select
for example. An easy example. You'll want your other cheaper
operations to short-circuit the sub-select when possible.

More importantly, Strategic-Eager-Loading won't work with Composite
Primary Keys until we have proper OR support. This isn't a complex use-
case. It's straight forward. We need to be able to say: This key, or
that key, or that key. And that won't work without beefing up the
queries a bit.

I'm the first guy to bust out a quick DataTransferObject as a stand-in
for a model on a /index page, dropping down to using the raw
DataObjects driver. So I'm on the side of the fence that says you can
probably do better with a few choice usages of SQL.

But there are use-cases beyond simple AND/IN/NOT clauses that our
query/ruby syntax needs to cover. And in those cases, it's important
to me that we ignore your carefully crafted condition precedence
simply because we thought Hashes were prettier than Arrays. (They are,
I agree, I just think it's too high a price to pay for
"free" (keystrokes-wise) aesthetics.)

-Sam

Science

unread,

Jul 15, 2008, 5:23:28 PM7/15/08

to DataMapper

Sam Smoot <ssm...@gmail.com> wrote:
> > > Right. That back-end needs a syntax however. Otherwise we're expecting
> > > everyone's syntax plugins to manipulate the Query#conditions AST
> > > directly. Which is fragile.
>
> > > Also, I've gotta say, I'm probably speaking from a position of CS
> > > ignorance, but I have a sneaking suspicion that Ruby 1.9's ordered-
> > > hashes is one of the less good design decisions the MRI team has made,
> > > forcing a memory/computation hit, when for most uses, it doesn't
> > > matter. So MRI will be paying a penalty there for a very long time to
> > > come.
>
> > > Either way, condition order is *very* important. So unless someone can
> > > suggest a syntax that works across VMs, and preserves order, I'm un-
> > > moved from Yehuda's syntax. Plugins encouraged sure.
>
> > > -Sam
>

Sam - I brought up some related stuff on another thread but thanks for
pointing me here.

My opinion is that data mapper needs to support complex queries to the
extent that it permits "distributed" query building. One of the issues
I have is that building raw SQL is not very object oriented (duh) and
so in my OO architecture where different query segments are
constructed by various arms and legs of the objects, holding together
all the SQL is fairly difficult. I solved this problem by creating a
custom SQL object but it would be great to see Data Mapper support
this kind of stuff natively. The syntax sugar, ParseTree and all that
are fine but the real issue is that I need to do stuff like:

* create a new clause called "foo" ("or'ed" to previous clause)
* add several conditions to it
* create a second new clause "bar"
* add some conditions to it
* go back to "foo" and add some more conditions in there
* add two order by statements
* set "distinct on (x,y,z)" to the front of the select list
* add two more order by statements
* specify several table join statements

Being able to add SQL to the query in some order other than what SQL
expects is really the big deal for me. It lets me design my
application the way it "wants" to be built, rather than the way SQL
would like the output.

I hope this input is useful! Definitely drop me a line if I can
clarify anything.

Steve

Daniel Parker

unread,

Aug 28, 2008, 1:19:42 PM8/28/08

to DataMapper

I would like to strongly suggest that we AT LEAST devise a way to get
at manually instantiating objects from sql results easier. I have one
app, for example, where I have to construct a pretty complex sql query
just to get the right objects, but at that point I'm grabbing whole
records anyway, so it'd just make sense to me to be able to easily
instantiate them into objects. I had to jump through all sorts of
hoops to make it (my find_by_sql method) work, and then the next
version jump of datamapper broke it. My example:

Message.find_by_sql("SELECT messages.* FROM messages INNER JOIN
(SELECT MAX(id) AS id FROM messages WHERE created_by =
'#{current_openid}' GROUP BY feed) AS ids ON messages.id = ids.id")

This basically selects the last-created message from each feed, only
referencing one table. It worked great, and I think is a good example
of something I have no other way of doing. BUT I'd rather instantiate
them into objects from the sql, and not have to do it entirely
manually.

Is there a possibility we can at least provide a "not-recommended-but-
available" way of doing it, and let developers shoot themselves in the
foot until they do it right, if they want to?

~Daniel

mayo

unread,

Sep 2, 2008, 3:57:03 AM9/2/08

to DataMapper

Would repository(:default).adapter.query(%Q{SELECT ... GROUP BY feed})
do the same job for you? This way you get an array of Structs back, or
am I missing something here? (It's pretty late, so I might be...)

mayo

Mike Summers

unread,

Sep 2, 2008, 8:20:00 AM9/2/08

to DataMapper

If we get to vote on this I'd go for directly exposed SQL (why two
syntaxes when one will do?) and then meta matching the result set
attributes to see if they don't fit some existing model object and
return as the model object rather than the array of structs.

If we go with the array sytax please add AND back in. Implicit
'features' are a maintenance problem.

--Mike

Daniel Parker

unread,

Sep 2, 2008, 10:36:21 AM9/2/08

to DataMapper

By the way, I'm all for the new array syntax, as suggested by Yehuda.
If the new syntax can cover just about any complex SQL query, I
wouldn't mind so much being "forced" to never use SQL, but at some
point it translates to SQL anyway, so why not allow the programmer
access to that level if he wants/needs it?

In order to get this to work for me, I had to copy the method from
datamapper that generates and executes the query, and I monkey-patch
it into a find_by_sql -- only because the sql query I needed to do was
too complex for the current query syntax. I don't think this use case
is so far out there -- plenty of people will need to execute complex
queries now and then -- we just have to find the right way to do it.
The monkey-patch I made worked great. In most cases it won't be a
problem (because if datamapper's migration creates the table, the
columns are all in the right order already), and if it is a problem
for someone, they can manually write the column order into the SQL and
it will behave as it should.

I'd love if the new syntax was in place, but also a find_by_sql method
that comes with a warning: "You'd better know what you're doing if you
use this because you could mix up the attribute order by
accident." (Unless we read the column names in the query and read the
incoming data appropriately...)

~Daniel

Adam French

unread,

Sep 2, 2008, 12:45:21 PM9/2/08

to datam...@googlegroups.com

You can only please some of the people all of the time or all of the
people some of the time. Having a way to write manual queries would
go a long way towards supporting the use-case Daniel describes, but I
think we need to keep in mind that DM can talk to any number of
persistent storage schemes, not just SQL-literate ones.

If we do add in find_by_sql functionality, I would suggest we call it
find_manually or something more query language generic. It may be
better left to a DM-More gem as well, since repository.adapter.query
and repository.adapter.execute already exist in DM-core and all it
would do is provide a cleaner API and fully instantiate the object
instead of returning Structs.

Anywho, my $0.02 worth.
===
Adam French
Project Manager
Wieck Media

Daniel Parker

unread,

Sep 2, 2008, 12:47:41 PM9/2/08

to datam...@googlegroups.com

I poked around in DataObjects, and I've seen a version where it injects another method or two when you're using sql-specific repositories. Not hard to do it that way. Of course nobody would want to compose raw SQL when they're not using a sql engine.

--
"You have granted me life and steadfast love, and your care has preserved my spirit." Job 10:12
"The LORD is my chosen portion and my cup . . . indeed, I have a beautiful inheritance." Psalm 16:5-6
"Give what you can ... take nothing back!"

Sam Smoot

unread,

Sep 2, 2008, 3:12:35 PM9/2/08

to DataMapper

On Sep 2, 11:47 am, "Daniel Parker" <dcpar...@gmail.com> wrote:
> I poked around in DataObjects, and I've seen a version where it injects
> another method or two when you're using sql-specific repositories. Not hard
> to do it that way. Of course nobody would want to compose raw SQL when
> they're not using a sql engine.

Daniel,

I think you mean DataMapper. Anyways. Yeah, so what's your use-case
for this?

No offense, just going to lay out my reasoning for why DM is the way
it is.

First, lemme clarify something: The order of the properties is not the
order that an object should be materialized in. That depends on
Query#fields. If you try to materialize just by "lining up the fields"
you'll quickly run into issues with composite fields or lazy-loaded
attributes and eventually prefetching.

I ask because AR::find_by_sql is wrong. It's a bleeding of
responsibilities. DM is for CRUD. It's an O/R Mapper, not a reporting
tool. The objects returned from a find_by_sql are often not
persistable objects and it completely subverts the majority of the
code in DM.

As Adam suggested, if all you want is a convenience to translate query
results into object instances a simple helper in your application
could manage that with very little work. Execute a DO query, map the
results to objects. Something like:

: # This is probably close, but untested, caveat emptor.
: module ApplicationHelper
: def query(model, sql, *args)
: connection = DataObjects::Connection.new(Merb.config[:databases]
[:default])
: reader = connection.create_command(sql).execute_reader(*args)
: results = []
: while(reader.next!) do
: results <<
model.new(Hash[*reader.fields.zip(reader.values).flatten])
: end
: return results
: ensure
: connection.close
: end
: end

You could even pass the fields explicitly to ensure everything's type-
cast properly with Command#set_types.

Reporting tools are valuable for sure. But when you don't have one
handy, the solution is not a half-baked extension to your O/RM. Just
use the database-drivers directly. It only adds a line or two of code
to use the DO connections directly, and the performance pay-off is
well worth it.

So. Why do the objects that come back need to be model instances?

I'll share an example of how we've done it wrong on an app or two...

We have a Photo model with a #preview(max_width, max_height) method.
It's awesome. It takes a look at itself to determine wether it's
published or not, and if so, generates a preview image or thumbnail
at the appropriate size (retaining aspect-ratio) in the public folder
so Apache can serve it. If it's unpublished, it generates a file in a
private folder and returns the path so the app can X-SENDFILE it.

So it depends on knowing the source-path of the original image, and
wether it's published.

Problem is, when we want to do a custom, fast query on our /photos/
index page to bring back the photos for a specific user, we get these
raw Arrays, not Photos.

Ok, easy enough, we move it to a class-method that takes the id,
created_at, and published_at so we know where on the file-system to
find the source file, whether to watermark it, etc.

The problem is, we're making it the responsibility of the Photo model
to know how to generate preview images, and that just doesn't make
sense.

What we need here is a separate class that lives outside of our domain-
model. A Preview class that can take these attributes. It's easier to
test. It's cleaner with a clear separation of responsibilities. It's
easier to re-use between projects as long as our Photo implements a
fairly simple interface.

Most importantly, it just works with a few pieces of data, and it
doesn't matter where it comes from. It's cake to use it with models,
Arrays, Structs, whatever. And it lives in the right Layer of our
application. Not the Business Logic Layer. It lives much higher in the
stack, in the Application Layer.

And so... it's a problem you can work around, and I'd wager that in
many if not most cases, your application code will be the better for
it. So, with that said, it's difficult for me to anticipate a "I can't
write this page I need because DM doesn't have this feature" sort of
use-case, but if you can come up with one I'll do my best to give it
some genuine consideration.

Thanks, -Sam

Daniel Parker

unread,

Sep 2, 2008, 3:40:12 PM9/2/08

to datam...@googlegroups.com

Thanks for the thoughtful reply.

First of all, the mock-method you wrote as an example here is pretty close to what I did to get where I needed. So I think we're on the same page. But my question is, why not make that method and expose it for the sql-flavored adapters? Maybe that should go in DO, or somewhere closer to the sql adapters, true. But in the somewhat rare case that someone needs more power at their fingertips, why make them come up with that helper method themselves?

Here's my use case: I have site where small messages can be posted to different feeds. Basically I need to pull the last message from each feed, but I need to do it in one query. A feed is not a database object, so I don't have a previous list of feeds. Here's my sql query:

"SELECT messages.* FROM messages INNER JOIN (SELECT MAX(id) AS id FROM messages WHERE created_by = ? GROUP BY feed) AS ids ON messages.id = ids.id"
I'm sure you can figure out what it does, but for helpers, it basically grabs the highest (incrementing) id from each feed made by the current user, then grabs the entire record for each of those ids. If there's a better way to do this, go ahead and let me know, but I believe the use case is still a valid one, assuming there may be other complex queries that fill a similar need.

I guess you'd call that a "report" method, and it's being used in the controller, not the model. (I confess this might be a structural/layering inconsistency, and yes, I could move it into a method on the model itself, but it is a single-use need so I didn't.) But of course to be objects of the Message model, I need to be able to pull that sql query into objects the same way DM normally does.

One question in my head is, Why run DM.query and get back Structs when there's really no logical roadblock to getting real model objects? I'm sure there will be cases in the future where I will want to do a custom sql query just because it will take less queries than DM would have taken to do the same thing. You have to admit, there are some limitations to the query syntax, and sometimes there are better ways to write a query that you can't accomplish with the query syntax provided. Will I have to write my own method to short-circuit some DM intestines? Or can I write my own sql when I really need to?

~Daniel

Reply all

Reply to author

Forward