ActiveRecord/Arel delete with joins?

Christopher J. Bottaro

unread,

Oct 28, 2011, 5:46:04 PM10/28/11

to rubyonra...@googlegroups.com

Hello,

What's the best/easiest way to write this delete statement using ActiveRecord 3.1?

DELETE e1
FROM events e1
JOIN events e2
WHERE e1.subject_type = e2.subject_type
AND e1.subject_id = e2.subject_id
AND e1.origin_type = e2.origin_type
AND e1.origin_id = e2.origin_id
AND e1.id > e2.id

Thanks for the help.

Colin Law

unread,

Oct 29, 2011, 4:31:02 AM10/29/11

to rubyonra...@googlegroups.com

Not answering the question I am afraid, but I think it is unwise to
assume anything about the id sequence. Presumably here you are
assuming that id values are assigned in an increasing sequence, but I
don't think this is necessarily guaranteed in the general case. I
think it might be better to use created_at, if that is what you really
mean. On the other hand if in reality you do not care which one you
delete and have the id test only to make sure that you delete only one
of them then please ignore my comment.

Since you are interested in the best way to code it (rather than just
hacking in the sql) then presumably it is something that happens
routimnely rather than some tidying up operation that you have to do
once. Would it not be possible using validations or similar to ensure
that the duplicate record situation does not happen in the first
place?

Colin

Christopher J. Bottaro

unread,

Oct 29, 2011, 1:23:19 PM10/29/11

to rubyonra...@googlegroups.com

Hmm. Interesting. On the one hand, I'm glad people are looking out for each other and advice is given on best practices. On the other hand, I forgot what it's like to ask for help on the internet and have everything you do under heavy scrutiny… :)

Presumption incorrect. It is a one off and not routine code, but that doesn't stop me from wanting to learn how to better use AR/Arel. Also, I created a unique index as soon as I realized there were dupes and I cleaned them out. I also added validation (which isn't guaranteed to work, hence the unique index in the db), and test cases/specs for the situation.

About the id vs created_at… I disagree and consciously chose the former. I think either will work fine and it's ok to make assumptions about the uniqueness (for sure) and order (comfortably sure) of primary keys for a given adapter. I'm familiar with the Postgres and MySQL adapters and I know they create unique, auto incrementing primary keys for each table.

That said, I am open to notion that I'm wrong or do not fully understand something though, so…

Thanks for the help,

-- C

Colin Law

unread,

Oct 29, 2011, 4:05:06 PM10/29/11

to rubyonra...@googlegroups.com

On 29 October 2011 18:23, Christopher J. Bottaro

I did start by apologising for not answering the question :)
I suspect it may be that for maybe 50% of questions asked here the
best result for the OP is not to have his question directly answered
but to suggest alternative ways of approaching the problem. No I have
not done the research to prove that, it is just my feeling. To answer
a question by suggesting alternatives is therefore a perfectly valid,
and often helpful response.

> Presumption incorrect. It is a one off and not routine code, but that
> doesn't stop me from wanting to learn how to better use AR/Arel. Also, I
> created a unique index as soon as I realized there were dupes and I cleaned
> them out. I also added validation (which isn't guaranteed to work, hence
> the unique index in the db), and test cases/specs for the situation.

OK, I did not realise that this was an academic question. In that
case my suggestion is of no use to you. You never know, it may be of
use to someone else finding this thread in the future, in which case I
have not entirely wasted my time.

> About the id vs created_at… I disagree and consciously chose the former. I
> think either will work fine and it's ok to make assumptions about the
> uniqueness (for sure) and order (comfortably sure) of primary keys for a
> given adapter. I'm familiar with the Postgres and MySQL adapters and I know
> they create unique, auto incrementing primary keys for each table.

Certainly the id values will be unique, there is no question about
that. I seem to remember reading about the situation with multiple
servers where each server will get given a batch of id values it could
use, so that the id values would not necessarily be in the same order
as created_at. I may be mistaken however. Also consider the
possibility in a few years time of someone migrating the code onto a
different db adaptor. The code might then break.
I think the point is that Rails does not guarantee that id values will
increase monotonically and therefore it is not a good idea to rely on
this

> That said, I am open to notion that I'm wrong or do not fully understand
> something though, so…
> Thanks for the help,

I don't think I have been much help. I can answer part of the
question though. You ask for the "best/easiest" way to write the
statement. The *easiest* way is just to code in the sql as you
already have the sql available, so it is easy. I still don't know
whether there is any *better* way though. It is certainly not obvious
to me, sorry again.

Colin

Kurt Werle

unread,

Oct 29, 2011, 11:26:08 AM10/29/11

to Ruby on Rails: Talk

On Oct 29, 1:31 am, Colin Law <clan...@googlemail.com> wrote:

All of this is true, and you should try these things before the
following.

You don't need a join. You're defining all your conditions on one
table just fine:

Event.where(:subject_type => e1.subject_type, ... ).where('id > ?',
e1.id).destroy_all

Frederick Cheung

unread,

Oct 30, 2011, 6:24:40 AM10/30/11

to Ruby on Rails: Talk

This doesn't do the same thing as the original query. This deletes all
the events which have the same subject_type etc as a specific event
e1, but created after e1, i.e. remove what you might consider
duplicates of the specific event e1. The original query on the other
hand deletes all duplicates, not only those that are duplicates of a
specific even.

Personally I would just use the raw sql in a call to delete_all. A
complicated use of arel isn't necessarily "better" or easier to use
than sql (portability concerns aside, but then the whole idea that you
can move any significantly sized app just by changing a line in
database.yml is a bit of a myth anyway).

you could do something like

class Event < ..
def self.duplicate_events
event_alias = Event.arel_table.alias
scoped.joins(event_alias).where(:subject_id =>
event_alias[:subject_id], ...).where(Event.arel_table[:id].gt(event_alias[:id]))
end
end

which gives you an Event.duplicate_events scope

unfortunately (at least with the version of ActiveRecord/Arel in rails
3.0.x you can't do Event.duplicate_events.delete_all, because the bit
of Arel that deals with deletes always does "delete from blah" and
doesn't allow you to say "delete t1 from t1 join t2, ..."

Fred

Christopher J. Bottaro

unread,

Oct 30, 2011, 3:00:42 PM10/30/11

to rubyonra...@googlegroups.com

On Sun, Oct 30, 2011 at 6:24 AM, Frederick Cheung <frederic...@gmail.com> wrote:

Personally I would just use the raw sql in a call to delete_all. A
complicated use of arel isn't necessarily "better" or easier to use
than sql (portability concerns aside, but then the whole idea that you
can move any significantly sized app just by changing a line in
database.yml is a bit of a myth anyway).

you could do something like

class Event < ..
def self.duplicate_events
event_alias = Event.arel_table.alias
scoped.joins(event_alias).where(:subject_id =>
event_alias[:subject_id], ...).where(Event.arel_table[:id].gt(event_alias[:id]))
end
end

That results in the following error:

RuntimeError: unknown class: Arel::Nodes::TableAlias

Distilled down:

Event.joins(Event.arel_table.alias)

Any ideas?

And yeah… it seems like just putting SQL into a delete_all call is the easiest way to go. Still interested in this Arel stuff though.

-- C

Reply all

Reply to author

Forward