Understanding insert and insert_all behaviour

1,255 views
Skip to first unread message

Marlus Saraiva

unread,
Feb 15, 2016, 8:36:47 AM2/15/16
to elixi...@googlegroups.com

Hi all,

Why Repo.insert and Repo.insert_all act differently regarding autogenerated fields like id, inserted_at or updated_at?

When I saw insert_all, my first guess was: it does the same thing that insert does, but for multiples items. It turned out that it actually doesn’t do the same thing. I DO agree that we must have functions that are meant to be closer to the datastore, but maybe they should be named differently. e.g., insert_entries, insert_raw or whatever name we can find that explicitly and undoubtedly states their behaviour. Does it make sense?

Thanks!
-marlus


José Valim

unread,
Feb 15, 2016, 8:58:22 AM2/15/16
to elixi...@googlegroups.com
Hi Marius,

They are named differently, they all finish with "all". :) To quote the docs, the behaviour of insert_all is consistent with update_all which also does not handle autogenerate:




José Valim
Skype: jv.ptec
Founder and Director of R&D

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/CAJvhf-%3DUj8X1ZQec3x9Lv8ZLyv35kWonj750W0x%2BTw3KjgQHAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Marlus Saraiva

unread,
Feb 15, 2016, 10:01:29 AM2/15/16
to elixi...@googlegroups.com
Hi José,

I guess I didn't express myself properly. I know it's consistent with update_all and delete_all and that those functions are well documented. Actually, I think they all have the same "problem". For me, an "_all" suffix intuitively means "do this for all items" and not "do this for all items but don't do that other thing". If `insert` does some extra work for one item, I expect that an insert_all do that same extra work for each item. If it doesn't, it should preferably have that explicit in the name rather than in the docs.

So the point is, although the *_all functions are consistent between them, they're not consistent with their 'single item' counterparts and this is confusing, IMHO. But maybe I'm just getting to paranoid about naming things more explicit ;)

Thanks.

José Valim

unread,
Feb 15, 2016, 10:06:00 AM2/15/16
to elixi...@googlegroups.com
I am OK with changing names as long as we have good options. I particularly don't think think insert_entries is any better or worse (it leads to the same kind of ambiguity) and insert_raw / update_raw / ... just doesn't sound right but could be argued for.

TL;DR: we are definitely open for good name proposals.



José Valim
Skype: jv.ptec
Founder and Director of R&D

Chris McGrath

unread,
Feb 15, 2016, 10:18:13 AM2/15/16
to elixi...@googlegroups.com
How about bulk_insert, bulk_update, bulk_delete etc.

Chris

signature.asc

Marlus Saraiva

unread,
Feb 15, 2016, 6:46:18 PM2/15/16
to elixi...@googlegroups.com

Hi Chris and José.

I don’t know if bulk_* will solve it. The problem is not the _all per se. The problem is that insert implicitly does some other things. One example is setting inserted_at and updated_at. That means I can do:

Repo.insert(%Post{title: "foo"})

but I can’t do:

Repo.insert_all(Post, [%{title: "foo1"}, %{title: "foo2"}]

With insert, the Repo will magically set inserted_at and updated_at. I think that would be OK as long the other similar functions, like insert_all did the same. But it doesn’t. If we try the second code, It will raise: ERROR (not_null_violation): null value in column “inserted_at” violates not-null constraint.

I’m pretty sure there must be some technical reason for update_all and insert_all to act like this. My only concern is that, in the near future, we may end up with an Repo API that is not intuitive anymore, and for each function that manipulates data we’ll need to ask: does this function set those fields or not, does it call that callback or not?

As far as I can see we may end up with two sets of functions: one set closer to the model and another set closer to the datastore.That might be a sign that maybe the problem is not even the functions names, but the responsibilities of the Repo itself. Should the Repo really have those two sets of functions together? If so, we probably want to, at least, “separate” them using some kind of pre/postfix. I know that _entries might not be the best name for it but at least it does NOT lead to the same kind of ambiguity. In this case there’s only one rule: if you see _entry or _entries, that means you’re close to the datastore, consequently, don't expect any implicit behaviour.

Anyway, I’m really excited about about v2.0.

Thanks again.


Gleb Arshinov

unread,
Feb 15, 2016, 7:12:09 PM2/15/16
to elixi...@googlegroups.com
Bulk insert name sounds good, but means something else specific in
terms of database, not 'insert values', so it would be confusing.

I wonder if current limitations of insert_all are inherent, or just a
temporary implementation details. Updating inserted_at or updated_at
seems doable by Ecto automatically rewriting the query. The calbacks
are being deprecated, and in any case some form can be re-implemented
using RETURNING. And if we do get these features in the future it
might make sense to just have a single insert() function which takes
options to control these features.

I like that with Ecto it's usually easier to do things efficiently,
than not. E.g. no automatic association loading. I think it would
consistent with that theme for the default to be efficient, and
non-efficient version to be opt-in. So in the ideal world we'd have:

* one method insert()/update() - transforming into one corresponding
DB statement in the common case
* have this support application-level hooks that can be accommodated
for free (inserted_at, updated_at)
* maybe rethink application-level hooks so they work in a more
batch-oriented way
* have options to enable more hooks/functionality that make things
inefficent be opt-in and controlled by options passed to
insert()/update()

Oh, and if we must change the name another possible word would be "_batch"

Gleb

On Mon, Feb 15, 2016 at 6:46 PM, Marlus Saraiva
> https://groups.google.com/d/msgid/elixir-ecto/CAJvhf-%3DT4HCLKyvfhX3%2B3EcMG5H0h4_4NFgHvjqf%2BLS%3D8XgHbA%40mail.gmail.com.

Matt Widmann

unread,
Feb 21, 2016, 10:56:11 AM2/21/16
to elixir-ecto
I'm curious to learn why `insert_all` and `update_all` cannot perform the same tasks as their singular counterparts. Shouldn't it just be a matter of generating one SQL statement vs multiple similar SQL statements to run in the same transaction?

Perhaps if this is done for efficiency reasons, let the user decide which to use by moving the current `*_all` to `*_batch` as suggested and making new functions `*_all` perform these expected tasks (with the penalty of course). When users find the performance issue, they'll dig into the `*_batch` version and make the necessary changes.

Seems that way we are only asking developers to optimize when it becomes necessary.

Marlus Saraiva

unread,
Feb 24, 2016, 5:49:40 AM2/24/16
to elixi...@googlegroups.com
Matt's suggestion sounds quite reasonable to me.

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages