Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[Proposal] Set relations to nil or empty array for new inserts

39 views
Skip to first unread message

Jason Chen

unread,
Nov 26, 2024, 12:44:11 PM11/26/24
to elixir-ecto
Right now when you Repo.insert, you'll get the newly inserted object back with all relations set to `#Ecto.Association.NotLoaded`. However, we could often know there are none because of foreign key constraints. The row just got inserted, so how could another table/row have already existed with the value of the new row's primary key as a foreign key? Ecto could safely set has_one to nil, has_many/many_to_many to empty array, instead of #Ecto.Association.NotLoaded. 

For example, `comments` could safely be `[]`.

```
> Repo.insert(cs)

{:ok,
 %Article{
   __meta__: #Ecto.Schema.Metadata<:loaded, "article">,
   id: "numdd4i5",
   name: "Test",
   comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
   inserted_at: ~U[2024-11-26 16:37:49.666956Z],
   updated_at: ~U[2024-11-26 16:37:49.666956Z]
 }}
```

One common case is with GraphQl/Absinthe, when a mutation asks for relations in the response. But really anything down the pipe that was interested in one of these relations could save a database query per relation.

```gql
mutation {
    createArticle {
      id
      comments {       // Unnecessary database query for nonexistent comments
        id
        text
      }
    }
}
```

The proposal specifically is to automatically set applicable fields to nil/[] instead of #Ecto.Association.NotLoaded upon insertion. Database triggers are the only things I can think of that can invalidate the fundamental assumption that the relations can't exist. Perhaps there are others I'm missing? Perhaps there should be a configuration to enable or disable this behavior instead of being automatic?

José Valim

unread,
Nov 26, 2024, 2:55:50 PM11/26/24
to elixi...@googlegroups.com
Hi Jason,

The reason why we haven't done this change historically is because "Ecto.Association.NotLoaded" explicitly means that we don't know. For example, I could write this code:

Repo.transaction!(fn ->
  post = Repo.insert!(%Post{})
  comment = Repo.insert!(%Comment{post_id: post.id})
  {post, comment}
end)

Which means that "post.comments == []" but that's not actually true! And then calling "Repo.preload(post, :comments)" won't work either, unless you pass the "force: true" option.

That said, people have asked for this for quite some time, so we should probably do something to improve this.

There is one option today, which is for you to explicitly set the fields today. You can do that when you build the struct or right after insert. There are two other options:

1. Have a flag in the association that forces it to be set to empty on insert - but it may have pitfalls on things like upsert

2. Have a function called `Ecto.empty_assocs(struct)`, that you could invoke, to retain explicit control, and it will set every not loaded assoc as empty

I would love to hear yours and everyone's feedback.

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-ecto/c6e8e2a9-5b2a-41cd-b6e9-e061fd46c50dn%40googlegroups.com.

Jason Chen

unread,
Nov 26, 2024, 3:45:42 PM11/26/24
to elixir-ecto
I'm not sure I understand why `post.comments == []` is not true in that example? Extending the example to illustrate:

```
Repo.transaction!(fn ->
  post =
    %Post{}
    |> Ecto.Changeset.change()
    |> Ecto.Changeset.put_assoc(:comments, [%Comment{text: "one"}])
    |> Repo.insert!()

  comment = Repo.insert!(%Comment{post_id: post.id, text: "two"})
  {post, comment}
end)
```

What should `post.comments` be here? Ecto's current behavior is that post has one comment which I think is correct. At the time of the `post` variable assignment, this was the case.

José Valim

unread,
Nov 26, 2024, 3:47:12 PM11/26/24
to elixi...@googlegroups.com
Right, my point was that, if we automatically initialized it to [], then the result coming out of the transaction would be wrong. I was just explaining why it works the way it does today. :)


Greg Rychlewski

unread,
Nov 26, 2024, 7:53:41 PM11/26/24
to elixir-ecto
Another situation is where the associations come from two different sources and may come out of order. One strategy to deal with that is to remove the foreign key constraint, if you trust your data sources are not feeding you noise. 

Out of laziness I will use the same schemas as above. You may receive 3 comments for a given post from one feed source. Then an hour later receive the post from a different feed source. When you insert the post you don't have any information about the comments.

Jason Chen

unread,
Nov 27, 2024, 11:19:55 AM11/27/24
to elixir-ecto
I see - I was not aware you could have associations without foreign key constraints but makes sense from the abstraction layer. This should definitely should be opt-in then.

I like José's first suggestion of having a flag in the association. It seems this is desirable for an association (ex you've set a foreign key constraint on it), it always would be desirable for that association. The second idea of `Ecto.empty_assocs(struct)` would be repetitive in those cases. Also if you wanted it on some associations and not others for that struct, there would have to be some way to specify that ex a second parameter and it might be open to errors if that parameter is not used consistently for the same struct.

A few naming ideas:

1. `has_many(:comments, Comment, empty_on_insert: true)` - would also be allowed on has_one and many_to_many. has_one's "empty" would be nil whereas the others would be Ecto.Association.NotLoaded. It could be possible to allow on belongs_to as well but it could do better in that it could check the corresponding id column to see if it's nil or not.
2. `has_many(:comments, Comment, default_on_insert: [])` - the nice thing about this is if the user does not provide it we could say it defaults to Ecto.Association.NotLoaded. I don't think this is better than 1) personally.
3. `has_many(:comments, Comment, default: [])` - this would shares a option with `field` but this could arguably be either a downside or an upside. has_many also already has a `defaults` option so might be best to not have another so close.
4. `has_many(:comments, Comment, defaults_to_empty: true)` - similar to 1) but closer naming to embed's defaults_to_struct. It may be ambiguous/confusing what happens in select/update cases though

This might be moot but I still don't understand the point with transactions. Here's what should happen today:

```
{post1, post2} = Repo.transaction!(fn ->
  post1 = Repo.insert!(%Post{})
  _comment1 = Repo.insert!(%Comment{post_id: post1.id, text: "post1 comment"})
 
  post2 =
    %Post{}
    |> Ecto.Changeset.change()
    |> Ecto.Changeset.put_assoc(:comments, [%Comment{text: "post2 comment"}])
    |> Repo.insert!()

  _comment2 = Repo.insert!(%Comment{post_id: post2.id, text: "post2 another comment"})
 
  {post1, post2}
end)

# Current behavior passes
assert %{} = post1.comments
assert [_] = post2.comments
```

Why is post2.comments allowed to be unaware of the second comment but post1.comments is not allowed to be unaware of its comment?

José Valim

unread,
Nov 27, 2024, 11:26:55 AM11/27/24
to elixi...@googlegroups.com
 
Also if you wanted it on some associations and not others for that struct, there would have to be some way to specify that ex a second parameter and it might be open to errors if that parameter is not used consistently for the same struct.

If we want it in some associations but not others, could it also be possible that we want it in some cases, but not all? In this case, the function level one gives the most flexibility (at the cost of verbosity)?

Jason Chen

unread,
Nov 27, 2024, 11:50:14 AM11/27/24
to elixir-ecto
> If we want it in some associations but not others, could it also be possible that we want it in some cases, but not all?

The main reason mentioned so far is if the association didn't have a foreign key constraint. So it would seem to be an always or never desire per association

Greg Rychlewski

unread,
Nov 27, 2024, 3:27:04 PM11/27/24
to elixir-ecto
One thing about it always happening is that if you have a long-lived code base at a big company where people filter in and out over the years, one day someone might decide to do this after posting a question on Ecto forum and not even know to check the association options

Repo.transaction!(fn ->
  post = Repo.insert!(%Post{})
  comment = Repo.insert!(%Comment{post_id: post.id})
  {post, comment}
end)

I was thinking maybe it should be an option on `repo.insert`. But then there is the choice of whether to pass it to the nested preload inserts or not.

Jason Chen

unread,
Dec 4, 2024, 1:53:13 PM12/4/24
to elixir-ecto
José's second option of `Ecto.empty_assocs(struct)` is easy enough to try without Ecto changes if that's where people are leaning. We can just implement this `empty_assocs` helper in our codebase and see how well it works and its usage patterns. If it pretty much always follows Repo.insert, that would be some support to adding an option to `Repo.insert` or José's first suggestion of an association flag.

José Valim

unread,
Dec 7, 2024, 3:50:46 AM12/7/24
to elixi...@googlegroups.com
I believe an option on Repo.insert/2 is the best way to go.

Compared to Repo.insert/2, Ecto.empty_assocs/1 has a few downsides: it needs to traverse and modify the data structures again. A Repo.insert option, on the other hand, can normalize the associations, including nested ones, as it traverses. You could also use the Repo.prepare_options callback to set this option to true by default. Or you could use "changeset.repo_options" to enable this feature in the changeset, in case you don't want to push this concern all the way up to the repo.

Overall, the repository option gives us flexibility and performance. So that gets my vote.

A pull request would be very welcome or feel free to open up an issue. :) I can submit a pull request later on, if no one beats me to it, but unfortunately I can't focus on it right now as I work on Elixir v1.18!


--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

Greg Rychlewski

unread,
Dec 8, 2024, 9:02:20 AM12/8/24
to elixi...@googlegroups.com
I had a thought about normalizing the nested associations. Let's say you have:

- Post has many Comment
- Comment belongs to Post

Then you do

post = %Post{title: "title", comments: [%Comment{text: "text"}]}
TestRepo.insert!(post, normalize_assocs: true)

You would get an infinite loop where you have to load Post into the Comment and then comments into the nested Post, etc..

You received this message because you are subscribed to a topic in the Google Groups "elixir-ecto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-ecto/4O4hu6XNjWw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-ecto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/elixir-ecto/CAGnRm4LphNMxpGiX22FTbWKX917%3DyQD1K6otYYdoLaFRNd%2Bspw%40mail.gmail.com.

José Valim

unread,
Dec 8, 2024, 11:59:22 AM12/8/24
to elixi...@googlegroups.com
Gah, you are right. And that leads to the opposite problem, which is probably why we default to NotLoaded today. Imagine that you insert:

    %Post{comments: [%Comment{}]}

If the Comment is initialized to "post: nil", it would obviously be wrong. Perhaps Ecto.empty_assocs(structs) is the way to go, and it is not recursive. Could that be a good starting point for you Jason? Which you can try out and report your feedback on?


Reply all
Reply to author
Forward
0 new messages