Additional callbacks for `Ecto.Repo` on fetched data

319 views
Skip to first unread message

Mathieu

unread,
Nov 29, 2020, 4:09:49 AM11/29/20
to elixir-ecto
Hello 👋 even though I know this has been requested numerous times already, I'd like to share my arguments as to why I think that  `Ecto.Repo` needs additional callbacks, and how the lack of it hurts the quality of my code and unnecessarily performs more computations than it should.

I hope some decision makers can truly take a little time to consider these points as they have been a source of frustrations, even though I am fond of Ecto.

1. callback to fill virtual fields on fetched schemas

Say a schema defines the following fields: a duration stored in DB, and two virtual fields to easily access the amount of hours and minutes separately:
```
field :duration, :integer
field :duration_hours, :integer, virtual: true
field :duration_minutes, :integer, virtual: true
```

Another example would be a virtual field `:full_name` that concatenates the `:first_name` and `:last_name`.

Without a callback, I have to explicitly fill the virtual fields after each `Ecto.Repo` call that loads data:
```
# some query
|> Repo.all()
|> # fill the virtual fields
```

There are two problems with that:

First, it's error-prone: you have to systematically call the function to fill the virtual field after *each* `Ecto.Repo` call that loads data. Why after every single call? Because of preloads (associations/nested entities should have their virtual fields filled)! (especially when preloads are passed as options to the context function) It's error-prone because one can forget to add the call. It is also obviously very repetitive code.

Secondly, as we might preload associations, after the `Ecto.Repo` call we have to recursively find all the entities and check if any of them need to have their virtual fields filled. These seem unnecessary computations, as a hook on the field would avoid us from looking recursively in the results.

I had to develop the library `https://github.com/mathieuprog/virtual_fields_filler`, that will recursively fill any virtual field that needs computation when fetching the schema and its preloads.
```
# some query
|> Repo.all()
|> fill_virtual_fields()
```

Another drawback is that the context must be aware of those virtual fields, while a callback could abstract this away and details kept in the schema.

2. callback to apply on fetched data

I am designing the permission system for my app, and one idea was to automatically filter out all the unauthorized data returned from a `Ecto.Repo` call.

Say a user wants to retrieve a list of items. Those items may be accessible or not to this user according to user-defined access rules. I think it'd be really nice if I could filter out items for which the current user doesn't have access to in a callback that I'd hook after the `Ecto.Repo` call.

I've read briefly on the motivation of not having such a hook (because it was actually there a long time ago); one main reason is that an `Ecto.Repo` call can be part of a transaction ; it would then be a very bad place to e.g. send out some confirmation email. But this can be documented as a caution to use such a hook.

So after `Ecto.Repo` call that loads data, I now have:
```
# some query
|> Repo.all()
|> fill_virtual_fields()
|> filter_out_unauthorized()
```

Again, as such protected entities can be nested, because of preloads for example, I have to add the call after *each* `Ecto.Repo` call.  

Left without such hook, I have to trust the developer that he or she won't forget to call the function that filters out the fetched entities based on current user's permissions. When we talk about permissions and data leakage, this is just not an acceptable solution, and I have to resort to something else.

One could maybe say that I shouldn't even be fetching entities that are unauthorized in the first place; however, I can't modify the query and add joins and such, as the rules are stored in tables that are hard to join with, and it'd add a lot of complexity and lots of DB changes. A hook can easily *guarantee* me that all fetched data and nested data are authorized.

The idea was to recursively filter and call e.g. a `permit?/2` function, using pattern matching:
```
def  permit?(current_user, %Location{} = location)  
def  permit?(current_user, %Resource{} = resource)    
def  permit?(current_user, %Event{} = event)   
# etc.
```

It seems the two problems are the same, but I have intentionally separated them into 2 points as those should be solved differently. The issue regarding virtual fields should be solved through a hook on the field (to avoid unnecessary recursivity). The second issue should be solved through a callback on the results of the `Ecto.Repo` call.

Yevhenii Kurtov

unread,
Nov 29, 2020, 5:24:31 AM11/29/20
to elixi...@googlegroups.com
Also we can just have before/after_load - a logical development of these more specific options. 
Everyone who will abuse them after attending 3-6 months bootcamp can be just told to not do so even if they have to implement a feature under a time pressure.


Of course if we tell people that they have to maintain a discipline that’s how it will happen.
Let’s take Contexts for example, which are Aggrregates in disguise IMO. Are they responsible only for fetching data in your app or they combine “being” and “doing”? 
If it’s former - then encapsulating whatever you want it there to be shouldn’t be a big fuzz.
If it’s later - build a proper data access layer.

And what if someone will need to call a 3rd party service to fill in those virtual fields? Or perform a costly computation?

TLDR: don’t put your domain logic in data-fetching callbacks.


--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/d770b3fa-abeb-463e-a34a-305bcf8107bcn%40googlegroups.com.

Mathieu

unread,
Nov 29, 2020, 6:34:04 AM11/29/20
to elixir-ecto
I do not understand the idea that we should not deliver a developer convenience because it could be used in a wrong way. As soon as you document about the possible misuses, the framework is not to blame.

The code, how I want to design it, is brittle and has bloat because of the lack of such functionality; because other irresponsible coders that don't read the docs and follow best practices could possibly misuse it? Those who are interested in such functionality in Ecto are penalized because of those?

A callback on `Ecto.Repo` results gives the guarantee that actions are performed on returned DB data. Just as a custom `prepare_query/3` gives you the guarantee that the queries are altered before any `Ecto.Repo` call, as it's used for example to scope any query with a tenant id (see Multi tenancy with foreign keys). This sounds to me a very appealing tool that I'd use if it were in Ecto.

> And what if someone will need to call a 3rd party service to fill in those virtual fields? Or perform a costly computation?

Ecto could provide the option to hook a callback on fetched results, but that doesn't mean the developer is required to use it if it is not suited for his or her case.

Your other comments about context usage were a little too abstract for me, but the interesting part of a hook instead of adding this logic in the context functions or other related modules, is that you have the *guarantee* that your code will run; this is especially important when you work on permissions or tenant scoping (and for same reason - i.e. having this guarantee - the latter relies on a custom `prepare_query/3` in the application's `Repo`).

Mathieu

unread,
Nov 30, 2020, 4:11:26 AM11/30/20
to elixir-ecto
By the way it should also work for preloads. Just like `prepare_query/3` allows you to alter the query for every query, preloads included.

As boundaries for the use of `Ecto.Repo` are not enforced without relying on third party libraries, it's pretty worrying that one can load any resource bypassing the context and context rules without developer discipline. I work now for a company that uses Elixir, they use `Ecto.Repo` all/get/one/etc. rightly in the context, but we see the use of `Ecto.Repo.preload/3` leaking everywhere (they coded fast without paying too much attention to good practices...). Say one day they want to enforce access permissions and filter unauthorized resources, only a callback can save that codebase. With preloads you can fetch resources deeply nested in the database, bypassing any check that we might have thought to be enforced through the context.

Ivan Yurov

unread,
Dec 1, 2020, 5:28:24 AM12/1/20
to elixir-ecto
I think the main purpose of virtual fields was to be filled with aggregates in `select` and `select_merge`, which are parts of the query domain. However any kind of post processing and transformations can be done explicitly without introducing callbacks. For the specific example of authorization control for collections, it doesn't sound like a good idea to do it after the query is executed, cause there's no way to ensure the page size consistency in this case.

Mathieu

unread,
Dec 1, 2020, 9:07:51 AM12/1/20
to elixir-ecto
>  However any kind of post processing and transformations can be done explicitly without introducing callbacks

It's not that easy if you have virtual fields that also need to be filled in nested structs/associations in the result of the query.

> or the specific example of authorization control for collections, it doesn't sound like a good idea to do it after the query is executed, cause there's no way to ensure the page size consistency in this case.

I don't know about page size consistency. Could you share a link?

Filtering the structs after fetching allows me to have a much lighter/cleaner context module, and the module callback would take care of all the permissions and filtering needs. I'd consider this module part of the context. Most importantly I'd have systematically the guarantee that the unauthorized structs are filtered out, as I'd hook into Repo.

More specific to my design, my main issue is that I have no control over which resource is being fetched in the context function, as preloads can be given as options. For instance
`Blog.get_article!(id, preload: [comments: :user, :categories], order_by: [title: :asc])`

I haven't been thinking about permissions when designing such API. I should then force preload requests through specific context function to have full control over what's loaded.

Some permission rules are hard to write into the query. An example: members can share each other's resources (access to notes, events, etc.) ; say an organization has 200+ members and the current user has access to 100 members' resources. When the user fetches the list of events of the organization, I'd have to write a large IN clause in SQL to include only the authorized members from the user-defined permissions. The queries will be hard to write and maintain the more I protect resources, and I can imagine them being hard to write if I have such protected resources in joins and more complex queries. Before venturing into patching my queries, I thought it'd be nice if I can just handle all of the permissions in a callback on the result set, with the added guarantee that no one can bypass the callback module, which I'd consider part of the context domain and rules.

Ivan Yurov

unread,
Dec 1, 2020, 9:31:52 AM12/1/20
to elixi...@googlegroups.com
You can create a recursive transformer if you need to post process the nested structures:
```
def transform(%Post{} = post), do: %{post |
  author: transform(post.author)
}
def transform(%Author{} = author), do: %{author |
  virtual_field: "yay"
}
```
etc... generalized callbacks wouldn't be cleaner anyways.

There's no link to the page size consistency problem, because it's informal. Suppose you load a feed, where authorization is performed after the query execution. Now, you ask for 10 posts and filtering leaves only 3 of them, then you ask for the next 10 posts, and this time all of them are available for the user. Obviously, you want your pages to always be some N size.

I'm not sure I fully understand your issue, but I think you can supply queries into preload, and those queries can narrow the scope according to your permissions: https://hexdocs.pm/ecto/Ecto.Repo.html#c:preload/3
Not sure it'll work like this, but you can try to supply a query that shows all of the comments visible to current user: `post = Repo.preload post, [comments: Comment.visible_for(current_user)]`

You received this message because you are subscribed to a topic in the Google Groups "elixir-ecto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-ecto/EFY2dJPXF1E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/a1ac43ba-5bd4-4594-8c83-348202dfd37bn%40googlegroups.com.


--
Kind regards,
Ivan Yurov

Mathieu

unread,
Dec 16, 2020, 3:19:04 AM12/16/20
to elixir-ecto
> You can create a recursive transformer if you need to post process the nested structures

That's exactly what I'm doing. Each repo call to load resources is followed by a utility function that does recursive transformation to fill virtual fields, e.g.
```
query
|> Repo.all()
|> fill_virtual_fields()
```

The problem is that I have to add this call after *every* repo call that loads data throughout my codebase, because I don't know what data will be loaded from a particular context function due to preloads, as my context offers a flexible API where you can pass preloads as option, e.g.
`Blog.get_article!(id, preload: [comments: :user, :categories], order_by: [title: :asc])`

Also, a callback for filling virtual fields would allow to avoid these recursive computations after each db fetch.

Overall, code will be better if the virtual fields (that can be computed in a pure way) are computed from the schema through a simple callback, than trying to solve it into the context through multiple recursive functions, or making helper functions, etc.

Regarding the issue of making sure that fetched data are authorized (e.g. for current user), you made a very good point that filtering resources after fetching will prevent me to work with paginated data properly (I didn't initially get that by page size you referred to pagination). Thank you for that! Modifying the query is indeed the way to go.

However a query might be wrongly coded, and I still need a way to ensure that all the data fetched from the DB is authorized for the current user, based on some custom permission rules. Therefore it would still be nice to check the fetched data from a callback that would be then systematically called, and I'd like to raise an error if an unauthorized resource has been detected. This mean that if there is a bug in the code – a query is wrong (didn't filter properly, missing where clause, ...) – an error is raised and results are not returned. I have now a trace from which I can debug. I can test my application, but I still don't have the guarantee that I'd miss something for some specific case. This prevents data leakages, where in some sectors you close your business if that were to happen.

woj...@wojtekmach.pl

unread,
Dec 16, 2020, 3:11:47 PM12/16/20
to elixir-ecto
I think you should be able to achieve this with an indirection, something along the lines of:

    defmodule MyApp.Repo do
      alias MyApp.RepoBase

      defdelegate insert_all(schema_or_source, entries, opts \\ []), to: RepoBase
      defdelegate update_all(queryable, entries, opts \\ []), to: RepoBase
      # ...

      def get(queryable, id, opts \\ []) do
        queryable
        |> RepoBase.get(id, opts)
        |> fill_virtual_fields()
      end

      # ...
    end

    defmodule MyApp.RepoBase do
      use Ecto.Repo
    end
Reply all
Reply to author
Forward
0 new messages