[Proposal] Add Data Dictionary Features To Ecto

47 views
Skip to first unread message

Onorio Catenacci

unread,
Aug 22, 2023, 1:20:20 PM8/22/23
to elixir-ecto
I'd like to propose adding data dictionary features to Ecto.

Data Dictionary

For those who may be unfamiliar with the term a data dictionary is meta data about the data in the database.  For instance, given a quantity, what's a reasonable minimum?  Is there an upper bound on the value?  What's the source of the number?  What does the number signify?  Now some may say, all of this information should be in the database and they'd be correct. But there are details which business analysts (and indeed developers) may need to know which cannot be easily captured in the database.  For instance, is there some government regulation regarding the privacy of this data?  If so does this regulation (GDPR for example) necessitate all deletes to be hard deletes as opposed to soft deletes?  As with comments in code we sometimes need to capture not just the implementation but also the intention and the design decisions. 

How Would You Add The Data Dictionary Features?

While this is still to be determined, I could envision some MVP types of things that I'd like to add.  For example, adding a :minimum and a :maximum option on numeric and date fields feels like relatively low hanging fruit. If you don't constrain a numeric field, then it's on the developer to catch that someone is trying to enter a negative dollar amount (which may be fine but the BA and/or the business should confirm that that's permissible).  Of course that can be checked via a changeset but if it's truly a database constraint it feels like the database is where it should live. Likewise on a date/time a minimum would be the earliest date/time which would be acceptable.  Again, for example, I may be able to enter 1492-01-01T00:00:00 (I'm not sure that's a valid datetime--it's solely meant as an example) as a date for an invoice but that's unlikely to be correct.  While that seems unlikely to happen it could  if a user fat-fingers a value.  Of course, the software doesn't have common sense to question such an absurd date.

I also understand that both Postgresql and SQLServer have the ability to add comments on fields and this seems to be the right place for this sort of metadata about the field to live so that others can extract/view it later.  I would need to research whether or not MySQL supports field level comments and the situation with other DB's which Ecto supports as well.

Why Are You Proposing This?

Given the frequency of the answers here being "Great idea!  Make a PR!" some may be wondering why I'm proposing this (as opposed to simply starting to code on it).  I mean I've already started on a fork of the code so I've got a notion of how to proceed.  There's a strong (and good) bias toward create code and submit a PR rather than talking an idea to death. 

So I'm asking because I can't find any conversation pertaining to the idea of a data dictionary on the mailing list.  I don't believe for a second that I'm such an original thinker that no one else has come up with this idea before.  So I wanted to propose this to see if there were some sort of "Yeah we thought about this before and we decided against it because <x> and <y>" so I don't submit a PR that ends up wasting folks' time.  

Also I would love to hear some feedback on my idea and potential approaches to making it happen. I'm very inclined to put the data dictionary information which cannot be encoded as check constraint into comments associated with the fields in the table but perhaps others have heard of other ways to do this.

Any feedback anyone cares to share would be welcomed! 

Allen Madsen

unread,
Aug 22, 2023, 5:40:37 PM8/22/23
to elixi...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/754958ac-7ecd-435f-a64c-493b71a472c5n%40googlegroups.com.

Onorio Catenacci

unread,
Aug 23, 2023, 11:02:40 AM8/23/23
to elixir-ecto
Check constraints would be a partial answer, yes, but I'm thinking of something a bit more comprehensive. 

Check constraints are to me a lot like looking at someone else's code.  I mean I can read someone else's code and I can easily see what it's actually doing.  Whether or not what it's actually doing is what it _should_ be doing is a far tougher question to answer in many cases.   A data dictionary would have that same purpose of capturing intent (as opposed to implementation) so others could determine if the database (and code) are doing what they should. 

I think one approach might be to add @fielddoc attributes on fields to allow a developer and/or DBA to comment things about the field which cannot be inferred by inspecting the DB or the code.  Also add @schemadoc to allow the developer DBA to document other points of interest which (again) cannot be captured in code.  If a data table needs to always do a hard delete due to privacy regulations, then that should be something captured in a @schemadoc.  That way if the regulation is ever changed other developers will be able to understand why a soft delete approach wasn't used. 

benwil...@gmail.com

unread,
Aug 28, 2023, 6:42:58 AM8/28/23
to elixir-ecto
The traditional answer with some of this stuff is that it should either be in changesets (like min / max) or check constraints. The rationale here is that if you make it on the field itself, then it's relatively easy to have a record in the database that fails to meet those constraints, that then becomes unprocessable by ecto. You can load the record from the DB, but you can't save the same record back without changing fields. So ultimately the data validation stuff you're talking about (date times ranges, etc) should either get enforced at the DB level via check constraints or via changesets in Elixir (or both).

As for your more "business" oriented criterion, this feels like a library to me?

Stefan Lapers

unread,
Aug 28, 2023, 7:42:46 AM8/28/23
to elixi...@googlegroups.com
I would really be awesome to have the possibility to attach meta data on Ecto schemas and the fields...
This would allow any kind of library to make use of this information. E.g. introspection of data sensitivity in applications, code generation, documentation...
Eg having a `meta` property on each ecto field + the schema itself would be enough to allow this I think..


-- 
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.

José Valim

unread,
Aug 28, 2023, 8:30:01 AM8/28/23
to elixi...@googlegroups.com
> This would allow any kind of library to make use of this information. E.g. introspection of data sensitivity in applications, code generation, documentation...

It would also lead developers to couple specific behaviours to their schema, instead of the underlying data, making their code harder to evolve. For example, imagine if all Ecto validations and constraints were part of the schema. Doing conditional validation, validation per role, or validation per action, would become extremely complex, while today it is mostly function composition.

Furthermore, Elixir does provide protocols to allow you to augment existing schemas (actually, the underlying struct). For example, if you want JSON encoding, you do this:

@derive Jason.Encoder, only: ...
schema "..." do
  ...
end

With the large benefit that it is not tied or coupled to Ecto at all. IMO adding metadata to Ecto fields should be a last resort type of feature. :)


Felipe Stival

unread,
Aug 28, 2023, 8:59:59 AM8/28/23
to elixir-ecto
Can't you do all you want with ecto types, for example, a numeric with min and max could be a parameterized type. Am I missing something?

José Valim

unread,
Aug 28, 2023, 9:08:01 AM8/28/23
to elixi...@googlegroups.com
You can but that's the reason why we don't. :) The parameters for types are usually encoding/decoding metadata, not validation metadata.

Stefan Lapers

unread,
Aug 28, 2023, 9:15:51 AM8/28/23
to elixi...@googlegroups.com
Hi Felipe, my use case has been code generation…
Its such a huge productivity boost to generate graphql schemas, open api specs, controllers, tests based on the Ecto schema and then start adapting that code.
Adding validation or constraints in the schema would not really be helpful, just like Jose mentioned.

Onorio Catenacci

unread,
Aug 28, 2023, 10:12:55 AM8/28/23
to elixir-ecto
I fear I've really led the conversation off on a tangent. 

I used min/max as an example of some of the data I'm concerned about capturing.  And, yes, it could be (and probably should be) added as part of the Changeset validation utility functions.  

What I'm discussing is _meta_ data about the fields in the database.  I guess I'm the only developer here to have run across a situation where I needed to know more about a field than I could easily infer from its definition.  For example, say I had a "transaction date" field in a "sales" table.  I can see the minimum date it will accept and I can see the dates that have already been posted by users.  What I cannot see is if the date signifies the date of the sale being finalized or if the date signifies the start of the sale transaction.  For some large sales amounts the two dates may not coincide. Now I may get lucky and that information may be documented somewhere.  What I want is to help with the documentation of details of this sort.  Elixir treats documentation as a first class citizen; I just mainly want to add a bit more documentation which I consider would help.  Whether or not that documentation lives in the database (as field comments) or on an internal website I don't really care.  I just want a standardized way to capture the information and add it to documentation. 

Of course all this is theoretical debate and discussion so unless anyone has a very strong reason to say "don't do it" I'm going to work on this and send along a PR when I have something worth sharing.  We can dissect and discuss a PR and actual code much more easily and with greater utility I think.

Austin Ziegler

unread,
Aug 28, 2023, 11:02:35 AM8/28/23
to elixi...@googlegroups.com
On Mon, Aug 28, 2023 at 10:13 AM Onorio Catenacci <onorio.ca...@gmail.com> wrote:
What I'm discussing is _meta_ data about the fields in the database.  I guess I'm the only developer here to have run across a situation where I needed to know more about a field than I could easily infer from its definition.  For example, say I had a "transaction date" field in a "sales" table.  I can see the minimum date it will accept and I can see the dates that have already been posted by users.  What I cannot see is if the date signifies the date of the sale being finalized or if the date signifies the start of the sale transaction.  For some large sales amounts the two dates may not coincide. Now I may get lucky and that information may be documented somewhere.  What I want is to help with the documentation of details of this sort.  Elixir treats documentation as a first class citizen; I just mainly want to add a bit more documentation which I consider would help.  Whether or not that documentation lives in the database (as field comments) or on an internal website I don't really care.  I just want a standardized way to capture the information and add it to documentation. 

I think that there’s multiple concerns here, and I don’t know that there’s a good answer for anything that you’re doing here:
  1. Remember that Ecto supports embedded schema as well as database-backed schema.
  2. If you’re interested in having this documented *in the database*, many databases support field and relation comments, accessible via Ecto migrations with the `comment:` option on `add` or `table` or other operations. This does not expose it to the Elixir code for inclusion in any documentation there, which is what I think you’re wanting.
  3. GraphQL exposes field-level documentation, so Absinthe supports that, but as far as I know, none of the `description` fields will show up in any generated Elixir documentation related to the GraphQL schema.
  4. Ecto schema are "enhanced" structs, and such documentation typically goes in either `@typedoc t "…"` or `@moduledoc`, depending on what is appropriate.
Unless we’re going to define a "metadata description language" that is unlike anything else in the Elixir ecosystem by which Stefan’s code generation issues could at least partially be solved, and which I think is beyond the scope of Ecto…I don’t think that any sort of standardized schema metadata is going to be useful. After all, it’s essentially a compilable comment that may get out of sync with any check constraints defined on the database, any validation details for the changeset, or something else.

Otherwise, something like Ash might be the way to go, since it is based on resource definition and generates Ecto, Absinthe, and other implementations based on said resource definition.

-a
--

benwil...@gmail.com

unread,
Aug 29, 2023, 2:59:57 PM8/29/23
to elixir-ecto
> so unless anyone has a very strong reason to say "don't do it" I'm going to work on this and send along a PR when I have something worth sharing.

I don't think sending a PR to implement something that has not yet received real support here sends the message you want. You are far more likely to end up with something useful if you build what you are looking for as a library.

Greg Rychlewski

unread,
Aug 29, 2023, 4:09:57 PM8/29/23
to elixir-ecto
We have done this at our work using module docs and then publishing the hexdocs to an internal web site.

Onorio Catenacci

unread,
Aug 29, 2023, 5:28:41 PM8/29/23
to elixir-ecto
I'm sorry; I can see that what I wrote did sound a bit "Damn the torpedoes".  It's just given the number of times I've seen Jose say "PR's are welcome" I didn't think it was inappropriate to submit a PR.  Perhaps I misunderstood something?

That point aside,  I was looking through some older messages just recently and I spotted this:

https://groups.google.com/g/elixir-ecto/c/f9XcNvTiSkk/m/H-WnHZaeAAAJ ([Proposal] support doc option in fields definition) 

I think this idea would serve my purposes just fine.  If people can add docs to fields--well that would serve the purpose I want too.  Obviously no one can force anyone to document details and constraints but we can certainly make the job easier for those who care to do so.

On Tuesday, August 29, 2023 at 2:59:57 PM UTC-4 benwil...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages