Periwinkle data model

16 views
Skip to first unread message

Sean Colsen

unread,
May 22, 2024, 1:35:28 PMMay 22
to Mathesar Developers

Team:

Since the group seemed supportive of heading in the direction of the “Purple” proposal during today’s call, I wanted to follow up with some more detailed thoughts on the data model for “Periwinkle” (as I think we’re calling it now). In parallel with the work that Pavish and Ghislaine are doing to flesh out the UX, I’d like to work with Brent (and anyone else who is interested) on fleshing out the data model. I’m eager to get to a point where we agree on the tables/models and columns/fields (and their relationships). Towards that end, I’m hoping there might not be a ton of discussion or work remaining. And I want to get started on it so we can reach an agreement as soon as possible. So I’m hoping an email thread might suffice to reach an agreement. Clarity on the specifics of the data model will really help our internal discussions, I think. So…

Here’s my proposal:

img

(Diagram source)

Additional thoughts:

  • The stars represent multi-column unique constraints.
  • In the past, Brent has taken issue with putting roles "underneath" databases, whereas Pavish has wanted to model roles underneath databases (in Mathesar). I continue to remain neutral about this behavior, but I’ve chosen to follow Pavish’s proposed behavior both when writing the “Purple” specs and when creating this ER diagram, simply because Pavish has seemed to feel more strongly than Brent. But I’m hoping we can resolve this debate soon. What I really really want to avoid is a situation where we pursue something akin to Pavish’s preferred product functionality while implementing it via something akin to Brent’s preferred data model. I raised this concern back on March 29th and I’m still not convinced that anyone fully understood my point at that time. Kriti even called it “bikeshedding”. After that thread, I ended up drafting a substantial critique document detailing specific problems stemming from incompatibilities between Brent’s models and Pavish’s product functionality. But I never presented that critique to anyone because I wanted us to get to a point as a team where we agreed on the product functionality first. I made an attempt at that on April 10th, by suggesting that Pavish incorporate an “Encapsulation” section into his product spec. I don’t think we ever reached an agreement in that PR on the matter, and then the whole product design veered off on a different course. So in the “Purple” proposal, I clearly specified that “encapsulation” behavior by writing:

    Role metadata (e.g. name, password) would be stored per-database. Two separate databases on the same server with the same DB role would warrant separate role records with separate (duplicate) passwords stored in Mathesar metadata.

    Brent: are you okay with that ☝️ behavior?

    • If so, I contend that the data model ought to reflect it.
    • If you’re not okay with that behavior, then I think we ought to have a separate email thread (or realtime discussion) about the behavior before proceeding any further on the models discussion. I want to base the models on our desired functionality.

    To reiterate: I really don't have a preference about the actual “encapsulation” behavior — I’m simply trying to move things towards a solid plan here.

  • The trigger on UserRole would ensure the uniqueness of user and role.database together, thereby ensuring that each Mathesar user can have a maximum of one DB role per database.

  • This data model also allows DB roles to be associated with a database without those roles being associated with any Mathesar user. I see this as a feature, but perhaps others might not.

  • The position of the metadata_role field within this schema is something else we’ve debated in the past. Briefly we considered putting this field on what is the Role table here. I made a case for putting it on UserRole. I could also see a case for omitting this field entirely and leaving all metadata (including explorations) to be writable by all users. We could do that if we really wanted to simplify things.


Brent Moran

unread,
May 30, 2024, 3:43:01 AMMay 30
to Mathesar Developers
Updated diagram, modified to work with the current "periwinkle 2" UI:

image.png

Features:
  • Matches the underlying database situation, at least for relevant database objects. I.e., Roles and Databases both exist on the same level in a database server.
  • enforces integrity:
    • one role of a given name per DB server
    • One database of a given name per DB server
    • One Exploration of a given name per database
    • One DB role for a given (user, database) pair. Could be (user, db_server) instead, but I'm trying to avoid unnecessary fights
    • DB Server of database will match DB server of db_role when looking up a db_role for a given user.
Anti-features:
  • Multicolumn foreign keys are used, but not well-supported by Django.
  • Weird "db_server" column is needed in UserDBRoleMap for the sole purpose of enforcing integrity.
I'm open to feedback w.r.t. those anti-features. (and the rest)

Brent

Sean Colsen

unread,
May 30, 2024, 9:55:50 AMMay 30
to Brent Moran, Mathesar Developers

The overall structure here looks great to me! Here are some additional thoughts:

Encapsulation

Enforcing “one role of a given name per DB server” is exactly what I was after back in March when I said:

I’m not seeing a compelling use case for having two DBServerCredential records with the same username and db_server

With the relationship graph in this proposed schema, I’m very relieved to see the unique constraint on DBRole (db_server, username) — exactly the unique constraint here that I was making such a fuss about!

Pavish was adamantly against this approach of sharing credentials across databases, so I want to double check to make sure that he’s actually okay with this now.

UX implications

Given this data model of sharing server/role data across databases, here are some things I think are worth considering from the user’s perspective. In my opinion these are all solvable problems, but they require thought.

  • When creating a new Database, do we want to allow the user to select an already-entered DBServer? I imagine “no”. I imagine we’d require users to manually enter the host and port. Then on the back end we’d either create a new DBServer or use one that already exists.

  • In the real world there is a difference between: (a) moving a database from one server to another server; and (b) changing the host or port for a server. So in Mathesar it’s possible we would need to give users: (a) a way to associate a Database with a different DBServer (without affecting other databases); and (b) a way to edit the host/port of a DBServer (affecting all of its associated Database records). If all actions are “performed from within a database” (as Pavish wants) our UX design might need to clearly provide different mechanisms for (a) and (b). At the very least the design will need to make the effect of the user’s action very clear.

  • If we actually allow the user to move a database from one server to another, then there are some other more complex considerations too… what happens to the roles? If there are existing UserDBRoleMap records, then it might not be possible/easy maintain their association with the Database record after the Database is associated with a new DBServer. Do we duplicated all the DBRole records for the new server? Do we delete all the UserDBRoleMap records? This is tricky. It makes me think that we might need to prevent users from modifying Database.db_server, at least for now.

  • When deleting Database records, we’ll cascade to delete their linked UserDBRoleMap records too. This opens up the possibility of leaving some DBRole records orphaned. We might consider that to be a feature because a user could add a new database without re-configuring all their roles. But if all actions are “performed from within a database”, then this gets tricky. I think its very important that we avoid building a product that gives users the impression they’ve deleted their roles (and saved passwords) when in fact that data remains inside their database with no way to delete it from the UI. I think we either need to build a way to manage DBRole records outside the context of a database (breaking Pavish’s goal) — or we need to automatically delete orphaned DBRole records. If we automatically delete those records, then we need to make that behavior really clear to users. We’d need to think through exactly how we’d provide that clarity to users, both in terms of UX design and in terms of technical implementation. There are some mild edge cases I’d like to present for consideration if/when we get further into the weeds here.

  • Deleting Database records also opens up the possibility of leaving orphaned DBServer records. Personally I’m not so concerned about this because I have a hard time imagining anyone considering that data to be sensitive or burdensome. I’d be inclined to leave them in the database.

Integrity

I agree that UserDBRoleMap.db_server is weird. I see that it’s there to enforce that the associated Database and DBRole have the same DBServer. That makes sense but it doesn’t seem like idiomatic usage of multi-column foreign key constraints. At least, I’ve never seen multi-column FKs used in this manner. If I were inspecting this schema, I’d be confused.

I would consider using triggers for this by:

  • Adding a trigger to UserDBRoleMap that enforces database.db_server being equal to db_role.db_server.
  • Adding a trigger to Database to make db_server immutable
  • Adding a trigger to DBRole to make db_server immutable

Kriti Godey

unread,
Jun 3, 2024, 7:04:20 PMJun 3
to Sean Colsen, Brent Moran, Mathesar Developers
Was this all resolved at the technical beta meeting last week?

Sean Colsen

unread,
Jun 3, 2024, 10:07:56 PMJun 3
to Kriti Godey, Brent Moran, Mathesar Developers

We’re agreed on all the major points, as discussed on 2024-05-30, and we’re moving forward (in large part) with the schema shown in Brent’s ER diagram.

Loose ends we wrapped up

Clarifying points, as directly related to my last message in this thread:

  • Pavish is on board with sharing credentials across databases.
  • When creating a new Database record, we will not allow the user to select from already entered DBServer records. We’ll require them to manually enter the host and port.
  • We won’t allow users to edit the host/port of a database already created.
  • When deleting a Database record, we’ll check to see if it’s the last Database used for its related DBServer. If so, then we’ll delete the DBServer record and all its related DBRole records. We’ll make this behavior clear to users in the UI.
  • We’ll stick with Brent’s proposed integrity plan that relies on multi-column foreign keys.

One minor change we agreed on

One small change to models show in Brent’s ER diagram:

  • Exploration records will no longer have a unique constraint on (database, name). Somewhat unconventionally, the uniqueness of explorations will be enforced on the front end. This opens the rare possibility that two explorations in the same schema could potentially have the same name. We expect this scenario to be rare enough that it’s corner we’re cutting right now for the sake of simplicity and implementation time. The discussion on this point was a bit nuanced and somewhat intertwined with a similar discussion on exploration association which lasted from 48:22 - 1:09:40. I can give a more detailed summary if needed.

Naming adjustments to discuss

The one remaining thing is…

At 43:45 in the meeting I requested some minor changes to terminology. I meant to follow up with an email suggesting more specific changes, so I’ll use this thread to continue that discussion. (Thanks to Kriti for the bump.)

First of all I just want to begin by saying that I do think it’s worth spending a small amount of time thinking critically about these names. They’re likely to be used in code all across our whole stack for years to come. Changing them later won’t be easy.

Here’s what I’d like to change:

  • Rename DBServer to Server.
  • Rename DBRole to Role, and:
    • change username to name
    • change db_server to server
  • In Database
    • change db_name to name
    • change db_server to server
  • Rename UserDBRoleMap to… something… (discussed below) and:
    • change db_role to role
    • change db_server to server

During the meeting, most people seemed amenable to the changes above.

As for renaming UserDBRoleMap

UserDatabaseRoleMap was a name we discussed during the meeting. I see that as an improvement. And if that’s as far as I can get with my request, then okay, I’ll accept it. I don't want to harp on this too long.

But I’ll say that I would be quite a bit happier with a single noun like Authorization, Access, License, Mandate, Warrant, Approval, or Clearance. And here’s why…

There might come a time when I want to store many records from that table in a JavaScript Map object, perhaps mapping their database id values to their records. Our front end code features data structures like this all over the place. With a name as primitive as “Authorization”, I can straightforwardly extend that name to produce more complex names like “DatabaseAuthorizationMap” (a map with database ids as keys and Authorizations as values). This helps keep my code clean when I want to do a map-like thing with that data. When the entity itself has a name as complex as “UserDatabaseRoleMap”, then extending the name into “UserDatabaseRoleMapMap” starts to get ugly and confusing really fast. This is why I always try to keep model object names as primitive as possible. A name like “Authorization” might seem like a weird or foreign concept at first. But to me that’s okay because it’s the kind of thing that we’re going to be using all over the place and eventually a name like that would feel natural.

In general I tend to embrace long names when their scope is small to medium. But the scope this this thing is gigantic. This name is likely to appear in so many places all over our codebase — Django models, python modules, RPC methods, TypeScript types, JavaScript variables, Svelte components, CSS classes, on and on. When the scope of a name is big enough to be pervasive across the whole codebase, thin I think simplicity is really valuable, specifically because it makes it easier for devs to combine that name into other constructs. Think: AuthorizationListModal.svelte, $currentUserAuthorization, getAuthorizationsForDatabase(), purge_orphaned_authorizations(), .new-authorization-row. As it see it, our ability to form descriptive names for things relies on the primitive objects (the models) having primitive names.

Kriti Godey

unread,
Jun 10, 2024, 11:18:23 PMJun 10
to Sean Colsen, Brent Moran, Mathesar Developers
Thanks Sean, this all sounds good.

I see that no one has responded to the naming adjustments discussion in the last week, who needs to be involved in this discussion, and what's the process for getting it resolved?

Brent Moran

unread,
Jun 11, 2024, 6:17:01 AMJun 11
to Mathesar Developers
Since I don't think I've written down my reasoning for the UserDatabaseRoleMap name, I'll do so here: The noun is "Map", because this object is a map. Everything else in the name is a description of what's being mapped to what. In particular, this object is a function (map) from the space of (user, database) pairs to the space of roles. Precisely the way that a javascript map is a function from its set of keys to the set of values. Perhaps it would be nice to specify what's in the domain vs. range of the function, but that would lead to a clunkier name, e.g., UserDatabaseToRoleMap. I find all other options listed above to be less descriptive, I.e., they give no clues w.r.t. what's being mapped to what. Specifically, none of those options helps one understand the fact that you're getting a database role (DB server concept) by submitting a user (Mathesar concept) and database (bridge between Mathesar and DB server concepts) pair. This is a pretty specific and fussy concept which contrasts, for example, with authorization to manipulate some resource on the web server. In the map case, you're not actually being "authorized" to do anything. You're looking up a database role based on the submitted info. Someone somewhere else performed the authorization step when they added that row to the UserDatabaseRoleMap table.

With all that said, I don't much care about names of things in the end. I consider the actual signature or attributes of an object to be the only reliable information about it, and tend to think of the name as something for someone else to be happy (or not happy) with. If Sean's worried that this term is going to be a problem in the front end, I suppose we can just choose one of his options.

Brent


Sean Colsen

unread,
Jun 11, 2024, 11:11:51 AMJun 11
to Brent Moran, Mathesar Developers

We discussed this in a call today and decided to go with UserDatabaseRoleMap

Brent Moran

unread,
Jun 12, 2024, 3:11:24 AMJun 12
to Mathesar Developers
Updated diagram with the relevant changes shown:

image.png

As I got into migrating our current models, I realized we still need some concept of "display name" for some objects. We've called it variously:
  • display_name
  • nickname
  • name
Of these options, my favorite is display_name since it's the clearest and most obvious what's going on. The only deficiency is that this doesn't make it immediately obvious that this is typically required to be unique.

Any opinions?

Brent Moran

unread,
Jun 12, 2024, 5:02:07 AMJun 12
to Mathesar Developers
I forgot to make one name change. Now fixed:

image.png

Sean Colsen

unread,
Jun 12, 2024, 5:38:32 AMJun 12
to Brent Moran, Mathesar Developers
Thanks Brent. This is looking great.

We also decided to remove the unique constraint on Exploration. It might be nice to update the model to reflect that.

Regarding the question of "display_name" vs "nickname" vs "name"... I'd like to understand which objects we're talking about before making a decision. In general I'd lean towards using "name" for things like Exploration, Role, and Database because those things only have one name-like field. Further, I think that using "display_name" for something like Database or Role could become confusing because those name values aren't just for display — people also use them in their own code sometimes too. We had briefly considered persisting an additional nickname on Role, and in that case (now moot) I think it would make sense to have both "name" and "nickname". For DB objects like Schema, Table, Column, Constraint, Type, I'd also still lean towards using "name" throughout our APIs and functions. Brent I'd like to hear an example of where you'd prefer to use "display_name".

Brent Moran

unread,
Jun 12, 2024, 6:41:07 AMJun 12
to Mathesar Developers
I was thinking specifically of the Database model, where we currently have both 'name' and 'db_name'. The former would go to 'display_name', and the latter to 'name'. Maybe I'm wrong, but I thought we were planning to keep keep both, since the underlying 'name' doesn't actually identify the database that well (e.g., many hosts would likely have a 'postgres' database), and so aren't that useful as the human-friendly (display) name.

Sean Colsen

unread,
Jun 12, 2024, 9:49:40 AMJun 12
to Brent Moran, Mathesar Developers

we currently have both ‘name’ and ‘db_name’

Ok I see the code you’re talking about now. Yeah, that is horribly confusing! I would be fine with either:

  • (A) We use “name” only.

    In the UI we would uniquely identify databases by displaying their host and port.

    With slightly more work on the front end we could also hide the host and port in some UI locations whenever Mathesar is configured with only one server. The brevity would improve aesthetics in the common case where users don’t have multiple servers. And we could even defer the implementation of this "auto-hide" logic until later.
  • (B) We use “name” and “display_name” as you suggest.

I can see advantages and disadvantages to both approaches — for users and for implementation effort.

I think I have a slight preference for (A) because it gives users a simpler metal model of the way things work.

Kriti Godey

unread,
Jun 25, 2024, 9:11:45 AMJun 25
to Sean Colsen, Brent Moran, Mathesar Developers
Just following up here, did we resolve this?

Sean Colsen

unread,
Jun 25, 2024, 9:29:11 AMJun 25
to Kriti Godey, Brent Moran, Mathesar Developers
As far as I understand, this is still an open question. I think it would be helpful if others could respond to this thread with their opinions so that we can see if reaching a decision asynchronously is easy.

Kriti Godey

unread,
Jul 2, 2024, 5:14:45 PMJul 2
to Sean Colsen, Brent Moran, Mathesar Developers
Should we talk about this at tomorrow's meeting?

Sean Colsen

unread,
Jul 2, 2024, 6:53:53 PMJul 2
to Kriti Godey, Brent Moran, Mathesar Developers
Yeah, I think that would be a good idea.

Kriti Godey

unread,
Jul 2, 2024, 7:10:00 PMJul 2
to Sean Colsen, Brent Moran, Mathesar Developers
Okay thanks, added to the agenda.

Brent Moran

unread,
Jul 3, 2024, 8:48:22 PM (14 days ago) Jul 3
to Mathesar Developers
We decided not to have both. It's just `name`, referring to both what the user will see, and the actual database name on the server.
Reply all
Reply to author
Forward
0 new messages