Team:
Since the group seemed supportive of heading in the direction of the “Purple” proposal during today’s call, I wanted to follow up with some more detailed thoughts on the data model for “Periwinkle” (as I think we’re calling it now). In parallel with the work that Pavish and Ghislaine are doing to flesh out the UX, I’d like to work with Brent (and anyone else who is interested) on fleshing out the data model. I’m eager to get to a point where we agree on the tables/models and columns/fields (and their relationships). Towards that end, I’m hoping there might not be a ton of discussion or work remaining. And I want to get started on it so we can reach an agreement as soon as possible. So I’m hoping an email thread might suffice to reach an agreement. Clarity on the specifics of the data model will really help our internal discussions, I think. So…
Here’s my proposal:
Additional thoughts:
In the past, Brent has taken issue with putting roles "underneath" databases, whereas Pavish has wanted to model roles underneath databases (in Mathesar). I continue to remain neutral about this behavior, but I’ve chosen to follow Pavish’s proposed behavior both when writing the “Purple” specs and when creating this ER diagram, simply because Pavish has seemed to feel more strongly than Brent. But I’m hoping we can resolve this debate soon. What I really really want to avoid is a situation where we pursue something akin to Pavish’s preferred product functionality while implementing it via something akin to Brent’s preferred data model. I raised this concern back on March 29th and I’m still not convinced that anyone fully understood my point at that time. Kriti even called it “bikeshedding”. After that thread, I ended up drafting a substantial critique document detailing specific problems stemming from incompatibilities between Brent’s models and Pavish’s product functionality. But I never presented that critique to anyone because I wanted us to get to a point as a team where we agreed on the product functionality first. I made an attempt at that on April 10th, by suggesting that Pavish incorporate an “Encapsulation” section into his product spec. I don’t think we ever reached an agreement in that PR on the matter, and then the whole product design veered off on a different course. So in the “Purple” proposal, I clearly specified that “encapsulation” behavior by writing:
Role metadata (e.g. name, password) would be stored per-database. Two separate databases on the same server with the same DB role would warrant separate role records with separate (duplicate) passwords stored in Mathesar metadata.
Brent: are you okay with that ☝️ behavior?
To reiterate: I really don't have a preference about the actual “encapsulation” behavior — I’m simply trying to move things towards a solid plan here.
The
trigger on UserRole would ensure the uniqueness of user
and role.database
together, thereby ensuring that each Mathesar user can have
a maximum of one DB role per database.
This data model also allows DB roles to be associated with a database without those roles being associated with any Mathesar user. I see this as a feature, but perhaps others might not.
The
position of the metadata_role
field within this schema is something else we’ve debated in
the past. Briefly we considered putting this field on what
is the Role
table here. I made
a case for putting it on UserRole
.
I could also see a case for omitting this field entirely and
leaving all metadata (including explorations) to be writable
by all users. We could do that if we really wanted to
simplify things.
The overall structure here looks great to me! Here are some additional thoughts:
Enforcing “one role of a given name per DB server” is exactly what I was after back in March when I said:
I’m not seeing a compelling use case for having two DBServerCredential records with the same username and db_server
With the relationship
graph in this proposed schema, I’m very relieved to see the
unique constraint on DBRole
(db_server, username)
— exactly the unique constraint here that I was making such a
fuss about!
Pavish was adamantly against this approach of sharing credentials across databases, so I want to double check to make sure that he’s actually okay with this now.
Given this data model of sharing server/role data across databases, here are some things I think are worth considering from the user’s perspective. In my opinion these are all solvable problems, but they require thought.
When
creating a new Database
,
do we want to allow the user to select an already-entered DBServer
?
I imagine “no”. I imagine we’d require users to manually
enter the host and port. Then on the back end we’d either
create a new DBServer
or use one that already exists.
In
the real world there is a difference between: (a) moving a
database from one server to another server; and (b) changing
the host or port for a server. So in Mathesar it’s possible
we would need to give users: (a) a way to associate a Database
with a different DBServer
(without affecting other databases); and (b) a way
to edit the host/port of a DBServer
(affecting all of its associated Database
records). If all actions are “performed from within a
database” (as Pavish wants) our UX design might need to
clearly provide different mechanisms for (a) and (b). At the
very least the design will need to make the effect of the
user’s action very clear.
If
we actually allow the user to move a database from one
server to another, then there are some other more complex
considerations too… what happens to the roles? If there are
existing UserDBRoleMap
records, then it might not be possible/easy maintain their
association with the Database
record after the Database
is associated with a new DBServer
.
Do we duplicated all the DBRole
records for the new server? Do we delete all the UserDBRoleMap
records? This is tricky. It makes me think that we might
need to prevent users from modifying Database.db_server
,
at least for now.
When
deleting Database
records, we’ll cascade to delete their linked UserDBRoleMap
records too. This opens up the possibility of leaving some DBRole
records orphaned. We might consider that to be a feature
because a user could add a new database without
re-configuring all their roles. But if all actions are
“performed from within a database”, then this gets tricky. I
think its very important that we avoid building a product
that gives users the impression they’ve deleted their roles
(and saved passwords) when in fact that data remains inside
their database with no way to delete it from the UI. I think
we either need to build a way to manage DBRole
records outside the context of a database (breaking Pavish’s
goal) — or we need to automatically delete orphaned DBRole
records. If we automatically delete those records, then we
need to make that behavior really clear to users. We’d need
to think through exactly how we’d provide that
clarity to users, both in terms of UX design and in terms of
technical implementation. There are some mild edge cases I’d
like to present for consideration if/when we get further
into the weeds here.
Deleting
Database
records also opens up the possibility of leaving orphaned DBServer
records. Personally I’m not so concerned about this because
I have a hard time imagining anyone considering that data to
be sensitive or burdensome. I’d be inclined to leave them in
the database.
I agree that UserDBRoleMap.db_server
is weird. I see that it’s there to enforce that the associated Database
and DBRole
have the same DBServer
.
That makes sense but it doesn’t seem like idiomatic usage of
multi-column foreign key constraints. At least, I’ve
never seen multi-column FKs used in this manner. If I were
inspecting this schema, I’d be confused.
I would consider using triggers for this by:
UserDBRoleMap
that enforces database.db_server
being equal to db_role.db_server
.Database
to make db_server
immutableDBRole
to make db_server
immutableWe’re agreed on all the major points, as discussed on 2024-05-30, and we’re moving forward (in large part) with the schema shown in Brent’s ER diagram.
Clarifying points, as directly related to my last message in this thread:
Database
record, we will not allow the user to select from
already entered DBServer
records. We’ll require them to manually enter the host and
port.Database
record, we’ll check to see if it’s the last Database
used for its related DBServer
.
If so, then we’ll delete the DBServer
record and all its related DBRole
records. We’ll make this behavior clear to users in the UI.One small change to models show in Brent’s ER diagram:
(database, name)
.
Somewhat unconventionally, the uniqueness of explorations will
be enforced on the front end. This opens the rare
possibility that two explorations in the same schema could
potentially have the same name. We expect this scenario to be
rare enough that it’s corner we’re cutting right now for the
sake of simplicity and implementation time. The discussion on
this point was a bit nuanced and somewhat intertwined with a
similar discussion on exploration association which lasted
from 48:22 - 1:09:40. I can give a more detailed summary if
needed.The one remaining thing is…
At 43:45 in the meeting I requested some minor changes to terminology. I meant to follow up with an email suggesting more specific changes, so I’ll use this thread to continue that discussion. (Thanks to Kriti for the bump.)
First of all I just want to begin by saying that I do think it’s worth spending a small amount of time thinking critically about these names. They’re likely to be used in code all across our whole stack for years to come. Changing them later won’t be easy.
Here’s what I’d like to change:
DBServer
to Server
.DBRole
to Role
,
and:username
to name
db_server
to server
Database
db_name
to name
db_server
to server
UserDBRoleMap
to… something… (discussed below) and:db_role
to role
db_server
to server
During the meeting,
most people seemed amenable to the changes above.
As for renaming UserDBRoleMap
…
UserDatabaseRoleMap
was a name we discussed during the meeting. I see that as an
improvement. And if that’s as far as I can get with my request,
then okay, I’ll accept it. I don't want to harp on this too
long.
But I’ll say that I
would be quite a bit happier with a single noun like Authorization
,
Access
,
License
,
Mandate
,
Warrant
,
Approval
,
or Clearance
.
And here’s why…
There might come a time when I want to store many records from that table in a JavaScript Map object, perhaps mapping their database id values to their records. Our front end code features data structures like this all over the place. With a name as primitive as “Authorization”, I can straightforwardly extend that name to produce more complex names like “DatabaseAuthorizationMap” (a map with database ids as keys and Authorizations as values). This helps keep my code clean when I want to do a map-like thing with that data. When the entity itself has a name as complex as “UserDatabaseRoleMap”, then extending the name into “UserDatabaseRoleMapMap” starts to get ugly and confusing really fast. This is why I always try to keep model object names as primitive as possible. A name like “Authorization” might seem like a weird or foreign concept at first. But to me that’s okay because it’s the kind of thing that we’re going to be using all over the place and eventually a name like that would feel natural.
In general I tend to
embrace long names when their scope is small to medium. But the
scope this this thing is gigantic. This name is likely
to appear in so many places all over our codebase — Django
models, python modules, RPC methods, TypeScript types,
JavaScript variables, Svelte components, CSS classes, on and on.
When the scope of a name is big enough to be pervasive across
the whole codebase, thin I think simplicity is really valuable,
specifically because it makes it easier for devs to combine that
name into other constructs. Think: AuthorizationListModal.svelte
,
$currentUserAuthorization
,
getAuthorizationsForDatabase()
,
purge_orphaned_authorizations()
,
.new-authorization-row
.
As it see it, our ability to form descriptive names for things
relies on the primitive objects (the models) having primitive
names.
UserDatabaseRoleMap
name, I'll do so here: The noun is "Map", because this object is a map. Everything else in the name is a description of what's being mapped to what. In particular, this object is a function (map) from the space of (user, database)
pairs to the space of role
s. Precisely the way that a javascript map is a function from its set of keys to the set of values. Perhaps it would be nice to specify what's in the domain vs. range of the function, but that would lead to a clunkier name, e.g., UserDatabaseToRoleMap
. I find all other options listed above to be less descriptive, I.e., they give no clues w.r.t. what's being mapped to what. Specifically, none of those options helps one understand the fact that you're getting a database role (DB server concept) by submitting a user (Mathesar concept) and database (bridge between Mathesar and DB server concepts) pair. This is a pretty specific and fussy concept which contrasts, for example, with authorization to manipulate some resource on the web server. In the map case, you're not actually being "authorized" to do anything. You're looking up a database role based on the submitted info. Someone somewhere else performed the authorization step when they added that row to the UserDatabaseRoleMap
table. We discussed this in
a call today and decided to go with UserDatabaseRoleMap
display_name
nickname
name
display_name
since it's the clearest and most obvious what's going on. The only deficiency is that this doesn't make it immediately obvious that this is typically required to be unique.we currently have both ‘name’ and ‘db_name’
Ok I see the code you’re talking about now. Yeah, that is horribly confusing! I would be fine with either:
I can see advantages and disadvantages to both approaches — for users and for implementation effort.
I think I have a
slight preference for (A) because it gives users a simpler metal
model of the way things work.