Brent,
That view is a clever idea. How would the view definition get updated when, say, a new column is added to the underlying table (especially if done outside of Mathesar)?
You said:
All metadata lookups need to operate on OIDs anyway
I can see how it’s possible to run a query like “describe this column’s type” by directly using OIDs instead of names. But when you say “need”, are you suggesting that it would be problematic to use names instead of OIDs? I’d be curious to see an example of a specific query which you think would work better when performed with OIDs vs names.
Is there agreement that we need to clarify what the architectural requirements are for those features?
Thanks for writing all this up, Brent
I wanted to respond to a few points:
If one copies the PostgreSQL data directory, OIDs are stable.
Sure. But to safely make a filesystem-level backup you’d need to take the DB offline, right? I wouldn’t want us to recommend that sort of backup process to any of our users. Or maybe I’m missing something?
we do a lot more renaming than importing and exporting via SQL
I think this is yet to-be determined. We do a lot of renaming because we’re currently hammering on the product and all its features. But I expect that many of our users will go years without a single DDL operation. But almost everyone needs regular backups of some kind (even if they don’t know it). For my CiviCRM clients, I customarily set up hourly backups. Granted, what matters most for the purpose of this discussion is a full backup-restore cycle which happens much less frequently than a mere backup. But I’ve had numerous occasions in my experience when I’ve needed to restore a backup – most often targeted at a separate staging DB for manual inspection, but also once to an actual live DB, sacrificing some live data made shortly after the backup. For comparison CiviCRM upgrades frequently include schema migrations, but those migrations rarely include rename operations. “Stand-alone” Mathesar use-cases would certainly have more frequent renaming though.
Everyone’s use-case will be different, but the above two points are just to say: I equate pg_dump with “backup stability”, and I place its importance on par with “rename stability”.
I’m ready to accept the OID approach, especially since it’s a currently lighter lift than using names.
But I do think it’s worth briefly entertaining the thought experiment of “what would we do if we wanted to use names?” Here’s what I might do:
When the service layer receives a request to rename a schema, table, or column, it propagates that rename operation into to all metadata as needed.
Other than that, we don’t do anything special.
This approach has plenty of drawbacks vs OIDs (and I’ve expounded on some possible mitigation strategies below)
If renaming is done outside Mathesar, metadata will be left dangling.
We could expose an API and/or command line utility to propagate renames into Mathesar’s metadata. This might be sufficient for most cases.
If metadata is stored in a separate database, the renaming cannot be atomic.
We could consider moving metadata into a schema within the user database.
Externally bookmarked URLs containing schema names or table names would break after renaming.
We could also maintain a table of redirects for schemas and tables. This would never be perfect because situations like two table swapping names would not work, but it would still cover most cases. I’m not sure how big of a problem this would actually be though, even if we did nothing.
When user A performs a rename operation, some client-side operations might break for user B without a full page refresh.
We already have this problem with other DDL operations, but using names would make the problem more common. If we eventually implement a realtime API, then this would be much less of a concern.
As Brent points out, some introspection queries might be a tiny bit slower when run using names due to joining on name columns which are not indexed.
I’m doubtful the performance concerns here would be significant, but we could do some testing to see.
Overall:
To me, the names approach seems simpler than the OIDs approach.
My biggest concern OIDs with OIDs is that I would really like to someday be able to use Mathesar with SQLite. This is a larger conversation, but for the stand-alone use case, I think SQLite has some major advantages over Postgres. Closing the door so tightly on that future would feel sad to me.
But I totally see the value in OIDs too! Personally I see advantages and disadvantages in both directions and I’m a bit torn.
The fact that Brent has such a strong opinion in favor of OIDs is very compelling, and so I’m on board with OIDs.
I just wanted to lay out the above thoughts about names to help us consider all the angles with such a far-reaching decision.
Is there agreement that we need to clarify what the architectural requirements are for those features?No, this seems like premature optimization to me. We can't predict all the features we need to build, and if we make detailed plans for all sorts of features now, it will make us less flexible later.
I do agree that we should have a clear understanding of the impact of any refactors, but this shouldn't be related to a whole list of specific features since those are not set in stone.
I'm nervous about not having a synchronised vision for what the groundwork is for.