Compatibility with Preexisting DBs project feedback

Brent Moran

unread,

Aug 15, 2023, 11:14:00 AM8/15/23

to Mathesar Developers

Hey all,

I've written up a first draft of the project to investigate
compatibility with preexisting databases. Here's the link:

https://wiki.mathesar.org/en/projects/preexisting-postgres

I'd appreciate feedback on it. In particular, it would be good for
Ghislaine, Pavish, and Sean to look at it. Nothing should be too
surprising, but I want to make sure we're all on the same page.

Please use this thread to discuss and give feedback. I think we need
to be pretty quick on this, since we'll want to finalize it by the end
of the week.

Brent

Ghislaine Guerin

unread,

Aug 15, 2023, 12:46:09 PM8/15/23

to Brent Moran, Mathesar Developers

Thanks, Brent. Here's my feedback and some points to consider:

Views
- This feature might overlap with Data Explorer.
- We either need a distinct representation from saved explorations or an integrated approach.

Constraints
- How do we determine which database configurations to show in the UI?
- We should find a balance to avoid overwhelming users if we opt to display all database configurations in the UI.

Prioritization
- I agree with prioritizing fixes for features that could lead to errors or data loss, like moving columns.

All else looks good to me.

Kriti Godey

unread,

Aug 16, 2023, 6:25:34 PM8/16/23

to Ghislaine Guerin, Brent Moran, Mathesar Developers

Looks mostly good to me. The main feedback I have is that we should be prioritizing issues that break Mathesar's ability to work with existing databases over issues that add new functionality to Mathesar. I remember we agreed on that during the meeting, but it is not clear from the write-up that we'll be taking that approach.

Pavish Kumar Ramani Gopal

unread,

Aug 17, 2023, 9:23:25 AM8/17/23

to Brent Moran, Mathesar Developers

The outcome of the project specifies that we would be solving the problems, while the timeline specifies:

> we should be able to accomplish everything up to writing issues within a single cycle

This part requires more clarity.

1. Do we only intend to identify issues and write them up?

2. Do we intend to brainstorm UX problems and come up with solutions for them?

3. Would the issues cover implementation details and UX?

Mainly, I'd like to understand the depth at which the issues would be specced.

Everything else looks good to me.

Sean Colsen

unread,

Aug 17, 2023, 1:51:49 PM8/17/23

to Pavish Kumar Ramani Gopal, Brent Moran, Mathesar Developers

Looks good, Brent!

I’ll add some thoughts…

You said:

Find realistic sample PostgreSQL databases

I would imagine it being important to do this work soon in the project, maybe first. But it’s a challenging task!

I spent some time pondering and researching this today. Here are some take-away thoughts:

The Sample Databases on the PostgreSQL wiki seems to have some useful leads that would be worth chasing down in more detail.
Finding open-source self-hosted applications that uses Postgres would be helpful because it could provide a sample database with some interesting structure.
Finding real data sets distributed openly would be helpful because it could provide a sample database with some interesting content.
Ideally we’d find examples with both interesting structure and interesting content, but this seems a bit tricky.
Applications would be especially useful if their installation steps included an option for loading sample data.

We could comb through the awesome-selfhosted list, but it’s enormous and hard to filter for projects that use Postgres. I wonder if there might be similar resources that would get us further toward the sort of filtering we’d need to do to avoid tons of manual research.
Data sets would be especially useful if they were distributed as a pg_dump SQL file.

I browsed some open data sites to see if I could find any open data distributed via pg_dump SQL. No luck. This is not really surprising though. If I were using Postgres as the source-of-truth for my data, I’d still want to distribute it in a format with broader compatibility. I did some digging in awesome-public-datasets, kaggle, fivethirtyeight, NYC, data.gov. I don’t imagine that stress-testing Mathesar with such data would prove to be particularly useful because what we need is data that is taking advantage of Postgres-specific features.

This seems like it might be a dead end. I’m just forwarding this thought though in case it saves anyone else some time during research.

Perhaps it’s worth asking the team:

Does anyone have any products in-mind that use Postgres?

Kriti Godey

unread,

Aug 17, 2023, 3:00:48 PM8/17/23

to Sean Colsen, Pavish Kumar Ramani Gopal, Brent Moran, Mathesar Developers

Some other places to look for sample DB structures might be awesome-postgres and also looking through popular ORMs and seeing what kinds of DB fields they create in Postgres (e.g. what does Django offer in terms of options for field types and how to handle things like primary keys, look at similar stuff Rails and SQLAlchemy and whatever the top 2-3 ORMs are for other popular programming languages).

Sean Colsen

unread,

Aug 17, 2023, 3:23:39 PM8/17/23

to Kriti Godey, Pavish Kumar Ramani Gopal, Brent Moran, Mathesar Developers

I just thought of another idea. If we could identify two or three recommended “database anonymizer” tools, we could invite Mathesar users to send us redacted copies of their production DBs. For example this person on Matrix has a production DB with 20 tables and is having trouble using Mathesar. It would be neat if we could refer them to a Mathesar docs page which gives instructions on how to redact and upload their production DB to share it with the Mathesar devs.

Years ago, I helped build a tool like this for CiviCRM. It’s an extension called anonymize (not part of the core product) but it was useful for some clients who needed to send redacted prod data to vendors building customizations for them. In this case, some clients were willing to do so. Granted, that’s a product-specific tool, so it’s quite clear what needs to be redacted.

A general purpose anonymization tool would carry more risk, and we’d need to clearly communicate that risk. I wouldn’t imagine many Mathesar users being comfortable sending us their prod data, even after redaction. But I think some probably would! I know there are a lot of general purpose tools like this, so we'd need to vet them a bit before recommending any.

Dominykas Mostauskis

unread,

Aug 18, 2023, 7:40:13 AM8/18/23

to Sean Colsen, Kriti Godey, Pavish Kumar Ramani Gopal, Brent Moran, Mathesar Developers

Quick note, multi-column pks or multi-column unique constraints shouldn't break the backend. I tested that (including adding tests) last cycle.

Brent Moran

unread,

Aug 18, 2023, 9:20:28 AM8/18/23

to Mathesar Developers

Okay, I've updated the project description attempting to capture the feedback here.

Regarding multicolumn keys, I'd still like to double-check that the tests are comprehensive under all cases. One issue will be moving columns between tables where one moved column is involved in a multicolumn foreign key. I'm pretty sure that will just delink the associated tables silently, and without warning, with the current logic. Regardless, if we check everything we can think of, and nothing breaks, we won't make any issues for that section.

Reply all

Reply to author

Forward