Performance testing and improvements post-beta

Brent Moran

unread,

Jan 20, 2025, 9:53:14 PMJan 20

to Mathesar Developers

Hello everyone, I have a quick thought that I want to record before I forget.

Related to Anish's review of https://github.com/mathesar-foundation/mathesar/pull/4142, I was doing a bit of ad-hoc E2E performance testing in Mathesar. I realized that our bottleneck is no longer the user database functionality when we have high latency, but in fact Django's queries. Basically, Django assumes relatively tight coupling and low latency between the webservice and the Django database, which isn't guaranteed for us. This means there are some serious problems for a few of our functions that need to query for multiple models from the Django DB.

Possible solutions:

Try to look into preloading related models. This will improve performance in limited situations.

This won't work for the most egregious performance problem areas, since they involve setting up multiple model instances.
This is a bit of a pain implementation-wise, and will add a bunch of dependencies and complexity.

Redesign problem functions to minimize Django queries

This is already our stance, but we could look at particularly slow ones to see if we can improve.
It's unlikely this will solve all (or even most) problems, since the slowest functions need multiple Django queries for logical coherence.

Rework our documentation and installation setups to disallow or at least strongly discourage using a remote database for the Django internal storage.

This is the easiest and simplest solution
This adds constraints on the user that we may not be willing to accept.

Go to SQLite3 only for the Django storage.

This is impossible while we still have the concept of an "internal DB server" where we're allowed to create databases, etc.
Would actually make Mathesar a way better product for any user whose main goal is to just connect to preexisting DBs (closer to an executable binary feel).

All of these solutions have serious weaknesses, as noted above. My favorite so far is (2), but framed as a strong recommendation.

Next steps:

We already have an API testing framework setup. I think we should add API testing using that framework which introduces service-DB latency and call each RPC function and REST endpoint with "high-stress" parameters to see what happens.

This will be slow (by design) and it's debatable whether it should be run as part of our pipeline.
We could set some limits on function call response times and fail if they get slow.
We should focus on the effects of latency, not on data size. We currently do pretty well with larger data sets.

Try to come up with other solutions beyond (1), (2), and (3) above.

Brent

Sean Colsen

unread,

Jan 21, 2025, 2:47:45 PMJan 21

to Brent Moran, Mathesar Developers

Brent,

I’m curious to understand this situation more deeply. Which RPC methods have serious problems? And what are the “multiple models” that we’re querying? Have you run a query logger?

My intuition is basically your first option, though I wouldn’t necessarily call it “preloading”, so I could be misunderstanding what you’re describing. I’ve seen pref problems when using ORMs in lots of cases. And I would imagine that we’d be able to improve our querying by not leaning on Django’s ORM magic as heavily. I’ve done this sort of optimization work before, with good success. But it’s hard for me to tell if that’s the case here without seeing the details up closer.

I’d be keen to hop on a call on poke around on a pairing session sometime to brainstorm further.

Brent Moran

unread,

Jan 21, 2025, 7:33:44 PMJan 21

to Mathesar Developers

The biggest problems I specifically noticed are the DB setup functions. By their nature, they sometimes need to get or create a sequence of dependent model instances, resulting in multiple queries. The Django "get or create" function produces 2 round-trips to the DB:

SELECT a model matching the passed params (returns null, meaning we must create)
INSERT to the model

Thus, a call to databases.setup.connect_existing involves the following sequence:

get or create a Server instance (2 db round trips if creating)
get or create a Database instance referencing the returned Server (2 more round trips if creating)
get or create ConfiguredRole instance referencing the returned Server (2 more round trips if creating)
get or create UserDatabaseRoleMap instance referencing the above Server, Database, and ConfiguredRole (2 more round trips if creating)

That's just the Django DB calls that we explicitly initiate. It seems to do more on its own before it even gets to our code (preparing info about the user's auth status, etc.). More calls are involved in installing the SQL and sample data sets on the User DB, but we already know that.

As for optimization, prefetching would help in cases where we don't need to create each instance, but wouldn't help when we do need to create each one. If we weren't using the ORM, we could use data-modifying CTEs with "INSERT ... ON CONFLICT DO UPDATE ... RETURNING" to eliminate the calls back and forth, but that could get pretty messy to maintain.

I haven't found any particularly egregious endpoints or functions that don't involve this sort of multiple-dependent-model creation sequence. We should do a comprehensive audit for that.

Regarding pairing on this, let's wait until we get the beta out, and until I have time to get the performance auditing script written.

Sean Colsen

unread,

Jan 21, 2025, 7:48:14 PMJan 21

to Brent Moran, Mathesar Developers

Hmmm interesting, if these performance issues are in fact limited to the “connect database” flow, then I’d be inclined to not even worry about it unless a user complains directly to us. Let’s say it takes 10 seconds to connect a new database. I’m not sure that would actually even be a problem worth solving at this point because (as it see it) it’s just so far from the “hot path” of typical usage that I’d expect.

Brent Moran

unread,

Jan 21, 2025, 8:06:58 PMJan 21

to Mathesar Developers

I'm also not terribly worried about it. As you say, it's not a common operation. Although, it is an operation that the installing user experiences at least once, and early in their experience. First impressions and all that.

With that said, I don't have any interest in solving (or even really fully diagnosing) this before beta, I just wanted to make sure I wrote it down somewhere while I was thinking about it.

Brent Moran

unread,

Jan 23, 2025, 4:35:54 AMJan 23

to Mathesar Developers

Okay, I took some of my QA time for performance poking.

The good: The new RPC setup is way better than the old one, to the point that we're now held back by Django itself more than anything. Our functions, and the differences between them are usually a miniscule portion of the response time, compared to the general Django overhead (see below).

The bad: Django seems to add an 8-DB-query overhead to each and every authenticated request, RPC or REST. This is before it hits our code at all, so I don't think even caching queryset results will help. Thus, if there are 50ms of latency between the service and the DB, we cannot possibly return a response faster than 400ms. That's pretty bad, given the end-user architectures we're trying to support.

To me, that problem means we should prioritize getting users to colocate the webservice with their mathesar_django DB, or moving to SQLite3 as the only supported Django DB. Or move away from Django entirely ;)

Brent

Reply all

Reply to author

Forward