constructing `Mapper` objects is slow (≈4 seconds) when running a local test

Bill Finn

unread,

Nov 30, 2020, 5:17:39 PM11/30/20

to sqlalchemy

Hello!

We're running a unit test that includes constructing ≈330 tables & ≈2,000 columns total in a local PostgreSQL 10.13 database (on disk, not in memory). After profiling the test, we found that it takes ≈4 seconds to create all of the SQLAlchemy `Mapper` objects [0], which are currently a prerequisite to running the test. This time doesn't include emitting any SQL -- just constructing the in-memory python mapper objects.

So, my questions are:

1. Is this performance about expected for the number of tables & columns? Or does it point to something we may be doing wrong?

2. Are there any tips or tricks for structuring tests so that a single test that requires the database schema runs quickly? I read through the "Testing" section of Essential SQLAlchemy, and we'd like to avoid mocking, if possible.

As far as due diligence goes, I read through the Performance FAQ, but I didn't see a mention of debugging slowness initializing the mapper objects, and I found this StackOverflow post re: mapping slowness in the application server, but that's not my primary concern.

Versions:

PostgreSQL 10.13

SQLAlchemy 1.3.20

Happy to provide more details if you think they'd be helpful.

Thanks,

Bill

[0] https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_20/lib/sqlalchemy/ext/declarative/api.py#L74

https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_20/lib/sqlalchemy/ext/declarative/base.py#L695-L697 (we don't set "__mapper_cls__" on any classes, so we're using the default `mapper`)

Mike Bayer

unread,

Nov 30, 2020, 6:27:19 PM11/30/20

to noreply-spamdigest via sqlalchemy

On Mon, Nov 30, 2020, at 5:17 PM, Bill Finn wrote:

Hello!

We're running a unit test that includes constructing ≈330 tables & ≈2,000 columns total in a local PostgreSQL 10.13 database (on disk, not in memory). After profiling the test, we found that it takes ≈4 seconds to create all of the SQLAlchemy `Mapper` objects [0], which are currently a prerequisite to running the test. This time doesn't include emitting any SQL -- just constructing the in-memory python mapper objects.

So, my questions are:

1. Is this performance about expected for the number of tables & columns? Or does it point to something we may be doing wrong?

it seems a bit steep but mapper construction is optimized to cache a lot of work up front so subsequent operations, namely loading objects, can be faster. to support all that it has to apply Python instrumentation to every attribute on every class and also construct structures that are used for value tracking, load and persist operations of every attribute on every class.

It depends on how complex the relationships between these mappers are and if you are using inheritance, things like that. 330 mappers is a very large model however, it's not going to feel snappy if you have to build the entire series of them.

2. Are there any tips or tricks for structuring tests so that a single test that requires the database schema runs quickly? I read through the "Testing" section of Essential SQLAlchemy, and we'd like to avoid mocking, if possible.

it seems unusual you have just a single unit test that requires a 300-mapper model to be built up and then i presume torn down? if you have many unit tests that use this model you'd want to build the model when the test suite starts, then use it for all of the tests without tearing it down.

As far as due diligence goes, I read through the Performance FAQ, but I didn't see a mention of debugging slowness initializing the mapper objects, and I found this StackOverflow post re: mapping slowness in the application server, but that's not my primary concern.

It's not a typical issue as an application with such a huge model is typically a long running server of some kind so a few seconds for startup is not very critical. test suites would build up the model at startup just once and run all tests against the same model.

Versions:
PostgreSQL 10.13
SQLAlchemy 1.3.20

Happy to provide more details if you think they'd be helpful.

Thanks,
Bill

[0] https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_20/lib/sqlalchemy/ext/declarative/api.py#L74
https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_20/lib/sqlalchemy/ext/declarative/base.py#L695-L697 (we don't set "__mapper_cls__" on any classes, so we're using the default `mapper`)

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/1958022d-e679-4d9f-b2de-91ef7926dde9n%40googlegroups.com.

Bill Finn

unread,

Nov 30, 2020, 7:37:32 PM11/30/20

to sqlal...@googlegroups.com

Hi Mike,

Thanks very much for the helpful response. I'll include a few follow-up notes below, but I suspect the solution to our problem is either to:

1. partition our tests such that any given test only imports the models that it depends on (rather than blanketly importing all of the model classes to construct the entire set of metadata); and/or

2. separate more of our tests from the SQLAlchemy models, so most of our tests don't depend on the ORM at all

(1) may be tricky given the highly relational nature of our data.

On Mon, Nov 30, 2020 at 11:27 PM Mike Bayer <mik...@zzzcomputing.com> wrote:

On Mon, Nov 30, 2020, at 5:17 PM, Bill Finn wrote:
Hello!

We're running a unit test that includes constructing ≈330 tables & ≈2,000 columns total in a local PostgreSQL 10.13 database (on disk, not in memory). After profiling the test, we found that it takes ≈4 seconds to create all of the SQLAlchemy `Mapper` objects [0], which are currently a prerequisite to running the test. This time doesn't include emitting any SQL -- just constructing the in-memory python mapper objects.

So, my questions are:

1. Is this performance about expected for the number of tables & columns? Or does it point to something we may be doing wrong?

it seems a bit steep but mapper construction is optimized to cache a lot of work up front so subsequent operations, namely loading objects, can be faster. to support all that it has to apply Python instrumentation to every attribute on every class and also construct structures that are used for value tracking, load and persist operations of every attribute on every class.

It depends on how complex the relationships between these mappers are and if you are using inheritance, things like that. 330 mappers is a very large model however, it's not going to feel snappy if you have to build the entire series of them.

I see -- ok, thanks. We have roughly 600 foreign key columns, and we are using multi-table inheritance in a number of places.

2. Are there any tips or tricks for structuring tests so that a single test that requires the database schema runs quickly? I read through the "Testing" section of Essential SQLAlchemy, and we'd like to avoid mocking, if possible.

it seems unusual you have just a single unit test that requires a 300-mapper model to be built up and then i presume torn down? if you have many unit tests that use this model you'd want to build the model when the test suite starts, then use it for all of the tests without tearing it down.

Yep, that makes sense. We do build the model once per test suite invocation, so the marginal time cost per test is quite small. However, we really would like to reduce the time it takes to run a single test to make TDD faster.

As far as due diligence goes, I read through the Performance FAQ, but I didn't see a mention of debugging slowness initializing the mapper objects, and I found this StackOverflow post re: mapping slowness in the application server, but that's not my primary concern.

It's not a typical issue as an application with such a huge model is typically a long running server of some kind so a few seconds for startup is not very critical. test suites would build up the model at startup just once and run all tests against the same model.

Makes sense -- thank you!

Bill

Mike Bayer

unread,

Dec 1, 2020, 8:58:25 AM12/1/20

to noreply-spamdigest via sqlalchemy

On Mon, Nov 30, 2020, at 7:37 PM, Bill Finn wrote:

Hi Mike,

Thanks very much for the helpful response. I'll include a few follow-up notes below, but I suspect the solution to our problem is either to:

1. partition our tests such that any given test only imports the models that it depends on (rather than blanketly importing all of the model classes to construct the entire set of metadata); and/or
2. separate more of our tests from the SQLAlchemy models, so most of our tests don't depend on the ORM at all

(1) may be tricky given the highly relational nature of our data.

do your "tests" each run in separate processes? or is the use case when you are working on something and just want to run one test?

There is one area that SQLAlchemy could improve here and it is something I've considered, but not yet proposed, which is that if you are using different ORM "registries", which in 1.3 is the declarative_base(), however in 1.4 there is now a "registry" used for all ORM mappings, that the "configure_mappers()" step, which is where a large portion of your time is being spent, would only take place for the subset of mappers local to a single registry at a time. More simply, you could segment your classes among multiple "declarative base" objects and only have the "configure" step occur for the ones that you need.

I expect that if I add this feature it will produce new kinds of regressions I'm not thinking of which is one of the reasons I'm hesitant.

On Mon, Nov 30, 2020 at 11:27 PM Mike Bayer <mik...@zzzcomputing.com> wrote:

On Mon, Nov 30, 2020, at 5:17 PM, Bill Finn wrote:
Hello!

We're running a unit test that includes constructing ≈330 tables & ≈2,000 columns total in a local PostgreSQL 10.13 database (on disk, not in memory). After profiling the test, we found that it takes ≈4 seconds to create all of the SQLAlchemy `Mapper` objects [0], which are currently a prerequisite to running the test. This time doesn't include emitting any SQL -- just constructing the in-memory python mapper objects.

So, my questions are:

1. Is this performance about expected for the number of tables & columns? Or does it point to something we may be doing wrong?

it seems a bit steep but mapper construction is optimized to cache a lot of work up front so subsequent operations, namely loading objects, can be faster.    to support all that it has to apply Python instrumentation to every attribute on every class and also construct structures that are used for value tracking, load and persist operations of every attribute on every class.

It depends on how complex the relationships between these mappers are and if you are using inheritance, things like that.   330 mappers is a very large model however, it's not going to feel snappy if you have to build the entire series of them.

I see -- ok, thanks. We have roughly 600 foreign key columns, and we are using multi-table inheritance in a number of places.
2. Are there any tips or tricks for structuring tests so that a single test that requires the database schema runs quickly? I read through the "Testing" section of Essential SQLAlchemy, and we'd like to avoid mocking, if possible.

it seems unusual you have just a single unit test that requires a 300-mapper model to be built up and then i presume torn down?    if you have many unit tests that use this model you'd want to build the model when the test suite starts, then use it for all of the tests without tearing it down.

Yep, that makes sense. We do build the model once per test suite invocation, so the marginal time cost per test is quite small. However, we really would like to reduce the time it takes to run a single test to make TDD faster.
As far as due diligence goes, I read through the Performance FAQ, but I didn't see a mention of debugging slowness initializing the mapper objects, and I found this StackOverflow post re: mapping slowness in the application server, but that's not my primary concern.

It's not a typical issue as an application with such a huge model is typically a long running server of some kind so a few seconds for startup is not very critical.   test suites would build up the model at startup just once and run all tests against the same model.

Makes sense -- thank you!

Bill

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CANTn%2BY09s96J7KYf03cjtRvVtYUud4_dSLQLfFCb9JuyfzhY6g%40mail.gmail.com.

Bill Finn

unread,

Dec 1, 2020, 11:04:22 AM12/1/20

to sqlal...@googlegroups.com

Hi Mike,

Thanks again for the response.

do your "tests" each run in separate processes?

Heh, is "tests" surrounding in quotes because we're constructing our entire data model for each test? FWIW, we call them integration tests, and we aren't under the illusion that they are unit tests. To answer your question, when we run the entire test suite, we parallelize ≈13k tests across tens of processes, and it takes 10-25 minutes.

or is the use case when you are working on something and just want to run one test?

Yes, this is the use case I'm trying to optimize for.

Bill

Mike Bayer

unread,

Dec 1, 2020, 2:42:33 PM12/1/20

to noreply-spamdigest via sqlalchemy

On Tue, Dec 1, 2020, at 11:03 AM, Bill Finn wrote:

Hi Mike,

Thanks again for the response.

do your "tests" each run in separate processes?

Heh, is "tests" surrounding in quotes because we're constructing our entire data model for each test? FWIW, we call them integration tests, and we aren't under the illusion that they are unit tests. To answer your question, when we run the entire test suite, we parallelize ≈13k tests across tens of processes, and it takes 10-25 minutes.

right I asked earlier if that's what you were doing. this is not unlike starting and stopping the database for each test, which would also make tests take many seconds each so it's better to create the datamodel just once per process and have it used among many tests (unless a single test is just one process?)

or is the use case when you are working on something and just want to run one test?

Yes, this is the use case I'm trying to optimize for.

for now you'd have to try to limit the import scope or otherwise perhaps identify if some element of the model is taking a particularly long time to initialize in which case we can take a look.

Bill

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CANTn%2BY2pA_6%3Df4mQU21j7oTehHTnCvGe2aYa8%2BynmkPggXKomQ%40mail.gmail.com.

Bill Finn

unread,

Dec 1, 2020, 3:16:45 PM12/1/20

to sqlal...@googlegroups.com

On Tue, Dec 1, 2020 at 7:42 PM Mike Bayer <mik...@zzzcomputing.com> wrote:

do your "tests" each run in separate processes?

Heh, is "tests" surrounding in quotes because we're constructing our entire data model for each test? FWIW, we call them integration tests, and we aren't under the illusion that they are unit tests. To answer your question, when we run the entire test suite, we parallelize ≈13k tests across tens of processes, and it takes 10-25 minutes.
right I asked earlier if that's what you were doing. this is not unlike starting and stopping the database for each test, which would also make tests take many seconds each so it's better to create the datamodel just once per process and have it used among many tests (unless a single test is just one process?)

Yes, this makes sense. When we run the entire test suite, we create the data model just once (per parallelized process), and this part is clear to me. Thank you.

or is the use case when you are working on something and just want to run one test?

Yes, this is the use case I'm trying to optimize for.
for now you'd have to try to limit the import scope or otherwise perhaps identify if some element of the model is taking a particularly long time to initialize in which case we can take a look.

Makes sense, thanks. FWIW, I'll link this older issue where you helped our slow test performance problems by updating SQLAlchemy's accessing of an enum's `__members__`:

https://github.com/sqlalchemy/sqlalchemy/issues/4758

Obviously, that problem has been addressed, but I wanted to say thank you!

Bill

Bill

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CANTn%2BY2pA_6%3Df4mQU21j7oTehHTnCvGe2aYa8%2BynmkPggXKomQ%40mail.gmail.com.

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/cb910e44-2c47-4d58-ae08-eb95152ba8a2%40www.fastmail.com.

Mike Bayer

unread,

Dec 1, 2020, 4:37:41 PM12/1/20

to noreply-spamdigest via sqlalchemy

On Tue, Dec 1, 2020, at 3:16 PM, Bill Finn wrote:

On Tue, Dec 1, 2020 at 7:42 PM Mike Bayer <mik...@zzzcomputing.com> wrote:
do your "tests" each run in separate processes?

Heh, is "tests" surrounding in quotes because we're constructing our entire data model for each test? FWIW, we call them integration tests, and we aren't under the illusion that they are unit tests. To answer your question, when we run the entire test suite, we parallelize ≈13k tests across tens of processes, and it takes 10-25 minutes.
right I asked earlier if that's what you were doing. this is not unlike starting and stopping the database for each test, which would also make tests take many seconds each so it's better to create the datamodel just once per process and have it used among many tests (unless a single test is just one process?)

Yes, this makes sense. When we run the entire test suite, we create the data model just once (per parallelized process), and this part is clear to me. Thank you.

or is the use case when you are working on something and just want to run one test?

Yes, this is the use case I'm trying to optimize for.
for now you'd have to try to limit the import scope or otherwise perhaps identify if some element of the model is taking a particularly long time to initialize in which case we can take a look.

Makes sense, thanks. FWIW, I'll link this older issue where you helped our slow test performance problems by updating SQLAlchemy's accessing of an enum's `__members__`:

https://github.com/sqlalchemy/sqlalchemy/issues/4758

oh that was a goofy one too, yeah. that one was major. could be another one lurking!

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CANTn%2BY3LByMbJxiohHUXkwSL%3DpAH_hKePZ_7GvkRUew08LF_cQ%40mail.gmail.com.

Reply all

Reply to author

Forward