Support for UNLOGGED tables in PostgreSQL

888 views
Skip to first unread message

Federico Capoano

unread,
Jul 12, 2015, 12:07:12 PM7/12/15
to django-d...@googlegroups.com
I just ran into this ticket:

I found myself in a situation where it would be really useful.

Two questions:
  • has anybody verified that UNLOGGED tables really perform faster when running tests with Postgres?
  • is this feature on the TODO list for the next release? If yest what needs to be done in ordere to help out?
Thanks
Federico

Curtis Maloney

unread,
Jul 12, 2015, 8:15:57 PM7/12/15
to django-d...@googlegroups.com
From http://www.postgresql.org/docs/9.4/static/sql-createtable.html

UNLOGGED

If specified, the table is created as an unlogged table. Data
written to unlogged tables is not written to the write-ahead log (see
Chapter 29), which makes them considerably faster than ordinary
tables. However, they are not crash-safe: an unlogged table is
automatically truncated after a crash or unclean shutdown. The
contents of an unlogged table are also not replicated to standby
servers. Any indexes created on an unlogged table are automatically
unlogged as well.

So, when PG says it's "considerably faster"... well, they're not known
for overstatement :)

Personally, I'd love to see it as an option [I'm sure I opened a
ticket about it somewhere] but last I tried to implement it the ORM
methods that looked promising were, in fact, misnamed.

I'm sure now that Andrew has significantly cleaned up the table
generation code, it should be feasible to add this option.

As far as "todo list"... there isn't one. Either someone stands up
and submits a PR, or they don't... if you don't, don't expect someone
else to. :)

--
Curtis
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/35f7d80b-7c79-4caf-be0f-7b9c1796fcb2%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Federico Capoano

unread,
Jul 13, 2015, 8:52:42 AM7/13/15
to django-d...@googlegroups.com
Hi Curtis,

thank you very much for the explaination and the feedback.

Regarding the TODO list, let me put it this way: touching the database related functionality in django is not very easy, so before attempting an approach that could be bad, it would have been great to have some hints, if there were a few suggestions on how to proceed, someone like me could try.

Federico

Federico Capoano

unread,
Jul 13, 2015, 10:26:26 AM7/13/15
to django-d...@googlegroups.com
Florian gave me an useful suggestion on IRC, so I was able to run a few quick and dirty tests.

I edited these two files:

db/backends/base/schema.py line 37

    sql_create_table = "CREATE UNLOGGED TABLE %(table)s (%(definition)s)"

contrib/gis/db/backends/postgis/schema.py line 80

    def create_model(self, model):
        super(PostGISSchemaEditor, self).create_model(model)
        # Create geometry columns
        for sql in self.geometry_sql:
            if not 'USING GIST' in sql:
                self.execute(sql)
        self.geometry_sql = []

I had to do this one because I was getting "django.db.utils.NotSupportedError: unlogged GiST indexes are not supported"

I measured the tests with both settings, first without the UNLOGGED trick, then with the trick.

without UNLOGGED: Ran 260 tests in 165.620s
with UNLOGGED: Ran 260 tests in 164.655s

It seems that there is no difference for me.
I already have fine tuned my /etc/postgresql/9.1/main/postgresql.conf

fsync = off
synchronous_commit = off
full_page_writes = off

Now I turned off these optimizations and I tried running again the tests with and without the trick.

without UNLOGGED: Ran 260 tests in 445.451s
with UNLOGGED: Ran 260 tests in 478.913s

For some reason in this case UNLOGGED performed even worse.

So for the moment I'll stay with the postgresql.conf optimizations and ditch the idea of UNLOGGED tables.
Unless some PG expert would find something wrong in my settings and gave some advice on how to properly repeat the tests.

To recap, according to my tests:

UNLOGGED tables does not give me a performance boost

While these are the winners:
  • fsync = off
  • synchronous_commit = off
  • full_page_writes = off
  • django 1.8 --keepdb option
Federico

Aymeric Augustin

unread,
Jul 13, 2015, 11:54:08 AM7/13/15
to django-d...@googlegroups.com
Hello Federico,

2015-07-13 16:26 GMT+02:00 Federico Capoano <federico...@gmail.com>:
While these are the winners:
  • fsync = off
  • synchronous_commit = off
  • full_page_writes = off
Would you like to submit a patch to docs/ref/databases.txt, in the "PostgreSQL
notes" section, explaining this configuration?

I'm sure it would help many people run their tests faster on PostgreSQL. It would
also counter the trend to run tests on SQLite "because it's faster" — it isn't.

--
Aymeric.

Federico Capoano

unread,
Jul 13, 2015, 12:46:30 PM7/13/15
to django-d...@googlegroups.com
Sure, i've just done something similar for this project:

The keepdb option allows to avoid recreating the test database at each run, hence saving precious time.

If you want to speed up tests even more, tweak your local postgresql configuration by setting these values:

# /etc/postgresql/9.1/main/postgresql.conf
# only for development!
fsync = off
synchronous_commit = off
full_page_writes = off

We just have to make sure people actually find this information via search engines.

The best thing would be to have a subsection called "How to speed up tests with Postgres" or something similar.

What do you think?

Federico

Schmitt, Christian

unread,
Jul 13, 2015, 1:29:12 PM7/13/15
to django-d...@googlegroups.com
Wouldn't it be enough to just have link to the correct PostgreSQL site: http://www.postgresql.org/docs/current/static/non-durability.html
And then some text with "if you want to run your tests on postgresql please see the docs around non durable postgres"

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Federico Capoano

unread,
Jul 14, 2015, 4:20:45 AM7/14/15
to django-d...@googlegroups.com

That's also a viable alternative, although it also mention unlogged tables, which are not supported by django.

We could list the quick hint with config sample, a link to pg non durable options page and a warning that UNLOGGED tables are not supported yet. What do you think about this?

BTW has anyone had the time to try unlogged tables with their test suite to see if it gave them any improvements? I am quite surprised i didn't even get a 5-10% improvement out of the promised 10-17%.

Federico

You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/IkRgMxTTzPQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Tim Graham

unread,
Jul 14, 2015, 1:03:03 PM7/14/15
to django-d...@googlegroups.com
I see a thirty second increase in the test suite (from 7.5 minutes to 8 minutes) on my local machine with:


sql_create_table = "CREATE UNLOGGED TABLE %(table)s (%(definition)s)"

Federico Capoano

unread,
Jul 15, 2015, 9:19:47 AM7/15/15
to django-d...@googlegroups.com
That's quite baffling.

Shai Berger

unread,
Jul 15, 2015, 3:01:59 PM7/15/15
to django-d...@googlegroups.com
This is a shot in the dark: Could it be that rolling back transactions
involving unlogged tables is harder? The idea does make sense, and running the
test suite does an extremely untypical amount of rollbacks.

Curtis Maloney

unread,
Jul 15, 2015, 11:35:22 PM7/15/15
to django-d...@googlegroups.com
On 16 July 2015 at 05:01, Shai Berger <sh...@platonix.com> wrote:
> This is a shot in the dark: Could it be that rolling back transactions
> involving unlogged tables is harder? The idea does make sense, and running the
> test suite does an extremely untypical amount of rollbacks.

I thought at some point I read that unlogged tables didn't support
transactions... however, the docs don't agree.

I was also considering going to the PG people and asking if the
UNLOGGED docs needed review... there could be all manner of reasons
why, on modern hardware, going via the WAL is as fast now...

--
C

Christophe Pettus

unread,
Jul 16, 2015, 2:34:56 AM7/16/15
to django-d...@googlegroups.com

On Jul 15, 2015, at 8:35 PM, Curtis Maloney <cur...@acommoncreative.com> wrote:

> On 16 July 2015 at 05:01, Shai Berger <sh...@platonix.com> wrote:
>> This is a shot in the dark: Could it be that rolling back transactions
>> involving unlogged tables is harder? The idea does make sense, and running the
>> test suite does an extremely untypical amount of rollbacks.
>
> I thought at some point I read that unlogged tables didn't support
> transactions... however, the docs don't agree.

Transactions behave the same in PostgreSQL for both logged and unlogged tables (except for, of course, the lack of a commit / rollback entry in the WAL), and there's no appreciable performance benefit on COMMIT and ROLLBACK time for logged vs unlogged.

My guess is that the Django tests are not generating enough data to make the WAL activity be a significant time sink.

By the way, I would strongly advise *against* *ever* even mentioning fsync = off anywhere in the Django documentation; that is such a horribly bad idea in 99.95% of real-life situations that steering people towards it as a "go faster" button is very unwise.

--
-- Christophe Pettus
x...@thebuild.com

Federico Capoano

unread,
Jul 16, 2015, 4:16:29 AM7/16/15
to django-d...@googlegroups.com
Hey Christophe,

On Thu, Jul 16, 2015 at 8:34 AM, Christophe Pettus <x...@thebuild.com> wrote:
[CUT]
>
> By the way, I would strongly advise *against* *ever* even mentioning fsync = off anywhere in the Django documentation; that is such a horribly bad idea in 99.95% of real-life situations that steering people towards it as a "go faster" button is very unwise.

We were not mentioning to advise to set non durable options for
production use, but for development, which I think it's more than
accetable, considering that I got a 70% boost, which together with the
django 1.8 --keepdb option and a bit of mocking, got my entire huge
test suite run in less than one minute, while running tests for
specific modules now (thank God) take seconds.
This is immensely important if you are doing TDD or refactoring often.

I also don't like the idea of believing django users are too stupid to
understand that this advice si valid for development only. Generally
python and django users are intelligent enough to properly read the
docs and understand what's written on it.

Federico

Christophe Pettus

unread,
Jul 16, 2015, 11:49:25 AM7/16/15
to django-d...@googlegroups.com

On Jul 16, 2015, at 1:16 AM, Federico Capoano <federico...@gmail.com> wrote:

> I also don't like the idea of believing django users are too stupid to
> understand that this advice si valid for development only. Generally
> python and django users are intelligent enough to properly read the
> docs and understand what's written on it.

It's not a matter of being "intelligent" or not. Developers are busy and can simply google things, see a particular line, and drop it in without fully understanding exactly what is going on. (Simply read this group for a while if you don't believe this to be the case!) People already turn off fsync, in production, after having read the PostgreSQL documentation, without actually realizing that they've put their database in danger.

Among other things, developers often have local data in their PostgreSQL instance that is valuable, and advising them to do a setting that runs the risk of them losing that data seems like a bad idea.

The Django documentation is not the place to go into the ramifications of fsync (or even synchronous_commit, although that's significantly less risky).

Luke Plant

unread,
Jul 19, 2015, 1:32:31 PM7/19/15
to django-d...@googlegroups.com
I agree with Federico on this - as long as we slap a big warning on it — "This is dangerous - it could make your database more likely to lose data or become corrupted, only use on a development machine where you can restore the entire contents of all databases in the cluster easily" — I don't see a problem in this being in our docs.

If people refuse to read a clear warning, they shouldn't be doing web development. They are just as likely to find similar instructions on the internet, but without warnings, and having it in our docs with the warning will be helpful.

Having a fast test suite is such an important part of development that it shouldn't be held back by  attempting to protect the world from people who cannot be helped.

Luke
-- 
"I was sad because I had no shoes, until I met a man who had no 
feet. So I said, "Got any shoes you're not using?"  (Steven Wright)

Luke Plant || http://lukeplant.me.uk/

Christophe Pettus

unread,
Jul 19, 2015, 1:43:42 PM7/19/15
to django-d...@googlegroups.com
This can be achieved by pointing to the relevant section in the PostgreSQL documentation with a general "Test execution may be sped up by adjusting the data integrity parameters in PostgreSQL; be sure to read the appropriate warnings before making any changes" warning.

Putting actual recommended settings in the Django documentation seems, at a minimum, pointlessly duplicative, and ties the Django documentation to the current state of the world in PostgreSQL gratuitously.
> --
> You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/55ABDF21.9060106%40cantab.net.
> For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Jul 19, 2015, 4:50:16 PM7/19/15
to django-d...@googlegroups.com
I agree with pointing to the relevant section of the PostgreSQL documentation. It will always be more complete, accurate and up-to-date that what we could write.

--
Aymeric.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/648BE2F3-E869-4E9D-BCB9-248E425D5A1C%40thebuild.com.

Curtis Maloney

unread,
Jul 19, 2015, 8:47:47 PM7/19/15
to django-d...@googlegroups.com
I second what Aymeric say.... rather than take on the burden of
maintaining correct warnings, let's point at the people whose
responsibility it really is :)

--
Curtis

On 20 July 2015 at 06:44, Aymeric Augustin
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/DF490E5E-DC25-44B6-88D5-36D9A2DBC1AA%40polytechnique.org.

Federico Capoano

unread,
Jul 20, 2015, 5:43:05 AM7/20/15
to django-d...@googlegroups.com
Thank you for all the feedback.

I opened this PR with the proposed addition to the docs:
https://github.com/django/django/compare/master...nemesisdesign:patch-1

Let me know if there's anything I can improve.

Just a final note: I think this solution is not the optimal one, let
me explain you why. The main reason I came on this list asking for
information about UNLOGGED tables in postgresql was exactly because of
that page which suggests a few solutions. I spent quite many hours
trying all of them (including running the entire DB in RAM), and
measuring test execution time with each one of those settings. In the
end I figured out that the only settings worth touching in my setup
are those 3 I mentioned before.

In order to share what I learnt and hoping to avoid some pain to
people that will come on the same path, I quickly wrote a blog post
with these suggestions:
http://nemesisdesign.net/blog/coding/how-to-speed-up-tests-django-postgresql/

Nothing new really, I found a few other blog posts with similar
suggestions, but the suggestions were scattered on different pages and
they didn't mention which changes were the most effective ones. That's
why I felt the need of writing this.

And by the way, I'm really happy of the outcome!

Federico
> You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/IkRgMxTTzPQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAG_XiSDo3c3O4Na%3DjtMHYftHc0PBYfsrvU7feDso%2B8Y889BOeA%40mail.gmail.com.

Schmitt, Christian

unread,
Jul 21, 2015, 2:46:29 AM7/21/15
to django-d...@googlegroups.com
I mean I made the proposal, but one drawback will be when linking to Postgresql Docs, what if they change the links? I mean that's barely happend in the past, however the "http://www.postgresql.org/docs/current/static/non-durability.html" is fairly new. (9.x). 
So we need to track every major upgrade on Postgresql if the docs changed.

Florian Apolloner

unread,
Jul 21, 2015, 3:02:04 AM7/21/15
to django-d...@googlegroups.com
The doc building process includes a linkchecker, if the link ceases to exist we will get notified (so the worst thing could be that the content changes to something completely different).

Federico Capoano

unread,
Jul 21, 2015, 4:24:13 AM7/21/15
to django-d...@googlegroups.com
I just changed the URL in the to
http://www.postgresql.org/docs/current/static/non-durability.html as
suggested.

Federico

Shai Berger

unread,
Jul 27, 2015, 7:02:26 AM7/27/15
to django-d...@googlegroups.com
On Monday 20 July 2015 12:42:47 Federico Capoano wrote:
>
> Just a final note: I think this solution is not the optimal one

I agree. The optimal solution would be for us to be able to define a test
database with unlogged tables and all the data-integrity-sync options off,
while the "production" database (production in quotes because it applies also
to the main development database) keeps sane settings.

Since PostgreSQL defines most of these at the cluster level, this would require
setting up a separate cluster for testing (Debian makes this easy with its
pg_createcluster command, but on other systems it would be more involved). But
on the Django side, this requires the ability to specify accessing a different
cluster (that is, usually, separate ports on the machine) for testing. We
currently do not have this ability.

The only danger I see in this is a "slippery slope" -- next thing, people will
want a separate engine for testing, and before we know it, all testing will be
run only against Sqlite. I don't think this fear is justified.

Are there good reasons to avoid giving users more control over their test
databases?

Shai.
Reply all
Reply to author
Forward
0 new messages