Setting CTYPE and COLLATE value

1,107 views
Skip to first unread message

Matthias Goerens

unread,
Jun 11, 2021, 4:37:05 PM6/11/21
to Postgres Operator
Hi all,

I'm trying to connect Synapse (Matrix Homeserver) to a Postgresql instance provided by the Postgres Operator. Synapse complains that:

2021-06-11 14:30:00,628 - synapse.storage.engines.postgres - 66 - WARNING - main - Database has incorrect collation of 'en_US.utf-8'. Should be 'C'
See docs/postgres.md for more information.
2021-06-11 14:30:00,628 - synapse.storage.engines.postgres - 73 - WARNING - main - Database has incorrect ctype of 'en_US.utf-8'. Should be 'C'
See docs/postgres.md for more information.
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main - ********************************************************
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main - Database is incorrectly configured:
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main -
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main - - 'COLLATE' is set to 'en_US.utf-8'. Should be 'C'
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main - - 'CTYPE' is set to 'en_US.utf-8'. Should be 'C'
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main -
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main - See docs/postgres.md for more information.
2021-06-11 14:30:00,654 - twisted - 258 - ERROR - main - ********************************************************



I have tried to solve this issue by providing a custom configuration as follow:

[mgoerens@mgoerens test-pgsql-k8s-op]$ cat set-encoding-c.yaml
---
bootstrap:
  initdb:
  - encoding: C
  - locale: C
[mgoerens@mgoerens test-pgsql-k8s-op]$ kubectl create configmap matrix-custom-config --from-file=set-encoding-c.yaml

[mgoerens@mgoerens test-pgsql-k8s-op]$ cat matrix-pgcluster.yaml
---
apiVersion: crunchydata.com/v1
kind: Pgcluster
metadata:
  annotations:
    current-primary: matrix
  labels:
    crunchy-pgha-scope: matrix
    deployment-name: matrix
    name: matrix
    pg-cluster: matrix
    pgo-version: 4.7.0
    pgouser: admin
  name: matrix
  namespace: default
spec:
  BackrestStorage:
    accessmode: ReadWriteOnce
    matchLabels: ""
    name: ""
    size: 1G
    storageclass: ""
    storagetype: create
    supplementalgroups: ""
  PrimaryStorage:
    accessmode: ReadWriteOnce
    matchLabels: ""
    name: matrix
    size: 1G
    storageclass: ""
    storagetype: create
    supplementalgroups: ""
  ReplicaStorage:
    accessmode: ReadWriteOnce
    matchLabels: ""
    name: ""
    size: 1G
    storageclass: ""
    storagetype: create
    supplementalgroups: ""
  annotations: {}
  ccpimage: crunchy-postgres-ha
  ccpimageprefix: registry.developers.crunchydata.com/crunchydata
  ccpimagetag: centos8-13.3-4.7.0
  clustername: matrix
  customconfig: matrix-custom-config
  database: matrix
  exporterport: "9187"
  limits: {}
  name: matrix
  pgDataSource:
    restoreFrom: ""
    restoreOpts: ""
  pgbadgerport: "10000"
  pgoimageprefix: registry.developers.crunchydata.com/crunchydata
  podAntiAffinity:
    default: preferred
    pgBackRest: preferred
    pgBouncer: preferred
  port: "5432"
  tolerations: []
  user: matrix
  userlabels:
    pgo-version: 4.7.0[mgoerens@mgoerens test-pgsql-k8s-op]$ kubectl apply -f matrix-pgcluster.yaml


However, in the logs of the database pod, I still get:

******************************
Patroni bootstrap method: initdb
******************************
Patroni configuration file:
******************************
bootstrap:
  method: initdb
  pgbackrest_init:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      primary'
    keep_existing_recovery_conf: true
  existing_init:
    command: '/opt/crunchy/bin/postgres-ha/bootstrap/create-from-existing.sh'
    keep_existing_recovery_conf: true
  dcs:
    postgresql:
      parameters:
        jit: off
        unix_socket_directories: /tmp
        wal_level: logical
        archive_mode: on
        archive_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
          && pgbackrest archive-push "%p"'
        archive_timeout: 60
        log_directory: pg_log
        shared_buffers: 128MB
        temp_buffers: 8MB
        log_min_duration_statement: 60000
        log_statement: none
        work_mem: 4MB
        max_wal_senders: 6
        shared_preload_libraries: pgaudit.so,pg_stat_statements.so,pgnodemx.so
      use_slots: false
      recovery_conf:
        restore_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
          && pgbackrest archive-get %f "%p"'
      use_pg_rewind: true
  post_bootstrap: /opt/crunchy/bin/postgres-ha/bootstrap/post-bootstrap.sh
  initdb:
  - encoding: UTF8
  - data-checksums

[...]
The database cluster will be initialized with locale "en_US.utf-8".
The default text search configuration will be set to "english".


I'm aware there is an issue already open on GitHub discussing the limitation of setting a specific LC_COLLATE value (https://github.com/CrunchyData/postgres-operator/issues/1869#issuecomment-690678945). But being rather new to this topic, I'm not able to determine if those limitations apply to me or if I have made a mistake in my configuration or procedure.

Help would be appreciated ! Thanks already !
Matthias

Jonathan S. Katz

unread,
Jun 11, 2021, 4:43:54 PM6/11/21
to Matthias Goerens, Postgres Operator
Hi Matthias,

To me this reads as an issue with how Synapse is configured when creating the database. Looking through some of their source, they include examples to initialize databases with both UTF8 and C collations. I would strongly recommend creating with a UTF8 collation.

It appears they include some documentation for how to do this:


Though I noticed that there are some inconsistencies in some of their files (e.g. https://github.com/matrix-org/synapse/blob/master/docker/run_pg_tests.sh#L14 which sets the UTF8 collation)

I would recommend running this command:

   createdb --encoding=UTF8 --locale=en_US.UTF-8 --template=template0 --owner=synapse_user synapse

Or running the "createdb" command without setting encoding/locale, which will then use the defaults that are in the container (which happen to be UTF8).

Thanks,

Jonathan

Jonathan S. Katz
VP Platform Engineering

Crunchy Data
Enterprise PostgreSQL 


Matthias Goerens

unread,
Jun 14, 2021, 7:26:33 AM6/14/21
to Jonathan S. Katz, Postgres Operator
Hi Jonathan, and thanks for your answer !

I went with your suggestion and had a closer look at the Synapse code
and documentation.

When connecting to an existing database, Synapse expects following
values to be set:
ENCONDNG='UTF8'
LC_COLLATE='C'
LC_CTYPE='C'

See: https://github.com/matrix-org/synapse/blob/master/synapse/storage/engines/postgres.py#L41

They also state that "Synapse will refuse to set up a new database if
it has the wrong values of COLLATE and CTYPE set, and will log
warnings on existing databases."
(https://github.com/matrix-org/synapse/blob/master/docs/postgres.md#fixing-incorrect-collate-or-ctype)

And provide an example how to create a DB for Synapse:

createdb --encoding=UTF8 --locale=C --template=template0
--owner=synapse_user synapse

See: https://github.com/matrix-org/synapse/blob/master/docs/postgres.md#set-up-database

It looks to me as I can't customize the expected locate (collate /
ctype) value in Synapse. It won't start if the provided DB isn't
configured with locale = C.

Therefore I'm back at trying to set a custom LC_COLLATE and LC_CTYPE
value in the postgres Operator. In my previous email, I described the
procedure I followed in order to create a "matrix" database with
locale = C and encoding = C using a custom configuration. I adapted it
to set only locale = C and leave encoding to its default value (UTF8)
as follows:
[mgoerens@mgoerens test-pgsql-k8s-op]$ cat set-encoding-c.yaml
---
bootstrap:
initdb:
- locale: 'C'
[mgoerens@mgoerens test-pgsql-k8s-op]$ kubectl create configmap
matrix-custom-config --from-file=set-encoding-c.yaml

The rest of the procedure is untouched.

However the matrix database is created with following values:

matrix=> SHOW SERVER_ENCODING;
server_encoding
-----------------
UTF8
(1 row)

matrix=> SELECT datcollate, datctype FROM pg_database WHERE datname =
current_database();
datcollate | datctype
-------------+-------------
en_US.utf-8 | en_US.utf-8
(1 row)


I'm still not sure if my procedure is incorrect or if there is a
limitation with setting a custom locale value.

Thanks,
Matthias

Matthias Goerens

unread,
Jun 17, 2021, 9:59:24 AM6/17/21
to Jonathan S. Katz, Postgres Operator
Hi Jonathan,

I had another look at my problem today and solved my issue.

It came from this command:

> kubectl create configmap matrix-custom-config --from-file=set-encoding-c.yaml

I didn't realize that the name of the file actually mattered. It
should have been:

> kubectl create configmap matrix-custom-config --from-file=postgres-ha.yaml

This way I'm able to set the desired "encoding" and "locale" values.

Thanks for your support.
Matthias
Reply all
Reply to author
Forward
0 new messages