Hi,
I started to look into the regression failure with the multibyte
regression tests in src/test/mb, reported as
https://github.com/greenplum-db/gpdb/issues/5241. The root cause of the
failure is confusion on client_encoding between QD and QEs.
Currently, we don't do anything special with client_encoding in QE
processes, so its value is set like any other GUC. In most cases, it
means that the QE processes have the same client_encoding setting as the
QD process. That's problematic.
If you look at the whole pipeline from the user to the QE process, we
have a situation like this:
QD QE
client <==> client_enc <-> server_enc <==> client_enc <-> server_enc
When a query enters the system via libpq, from the left, it is first
converted from client_encoding to server encoding, in the QD process. QD
plans the query, and dispatches it to the QE processes.
You might think that things go wrong right there, because the QE will
try to perform the client->server encoding conversion again. But it
works, because the QE->QE dispatching is performed using the special 'M'
message type, and the code in tcop/postgres.c that handles that, doesn't
perform encoding conversion. So the query is expected to already be in
server encoding, which is correct. Transferring the query result back to
the client also works, because the results from QEs to QD are not sent
via the libpq connection. They are sent via the interconnect, which also
doesn't do any encoding conversions. So all that communication between
the QD and QD works in the server encoding.
Except for a few things:
1. COPY TO is one exception: the data is converted to the client
encoding already in the QE. That's mostly handled correctly, except in
the case exposed in github issue #5241, where the QE's client_encoding
setting is different from the QD's.
2. ERROR processing. If an error happens in a QE, the QE converts it
from server to client encoding, and sends it back to the QD. The QD
incorrectly assumes that it's in the server encoding, and will try to
convert it again. As a result, you get e.g. this:
postgres=# create function raise_error(t text) returns void as $$
begin
-- Unicode code point 196 is "Latin Capital Letter a with Diaeresis".
raise 'raise_error called on %. Here''s a funny character: %', t,
chr(196);
end;
$$ language plpgsql;
CREATE FUNCTION
postgres=# set client_encoding=latin1;
SET
postgres=# select raise_error(t) from enctest;
FATAL: Unexpected internal error (elog.c:259)
DETAIL: FailedAssertion("!(pg_verifymbstr(*str, len, ((bool) 1)))",
File: "elog.c", Line: 259)
HINT: Process 32757 will wait for gp_debug_linger=120 seconds before
termination.
Note that its locks and other resources will not be released until then.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
3. Internal queries dispatched directly with CdbDispatchCommand()
function. They don't use the interconnect, and send the results from QEs
back to the QD via libpq. Most such queries return ASCII-only data, like
tuple counts, so the encoding doesn't matter. But there's one exception:
ANALYZE collects the sample from the segments using
CdbDispatchCommand(). So you can get invalidly encoded data in pg_stats.
How to fix this? I think there are two possible solutions:
1. Teach QD about the things coming from QEs have already been converted
to client encoding. This isn't as simple as it might seem at first
glance. You could forward any errors to the client as is, as they're
already in the client's encoding, but then we'll need to keep track
which errors originated from the QE and don't need encoding conversion,
and which do. Also, if an error is caught e.g. with a PL/pgSQL EXCEPTION
block, rather than sent to the client, then we'd need to convert it back
from client_encoding to server encoding. We'll also need to fix the case
exposed in github issue #5241, where the client_encodings in QD and QE
currently go out of sync.
2. Always set client_encoding to match server encoding in QEs. This
seems simpler, but there are a couple of little complications with this
approach, too. First, in COPY TO, we should still perform the encoding
conversion in the QE, so that will need a special case. That's just a
simple matter of programming, though, because you can already specify a
non-default encoding in the COPY options, so we can make use of that. A
thornier case is in the NOTICE processing. Currently, we install a libpq
NOTICE processor in the QD->QE libpq connections, which just forwards
the NOTICE to the client, without encoding conversion. It runs as a
libpq callback function, so it cannot do anything that ereport()s, so it
is not safe to do encoding conversion there. We'll need to rewrite that
so that the NOTICE processor just stashes away NOTICE message, and we
convert and send it back to the client later, when we're out of the
callback function.
So, I'm leaning towards solution 2: always set client_encoding to match
server encoding (or to SQL_ASCII) in the QE processes. Anyone see a
problem with that?
- Heikki