Hi GPDB Developers,
Users have reported that when they were inserting data into GPDB from PXF and the PXF service encountered an error, the query would complete successfully with a partial load of data from the external data source. See
PXF-898 for more details. The PXF team has been looking into this issue and it looks like this behavior was the result of two bugs:
1. When the PXF service is reading from an external data source and sending the data back to GPDB, it uses HTTP chunked transfer encoding to stream the data. In older versions of the embedded web server the PXF service uses (Apache Tomcat), when an error
was reported the streaming response would stop writing data, write an error response, and then close the connection in such a way that from an HTTP protocol point of view, the transfer was completed successfully. The means that PXF's custom GPDB protocol does
not see any errors and therefore doesn't report an error.
In later versions of Apache Tomcat, the behavior was changed to close the connection without properly terminating the chunked transfer encoding. PXF has since upgraded to using this newer version of Tomcat and this results in newer versions of PXF detecting
the error in the custom GPDB protocol and raising an error. The end result is that the transaction is ended.
2. There is a second bug in GPDB's fileam.c with regards to `FORMATTER_GET_SAW_EOF` that prevented PXF's custom formatter from detecting the error. This issue was also recently reported via the Greenplum Users mailing list (see message
EOF in Custom Formatter for External Table) and the reporter comes to more or less the same conclusion:
> Using FORMATTER_GET_SAW_EOF never allows to see a true flag, thus I think there is a misunderstanding how it works
For some context, PXF C code contains 2 parts. The first is the C side protocol that connects to the PXF Java service and streams data into GPDB. The second part is the custom formatter. This takes the data streamed in and parses it into tuples for GPDB
to read. The two work in tandem to continuously load and parse a data buffer. Most other custom protocols also work in this manner.
GPDB calls into the formatter code and the PXF extension protocol using `externalgettup_custom`. This function contains a double `while`-loop to handle the incoming data stream and the formatter. There are also two important variables to track in this
function:
* `scan->raw_buf_done` - true if either there is no more data to read or the formatter has handled all the data currently in the buffer, i.e: the formatter can do no more processing on the data in the buffer
* `scan->fs_state->fe_eof` - true if EOF has been seen
We go into the outer-`while`-loop if `!(raw_buf_done == true && fe_eof == true)`. This means that if either `raw_buf_done == false` or `fe_eof == false` or both are false. Once we go into the outer-`while`-loop, we call into `external_getdata` if `raw_buf_done
== true`. This function calls the protocol to get more data if needed and sets `fe_eof` accordingly. If the protocol returned 0 bytes, `fe_eof` is set to true. If data is loaded into the buffer, `raw_buf_done` is set to false.
We go into the inner-`while`-loop when `raw_buf_done == false` at which point we call the custom formatter. If the formatter returns `FORMATTER_NEEDS_MORE_DATA`, we set `raw_buf_done = true` as we have read all the current data in the buffer and exit the
inner-`while`-loop.
Let's say that the incoming data stream ended unexpectedly and the call into the protocol to get more data returned 0 bytes. Then we would have `raw_buf_done = true` and `fe_eof = true`. In this scenario, the formatter has called for more data but the
conditions for both `while` loops are such that custom formatter will not be called again and instead we report `WARNING` with the message `unexpected end of file`.
The key takeaway is that the custom formatter function was never called _after_ the EOF was detected. This prevents any error handling in the custom formatter from detecting the case where there is still unformatted data but EOF has been seen (e.g,
PXF's
custom formatter). If the transport (in this case, HTTP chunked transfer encoding) completes without a _transport_-level error, there is no way for the _application_-level to detect partial data transfers.
> The formatter needs more data, but we have reached EOF. This is an error.
but reports at the `WARNING` level.
PR 5391 refactored the code inside `externalgettup_custom` to change the reporting from
`ERROR` to `WARNING`. This PR also modified the `while`-loops so that the custom formatter function is no longer called _after_ EOF is seen. Is this change in behavior expected/intentional?
We could change the report from `WARNING` back to `ERROR` since partial data loading seems especially problematic, but this might be considered a breaking change in behavior. Alternatively, ensuring the custom formatter function is called at least once
_after_ the EOF is seen would allow custom formatters to detect and handle the case where there is still some unprocessed data but EOF has been seen.
An in-depth step through of a local reproduction of the issue can be found below:
* `scan->raw_buf_done = false`
* `scan->fs_pstate->fe_eof = false`
2. The condition of the outer-`while`-loop (`!(scan->raw_buf_done && pstate->fe_eof)`) is `true`, so execution proceeds into the loop body.
3. The condition of the `if`-statement (`scan->raw_buf_done`) is `false`, the execution skips over the body of the statement.
4. The condition of the inner-`while`-loop (`!scan->raw_buf_done`) is `true`, so execution proceeds into the loop body, where the custom formatter function is invoked (line 1001).
5. The formatter function returns without an error, having set `formatter->fmt_notification = FMT_NEED_MORE_DATA` to indicate that it needs more data to form a tuple from the data buffer. This causes `scan->raw_buf_done` to be set to `true` and execution
to continue with the next iteration of the inner-`while`-loop.
6. The condition of the inner-`while`-loop is now `false`, so execution continues with next statement after the loop body which is the end of the outer-`while`-loop.
7. The condition of the outer-loop is still `true` (since `pstate->fe_eof = false`), so execution proceeds into the loop body.
8. The condition of the `if`-statement is `true`, so execution proceeds into the statement body.
9. The function `external_getdata` returns `0` and the inner-`if`-statement (lines 967--971) are skipped, meaning `scan-raw_buf_done` is still `true`. This also has caused `pstate->fe_eof` to be set to `true`.
10. The condition of the inner-`while`-loop is `false`, so execution continues with the next iteration of the outer-`while`-loop.
11. The condition of the outer-`while`-loop is now `false` (`scan->raw_buf_done = true` and `pstate->fe_eof = true`), so execution continues with the next statement after the outer-loop.
12. The `if`-statement at the end of `externalgettup_custom` detects that there is still unformatted data that has not been processed by the formatter but that we have reached EOF. A warning is reported and the scan is ended.