EOF in Custom Formatter for External Table

79 views
Skip to first unread message

Пятро Вольны

unread,
Jan 20, 2023, 5:19:17 AM1/20/23
to Greenplum Users
Hello,
I'm creating a custom formatter for external table, and the task is to detect EOF properly. Technically, I need first X bytes from the file and last Y bytes from the file. Using FORMATTER_GET_SAW_EOF never allows to see a true flag, thus I think there is a misunderstanding how it works. Here's my code, can you please recommend how to modify it in order to properly see when formatter manager sent the last block of data? This one just terminates on the last block leaving "unexpected end of file" warning. I want to detect the last block and return a formatted row then.

Datum foggy_formatter_in(PG_FUNCTION_ARGS)
{
  if(!CALLED_AS_FORMATTER(fcinfo))
  {
    elog(ERROR, "foggy_formatter can only be called as a formatter");
  }

  int len = FORMATTER_GET_DATALEN(fcinfo);
  bool eof = FORMATTER_GET_SAW_EOF(fcinfo);

  if(eof)
  {
    elog(NOTICE, "Reached EOF, total file size = %d", len);
    // TODO: implement formatting, return the tuple
    FORMATTER_RETURN_NOTIFICATION(fcinfo, FMT_NONE);
  }
  else
  {
    elog(NOTICE, "Still reading file, current size = %d", len);
    FORMATTER_RETURN_NOTIFICATION(fcinfo, FMT_NEED_MORE_DATA);
  }
}

Rui Zhao

unread,
Feb 1, 2023, 2:48:58 AM2/1/23
to Greenplum Users, andre...@ya.ru
Hello,

I have had a little background of the custom formatter. Here is what I know, hope this can help:

The normal output of custom `formatter_in` function is a tuple. As you described that your data contains only one tuple, so when you detect eof in the `if(eof)` bracket, you need to format the data in `formatter->fmt_databuf` into a tuple struct which is stored at `formatter->fmt_tuple`.

So the seudo code may look like:
  if(eof)
  {
    elog(NOTICE, "Reached EOF, total file size = %d", len);
    // TODO: implement formatting, return the tuple
    FORMATTER_SET_DATACURSOR(fcinfo, len); /* You need to mark this to consume the data */
    tuple = heap_form_tuple(tupdesc, values, nulls);/* values and nulls are needed to be filled with the column values which also need to be transformed by conv_functions */
     FORMATTER_SET_TUPLE(fcinfo, tuple); /* or you can return null value for testing */ 
     FORMATTER_RETURN_NOTIFICATION(fcinfo, FMT_NONE);
     FORMATTER_RETURN_TUPLE(tuple);
  }
  else
  {
    elog(NOTICE, "Still reading file, current size = %d", len);
    FORMATTER_RETURN_NOTIFICATION(fcinfo, FMT_NEED_MORE_DATA);
  }


If you are trying to write your own custom formatter, you need to read some of the written formatters. You can read `fixedwidth_in()` which contains the most of the needed function callings as an example.
And also function `externalgettup_custom()` calls the real formatter_in functions, reading how they are called is also helpful to write single row error handling and how to handle the data cursors. 

Ashuka Xue

unread,
Mar 15, 2023, 4:16:14 PM3/15/23
to gpdb-...@greenplum.org


From: Ashuka Xue <ax...@vmware.com>
Sent: Wednesday, March 15, 2023 1:09 PM
To: Пятро Вольны <andre...@ya.ru>
Subject: Re: [gpdb-users] EOF in Custom Formatter for External Table
 
Hi, 

Commits were pushed to GPDB main and 6X_STABLE with a fix for the custom formatter logic. 

It allows the formatter to handle cases where an EOF has been seen. Hopefully, this helps to resolve the issue that you have noticed where your formatter was not able to detect EOF properly. 

Thanks, 
Ashuka


From: Пятро Вольны <andre...@ya.ru>
Sent: Friday, January 20, 2023 2:19 AM
To: Greenplum Users <gpdb-...@greenplum.org>
Subject: [gpdb-users] EOF in Custom Formatter for External Table
 
!! External Email
--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/14c72605-b8a5-4526-9ee4-13821f6bee54n%40greenplum.org.

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Пятро Вольны

unread,
Mar 24, 2023, 6:17:03 PM3/24/23
to Greenplum Users, Rui Zhao, andre...@ya.ru
Thank you, Rui.
However, my issue is about getting into IF(EOF) branch. EOF is never TRUE. I suppose https://github.com/greenplum-db/gpdb/pull/15051 fixes this, will test on latest release soon.

Пятро Вольны

unread,
Mar 24, 2023, 6:19:34 PM3/24/23
to Greenplum Users, Ashuka Xue
Thanks Ashuka, this looks like what I need.
Will test my guess on latest release where it is included. Will 6.23.4 include this fix? (I don't compile GP only using binaries)

Ashuka Xue

unread,
Apr 10, 2023, 2:07:01 PM4/10/23
to Пятро Вольны, Greenplum Users
Hi, 
This fix is not included in 6.23.4. However, it is included in GPDB 6.24.0 which is now available. 
Best, 
Ashuka

From: Пятро Вольны <andre...@ya.ru>
Sent: Friday, March 24, 2023 3:19 PM
To: Greenplum Users <gpdb-...@greenplum.org>
Cc: Ashuka Xue <ax...@vmware.com>

Subject: Re: [gpdb-users] EOF in Custom Formatter for External Table
 
!! External Email
Reply all
Reply to author
Forward
0 new messages