Identifying absent optional fields - request for enhancement...

1,244 views
Skip to first unread message

paul Bandler

unread,
Apr 30, 2015, 8:02:49 AM4/30/15
to flatb...@googlegroups.com
First, congratulations on creating a very interesting and timely technology.
 
I have successfully used FlatBuffers to flatten and recover a relatively complex aggregate object structure.  However I don't think it is possible identify absent optional fields from the generated code API as a default value is always returned from the public API.  While 0 is a reasonable approximation to null/absent for most types, for integers it is not.
 
As I believe the absence of optional fields is 'known' by tables (0 entries in the vtable?), would it be possible for flatc to generate an additional accessor method for each table attribute to test the presence of each of its attributes?

Wouter van Oortmerssen

unread,
May 4, 2015, 1:27:10 PM5/4/15
to paul Bandler, flatb...@googlegroups.com
Thanks!

You're right that this would be easy functionality to add, the reason such accessors don't exist is because:
  • It is not a common use case to need this information. Code tends to be simpler if it can work with a reasonable default, instead of needing if-checks for each field.
  • Adding such accessors would double the amount of methods / generated code.
  • By default, you can't really use the absence of a field as a further 1 bit of information, since if a value you're storing happens to be equal to the default it doesn't get stored anyway. Though this behavior is overridable with force_defaults flag to FlatBufferBuilder.
That said, I could add such accessors under a flatc command line flag (e.g. --gen-field-test or so).

Anyone else feel this functionality is desirable?

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

paul Bandler

unread,
May 5, 2015, 2:38:36 AM5/5/15
to flatb...@googlegroups.com, pban...@cseuk.co.uk
Wouter,

Thanks for the reply.

I agree that in general most applications can use the reasonable defaults in place of optional field absence.  However the use-case where I need it is that I'm using flat buffers to encode and recover an existing data-model which is normally persisted in a SQL database which includes many nullable columns.  To be 100% confident that once the data is recovered from its flat-buffer encoding and fed into existing application code it will not have any unanticipated side effects I need the absent optional fields to be represented as it is when read from the database (i.e. is absent rather than a 0 / default value).  I could introduce other 'special' values on a case-by-case basis to mean absent (e.g. an 'absent' enumeration, a more unlikely integer value that 0 for an interest rate etc), but that isn't an elegant solution.

As a slight variation on generating a test method for each field, you could generate just the field identifier (a symbolic constant / final static int) corresponding to each field and have a generic Table.isNull(field-identifier) public method?

regards,

Paul

Wouter van Oortmerssen

unread,
May 8, 2015, 1:36:46 PM5/8/15
to paul Bandler, flatb...@googlegroups.com
I like the idea of using field constants to reduce API surface. Maybe additionally behind a --gen-field-ids.

Are you able to make a PR for this?
Message has been deleted

paul Bandler

unread,
May 9, 2015, 3:28:24 AM5/9/15
to flatb...@googlegroups.com
PR == Pull Request, ie make the change to flatc myself ? That maybe possible if the current proof of concept is taken forward. Although it maybe a simple change in principle I would have to overcome a few hurdles such as becoming familiar with GitHub (yes some of us old timers are still using tools suites from the 1990's) then get myself a C++ environment setup etc. so I can't justify that time until I'm sure we're going ahead.

Wouter van Oortmerssen

unread,
May 11, 2015, 12:16:01 PM5/11/15
to paul Bandler, flatb...@googlegroups.com
Yes, pull request. No worries, I will get to it, just may take some time.

On Sat, May 9, 2015 at 12:28 AM, paul Bandler <pban...@cseuk.co.uk> wrote:
PR == Pull Request, ie make the change to flatc myself ? That maybe possible if the current proof of concept is taken forward. Although it maybe a simple change in principle I would have to overcome a few hurdles such as becoming familiar with GitHub (yes some of us old timers are still using tools suites from the 1990's) then get myself a C++ environment setup etc. so I can't justify that time until I'm sure we're going ahead.

Wouter van Oortmerssen

unread,
Nov 30, 2015, 5:34:22 PM11/30/15
to FlatBuffers
This functionality is now in:

This allows you to call e.g.

mymonster.CheckField(Monster::VT_HEALTH)

to see if a field was stored at all. Note the caveat I mentioned still applies: you won't be able to tell the difference between a field that wasn't written vs a field that happens to have the default value this way, unless you use force_defaults. This is intrinsic to how FlatBuffers works.

Jason Aten

unread,
Dec 24, 2015, 3:38:14 PM12/24/15
to FlatBuffers
I added similar null-checking functionality for the Go bindings in this pull request.

https://github.com/google/flatbuffers/pull/3332

I really like how clean and simple the code base is.

Wouter van Oortmerssen

unread,
Dec 29, 2015, 2:10:09 PM12/29/15
to Jason Aten, FlatBuffers
Thanks! I'll respond there.

--

david....@viavisolutions.com

unread,
Jan 28, 2016, 11:49:51 AM1/28/16
to FlatBuffers
Hi there-

I'm really needing the ability to distinguish between fields that were set and fields that are actually "empty".

For my application, I am not concerned with using default values - we either set a value, or it is considered "not set" or "empty".
Apparently, even with the IsFieldPresent function, this isn't really possible.  With force_defaults off, if we send a value that happens to be the default, it appears "empty".  If we turn force_defaults on, then fields we never set are appearing "present".  Both are not good.

I'm considering enhancing FB in my own fork to add this functionality, which I understand would break the current FB assumption that all fields are basically "set" - either directly or with a default.

Do you have any suggestions in this area?

mikkelfj

unread,
Jan 28, 2016, 2:08:55 PM1/28/16
to FlatBuffers


On Thursday, January 28, 2016 at 5:49:51 PM UTC+1, david....@viavisolutions.com wrote:
Hi there-

I'm really needing the ability to distinguish between fields that were set and fields that are actually "empty".

For my application, I am not concerned with using default values - we either set a value, or it is considered "not set" or "empty".
Apparently, even with the IsFieldPresent function, this isn't really possible.  With force_defaults off, if we send a value that happens to be the default, it appears "empty".  If we turn force_defaults on, then fields we never set are appearing "present".  Both are not good.

I'm considering enhancing FB in my own fork to add this functionality, which I understand would break the current FB assumption that all fields are basically "set" - either directly or with a default.

Do you have any suggestions in this area?

Hi David,
I cannot speak for how it works in C++, but I am surprised it does not work as you describe.

The flatcc C  code generator has an option for force values as you describe, for exactly the reasons you mention, if I understand you correctly:

e.g.
    ns(Monster_hp_force_add(B, 100));

Any value that is not set at all will appear absent.

Force only applies to scalars.

Vectors, strings, and tables fields can be absent, or present with zero length, or have actual content. It is possible to detect if they are present or not. Struct member fields also can only be absent or present, but unlike C++ I think, flatcc struct members can have members that a not set which then become 0.

Float and double are a bit tricky because testing for the default value is numerically unstable when converting from ascii text in the schema to the internal representation, so it may be a bit of a hit or miss depending on the value. These are better used as getting a default if none were set, which is also the real motivation behind defaults I guess, rather than a compression concept.


Here is a snippet from the monster test case dealing with defaults and force add:

// ns is macro to wrap the long name space prefix
int test_add_set_defaults(flatcc_builder_t *B)
{
    void *buffer;
    size_t size;
    ns(Monster_table_t) mon;
    flatcc_builder_reset(B);

    ns(Monster_start_as_root(B));
    ns(Monster_name_create_str(B, "MyMonster"));
    ns(Monster_hp_add(B, 100));
    ns(Monster_mana_add(B, 100));
    ns(Monster_color_add(B, ns(Color_Blue)));
    ns(Monster_end_as_root(B));

    buffer = flatcc_builder_get_direct_buffer(B, &size);
    mon = ns(Monster_as_root(buffer));
    if (ns(Monster_hp_is_present(mon))) {
        printf("default should not be present for hp field\n");
        return -1;
    }
    if (!ns(Monster_mana_is_present(mon))) {
        printf("non-default should be present for mana field\n");
        return -1;
    }
    if (ns(Monster_color_is_present(mon))) {
        printf("default should not be present for color field\n");
        return -1;
    }

    flatcc_builder_reset(B);
    ns(Monster_start_as_root(B));
    ns(Monster_name_create_str(B, "MyMonster"));
    ns(Monster_hp_force_add(B, 100));
    ns(Monster_mana_force_add(B, 100));
    ns(Monster_color_force_add(B, ns(Color_Blue)));
    ns(Monster_end_as_root(B));

    buffer = flatcc_builder_get_direct_buffer(B, &size);
    mon = ns(Monster_as_root(buffer));
    if (!ns(Monster_hp_is_present(mon))) {
        printf("default should be present for hp field when forced\n");
        return -1;
    }
    if (!ns(Monster_mana_is_present(mon))) {
        printf("non-default should be present for mana field, also when forced\n");
        return -1;
    }
    if (!ns(Monster_color_is_present(mon))) {
        printf("default should be present for color field when forced\n");
        return -1;
    }

    return 0;
}


david....@viavisolutions.com

unread,
Jan 28, 2016, 2:33:16 PM1/28/16
to FlatBuffers
So, my case is that I want to detect when a field was never set.  For instance, with Monster, suppose I never set the HP field.  I want to test, after reading the buffer, if the field was set by the sender.  However, since it has a default value (as all fields do), I have no reliable way to do that: if the default value is 0, and I set it with a 0, then the buffer looks exactly the same - there's no way to distinguish between "not-set" and "set to default".

Now, forcing defaults just seems to generate all the defaults in the buffer, which to the receiving end looks like every value was set.  So there's no way to differentiate between "not-set" and "set to default" again.

So we currently have two states: "set" and "default".  I need a third state: "never set".  However, this is orthogonal to the design of flatbuffers where it is assumed that "not set" == "default".

I'm thinking about ways to implement this and stay within that constraint.  Maybe introduce a default value of "null" which would indicate "no value", i.e. "not set"... hmmm.

mikkelfj

unread,
Jan 28, 2016, 2:44:47 PM1/28/16
to FlatBuffers


On Thursday, January 28, 2016 at 8:33:16 PM UTC+1, david....@viavisolutions.com wrote:
So, my case is that I want to detect when a field was never set.  For instance, with Monster, suppose I never set the HP field.  I want to test, after reading the buffer, if the field was set by the sender.  However, since it has a default value (as all fields do), I have no reliable way to do that: if the default value is 0, and I set it with a 0, then the buffer looks exactly the same - there's no way to distinguish between "not-set" and "set to default".

If you don't set the HP field, the _is_present method will reliably tell you that it wasn't set, assuming you can trust the sender to use force_add. If you cannot trust the sender to do that, you cannot know why the field was never set.
 
Now, forcing defaults just seems to generate all the defaults in the buffer, which to the receiving end looks like every value was set.
This is what surprises me. It is not the the case in flatcc, which doesn't have a global force setting, only force_add methods.
 
So there's no way to differentiate between "not-set" and "set to default" again.
 
There is if you use force_add in the flacc interface. There is currently no way to detect if a value is actually set and is also a default value. I have been considering this, for example by providing a hp_get_default method which you can then compare with. But it seems a bit over-engineered perhaps.
 
So we currently have two states: "set" and "default".  I need a third state: "never set".  However, this is orthogonal to the design of flatbuffers where it is assumed that "not set" == "default".

There isn't actually a default state. A default value is either just not stored, and if so, it is automagically restored by the reader interface which knows the missing value, or the default value is actually stored via the force_add method, and then the reader interface has no idea whether it matches the default value or not.

 
I'm thinking about ways to implement this and stay within that constraint.  Maybe introduce a default value of "null" which would indicate "no value", i.e. "not set"... hmmm.

There is no really good null concept in FlatBuffers. The is_present, and force_add are still good options, but do not fully null, missing, and actual value. But close enough.

I have been considering a bit flag extension to the FlatBuffer format to more precisely handle the null concept for SQL data representation, but since it would be non-portable, it isn't something I would add easily.

Wouter van Oortmerssen

unread,
Jan 29, 2016, 3:40:30 PM1/29/16
to mikkelfj, FlatBuffers
David:

force_defaults should work, i.e. if it is on, and you don't call add_field, then IsFieldPresent will return false for that field.

The alternative to force_defaults is to wrap your scalar field in a struct, that way it be null if not present.

You could also store your own booleans or bit field to indicate this information.

I don't see why you would need to fork the format, force_defaults already does what you would get if you "fixed" this behavior. An alternative would be storing a bit mask in the encoding.

David Hooker

unread,
Jan 29, 2016, 3:44:22 PM1/29/16
to FlatBuffers
Wouter:

Yes, thanks, I was confused as to what force_defaults really does until I more carefully read the code.  It is working as I need now.  I did enhance the IDL compiler to generate the "has_xxx" methods because this is much easier for me to work with than using IsFieldPresent.

Thanks again!
-David-

--
You received this message because you are subscribed to a topic in the Google Groups "FlatBuffers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/flatbuffers/1hrNtBI0BQI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to flatbuffers...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages