Tagged Unions

481 views
Skip to first unread message

Dana Jansens

unread,
Dec 3, 2021, 11:25:46 AM12/3/21
to chromium-dev, cxx
Hello C++ friends,

A tagged union, in C++ parlance, is a combination of an enum with (different) data attached to each enumerant. Conceptually it's like this pseudo-C++ code:

enum class MyEnumOfOptions {
  KOption1 (int),
  kOption2 (string),
  kOption3 (MyStruct),
};

When using a tagged union, the compiler will ensure that the appropriate data is attached when setting the enum value. For instance when setting it to kOption2, you have to also include the string.

This is one of my favourite features of modern programming languages, as it allows you to move all kinds of runtime branches up to compile time, which means you avoid writing bugs. An important part of tagged unions is the matching against them. In C++ we have a switch statement for enums, which allows the language to perform a kind of "matching". But since enums can not hold data, this matching is limited.

In the past the effect of tagged union could be emulated by passing around an enum and a union as a pair of parameters, but unions are difficult to hold correctly - especially if they can contain classes. And the compiler provides no safety rails.

C++ also has introduced something closer to this ideal which is std::variant. It's like a tagged union, but without the relationship to a union. So matching is normally done on indices, which prevents the compiler from verifying your code at all. It can also be done on types, if every type in your variant is unique, but I find this to be an awkward tool to use in the places where you'd want a tagged union - it means converting an enum into a set of independent structs, losing their relationship to each other in the type system.

Mojo does provide a tag on its unions for IPC, though I'm not sure about pulling mojo out for use in non-IPC contexts.

Recently this came up in discussion about parameters in WebContentsObserver, where we have avoided passing data that would be useful for handling some enum values because then we present "null data" which can be misused and lead to bugs.

So anyhow I thought it might be nice to introduce a TaggedUnion type into C++, built on variant, but marrying it to an actual enum.


Do you think this would be useful? I'd love to hear feedback on the idea, the code, and especially if and how you would use this tool if it exists.

If there's interest I can work on getting it landed.

Thanks,
Dana

dan...@chromium.org

unread,
Dec 3, 2021, 11:26:58 AM12/3/21
to chromium-dev, cxx
+chromium-dev from my correct account

dan...@chromium.org

unread,
Dec 3, 2021, 11:29:20 AM12/3/21
to chromium-dev, cxx
On Fri, Dec 3, 2021 at 11:26 AM <dan...@chromium.org> wrote:
+chromium-dev from my correct account

On Fri, Dec 3, 2021 at 11:25 AM Dana Jansens <dan...@google.com> wrote:
Hello C++ friends,

A tagged union, in C++ parlance, is a combination of an enum with (different) data attached to each enumerant. Conceptually it's like this pseudo-C++ code:

enum class MyEnumOfOptions {
  KOption1 (int),
  kOption2 (string),
  kOption3 (MyStruct),
};

When using a tagged union, the compiler will ensure that the appropriate data is attached when setting the enum value. For instance when setting it to kOption2, you have to also include the string.

This is one of my favourite features of modern programming languages, as it allows you to move all kinds of runtime branches up to compile time, which means you avoid writing bugs. An important part of tagged unions is the matching against them. In C++ we have a switch statement for enums, which allows the language to perform a kind of "matching". But since enums can not hold data, this matching is limited.

In the past the effect of tagged union could be emulated by passing around an enum and a union as a pair of parameters, but unions are difficult to hold correctly - especially if they can contain classes. And the compiler provides no safety rails.

C++ also has introduced something closer to this ideal which is std::variant. It's like a tagged union, but without the relationship to a union.

Obligatory typo fix: C++ also has introduced something closer to this ideal which is std::variant. It's like a tagged union, but without the relationship to an enum.

Jeremy Roman

unread,
Dec 3, 2021, 3:10:19 PM12/3/21
to dan...@chromium.org, chromium-dev, cxx
I have nits (e.g. && overloads of some things, operator=, should probably call it something closer to "variant", etc). I don't know that we use variants enough for this to be a huge win and there's a cost to doing something slightly different from other C++ users. But it's less unusual than I'd expected and I have no strong reservations.

I think the usual C++ pattern for this isn't...great...but isn't the worst either.

std::visit(overloaded{
  [](int i) { ... },
  [](const std::string& s) { ... },
  [](const MyStruct& ms) { ... }
}, my_variant);

I guess the question is whether we can foresee ourselves using this pattern frequently enough in Chromium C++ to justify this over std::variant. I'll certainly grant that it is nicer (and would be nicer yet if we had proper pattern matching syntax).

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAHtyhaRc6fHEv7ogRoSYmR9sUx%2BYjs-WoXfZtxMPwuqq%3D8ezRQ%40mail.gmail.com.

K. Moon

unread,
Dec 3, 2021, 3:54:38 PM12/3/21
to Jeremy Roman, dan...@chromium.org, chromium-dev, cxx
Would this be landed directly into //base? I assume it would have an incubation period somewhere else first.

Piotr Bialecki

unread,
Dec 3, 2021, 3:55:58 PM12/3/21
to Jeremy Roman, dan...@chromium.org, chromium-dev, cxx
FWIW, my only 2 cents are that I share concern around the ergonomics raised by ddy...@vewd.com on Slack:
"i.e. switch() alone only gets you into the correct path, but you still need to get the payload by re-typing the enum's value somewhere (because different runtime enum values will have different types of payload). So not sure if this is really any better than the original visit() idea which gives you both path and payload at the same time without a need of repetition."

OTOH, I have not had a need to use `std::visit` yet, but it does look a bit strange to see that the visited value is spelled out at the end, which breaks the symmetry with a `switch` statement (I assume that is a requirement for it to be a variadic function using parameter packs?).

Naive question: would it be possible to have TaggedUnions somehow work with `std::visit`?

On Fri, Dec 3, 2021 at 12:10 PM Jeremy Roman <jbr...@chromium.org> wrote:

Roland Bock

unread,
Dec 6, 2021, 11:26:28 AM12/6/21
to dan...@chromium.org, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
Nice work. Not sure about actual use cases, but nice to read :-)

I played with the code for a while and noticed a couple of things:
  • Bikeshedding:
    • I would call it TaggedVariant rather than TaggedUnion.
    • I found the terms value and data rather confusing and would use tag and type instead.
    • It took some time to get used to the factory method being called "With". I would prefer something that indicates the factory aspect, like "Create".
    • Not sure if enumeral is an established term? ccpreference.com calls it enumerator, see https://en.cppreference.com/w/cpp/language/enum
  • Implementation
    • Using fold expressions can shorten the internal code a lot.
    • We should check for uniqueness of tags.
    • We could relax the restriction to tags being enum values a bit. Any integral or enum type would work.
I uploaded a version that addresses the items above (except allowing non-enum values).

Cheers,
Roland


--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CACwGi-6x7QR913GGcEz%2BmD6rupzMgDz%3DFpPFRmg2yfe2HJtNNw%40mail.gmail.com.

dan...@chromium.org

unread,
Dec 6, 2021, 11:36:44 AM12/6/21
to rb...@google.com, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
Thanks for the feedbacks, there's a lot of good ideas for the implementation. :)

I am really interested in where we could use this to improve the correctness of our code. I don't want to pursue landing something just on coolness factors, but I know I use equivalent things when writing C++ code a lot. I can come up with some toy examples of how migrating from enum + data to this makes code more correct, and is a more obvious transition than to variant. But where do you see it helping you avoid writing bugs?



Joe Mason

unread,
Dec 6, 2021, 12:18:47 PM12/6/21
to dan...@chromium.org, rb...@google.com, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
If this were available I'd definitely use it to implement state machines such as the one in PageLoadTrackerDecorator (https://source.chromium.org/chromium/chromium/src/+/main:components/performance_manager/decorators/page_load_tracker_decorator.h;l=118;drc=e90de75a0097f098146b850276492c1fcc067e83) that has several timestamps that each only apply to a subset of the states. The code would be clearer if they were attached to the state directly.

Each individual example I can think of would only be a minor improvement - it's not that hard in that class to keep track of which timestamp maps to which state from the comment - but the correctness benefits would add up.

Will Cassella

unread,
Dec 6, 2021, 3:44:21 PM12/6/21
to Chromium-dev, Joe Mason, rb...@google.com, Jeremy Roman, chromium-dev, cxx, K. Moon, danakj
My biggest complaint with std::variant is that it's impossible to ensure you're doing exhaustive checking every variation of the variant, hence I always add a 'static_assert(absl::variant_size<MyVariant>::size == x, "")' at every location I try to do so. Having this checked by the compiler instead (I guess using our existing exhaustive switch checking) would be awesome.

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev+unsubscribe@chromium.org.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Justin Novosad

unread,
Dec 7, 2021, 11:31:19 AM12/7/21
to joenot...@google.com, dan...@chromium.org, rb...@google.com, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
I think it would be interesting to add built-in serialization/deserialization helpers.  For example, a deserilaize method that reads the enum value, and then calls the deserialize method on the appropriate data class.  This would imply that all data classes implement the same deserialization interface. If that is too restrictive, then perhaps only support POD types.

Also, I wonder whether there would be value in using this pattern in most places in the code where we do some form of ad hoc RTTI (e.g. methods that return some kind of type id that is used to inform downcasting).  If I understand this pattern correctly, I think it might make it easier to write code that allocates polymorphic objects on the stack rather than on the heap, which could be beneficial for performance in several places.

Another addition to suggest: it would be nice to have a way for a TaggedEnum declaration to specify an interface that all the data classes must implement (or maybe this could be implicit thanks to template magic).  Then, TaggedEnum can be a performance primitive that offers a way to do polymorphism without virtual inheritance. This is possible because the TaggedEnum can inline all the override method addresses at compile time since all the data types are part of the declaration. 

   -Justin

Jeremy Roman

unread,
Dec 7, 2021, 11:58:46 AM12/7/21
to Justin Novosad, joenot...@google.com, dan...@chromium.org, rb...@google.com, chromium-dev, cxx, km...@chromium.org
On Tue, Dec 7, 2021 at 11:31 AM Justin Novosad <ju...@chromium.org> wrote:
I think it would be interesting to add built-in serialization/deserialization helpers.  For example, a deserilaize method that reads the enum value, and then calls the deserialize method on the appropriate data class.  This would imply that all data classes implement the same deserialization interface. If that is too restrictive, then perhaps only support POD types.

This is actually the case that's easy with std::variant.

std::visit([](const auto& v) { return v.Deserialize(); }, variant);

Justin Novosad

unread,
Dec 7, 2021, 12:19:19 PM12/7/21
to Jeremy Roman, joenot...@google.com, dan...@chromium.org, rb...@google.com, chromium-dev, cxx, km...@chromium.org
On Tue, Dec 7, 2021 at 11:58 AM Jeremy Roman <jbr...@chromium.org> wrote:
On Tue, Dec 7, 2021 at 11:31 AM Justin Novosad <ju...@chromium.org> wrote:
I think it would be interesting to add built-in serialization/deserialization helpers.  For example, a deserilaize method that reads the enum value, and then calls the deserialize method on the appropriate data class.  This would imply that all data classes implement the same deserialization interface. If that is too restrictive, then perhaps only support POD types.

This is actually the case that's easy with std::variant.

Oh yeah... And std::visit also provides a way to do the v-table-less polymorphism thing I was suggesting, except that the syntax is unwieldy. 


TIL

James Cayo

unread,
Dec 7, 2021, 2:03:43 PM12/7/21
to ju...@chromium.org, Joe Mason, dan...@chromium.org, rb...@google.com, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
Yes and the tagged union can be used to implement visitor pattern with polymorphism so you can call overridden method on a Lambda without needing to do inheritance or generate vtable. Dynanic memory allocation is definitely avoided when applying this.

Daniel Cheng

unread,
Dec 7, 2021, 2:12:12 PM12/7/21
to rb...@google.com, dan...@chromium.org, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
On Mon, 6 Dec 2021 at 08:19, 'Roland Bock' via Chromium-dev <chromi...@chromium.org> wrote:
Nice work. Not sure about actual use cases, but nice to read :-)

I played with the code for a while and noticed a couple of things:
  • Bikeshedding:
    • I would call it TaggedVariant rather than TaggedUnion.
    • I found the terms value and data rather confusing and would use tag and type instead.
    • It took some time to get used to the factory method being called "With". I would prefer something that indicates the factory aspect, like "Create".
    • Not sure if enumeral is an established term? ccpreference.com calls it enumerator, see https://en.cppreference.com/w/cpp/language/enum
  • Implementation
    • Using fold expressions can shorten the internal code a lot.

We don't have C++17 quite yet, though it's close :)

Daniel
 

Joe Mason

unread,
Dec 7, 2021, 3:54:31 PM12/7/21
to cayo....@gmail.com, ju...@chromium.org, dan...@chromium.org, rb...@google.com, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
The TaggedUnion class in the example CL doesn't quite work well with std::visit - it's would be easy to expose the `variant_` member to use with std::visit, but the visitor methods are only distinguishable by what type is passed to them. Members of the TaggedUnion enum could have the same attached data type, and std::visit couldn't distinguish them. It would be good to call the visitor methods with (Enum, DataType) instead of just DataType.

Eg. for the example:

// For an enum:
//
//   enum Values { kEnumValue1, kEnumValue2, kEnumValue3 };
//
// We can define a TaggedUnion with data for each enumeral:
//
//   using ValuesUnion = TaggedUnion<EnumToData<kEnumValue1, int>,
//                                   EnumToData<kEnumValue2, bool>,
//                                   EnumToData<kEnumValue3, MyStruct>>;

Instead of exposing `std:variant<int, bool, MyStruct>` for visit, expose `std::variant<std::pair<Values, int>, std::pair<Values, bool>, std::pair<Values, MyStruct>>`, and a visitor getting a `std::pair<Values, int>` could DCHECK that the first element is `kEnumValue1`. That would let someone later add `EnumToData<kEnumValue4, int>` to the type, which can now be distinguished at runtime by the first element of the visitor.

It would be even better to figure out a way to distinguish those at compile time instead of at runtime, but I can't think of a way without generating a type for each enum value, which seems like too much magic to squish into the interface...

Roland McGrath

unread,
Dec 7, 2021, 5:09:05 PM12/7/21
to Joe Mason, cayo....@gmail.com, ju...@chromium.org, dan...@chromium.org, rb...@google.com, Jeremy Roman, chromium-dev, cxx, km...@chromium.org
It's not clear to me that having the enum is actually useful on its own.  AFAICT the only reason to have it is to address the case of std::variant with multiple variants using the same type.  It's easy enough just to declare a rule (and some helper magic) that std::variant can only be used with a set of unique types.  There's no real cost to defining `struct NamedThing { Type value; };` types with the same Type to make them distinct.
Reply all
Reply to author
Forward
0 new messages