Does "constant: true" ever make sense for scalar function definition

Weston Pace

unread,

Feb 16, 2023, 11:21:31 AM2/16/23

to Substrait

Two examples popped up recently (though did not get merged), and I suspect they were both motivated by the fact that Arrow's compute only has kernels for the case where the argument is constant.

The examples were round (where the number of digits to round to was a constant argument) and the various temporal functions (where the timestamp is a constant argument).

However, just because something is typically constant doesn't mean that it has to be constant. In both of those cases it should be possible to vary the constant argument and the resulting operation would still make sense. In fact, since scalar functions are stateless, it seems to me that it would never be required that constant: true be applied.

In other words, I think the only purpose would for constant: true is in aggregate and window functions where we need to distinguish between the thing being aggregated and "configuration" arguments which parameterize the operation itself (e.g. the separator in string_agg).

Sorry if this is noise but I am curious and hoping to check my understanding of this property.

Jacques Nadeau

unread,

Mar 8, 2023, 12:55:32 AM3/8/23

to subs...@googlegroups.com

I'm not sure I understand the question. There are a number of situations, like those you outlined, where there is some kind of substantial setup involved with working with a input such that implementations that vary a value are far greater than those that don't (regex is a great example).

--
You received this message because you are subscribed to the Google Groups "substrait" group.
To unsubscribe from this group and stop receiving emails from it, send an email to substrait+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAE4AYb0baOnOhnkV6pxEAOR5Ksxs%3DRLWNuFXMpEQognZn6YS5g%40mail.gmail.com.

Weston Pace

unread,

Mar 8, 2023, 11:35:25 AM3/8/23

to subs...@googlegroups.com

I agree there are situations where a constant kernel is more efficient than a non-constant kernel. In Arrow we have such implementations even for the basic arithmetic functions.

However, this puts YAML writers in a situation where they have to make decisions based on implementation efficiency and performance. For example, should the second (`y`) argument of `foo(x, y)` be a constant? My engine has decided it's too expensive but another engine came to a different decision. So what should I do?

For some practical examples:

- The round kernel does not define the "number of digits to round to" argument as constant.

This was originally at odds with the Arrow engine which did require that to be constant.

- You gave the example of regular expressions. There are engines out there (e.g. both

spark and postgres) that actually do support non-constant regular expressions.

- Another case that is almost always constant is time zone strings. However, we have had

asks in Arrow to support a non-constant case. So it is certainly something that could happen.

My point is that I think extension files should be focused on the semantics of the function. When it comes to scalar functions, I don't think "constant" is a semantic part of the function. It's only an optimization / implementation decision at that point.

To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAJ9XdSq8p66YRuqDroMWq7J-EXHBpCWyCXSZ8ahc%3D%3DJiLKXnug%40mail.gmail.com.

Andrew Lamb

unread,

Mar 8, 2023, 11:58:02 AM3/8/23

to subs...@googlegroups.com

> When it comes to scalar functions, I don't think "constant" is a semantic part of the function. It's only an optimization / implementation decision at that point.

I agree with this opinion

(by the way, thank you for all the work you are doing to push Substrait forward)

To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAE4AYb0WKktOhvJjqgZ3ifXzfXbBtzp-p_D%2BZgTYZe_rycZPpA%40mail.gmail.com.

Jacques Nadeau

unread,

Mar 16, 2023, 8:58:58 PM3/16/23

to subs...@googlegroups.com

> When it comes to scalar functions, I don't think "constant" is a semantic part of the function

There are systems that don't support variables in those positions. There are systems that do. From my point of view, that is a definition of differing semantics. It is important to communicate that as part of what a system can do. If a system supports both, it can ignore this semantic (or choose to use it for optimization). If the system doesn't support both, that's a very important thing to be aware of.

To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAP-eHPLORWd%2B-yy7i%2BHeSa4YqKnXfHnfqgRv1O0Za7ykr3Leuw%40mail.gmail.com.

Weston Pace

unread,

Mar 16, 2023, 9:54:01 PM3/16/23

to subs...@googlegroups.com

> There are systems that don't support variables in those positions. There are systems that do. From my point of view, that is a definition of differing semantics. It is important to communicate that as part of what a system can do.

I agree with what you are saying. However, I think solving this should be a feature of capability discovery (for each argument tell me if it can be non-constant) instead of function definition. This way we can avoid having 2^N variations of kernels and it is much easier to author function definitions.

> If a system supports both, it can ignore this semantic (or choose to use it for optimization).

The problem this approach is that "constant: true" is very important for aggregate functions. There is a very semantic meaning which is unrelated to engine capabilities or optimization. I mentioned the "separator" in the `string_agg` function. Another example is the "number of quantiles" in the `quantile` function. So if we treat "constant: true" as "the author of this yaml file thought of a clever trick that you could do to implement an optimized kernel if this is constant but you can ignore this trick if you haven't implemented it" then we end up with two very different meanings for "constant: true". One that is safe to ignore and one that is not safe to ignore.

To view this discussion on the web visit https://groups.google.com/d/msgid/substrait/CAJ9XdSrqWb_EhnCmcmBQEa9U-5pyGM7ZHcXAOfBuAjszkOoEpQ%40mail.gmail.com.

Weston Pace

unread,

Mar 16, 2023, 9:58:11 PM3/16/23

to subs...@googlegroups.com

Unrelated to the argument, but, rereading this email, I think the sentence "the author of...if you haven't implemented it" is maybe a bit more sarcastic / negative than I intended so let me apologize for that. My point was just that we have one "optional optimization" definition and one "semantic interpretation" definition.

Reply all

Reply to author

Forward