UNLIST default data-type

31 views
Skip to first unread message

Mark Rotteveel

unread,
Feb 13, 2026, 4:07:22 AM (10 days ago) Feb 13
to firebir...@googlegroups.com
The UNLIST documentation (README.unlist) states the following:

"""
2) <data-type>: target data type to convert the output values into.
Alternatively, a domain can be specified as the returned type. If
omitted, VARCHAR(32) is implied. Feel free to suggest any better
alternative default.
"""

In practice, it's actually CHAR(32), which I'll report as a bug, because
I really, really don't like CHAR and think that VARCHAR would indeed be
better here :).

I think that the choice for a fixed default is wrong.

For example, this results in an error (as the first value is 33 characters):

select U.C0 from unlist('text56789012345678901234567890123,text2,text3')
as U(C0);

Statement failed, SQLSTATE = 22001
arithmetic exception, numeric overflow, or string truncation
-string right truncation
-expected length 32, actual 33

I think the default datatype selection should be a little more intelligent:

For input of CHAR(n)/VARCHAR(n) -> output VARCHAR(n) of same character
set (or BLOB SUB_TYPE TEXT for n > 32765, or 8191 for UTF8, and the
limits of other multi-byte character sets).
For input of type BLOB -> output BLOB of same type.

I know you can work around this by explicitly specifying `RETURNING
VARCHAR(x)`, but I think the current behaviour is not obvious, and even
if explicitly documented, will probably bite people in the ass when they
least expect it (i.e. in production use).

Any thoughts on this?

P.S. I'll also remove the "Feel free to suggest any better alternative
default.", because that doesn't belong in the docs

Mark
--
Mark Rotteveel

Dmitry Yemanov

unread,
Feb 13, 2026, 5:00:25 AM (10 days ago) Feb 13
to firebir...@googlegroups.com
13.02.2026 12:07, 'Mark Rotteveel' via firebird-devel wrote:
>
> I think the default datatype selection should be a little more intelligent:
>
> For input of CHAR(n)/VARCHAR(n) -> output VARCHAR(n) of same character
> set (or BLOB SUB_TYPE TEXT for n > 32765, or 8191 for UTF8, and the
> limits of other multi-byte character sets).
> For input of type BLOB -> output BLOB of same type.
>
> I know you can work around this by explicitly specifying `RETURNING
> VARCHAR(x)`, but I think the current behaviour is not obvious, and even
> if explicitly documented, will probably bite people in the ass when they
> least expect it (i.e. in production use).
>
> Any thoughts on this?

Decoding [longish] strings from the delimited list is not a common usage
pattern, so I think it's OK to optimize the default behaviour for more
expected cases and require the RETURNING clause otherwise.

Also, a longer output has also noticeable performance implications, see
discussion in PR #8878. And returning a blob is gonna be even slower.

That said, the output should really be variable-length. It will be fixed
together with #8878.


Dmitry

Mark Rotteveel

unread,
Feb 13, 2026, 5:15:07 AM (10 days ago) Feb 13
to firebir...@googlegroups.com
On 13/02/2026 11:00, Dmitry Yemanov wrote:
> 13.02.2026 12:07, 'Mark Rotteveel' via firebird-devel wrote:
>> Any thoughts on this?
>
> Decoding [longish] strings from the delimited list is not a common usage
> pattern, so I think it's OK to optimize the default behaviour for more
> expected cases and require the RETURNING clause otherwise.

It might not be (very) common, but how this function will be used will
likely be on data that might not be clean (otherwise people would
probably have hardcoded things).

> Also, a longer output has also noticeable performance implications, see
> discussion in PR #8878. And returning a blob is gonna be even slower.

In my opinion, robustness (i.e. not resulting in a string truncation
error) in the face of unclean data, or otherwise from a
not-so-trustworthy source is more important as the default than performance.

Things like this are more likely to fail in production situations, as
testing is usually done with less realistic and simpler data, and things
that the programmers or testers actually expect instead of the reality
of dirty or more complex inputs.

If people want optimal performance, then they can explicitly specify the
type using RETURNING.

That said, this something that we could always change based on
user-feedback.

> That said, the output should really be variable-length. It will be fixed
> together with #8878.
Thanks.

A related question, about the separator argument. The documentation says
it can accept a BLOB, as long as it's limited to 32KB. Is that actually
true, or does it accept a VARCHAR, and the blob is automatically
converted before calling UNLIST? Wouldn't that also mean that the actual
upper limit (in characters) depends on the character set?

Mark
--
Mark Rotteveel

livius...@poczta.onet.pl

unread,
Feb 14, 2026, 5:28:37 AM (9 days ago) Feb 14
to firebir...@googlegroups.com
Hi

This is a CAST-type function, so it should always require specifying the target type for the data conversion.
Therefore, the RETURNING clause should always be mandatory.
That solves all the problems being discussed here.
Treating something as “typical” and fixating on that only creates chaos and inconsistencies,
and later leads to bug reports and forum debates.
Just make the type mandatory for unlist and that’s it—case closed.

regards,
Karol BIeniaszewski

Mark Rotteveel

unread,
Feb 14, 2026, 6:06:16 AM (9 days ago) Feb 14
to firebir...@googlegroups.com
No, primarily it is a "split on a delimiter" function, not a cast
function. The input is string (or binary) and the default output is a
result set with a column of type string (or binary). The RETURNING
clause is if you want to transform (cast) that column to some other type.

BTW, the RETURNING clause is similar to usage in the SQL standard
JSON_VALUE function (6.28 <JSON value function>), which also defaults to
a character string type.

Mark
--
Mark Rotteveel

livius...@poczta.onet.pl

unread,
Feb 14, 2026, 6:48:52 AM (9 days ago) Feb 14
to firebir...@googlegroups.com

Yes, it’s a CAST function applied to each value separated by the delimiter.

So once you do the split, there’s a CAST involved.

If there were to be any kind of automatism (a Default Type Value), it would then have to look at the context in which the unlist function is used.

Because if I write:

SELECT * FROM TABLEX X WHERE X.ID IN unlist ..

then the cast should use the type of the X.ID field—if it’s Integer then Integer, if it’s Int64 then Int64, and so on.

But I don’t know whether analyzing cases like that makes sense or is easy to implement, because the situations can be more complex.
Therefore, removing the default type inference and requiring that the type always be specified is less misleading.

Regards,
Karol Bieniaszewski

Mark Rotteveel

unread,
Feb 14, 2026, 7:50:25 AM (8 days ago) Feb 14
to firebir...@googlegroups.com
On 14/02/2026 12:48, liviuslivius via firebird-devel wrote:
> Yes, it’s a CAST function applied to each value separated by the delimiter.
>
> So once you do the split, there’s a CAST involved.

There isn't. You have a string input, you split, you have a string
output => no cast involved. The RETURNING clause is only if you want to
coerce/cast it to a different type.

> If there were to be any kind of automatism (a Default Type Value), it
> would then have to look at the context in which the |unlist| function is
> used.
>
> Because if I write:
>
> |SELECT * FROM TABLEX X WHERE X.ID <http://X.ID> IN unlist ..|
>
> then the cast should use the type of the |X.ID <http://X.ID>| field—if
> it’s Integer then Integer, if it’s Int64 then Int64, and so on.

That form of UNLIST is not yet implemented (see pending PR
https://github.com/FirebirdSQL/firebird/pull/8878), and automatic type
inference for that is still under discussion (see open questions in that
PR).

Mark
--
Mark Rotteveel
Reply all
Reply to author
Forward
0 new messages