Proposal for SQL support for JavaScript Object Notation (JSON). Part 1

81 views
Skip to first unread message

Artyom Abakumov

unread,
Jul 15, 2024, 3:32:06 AM (12 days ago) Jul 15
to firebird-devel

SQL support for JavaScript Object Notation (JSON) 

Hello. As you might know, Red Soft has implemented SQL JSON functions, and I hope we can start working on pull requests for Firebird 6 soon. To reduce the time spent on review and discussion, the proposal has been split into several parts, and here is the first one. This part provides a text explanation of the implementation details. Once all the most controversial points have been discussed, I will send a PR.

All functionality is implemented according to the SQL standard. In the first part, I cover the simplest function of the standard — JSON_ARRAY, along with all its related features (there are plenty of them).

MAIN POINTS

To make things clear, let's set up the main points on the entire implementation:

  1. This is the implementation of the JSON SQL standard 2017 (ISO/IEC TR 19075-6:2017).

  2. There is no explicit JSON data type (which appears only in SQL 2023) or a binary JSON type – JSON is stored as text via text types.

  3. The implementation adds JSON functions, JSON_TABLE record source, and all its associated components.

  4. All functions are named according to the SQL standard.

  5. Almost all functionality is already implemented and awaiting its time to be proposed for merging. However, to avoid a 15k+ new lines of code, the whole proposal is divided into 8 parts.

  6. Parts 1-3 are related to JSON Generation functions, parts 4-6 to JSON Extract (query) functions, and parts 7-8 are related to JSON_TABLE:

    1. The core description of the IO interface implementation and JSON_ARRAY as an example.

    2. The PR with JSON_ARRAY implementation.

    3. The PR with remaining JSON generation functions' implementation.

    4. The description of JSON Get functions.

    5. The implementation of JSON_QUERY and IS JSON functions (PR).

    6. The implementation of JSON_VALUE and JSON_EXISTS functions (PR).

    7. The description of JSON_TABLE implementation.

    8. The JSON_TABLE PR.

  7. I wanted the code to be tested via Unit tests, but it is quite hard due to the engine heavily relying on tdbb, which is not defined while testing. It is possible to partially define it, but it still has serious limitations. So, the JSON code is mostly separated from the engine as much as possible. This means using dsc only for input and output values.

  8. All the basic JSON functions (JSON_VALUE, JSON_QUERY, JSON_ARRAY, and JSON_OBJECT) are implemented as ExprNodes, JSON_TABLE is a RecordSourceNode and JSON_ARRAYAGG and JSON_OBJECTAGG are AggNodes.

  9. All the functions produce compact JSON. Therefore, all excess spaces, tabs, and new lines will be omitted.

  1. Some parts of the standard are not very handy. Therefore, there will be a couple of additions to the functionalities that are outside of the standard.

GLOBAL GOALS
  1. Implement most of SQL JSON.

  2. Adhere to the SQL standard as closely as possible.

  3. Keep JSON code as independent from Jrd as possible to enable writing unit tests.

  4. Keep it user-friendly.

SYNTAX

The current document is focused on the description of the IO interface, so there will be only one function (with 2 constructors):

JSON_ARRAY

<JSON array constructor> ::=

<JSON array constructor by enumeration>

| <JSON array constructor by query>

<JSON array constructor by enumeration> ::=


JSON_ARRAY <left paren>

[ <JSON value expression> [ { <comma> <JSON value expression> }... ]

[ <JSON constructor null clause> ] ]

[ <JSON output clause> ]

<right paren>

<JSON array constructor by query> ::=


JSON_ARRAY <left paren>

<query expression>

[ <JSON input clause> ]

[ <JSON constructor null clause> ]

[ <JSON output clause> ]

<right paren>


JSON_ARRAY Clauses

<JSON constructor null clause> ::=

ABSENT ON NULL

| NULL ON NULL


<JSON output clause> ::=

RETURNING <data type>

[ FORMAT <JSON representation> ]

<JSON representation> ::=

JSON

| SQL

| AUTO


<JSON value expression> ::=

<value expression> [ <JSON input clause> ]

<JSON input clause> ::=

FORMAT <JSON representation>

<JSON representation> ::=

JSON

| SQL


The JSON representation is declared in the standard as:

<JSON representation> ::=

JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]

| <implementation-defined JSON representation option>


ENCODING is what I believe is the CHARSET of our string. And I add representation options: SQL and AUTO. Their purpose will be documented later.

Examples

SQL> select JSON_ARRAY(false, 'false', 'false' format json) from rdb$database;

========

[false,"false",false]



SQL> SELECT JSON_ARRAY(SELECT RDB$RELATION_ID FROM RDB$RELATIONS WHERE RDB$RELATION_ID < 5) FROM RDB$DATABASE;

========

[0,1,2,3,4]


NODE IMPLEMENTATIONBLR

There aren't many 1-byte BLR codes left, and, as of now, the full PR will require 11 verbs. So, I've decided to create a simple factory. Only one BLR code will be used as the primary node. Based on the secondary BLR code, the factory will determine the type of node and create it. Currently, it is implemented as a switch statement. I tried a factory with register implementation, but it doesn't look so good for such a small number of nodes. But if you say no, the factory with register-classes will be used.

Output

Let's start with the simplest JSON function - JSON_ARRAY with enum constructor. The function takes a set of arguments and combines them into a JSON ARRAY. The function has an alternative syntax with a subquery (the query expression must return exactly one according to the SQL/JSON Standard). The first problem is the default output type. The SQL/JSON 2017 standard says that JSON is an "implementation-dependent string of Unicode characters". But there is one problem - it is hard to predict the output length. Another issue is that JSON can be longer than the VARCHAR limit (which is reduced by 4 times if you use utf8 encoding). So the safest default type is a text blob. Yes, blobs are slow and not very convenient, but at least it is clear to the user that the function works instead of resulting in an obscure error about "string right truncation". The good news is that the standard provides a RETURNING clause. It can be used to specify the type of output.

SELECT JSON_ARRAY(1,"hello",false RETURNING VARCHAR(30))


So if a user cares about optimization, there will not be any problems.

But there is one more issue - the FORMAT clause. This is basically a workaround for the missing JSON type. The format can be specified for output and for input. Its meaning is simple: it helps the JSON function to understand how to interpret values: as a string or as a JSON. For example: SQL>select JSON_ARRAY('{"a":42}', {"b":16}' format json) from rdb$database; ["{"a":42}",{"b":16}]

The first object is interpreted as a string and all special symbols are escaped. The input value can be returned from another JSON function. By default, the format of the returning clause is set to JSON even when the returning value is absent.

Here are some examples:

SQL>SELECT JSON_ARRAY(JSON_ARRAY(1), JSON_ARRAY(1) FORMAT JSON, JSON_ARRAY(3 RETURNING VARCHAR (30) FORMAT JSON)) FROM RDB$DATABASE;

[[1],[1],[3]]


SQL>SELECT JSON_ARRAY(JSON_ARRAY(1), JSON_ARRAY(1) FORMAT SQL, JSON_ARRAY(3 RETURNING VARCHAR(30) FORMAT SQL)) FROM RDB$DATABASE;

[[1],"[1]","[3]"]


To specify the format, a DSC flag is used named DSC_json. The problem is that flags are really easy to lose. This can happen in a cast, for example. So how is the RETURNING CLAUSE implemented? First, the JSON_ARRAY node sets the flag. If there is a RETURNING clause, wrap the JSON node in an artificial CastNode. Then put the cast node inside a JsonFormatNode. The JsonFormatNode specifies the flag after the cast.

JSON_ARRAY -> JsonArrayNode

JSON_ARRAY Returning -> JsonFormatNode(CastNode(JsonArrayNode))

JSON_ARRAY Returning format -> JsonFormatNode(CastNode(JsonArrayNode))


For JSON format - add the DSC_json flag. For SQL format - remove the DSC_json flag.

Input

As mentioned earlier, input arguments can also be specified with the FORMAT clause. So let's use our JsonFormatNode once again:

VALUE FORMAT JSON -> JsonFormatNode(ValueExprNode) VALUE FORMAT SQL -> JsonFormatNode(ValueExprNode)

But what format should we use by default if we are not sure about the source of the value? This is where the AUTO format comes in. It simply keeps the value as is.

VALUE FORMAT AUTO -> ValueExprNode VALUE -> ValueExprNode

FUNCTION IMPLEMENTATIONInput

So, we get an input descriptor. Now, we need to read it. For the SQL format, we also need to escape all the special symbols.

The main problem is a blob (again). We (usually) can't just read it as a whole, so we need some kind of chunk reader. But we also need to deal with simple strings, too. Therefore, we need a storage class. I named it InputJsonText.

InputJsonText

How it works. We get a descriptor. If it has a simple type, just convert it to a string and add it to the Read Buffer. If it is a blob, read it chunk by chunk into the Read Buffer." For a VARCHAR, put it into the Read Buffer but also check it chunk by chunk. It may be overcomplicated as keeping 64KB in memory is not so scary, but what if the VARCHAR limit is extended in the future? Therefore, I decided to handle VARCHAR chunk by chunk. But I might be wrong. All the stuff about limits, buffer sizes, etc., is really controversial, so please correct me. The Read Buffer size is 16KB. Ok, we read the chunk. What is next? Next, we need a class to store the output JSON.

OutputJsonText

This class may seem overcomplicated, but all this stuff was initially created for extraction JSON functions like JSON_QUERY. So please do not ask to burn it with fire right now, and wait for proposals #4 and #5. Back to output JSON. We have some input, which is then added to the output. The OutputJsonText produces a descriptor (dsc). But what should we put into the dsc? So we have an append method. Get the data via InputJsonText and put it into the main buffer. If it is marked as SQL format, escape it. If the buffer is larger than the limit (also 16k), add the buffer to storage. Basically, there are 2 types of storage: blob and tempspace. For Generate functions, we only need the first one. So, add the input to the blob.

When the output descriptor is requested, if there is no data in storage and the buffer fits into the VARCHAR limit, produce a text descriptor. For any data in storage, produce a blob.

JSON_ARRAY

The resulting pipeline is here:

  1. Create an OutputJsonText field;

  2. Add each argument and a comma;

  3. Then finish the generation and request a DSC.

  4. Mark the DSC as JSON.

Formatting

The function creates a JSON without any spaces or new lines. First, it saves space for VARCHAR. Second, I believe JSON is used for machine processing only, so few people would care about pretty output. If it is really necessary, a formatter clause can be implemented.

Validation

Back to implementation. There is a problem - the output may be invalid. For example, the user passed something like this: '9&*H[3' FORMAT JSON. So it would be nice to validate the output JSON array or each argument. But it slows the function down, and I am not sure if it is really necessary.

Summarize (A.K.A. TLDR)
  1. JSON nodes are implemented as factory products;

  2. The default return type is text BLOB for Generate (JSON_ARRAY) functions;

  3. Use a special class as an input reader;

  4. Keep small outputs in the buffer, and large ones in BLOB;

  5. Do not include spaces/newlines/tabs in the output;

  6. Mark the output descriptor as JSON via the DSC_json flag;

  7. Validate the output JSON (try parsing).

Controversial points
  1. Naming is my weak side. So I expect many corrections in this part.

  2. Read big varchars by chunks.

  3. JSON Factory and its implementation. Switch VS. Register class.

  4. Validation of JSON_ARRAY output.

Implementation questions:
  1. Where should we store new Nodes? ExprNodes.h/cpp?

  2. Where should we store error messages? impl/msg/jrd.h?

  3. Switch VS. Register class.

  4. Limits


Alex Peshkoff

unread,
Jul 15, 2024, 5:57:47 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
On 7/15/24 10:32, Artyom Abakumov wrote:
>
> To specify the format, a DSC flag is used named DSC_json. The problem
> is that flags are really easy to lose. This can happen in a cast, for
> example. So how is the RETURNING CLAUSE implemented? First, the
> JSON_ARRAY node sets the flag. If there is a RETURNING clause, wrap
> the JSON node in an artificial CastNode. Then put the cast node inside
> a JsonFormatNode. The JsonFormatNode specifies the flag after the cast.
>
> JSON_ARRAY -> JsonArrayNode
>
> JSON_ARRAY Returning -> JsonFormatNode(CastNode(JsonArrayNode))
>
> JSON_ARRAY Returning format -> JsonFormatNode(CastNode(JsonArrayNode))
>
>
> For JSON format - add the DSC_json flag. For SQL format - remove the
> DSC_json flag.
>

Use of flag for setting format is strange for me. Current set of
descriptor flags may be applied for any datatype, but what means
DSC_json set for example for a float value? Next, what if in addition to
json/sql formats we will have a need in one more (csv or xml)?

I suggest to add separate datatype dtype_json instead and use subtype to
encode formats. This will be universal and consistent with the rest of
engine solution.


Dimitry Sibiryakov

unread,
Jul 15, 2024, 6:29:55 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
Artyom Abakumov wrote 15.07.2024 9:32:
> There is no explicit JSON data type (which appears only in SQL 2023) or a binary
> JSON type – JSON is stored as text via text types.

As Alex already said this is not good. You'd better to create pre-defined
system BLOB subtype for JSON and JSON type as an alias for it. BLOB slowness can
(and should) be fixed.

> ENCODING is what I believe is the CHARSET of our string.

RFC 8259 is clear: JSON must be in UTF-8 only. If ANSI has other opinion -
they are wrong.

> And I add representation options: SQL and AUTO. Their purpose will be documented later.

I don't see where their purpose is documented. If JSON is fixed to BLOB they
are not needed IMHO.

> There aren't many 1-byte BLR codes left, and, as of now, the full PR will require 11 verbs.

Why any verb is required at all? AFAICS JSON is defined as a set of functions
and functions don't need own BLR verbs.

TLDR: If you give up versatility and fix JSON text to be BLOB sub_type JSON
in UTF-8 only, most of your coding problems disappear and introducing this
feature by pieces will be easier.

--
WBR, SD.

Artyom Abakumov

unread,
Jul 15, 2024, 6:30:53 AM (12 days ago) Jul 15
to firebird-devel

Use of flag for setting format is strange for me. Current set of
descriptor flags may be applied for any datatype, but what means
DSC_json set for example for a float value? Next, what if in addition to
json/sql formats we will have a need in one more (csv or xml)?

 
The flag is set only for the product of the following functions: JSON_ARRAY, JSON_ARRAGG, JSON_OBJECT, JSON_OBJECTAGG, and JSON_QUERY. They all result in JSON Text, i.e., only CHAR, VARCHAR, or TEXT BLOB types.
The flag is handy because it allows the use of current text types with the JSON functions at zero cost.

I suggest to add separate datatype dtype_json instead and use subtype to
encode formats. This will be universal and consistent with the rest of
engine solution.
 
It would be nice to add a new type, but it is not an easy task due to the complexity of the engine: BLR, type conversions, etc. Second, it would require all the drivers to support the new type. Third, there are a lot of unsolved questions. What about indexes? Will the type be a fork of BLOB or a new entity? To summarize, I prefer to first implement the functionality of SQL 2017 and then move on to a separate JSON type. But if you decide to implement a new type, I will start working on it.


Dimitry Sibiryakov

unread,
Jul 15, 2024, 6:34:50 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
Artyom Abakumov wrote 15.07.2024 12:30:
> The flag is set only for the product of the following functions: JSON_ARRAY, JSON_ARRAGG, JSON_OBJECT, JSON_OBJECTAGG, and JSON_QUERY. They all result in JSON Text, i.e., only CHAR, VARCHAR, or TEXT BLOB types.
> The flag is handy because it allows the use of current text types with the JSON functions at zero cost.

DSC flags are not carried to client API so it doesn't solve any problem for
client-side representation.

> It would be nice to add a new type, but it is not an easy task due to the
> complexity of the engine: BLR, type conversions, etc.

Adding BLOB subtype is much easier. Because of this I would go for it.

--
WBR, SD.

Artyom Abakumov

unread,
Jul 15, 2024, 6:41:57 AM (12 days ago) Jul 15
to firebird-devel
> ENCODING is what I believe is the CHARSET of our string.

RFC 8259 is clear: JSON must be in UTF-8 only. If ANSI has other opinion -
they are wrong.
I am only glad to omit the ENCODING clause
 
> And I add representation options: SQL and AUTO. Their purpose will be documented later.

I don't see where their purpose is documented. If JSON is fixed to BLOB they
are not needed IMHO.

Blobs are slow and it is complicate to word with it. This is way I prefer to keep JSON in a simple strings if possible. And the standart allows you to wrire something like RETURNING VARCAHR(30) FROMAT JSON. So a prefer to keep the flag  
 
> There aren't many 1-byte BLR codes left, and, as of now, the full PR will require 11 verbs.

Why any verb is required at all? AFAICS JSON is defined as a set of functions
and functions don't need own BLR verbs.
 
Please correct me if I misunderstood you, but I use Expression nodes. Each node requires its own BLR. I tried to use system functions at first, but it resulted in a nightmare due to multiple optional clauses.

Artyom Abakumov

unread,
Jul 15, 2024, 6:46:01 AM (12 days ago) Jul 15
to firebird-devel
DSC flags are not carried to client API so it doesn't solve any problem for
client-side representation.

And we do not need the falg in the client site. The flag only uses in the JSON nodes for complex queries like JSON_ARRAY(JSON_QUERY(...), JSON_VALUE(..), JSON_OBJECT())
Adding BLOB subtype is much easier. Because of this I would go for it.

The SQL 2023 introduces a JSON type. If we add a new subtype, there will be a mess with types in the future
 

Alex Peshkoff

unread,
Jul 15, 2024, 6:49:57 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
On 7/15/24 13:41, Artyom Abakumov wrote:
> Please correct me if I misunderstood you, but I use Expression nodes.
> Each node requires its own BLR. I tried to use system functions at
> first, but it resulted in a nightmare due to multiple optional clauses.

Take a look at encrypt/decrypt implementation. It also has a lot of
optional clauses and IMO nothing too awful in implementation.


Artyom Abakumov

unread,
Jul 15, 2024, 6:51:58 AM (12 days ago) Jul 15
to firebird-devel
One more remark about JSON Type.
 Postgres had significant issues with their implementation—they had two types of their own: the textual json and the binary jsonb. Now, the standard requires yet another type. As Oleg Bartunov himself mentioned, ideally, there should be one type with different storage options (binary and textual), but for now, they’ve settled on a more makeshift solution. They created an option to associate SQL JSON either with their textual JSON or with the binary one. They don't like the textual one at all because it is slow and suitable only for two cases: when the original formatting is crucial and when there are duplicate keys in the objects (the binary format retains the last occurrence). Therefore, for new users, the default will be binary, while for existing users, it will be textual.

So, as I mentioned earlier, there are a lot of unsolved questions.

Dimitry Sibiryakov

unread,
Jul 15, 2024, 6:56:16 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
Artyom Abakumov wrote 15.07.2024 12:41:
> Blobs are slow and it is complicate to word with it. This is way I prefer to
> keep JSON in a simple strings if possible.

Don't do premature optimization, please. If BLOBs are slow - this is a
separate problem that must be fixed separately.

> And the standart allows you to wrire something like RETURNING VARCAHR(30) FROMAT JSON.

In this case you perform whole internal work on BLOB and then convert it to
given datatype on output.

> Please correct me if I misunderstood you, but I use Expression nodes. Each node requires its own BLR. I tried to use system functions at first, but it resulted in a nightmare due to multiple optional clauses.

Tell a little more details about this nightmare, please. Every optional
clause has default value so in parser you just construct function call with a
lot of parameters and some of them get predefined literal value. Where is a problem?

> The SQL 2023 introduces a JSON type. If we add a new subtype, there will be a mess with types in the future

What mess? SQL type don't have to have a separate dtype or btype. It can be a
mere alias for them.

--
WBR, SD.

Alex Peshkoff

unread,
Jul 15, 2024, 6:58:22 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
On 7/15/24 13:34, 'Dimitry Sibiryakov' via firebird-devel wrote:
>> It would be nice to add a new type, but it is not an easy task due to
>> the complexity of the engine: BLR, type conversions, etc.
>
>   Adding BLOB subtype is much easier. Because of this I would go for it.

Definitely easier. At the same time we loose ability to have indices on
json fields - not sure that they are really needed, but we need to
decide taking this into an account too. Next, how will be implemented
json format in this case? I suppose that's really useful part of suggestion.

Let's decide not based upon 'the simplest approach' but 'the most
correct approach'. Taking into an that
> The SQL 2023 introduces a JSON type.
I doubt we need other arguments.

What about complexity... Artyom, it's not as big as seems at the first
look. The only place where non trivial code changes were needed when
adding decfloat datatype were sort & indices - it was needed to convert
decfloat value into sortable/indexable key form, and IBM 'forgot' to add
that code (definitely present in DB2) to decfloat support library. All
the rest is very simple modification of code here and there. Also I'm
sure json does not have that problem - it's sorted according to
character string rules, which work for a long time. If you will have
some particular questions, I'll be glad to help as much as I can.


Dmitry Yemanov

unread,
Jul 15, 2024, 6:59:18 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
15.07.2024 13:30, Artyom Abakumov wrote:
>
> I suggest to add separate datatype dtype_json instead and use subtype to
> encode formats. This will be universal and consistent with the rest of
> engine solution.
>
> It would be nice to add a new type, but it is not an easy task due to
> the complexity of the engine: BLR, type conversions, etc. Second, it
> would require all the drivers to support the new type. Third, there are
> a lot of unsolved questions. What about indexes? Will the type be a fork
> of BLOB or a new entity?

Not necessary. Look at dtype_dbkey for example. It's purely internal
data type which is exposed to the external world as BINARY(8).


Dmitry

Alex Peshkoff

unread,
Jul 15, 2024, 6:59:55 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
On 7/15/24 13:56, 'Dimitry Sibiryakov' via firebird-devel wrote:
>> The SQL 2023 introduces a JSON type. If we add a new subtype, there
>> will be a mess with types in the future
>
>   What mess? SQL type don't have to have a separate dtype or btype. It
> can be a mere alias for them.

BTW, if we decide that json need not be indexable - that's also cool
solution.


Dimitry Sibiryakov

unread,
Jul 15, 2024, 7:06:41 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
Alex Peshkoff wrote 15.07.2024 12:58:
> Definitely easier. At the same time we loose ability to have indices on json fields

Why? I don't know indexing code well but for me the only problem for indexing
BLOBs is conversion value->key and following comparison.
I have no idea how JSON indexing can work, but almost sure that it hardly
will be something bigger that expression index on JSON_VALUE function.

> Next, how will be implemented json format in this case?

The same way as, for example, BLR format: it just kept as is and
interpreted/validated on need. What exactly the problem you see?

> Let's decide not based upon 'the simplest approach' but 'the most correct approach'. Taking into an that
>> The SQL 2023 introduces a JSON type.
> I doubt we need other arguments.

As I already said: SQL type doesn't have to have underlaying dtype and btype.
Look at (VAR)BINARY.

--
WBR, SD.

Денис Симонов

unread,
Jul 15, 2024, 7:18:04 AM (12 days ago) Jul 15
to firebird-devel
Personally, I like the idea of ​​a separate internal type for JSON. All functions producing JSON can create this data type (it can generally be a reference one). Which can later be converted to VARCHAR or BLOB SUB_TYPE TEXT when returned or BLOB SUB_TYPE JSON (for optimal storage in binary form).

As for indexing, I’m not sure that it is necessary to directly index the entire JSON. Nobody bothers you to create a calculated index of a returning type that can be indexed. Or in the future some new type of index, and there may be more than one. For example, there may be full-text indexes or indexes for geodata. And they may well be created on top of BLOBs.

понедельник, 15 июля 2024 г. в 14:06:41 UTC+3, Dimitry Sibiryakov:

Dimitry Sibiryakov

unread,
Jul 15, 2024, 7:43:16 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
Artyom Abakumov wrote 15.07.2024 12:41:
> Blobs are slow and it is complicate to word with it.

BTW, AFAIK BLOBs are known to be slow only in delivery to client side. Inside
of engine they are widely used to speed up operations, for example record format
and ACLs are kept in BLOB.
For operating with BLOBs there are several wrappers and simple routines in
common/classes as well al in jrd/blb itself.

--
WBR, SD.

Artyom Abakumov

unread,
Jul 15, 2024, 7:50:44 AM (12 days ago) Jul 15
to firebird-devel
Alright. What about JSON representation (binary, text). Should it be the same type with subtypes (like blob) or 2 separate types? Which of these types should the functions return?
понедельник, 15 июля 2024 г. в 14:18:04 UTC+3, Денис Симонов:

Dimitry Sibiryakov

unread,
Jul 15, 2024, 8:00:16 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
Artyom Abakumov wrote 15.07.2024 13:50:
> What about JSON representation (binary, text). Should it be the same type with
> subtypes (like blob) or 2 separate types? Which of these types should the
> functions return?

JSON has RFC, BSON doesn't. IMHO, they must be different types and when the
standard says "JSON" I would say that it is a text JSON, not BSON.
Also Google tell me that there are different kinds of binary JSON so it is
not clear which one Firebird should support.
At initial stage I would suggest to limit the implementation to text JSON.
(And if it is implemented as a BLOB subtype - you always can use BLOB filters to
transform one to another.)

--
WBR, SD.

Artyom Abakumov

unread,
Jul 15, 2024, 8:12:11 AM (12 days ago) Jul 15
to firebird-devel

Also Google tell me that there are different kinds of binary JSON so it is not clear which one Firebird should support.
 
Actually, there is no such thing as binary JSON. There is a BSON standard, but it is not what I mean when talking about binary JSON representation. Each RDBMS supports its own implementation of binary JSON. Basically, it is proposed only to work faster with data from JSON.

Денис Симонов

unread,
Jul 15, 2024, 8:12:55 AM (12 days ago) Jul 15
to firebird-devel
Ideally, they should return BLOB SUB_TYPE JSON (in binary form), unless of course the RETURNING clause is specified (then return the type that is written there). And there should be a built-in BLOB filter that allows on-the-fly conversions BLOB SUB_TYPE JSON <-> BLOB SUB_TYPE TEXT. But it’s worth considering that the functions should return some kind of not a typical BLOB, but something like the one that returns BLOB_APPEND or even something more cunning, so as not to produce temporary BLOBs. But doing just that and doing it effectively is unlikely to work. Therefore, it seems to me easier to organize some kind of internal type (sflochny or pointer), which will be used while the JSON is being built (that is, one JSON function is called inside another). And only when the result of the functions begins to be used outside of other JSON functions do the conversion to BLOB SUB_TYPE JSON/BLOB SUB_TYPE TEXT.

понедельник, 15 июля 2024 г. в 14:50:44 UTC+3, Artyom Abakumov:

Piergiorgio Valli

unread,
Jul 15, 2024, 9:13:35 AM (12 days ago) Jul 15
to firebir...@googlegroups.com
HI all,

I would add a little contribution, on Sqlite  and Postgresql they have added jsonb is storing internal binary representation of JSON directly in the database, applications can bypass the overhead of parsing and rendering JSON when reading and updating JSON values.


Pier



--
You received this message because you are subscribed to the Google Groups "firebird-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-deve...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebird-devel/3eaeb942-8efb-44dd-8457-6d758e4d9c9dn%40googlegroups.com.

Mark Rotteveel

unread,
Jul 15, 2024, 1:46:52 PM (12 days ago) Jul 15
to firebir...@googlegroups.com
On 15/07/2024 09:32, Artyom Abakumov wrote:
> To make things clear, let's set up the main points on the entire
> implementation:
>
> 1.
>
> This is the implementation of the JSON SQL standard 2017 (ISO/IEC TR
> 19075-6:2017).

I don't have time to read through it all right now, but a thing that
stands out to me:

The current version of ISO/IEC 19075-6 is ISO/IEC 19075-6:2021, so why
base it on an outdated version (ISO/IEC TR 19075-6:2017)?

Specifically, the introduction of ISO 19076-6:2021 says:

"""
This first edition of ISO/IEC 19075-6 cancels and replaces ISO/IEC TR
19075-6:2017. Several small technical errors have been corrected,
"""

Its phrasing ("first edition of") is suggests the 2017 was not a
standard. Its code was ISO/IEC TR 19075-6:2017 and the 2021 version is
ISO/IEC 19075-6:2021 (without the TR). As far as I understand, that
means that the 2017 was a Technical Report, and not a standard. The fact
that (small) technical errors have been corrected is also of importance.

Also, as far as I'm aware part of the specification for JSON support is
in ISO 9075-2(:2023), and not in 19075-6, so make sure it's covered. I
only have access to the page of contents and the foreword (I'd need to
pay for the rest), but based on that, for example the coverage of
JSON_OBJECTAGG + JSON_ARRAYAGG in 19075-6 is 2 pages, while in 9075-2,
it's (at least) 6 pages for the SQL syntax and rules for those two
functions.

Mark
--
Mark Rotteveel

Artyom Abakumov

unread,
Jul 15, 2024, 2:23:48 PM (12 days ago) Jul 15
to firebird-devel

The current version of ISO/IEC 19075-6 is ISO/IEC 19075-6:2021, so why
base it on an outdated version (ISO/IEC TR 19075-6:2017)?

The implementation began before the second edition, and, honestly, I only found out about the newer version when I was preparing the proposal. It would be nice to take a look, but unfortunately, I have no access to it.

Also, as far as I'm aware part of the specification for JSON support is
in ISO 9075-2(:2023), and not in 19075-6, so make sure it's covered.

Yes, there are a lot of blind spots. In some cases, I reference the behavior of PostgreSQL JSON functions to address them.

Dmitry Yemanov

unread,
2:08 AM (6 hours ago) 2:08 AM
to firebir...@googlegroups.com
All,

Let me try to summarize the feedback. Feel free to correct me if I got
something wrong.

Internally (inside the engine) it makes sense to add a dedicated
dtype_json type and make all JSON functions to receive/return it instead
of using DSC_json with blobs. dtype_json would still be backed by the
blob mechanics as now.

dtype_json may have two sub-types - text (implemented now) and binary
(reserved for the future).

Externally (API level) dtype_json is converted from/to text BLOB.

Now the new JSON data type (as per SQL:2023) comes in. If we add it now,
it will be backed by the same dtype_json/text (but a new BLR verb is
needed). So far so good, and AFAIU it corresponds to the SQL standard
which bases this datatype on the JSON specification (RFC 8259).

When we decide to implement a (more efficient) binary JSON format, it
will be backed by dtype_json/binary. Not sure whether it should be
externally represented as BLOB BINARY or some new blob sub-type should
be invented or we just leave the text BLOB as the only public interface
and convert between JSON sub-types under the hood.

But if/when dtype_json/binary is implemented and we'd want it to be
stored "as is", how should we handle that? Add a non-standard data type
like PG did with JSONB? Add a non-standard syntax extension to the
standard JSON data type e.g. JSON [SUB_TYPE {TEXT | BINARY}]? Something
else?


Dmitry

Денис Симонов

unread,
3:58 AM (4 hours ago) 3:58 AM
to firebird-devel
It seems to me that the answer to the last question depends on the properties of the JSON type as it is described in the standard. Are there any features of the SQL-2023 JSON type that would distinguish it from a simple text blob? If not, then we can easily make it an alias for another type, as is currently done with [VAR]BINARY.

суббота, 27 июля 2024 г. в 09:08:31 UTC+3, Dmitry Yemanov:

Dimitry Sibiryakov

unread,
4:47 AM (3 hours ago) 4:47 AM
to firebir...@googlegroups.com
Dmitry Yemanov wrote 27.07.2024 8:08:
> Let me try to summarize the feedback. Feel free to correct me if I got something
> wrong.
>
> Internally (inside the engine) it makes sense to add a dedicated dtype_json type
> and make all JSON functions to receive/return it instead of using DSC_json with
> blobs. dtype_json would still be backed by the blob mechanics as now.

I see no sense in dtype_json. Its support would be required from every
existing application, including plugins that has handle data by dtype.
isc_blob_json subtype support would be optional: if an application/plugin has
no support for it it still can handle it as a blob.

--
WBR, SD.

Adriano dos Santos Fernandes

unread,
6:45 AM (1 hour ago) 6:45 AM
to firebir...@googlegroups.com
Em sáb., 27 de jul. de 2024 03:08, Dmitry Yemanov <fire...@yandex.ru> escreveu:
All,

Let me try to summarize the feedback. Feel free to correct me if I got
something wrong.

Internally (inside the engine) it makes sense to add a dedicated
dtype_json type and make all JSON functions to receive/return it instead
of using DSC_json with blobs. dtype_json would still be backed by the
blob mechanics as now.

I don't think this makes sense internally.

Internally it makes sense that a JSON is backed by a map, so it's not going to be re-parsed at each operation.


Adriano

Dimitry Sibiryakov

unread,
6:53 AM (1 hour ago) 6:53 AM
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 27.07.2024 12:44:
> Internally it makes sense that a JSON is backed by a map, so it's not going to
> be re-parsed at each operation.

JSON is a tree.
Map is enough to represent "object value" (in terms of RFC) but it will have
problem with "array value":

> A JSON value MUST be an object, array, number, or string, or one of
> the following three literal names

If JSON needs any in-memory internal representation it should be something
like Firebird's node tree.

--
WBR, SD.

Dmitry Yemanov

unread,
7:08 AM (1 hour ago) 7:08 AM
to firebir...@googlegroups.com
27.07.2024 11:47, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
> I see no sense in dtype_json. Its support would be required from
> every existing application,

Nope, as it does not exist outside the engine (described as CLOB in API).

> including plugins that has handle data by dtype.

Is this really a problem? Do they support dtype_dbkey, for example?

>   isc_blob_json subtype support would be optional: if an
> application/plugin has no support for it it still can handle it as a blob.

I'm not sure I like having two text blob sub-types that differ only in
the formatting. But I'm open to discuss it further if needed.


Dmitry

Dmitry Yemanov

unread,
7:11 AM (1 hour ago) 7:11 AM
to firebir...@googlegroups.com
27.07.2024 13:44, Adriano dos Santos Fernandes wrote:
>
> I don't think this makes sense internally.
>
> Internally it makes sense that a JSON is backed by a map, so it's not
> going to be re-parsed at each operation.

I'd say this is more about optimization and can be implemented later.
That map (or tree) still have to be serialized when stored, so anyway
something blob-like would exist under the hood.


Dmitry

Dmitry Yemanov

unread,
7:15 AM (1 hour ago) 7:15 AM
to firebir...@googlegroups.com
27.07.2024 10:58, Денис Симонов wrote:

> It seems to me that the answer to the last question depends on the
> properties of the JSON type as it is described in the standard. Are
> there any features of the SQL-2023 JSON type that would distinguish it
> from a simple text blob? If not, then we can easily make it an alias for
> another type, as is currently done with [VAR]BINARY.

By SQL:2023 it's a text formatted accordingly to RFC 8259. But it has
some embedded semantics and thus behaves (ordered / compared / etc)
differently than just a string.


Dmitry

Mark Rotteveel

unread,
7:18 AM (1 hour ago) 7:18 AM
to firebir...@googlegroups.com
On 27/07/2024 09:58, Денис Симонов wrote:
> It seems to me that the answer to the last question depends on the
> properties of the JSON type as it is described in the standard. Are
> there any features of the SQL-2023 JSON type that would distinguish it
> from a simple text blob? If not, then we can easily make it an alias for
> another type, as is currently done with [VAR]BINARY.

That trick was to ensure SQL standard compliance with backwards
compatibility with [VAR]CHAR CHARACTER SET OCTETS, which has been in
widespread use for 24+ years.

In this case we're talking about a new feature, and I don't see the
point of trying to introduce such workarounds when the standard
specifies that JSON is a separate type.

I personally think it should not be represented as a plain text blob,
but as a separate blob type (e.g. BLOB SUB_TYPE JSON), with the type
alias JSON, so that clients can actually identify the columns as a json
type.

Trying to use a text blob as the described type may be fine for mere
humans using the API, but for introspection of data types and making
decisions in libraries or clients, having an explicit type is a lot
better, even if the actual behaviour is identical.

For clients that wouldn't support this new sub-type, it can be worked
around using isc_dpb_set_bind/SET BIND/DataTypeCompatibility setting
(though I'm not sure if that would need more work for sub-type support).

Mark
--
Mark Rotteveel

Dimitry Sibiryakov

unread,
7:18 AM (1 hour ago) 7:18 AM
to firebir...@googlegroups.com
Dmitry Yemanov wrote 27.07.2024 13:08:
> Is this really a problem? Do they support dtype_dbkey, for example?

Can't say for everyone, but I did my best to handle (somehow) every existing
dtype because I have no idea when and where it can appear.
I cannot foresee if dtype_json can appear in RDB$FORMAT or
ITracePlugin::trace_dsql_execute. In short everywhere I receive DSC from Firebird.

> I'm not sure I like having two text blob sub-types that differ only in the formatting.

But you are fine having several binary blob sub-types, right?
AFAIU, blob subtypes are not about formatting but about interpretation of the
stream of bytes you get from it.

--
WBR, SD.

Mark Rotteveel

unread,
7:26 AM (25 minutes ago) 7:26 AM
to firebir...@googlegroups.com
On 15/07/2024 20:23, Artyom Abakumov wrote:
>
> The current version of ISO/IEC 19075-6 is ISO/IEC 19075-6:2021, so why
> base it on an outdated version (ISO/IEC TR 19075-6:2017)?
>
>
> The implementation began before the second edition, and, honestly, I
> only found out about the newer version when I was preparing the
> proposal. It would be nice to take a look, but unfortunately, I have no
> access to it.

By its phrasing ISO/IEC 19075-6:2021 is actually the *first* edition
(which suggests that ISO/IEC TR 19075-6:2017 was a pre-release or
something).

If you don't have access, then you should buy a copy of the standard if
you're implementing it, IMHO (with the Dutch ISO representative, NEN, a
single-user copy costs € 270 (including tax)). While paying for itmay be
excessive for an individual, I don't think it is for a company.

> Also, as far as I'm aware part of the specification for JSON support is
> in ISO 9075-2(:2023), and not in 19075-6, so make sure it's covered.
>
>
> Yes, there are a lot of blind spots. In some cases, I reference the
> behavior of PostgreSQL JSON functions to address them.

Do you mean to say you also haven't got access to ISO/IEC 9075-2:2023?
That sounds like a knowledge gap to me.

As an aside, referring to PostgreSQL is potentially risky, as their JSON
implementation predates the SQL standard support, and while I believe
they have retrofitted standard support, it may have features or
behaviours not defined by the standard.

Mark
--
Mark Rotteveel

Dmitry Yemanov

unread,
7:35 AM (16 minutes ago) 7:35 AM
to firebir...@googlegroups.com
27.07.2024 14:17, 'Mark Rotteveel' via firebird-devel wrote:
>
> I personally think it should not be represented as a plain text blob,
> but as a separate blob type (e.g. BLOB SUB_TYPE JSON), with the type
> alias JSON, so that clients can actually identify the columns as a json
> type.
>
> Trying to use a text blob as the described type may be fine for mere
> humans using the API, but for introspection of data types and making
> decisions in libraries or clients, having an explicit type is a lot
> better, even if the actual behaviour is identical.

Well, if you say it would be handy from the application/driver developer
perspective, I count it as a good argument.

And I believe it would be easier to adjust the existing patch to the new
blob sub-type than to the new internal dtype.

IIRC, Alex voted for the new dtype, but he's on vacation now. In the
meantime, does anyone see any issues with JSON represented as a
dedicated blob sub-type (and still left compatible with CLOB)?

> For clients that wouldn't support this new sub-type, it can be worked
> around using isc_dpb_set_bind/SET BIND/DataTypeCompatibility setting
> (though I'm not sure if that would need more work for sub-type support).

Acceptable, as for me.

If we later add a binary JSON format, is this going to be yet another
blob sub-type? Would we want it to be publicly available (in the API)
with its format being officially documented? Or should it be some
"internal" blob sub-type (like ACL/BLR/etc) which should be converted
to/from CLOB/JSON sub-type using a built-in blob filter?


Dmitry

Reply all
Reply to author
Forward
0 new messages