Error when using Table as SelectField together with "UNION ALL" and "ORDER BY" – am I building a non-correct SQL?

44 views
Skip to first unread message

Yafl Wabei

unread,
Sep 28, 2023, 7:40:34 AM9/28/23
to jOOQ User Group
Hi,

during experiments with Tables as SelectField (see https://www.jooq.org/doc/latest/manual/sql-building/sql-statements/select-statement/select-clause/select-clause-table/ for the manual reference and https://groups.google.com/g/jooq-user/c/XMD2Rc0kEqQ for my experiments) I ran into a problem when using it together with the unionAll and orderBy clauses. I basically tried something like this:

ctx.select(AUTHOR, BOOK.TITLE)
  .from(AUTHOR)
  .join(BOOK).on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
  .where(...)
  .unionAll(
    ctx.select(AUTHOR, BOOK.TITLE)
      .from(AUTHOR)
      .join(BOOK).on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
      .join(PUBLISHER).on(PUBLISHER.ID.eq(BOOK.PUBLISHER_ID))
      .where(...)
  )
  .orderBy(AUTHOR.FIRST_NAME);

This doesn't work. In Postgres, it produces an error like this:

  org.postgresql.util.PSQLException: ERROR: column "first_name" does not exist

I came up with a guess why it doesn't work, but I'd like to know if that's correct and if someone has a more detailed explanation why it's not working. So here's my analysis and the guess based on that:

The SQL command created by the above jooq query looked sth like this:

select
  row (author.id, author.first_name, author.last_name, ...) as author,
  book.title
from author join book on author.id = book.author_id
where ...
union all
  select
    row (author.id, author.first_name, author.last_name, ...) as author,
    book.title
  from author join book on author.id = book.author_id join publisher on publisher.id = book.publisher_id
  where ...
order by first_name

If I remove the unionAll, but keep the orderBy clause, like so:

ctx.select(AUTHOR, BOOK.TITLE)
  .from(AUTHOR)
  .join(BOOK).on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
  .where(...)
  .orderBy(AUTHOR.FIRST_NAME);

then it works! In this case the SQL looks like this:

select
  row (author.id, author.first_name, author.last_name, ...) as author,
  book.title
from author join book on author.id = book.author_id
where ...
order by author.first_name

In both cases (with or without unionAll), the author row gets the alias "author" (same as the table name) by jooq, and here the "order by" also references "author". When I use an explicit alias myself (like Table<?> a = AUTHOR.as("a")), the results are similar, so that doesn't solve the problem. I guess the reason is that with the unionAll Postgres produces something like two temporary tables (in the above case they both have the alias "author", but in general they might have different aliases), but then to unite these tables, it needs to put them together in a new temporary table, where the aliases of the single tables will be lost (because Postgres couldn't use different aliases for different rows in the same result table if the two tables had different aliases). I guess that's also why there's no alias in the "order by", because with unionAll there's nothing to reference, so the only possibility is probably to just put the column name and hope that it's a column in the result table. In this case it isn't, though.

When using a select like

ctx.select(AUTHOR.fields())
  .select(BOOK.TITLE)

instead, it works, because then the "first_name" column is just there as a plain column in the result table with the correct name.


So I guess to avoid this, I shouldn't use "Table as SelectField", in case I'm also using unionAll and orderBy (or just not use it at all, if it's not fitting the use case). Instead I could just put the fields I want to query in the select.

Are there other (maybe easier) variants how to make this query work? Or ways to build it differently that make more sense?


Meta: Why am I posting this?

As I came up with an explanation, what's the point in posting it here? I have a few reasons:

  - My explanation is just a guess. If someone has a better or more in-depth explanation of what's happening or why it doesn't work, I'll be happy to hear about that!
  - If my guesses are wrong, I'll be happy to receive corrections.
  - I mentioned this in https://groups.google.com/g/jooq-user/c/XMD2Rc0kEqQ and Lukas said I should file a bug report on GitHub. But according to my guesses, it's not a jooq bug – I'm just building a non correct SQL statement (with jooq). If my guess is correct, jooq cannot fix this, so I'm not gonna file a bug report. But I'd still like to receive feedback on my guess.
  - Other people with the same problem can find it and save time.



All comments appreciated!
Thanks!
Yafl

Lukas Eder

unread,
Sep 28, 2023, 8:03:32 AM9/28/23
to jooq...@googlegroups.com
Hi Yafl,

The UNION ALL operator looks as though it belongs to the SELECT statement, and jOOQ certainly makes it look so as well. But it doesn't really. It creates a new kind of statement that wraps 2 subqueries. Think of it this way:

  UnionAll(Select1, Select2)

Now that query continues to project things, and by convention (there are exceptions, e.g. this bug in Oracle: https://stackoverflow.com/q/25387951/521799), the names of the projected columns corresponds to the (unqualified!) column names from the first subquery. Now, some RDBMS allow for referencing those columns by name in the UNION's ORDER BY clause, others don't.

jOOQ has a few open issues in this area. Some things may be expected to work (e.g. ORDER BY <column index>). In your case, I don't think jOOQ should do any magic such as unnesting the nested column again, just because it happens to be there. You could have any other column called "first_name", which is something entirely different (e.g. any expression) and then why would jOOQ prefer to unnest the nested column.

In SQL, the column that you've nested ceases to exist outside of the UNION operator because you've nested it. This is different without a UNION (and without DISTINCT), because the column names that are not projected with SELECT may still be accessed by ORDER BY in most "reasonable" dialects. Though again, there might be exceptions, e.g.:

I mentioned DISTINCT, because that creates a similar situation like UNION. I've described in in this blog post:

I hope this helps
Lukas

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jooq-user/9961bae5-6a9f-419a-b1da-1716f336bdc5n%40googlegroups.com.

Yafl Wabei

unread,
Sep 28, 2023, 10:19:19 AM9/28/23
to jOOQ User Group
On Thursday, 28 September 2023 at 14:03:32 UTC+2 Lukas Eder wrote:
Hi Yafl,

The UNION ALL operator looks as though it belongs to the SELECT statement, and jOOQ certainly makes it look so as well. But it doesn't really. It creates a new kind of statement that wraps 2 subqueries. Think of it this way:

  UnionAll(Select1, Select2)

Ah, that makes sense! If I think about it that way, I also see more clearly why the aliases of the selects cannot be referenced anymore – because they are defined in an inner scope, whereas ORDER BY sits in the outer scope. I don't know if that's what actually happens (see below too), because I learned now that SQL can be very involved when it comes to (logical) execution order and what's accessible when, but it does help to think about it this way. :)
 
Now that query continues to project things, and by convention (there are exceptions, e.g. this bug in Oracle: https://stackoverflow.com/q/25387951/521799), the names of the projected columns corresponds to the (unqualified!) column names from the first subquery. Now, some RDBMS allow for referencing those columns by name in the UNION's ORDER BY clause, others don't.

jOOQ has a few open issues in this area. Some things may be expected to work (e.g. ORDER BY <column index>). In your case, I don't think jOOQ should do any magic such as unnesting the nested column again, just because it happens to be there. You could have any other column called "first_name", which is something entirely different (e.g. any expression) and then why would jOOQ prefer to unnest the nested column.

I agree, I didn't expect jOOQ to implement magic here. I merely posted it to gain a better understanding of what's happening.
 
In SQL, the column that you've nested ceases to exist outside of the UNION operator because you've nested it. This is different without a UNION (and without DISTINCT), because the column names that are not projected with SELECT may still be accessed by ORDER BY in most "reasonable" dialects. Though again, there might be exceptions, e.g.:

I mentioned DISTINCT, because that creates a similar situation like UNION. I've described in in this blog post:

Ah, so is this the reason why the version without UNION works? Because it is accessing the table name and not the alias? Here my thinking was wrong.

Good to know! In fact, separating the names of different things, I can validate this – the following doesn't work:

select
  row (p.id, p.first_name, p.last_name, ...) as a,
  book.title
from author as p
join book on p.id = book.author_id
where ...
order by a.first_name

whereas this does:

select
  row (p.id, p.first_name, p.last_name, ...) as a,
  book.title
from author as p
join book on p.id = book.author_id
where ...
order by p.first_name

The first one gives a «ERROR: Missing FROM-clause entry for table "a"». Is that because table aliases need to be defined in the from clause always? Because in the logical execution order, ORDER BY comes after SELECT, so I could imagine that it could theoretically access an alias defined in the SELECT statement. On the other hand, the entries in the column "a" are just strings of the form "(value1,value2,...)", not actual tables, so there's no way to access a column in there. If it was a table, it might be possible. You showed the difference it very well in this blog post with the DBeaver screenshots: https://blog.jooq.org/projecting-type-safe-nested-tablerecords-with-jooq-3-17/

So would the query work with a nested table in the select? Or would it still not work, because ORDER BY cannot access aliases defined in SELECT?
I wanted to test this, but I cannot reproduce the nested result anymore, even with postgres in DBeaver (as described in the mentioned blog post). It always gives me rows, also with just plain simple "SELECT Table FROM Table". Don't know why. I'm pretty sure it worked at least once before.

Although I read multiple times on your blog about the logical execution order of the SQL clauses and the subtleties of SQL, the order in which SQL is written still confuses me and I somehow expect references from the select to be accessible elsewhere. I'm thinking about starting to rewrite my SQL statement in the logical execution order to spot errors more easily when an error occurs. But then still, if aliases from the SELECT clause are not accessible in the ORDER BY clause, the execution order cannot be the reason for that. Sigh.

Lukas Eder

unread,
Sep 29, 2023, 2:38:30 AM9/29/23
to jooq...@googlegroups.com
On Thu, Sep 28, 2023 at 4:19 PM Yafl Wabei <yafl....@gmail.com> wrote:
On Thursday, 28 September 2023 at 14:03:32 UTC+2 Lukas Eder wrote:
Hi Yafl,

The UNION ALL operator looks as though it belongs to the SELECT statement, and jOOQ certainly makes it look so as well. But it doesn't really. It creates a new kind of statement that wraps 2 subqueries. Think of it this way:

  UnionAll(Select1, Select2)

Ah, that makes sense! If I think about it that way, I also see more clearly why the aliases of the selects cannot be referenced anymore – because they are defined in an inner scope, whereas ORDER BY sits in the outer scope. I don't know if that's what actually happens (see below too), because I learned now that SQL can be very involved when it comes to (logical) execution order and what's accessible when, but it does help to think about it this way. :)
 
Now that query continues to project things, and by convention (there are exceptions, e.g. this bug in Oracle: https://stackoverflow.com/q/25387951/521799), the names of the projected columns corresponds to the (unqualified!) column names from the first subquery. Now, some RDBMS allow for referencing those columns by name in the UNION's ORDER BY clause, others don't.

jOOQ has a few open issues in this area. Some things may be expected to work (e.g. ORDER BY <column index>). In your case, I don't think jOOQ should do any magic such as unnesting the nested column again, just because it happens to be there. You could have any other column called "first_name", which is something entirely different (e.g. any expression) and then why would jOOQ prefer to unnest the nested column.

I agree, I didn't expect jOOQ to implement magic here. I merely posted it to gain a better understanding of what's happening.
 
In SQL, the column that you've nested ceases to exist outside of the UNION operator because you've nested it. This is different without a UNION (and without DISTINCT), because the column names that are not projected with SELECT may still be accessed by ORDER BY in most "reasonable" dialects. Though again, there might be exceptions, e.g.:

I mentioned DISTINCT, because that creates a similar situation like UNION. I've described in in this blog post:

Ah, so is this the reason why the version without UNION works? Because it is accessing the table name and not the alias? Here my thinking was wrong.

Good to know! In fact, separating the names of different things, I can validate this – the following doesn't work:

select
  row (p.id, p.first_name, p.last_name, ...) as a,
  book.title
from author as p
join book on p.id = book.author_id
where ...
order by a.first_name

Use (a).first_name to dereference nested row attributes in PostgreSQL
 

whereas this does:

select
  row (p.id, p.first_name, p.last_name, ...) as a,
  book.title
from author as p
join book on p.id = book.author_id
where ...
order by p.first_name

The first one gives a «ERROR: Missing FROM-clause entry for table "a"». Is that because table aliases need to be defined in the from clause always? Because in the logical execution order, ORDER BY comes after SELECT, so I could imagine that it could theoretically access an alias defined in the SELECT statement. On the other hand, the entries in the column "a" are just strings of the form "(value1,value2,...)", not actual tables, so there's no way to access a column in there. If it was a table, it might be possible. You showed the difference it very well in this blog post with the DBeaver screenshots: https://blog.jooq.org/projecting-type-safe-nested-tablerecords-with-jooq-3-17/

So would the query work with a nested table in the select? Or would it still not work, because ORDER BY cannot access aliases defined in SELECT?
I wanted to test this, but I cannot reproduce the nested result anymore, even with postgres in DBeaver (as described in the mentioned blog post). It always gives me rows, also with just plain simple "SELECT Table FROM Table". Don't know why. I'm pretty sure it worked at least once before.

Although I read multiple times on your blog about the logical execution order of the SQL clauses and the subtleties of SQL, the order in which SQL is written still confuses me and I somehow expect references from the select to be accessible elsewhere. I'm thinking about starting to rewrite my SQL statement in the logical execution order to spot errors more easily when an error occurs. But then still, if aliases from the SELECT clause are not accessible in the ORDER BY clause, the execution order cannot be the reason for that. Sigh.

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.

Yafl Wabei

unread,
Oct 11, 2023, 2:39:00 AM10/11/23
to jOOQ User Group
Being late with replying, but I still want to conclude this.

On Friday, 29 September 2023 at 08:38:30 UTC+2 Lukas Eder wrote:

In SQL, the column that you've nested ceases to exist outside of the UNION operator because you've nested it. This is different without a UNION (and without DISTINCT), because the column names that are not projected with SELECT may still be accessed by ORDER BY in most "reasonable" dialects. Though again, there might be exceptions, e.g.:

I mentioned DISTINCT, because that creates a similar situation like UNION. I've described in in this blog post:

Ah, so is this the reason why the version without UNION works? Because it is accessing the table name and not the alias? Here my thinking was wrong.

Good to know! In fact, separating the names of different things, I can validate this – the following doesn't work:

select
  row (p.id, p.first_name, p.last_name, ...) as a,
  book.title
from author as p
join book on p.id = book.author_id
where ...
order by a.first_name

Use (a).first_name to dereference nested row attributes in PostgreSQL

Thanks for the hint! Though, when I try this, I get an error

«ERROR: column "a" does not exist»

Did I miss something? Do I need a different syntax?

Lukas Eder

unread,
Oct 11, 2023, 2:58:32 AM10/11/23
to jooq...@googlegroups.com
Hah, indeed, that doesn't work in PostgreSQL, neither does any other expression on a (in ORDER BY). So, I'd also like to conclude this, with my pinned tweet:

(The following would work, but I forgot what the point of all this is):

select a
from (
  select row (x, y) as a
  from (values (1, 2)) t (x, y)
) as t
order by (a).f1;

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.

Yafl Wabei

unread,
Oct 11, 2023, 8:12:08 AM10/11/23
to jOOQ User Group
On Wednesday, 11 October 2023 at 08:58:32 UTC+2 Lukas Eder wrote:
Hah, indeed, that doesn't work in PostgreSQL, neither does any other expression on a (in ORDER BY). So, I'd also like to conclude this, with my pinned tweet:

(The following would work, but I forgot what the point of all this is):

select a
from (
  select row (x, y) as a
  from (values (1, 2)) t (x, y)
) as t
order by (a).f1;

Well ok, that does work, though it's a bit annoying that I cannot reference "x" or "y" in the ORDER BY clause, but have to use "f1".

Anyway, I think there's no real point to this anymore, because I'm not gonna use this specific SQL. We ended up here, because I wanted to understand which kind of references to selected rows work (or don't) in an ORDER BY clause. That's settled for me now. :)

Thanks for your help, again!
Reply all
Reply to author
Forward
0 new messages