Pretty printing SQL

403 views
Skip to first unread message

Lukas Eder

unread,
Mar 2, 2012, 9:04:13 AM3/2/12
to jooq...@googlegroups.com, Christopher Deckers
Hello Christopher,

I have now implemented the pretty printing functionality in jOOQ:
https://sourceforge.net/apps/trac/jooq/ticket/161

There is a flag that can be set to true in the jOOQ runtime settings. By default it is set to false, as today. When set to true, every QueryPart can decide for themselves, how they want to format their rendered SQL. I chose not to re-use the logic used in the jOOQ console, as it is non-trivial to correctly parse SQL text for formatting without a sophisticated lexer. It is also hard to extend such a parser manually without breaking it. For the console, this is less critical, as the SQL statement doesn't have to be executed.

Here are some sample statements, rendered with jOOQ's new pretty printing feature:


-- Nested selects in predicates
--------------------------------------
from "TEST"."T_BOOK"
where "TEST"."T_BOOK"."ID" = any (select "array_table"."COLUMN_VALUE"
                                  from (select 1 "COLUMN_VALUE"
                                        from dual) "array_table")
order by "TEST"."T_BOOK"."ID" asc

-- Nested selects in LIMIT .. OFFSET clauses
--------------------------------------
select * from (
  select limit_104581173.*, rownum as rownum_104581173
  from (
    select
      "TEST"."T_AUTHOR"."ID",
      "TEST"."T_AUTHOR"."FIRST_NAME",
      "TEST"."T_AUTHOR"."LAST_NAME",
      "TEST"."T_AUTHOR"."DATE_OF_BIRTH",
      "TEST"."T_AUTHOR"."YEAR_OF_BIRTH",
      "TEST"."T_AUTHOR"."ADDRESS"
    from "TEST"."T_AUTHOR"
    order by "TEST"."T_AUTHOR"."ID" asc
  ) limit_104581173
)
where rownum_104581173 > 0
and rownum_104581173 <= (0 + 1)

-- Nested JOIN clauses
--------------------------------------

select
  "TEST"."T_BOOK"."TITLE",
  "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME",
  sum("MULTI_SCHEMA"."T_BOOK_SALE"."SOLD_FOR")
from "TEST"."T_BOOK"
join ("TEST"."T_BOOK_TO_BOOK_STORE"
join "MULTI_SCHEMA"."T_BOOK_SALE" on ("MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME" = "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_STORE_NAME" and
                                      "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_ID" = "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_ID")) on "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_ID" = "TEST"."T_BOOK"."ID"
group by
  "TEST"."T_BOOK"."ID",
  "TEST"."T_BOOK"."TITLE",
  "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME"
order by
  "TEST"."T_BOOK"."ID" asc,
  sum("MULTI_SCHEMA"."T_BOOK_SALE"."SOLD_FOR") desc
--------------------------------------


This formatter could be re-used in jOOQ console, as the console has a reference to org.jooq.Query. For formatting, use

String sql = new Factory(
    connection,
    dialect,
    new Settings().withRenderFormatted(true)).render(query);

Cheers
Lukas

Christopher Deckers

unread,
Mar 4, 2012, 4:38:04 PM3/4/12
to Lukas Eder, jooq...@googlegroups.com
Hi Lukas,

Disclaimer: I am just a pain when voicing opinions. This is only an artefact to bring more heat to the debates in hope of finding the best (or better) solutions :)

I chose not to re-use the logic used in the jOOQ console,

I have nothing against that statement.
 

as it is non-trivial to correctly parse SQL text for formatting without a sophisticated lexer.

I totally disagree. You don't need a sophisticated lexer to format Java, XML, SQL and these kind of languages. I am not talking about PL-SQL or T-SQL but normal SQL. In your case, you would also have the same issue with free-style SQL anyway.


It is also hard to extend such a parser manually without breaking it.

I don't agree with that, because it is a simple manually coded state machine with a few simplistic look ahead.
It's logic is very simple:
- Keep track of the indentation level.
- When there is an opening parenthesis, start a new line and increase the indentation, with an exception: parentheses with a single word.
- When there is an SQL keyword, and the list is clearly defined and extensible, go to next line but no indentation increase.
- When there is a quote, do not apply the above rules until a closing quote is found.

Maybe I should add double quotes because you make extensive use of them, but that is it.
 

For the console, this is less critical, as the SQL statement doesn't have to be executed.

This is not true: we format the code to work on it, debug it, run it. If it were to break, it would not be much of interest.
I don't deny there could be bugs (though it works fine for us), but it is deterministic, the rules are very simple to understand and it is easy to add the SQL keywords generating a new line if we miss one.
 

Here are some sample statements, rendered with jOOQ's new pretty printing feature:

"pretty" is very subjective :)

 
-- Nested selects in predicates
--------------------------------------
from "TEST"."T_BOOK"
where "TEST"."T_BOOK"."ID" = any (select "array_table"."COLUMN_VALUE"
                                  from (select 1 "COLUMN_VALUE"
                                        from dual) "array_table")

As if I were to format Java code this way:

synchronized(myWonderfulLock) {if(someCondition) {
                                 dosomething();
                               }
}

I prefer to avoid horizontal scrolling, except for long lists of columns, values. What matters to me are the keyword of SQL, not to have a new line for arbitrary items and certain keywords on the same line. Also, the above logic could make several block that are at the same indentation level to not be aligned.

But as I said, this is very subjective. If you want to really support that feature, you need to please anyone. That probably means a formatting strategy that the user can implement with predefined strategies. Mine could be plugged, or yours could be plugged :)


-- Nested selects in LIMIT .. OFFSET clauses
--------------------------------------
select * from (
  select limit_104581173.*, rownum as rownum_104581173
  from (
    select
      "TEST"."T_AUTHOR"."ID",
      "TEST"."T_AUTHOR"."FIRST_NAME",
      "TEST"."T_AUTHOR"."LAST_NAME",
      "TEST"."T_AUTHOR"."DATE_OF_BIRTH",
      "TEST"."T_AUTHOR"."YEAR_OF_BIRTH",
      "TEST"."T_AUTHOR"."ADDRESS"
    from "TEST"."T_AUTHOR"
    order by "TEST"."T_AUTHOR"."ID" asc
  ) limit_104581173
)
where rownum_104581173 > 0
and rownum_104581173 <= (0 + 1)

This one is fine, except for the columns which I would prefer to see on the same line. The fact that they are on multiple lines is probably interesting here only because you prefix everything.
Side note: if pretty print is about easier reading, then using the schema and removing the prefixes would be the first thing to do. In such case, certain considerations could change (like all columns on same line).

 
-- Nested JOIN clauses
--------------------------------------

select
  "TEST"."T_BOOK"."TITLE",
  "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME",
  sum("MULTI_SCHEMA"."T_BOOK_SALE"."SOLD_FOR")
from "TEST"."T_BOOK"
join ("TEST"."T_BOOK_TO_BOOK_STORE"
join "MULTI_SCHEMA"."T_BOOK_SALE" on ("MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME" = "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_STORE_NAME" and
                                      "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_ID" = "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_ID")) on "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_ID" = "TEST"."T_BOOK"."ID"
group by
  "TEST"."T_BOOK"."ID",
  "TEST"."T_BOOK"."TITLE",
  "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME"
order by
  "TEST"."T_BOOK"."ID" asc,
  sum("MULTI_SCHEMA"."T_BOOK_SALE"."SOLD_FOR") desc
--------------------------------------


I find this one unreadable. Some new lines are keywords and some are columns, there are opening parentheses without indentation, the multiple and statements of the on clause are not where I would expect them, etc.
As a user, there is no logic that tells me how this is supposed to help me understand the query.

 
This formatter could be re-used in jOOQ console, as the console has a reference to org.jooq.Query. For formatting, use

String sql = new Factory(
    connection,
    dialect,
    new Settings().withRenderFormatted(true)).render(query);

I only process raw SQL, not query parts. How does your formatter work with plain SQL?

Cheers,
-Christopher

Lukas Eder

unread,
Mar 4, 2012, 5:13:28 PM3/4/12
to Christopher Deckers, jooq...@googlegroups.com
Hi Christopher,

> I totally disagree. You don't need a sophisticated lexer to format Java,
> XML, SQL and these kind of languages. I am not talking about PL-SQL or T-SQL
> but normal SQL.

I'm just saying it's non-trivial to distinguish UPDATE .. SET from
MERGE .. WHEN MATCHED THEN UPDATE SET and from INSERT .. ON DUPLICATE
KEY UPDATE (without SET, but with ON as in JOIN .. ON) etc. Unlike
other languages (e.g. Java), SQL keywords are only keywords in their
contexts. That makes it a bit more difficult to format. With jOOQ
supporting arbitrary vendor-specific extension, we're somewhere
between SQL and PL/SQL, T-SQL.

> In your case, you would also have the same issue with
> free-style SQL anyway.

It is safe to consider free-style SQL as "already formatted", at least
for logging.

> - When there is an opening parenthesis, start a new line and increase the
> indentation, with an exception: parentheses with a single word.

I've seen that ;-) Example:
COUNT (
*
)

> "pretty" is very subjective :)

Of course. Well... We both know that "pretty printing" an accepted
term for "formatting". Besides, I suspect that the formatting got
lost, partially, in E-Mail re-formatting. But I agree with some
points. A subquery's "select" should go to an indented new line

> That probably means a formatting
> strategy that the user can implement with predefined strategies.

Yes. I've thought about that. I'll have to look into Eclipse's
formatting rules eventually. But it's certainly not a priority.

> This one is fine, except for the columns which I would prefer to see on the
> same line. The fact that they are on multiple lines is probably interesting
> here only because you prefix everything.

Even without prefixing, imagine complex functions (e.g. window
functions) and columns in the same projection list... Or projections
with more than 20 fields...

> I only process raw SQL, not query parts. How does your formatter work with
> plain SQL?

It is kept as such

Cheers
Lukas

Reply all
Reply to author
Forward
0 new messages