Dynamic SQL Error; SQL error code = -104; Token unknown -- ISC error code:335544634

209 views
Skip to first unread message

Jonathan Boarman

unread,
Apr 18, 2023, 10:47:19 PM4/18/23
to firebird-java
We're trying to pull data into Databricks from a Firebird data source, which we think is running version Firebird 2.5 (we don't have access to the server to verify).  The python caller is leveraging the Jaybird JDBC driver and the error is a java.sql.SQLSyntaxErrorException.

Does anyone have ideas for how we should approach investigating or resolving the issue below?

The error that comes back for any query attempted via JDBC is as follows, with the only variation being the column number, which does not seem to correspond to the length of the input query.

java.sql.SQLSyntaxErrorException: Dynamic SQL Error; SQL error code = -104; Token unknown - line 1, column 15; SELECT [SQLState:42000, ISC error code:335544634]

We have tried both the 4.0.9 and 5.0.1 versions of the Jaybird JDBC drivers.  The error seems to indicate the SQL is invalid, but the queries are the most basic of queries:

SELECT a.ID FROM table_name AS a

or something as simple as:

SELECT 1

We've tried coding various patterns, but here's a basic example of code the produces this error:

jdbc_url = 'jdbc:firebirdsql://xxx.xxx.xxx.xxx:xxxx/TEST.FDB'
jdbc_properties = {
  'user': 'user',
  'password': 'pass',
  'driver': 'org.firebirdsql.jdbc.FBDriver'
}

query = "SELECT 1;"

# Execute the query and create a DataFrame
result = spark.read.jdbc(url=jdbc_url, table=query, properties=jdbc_properties)
result.show()




What should we try to better understand what is causing this error?

Thanks,
Jonathan Boarman


Arioch The

unread,
Apr 19, 2023, 4:26:03 AM4/19/23
to firebi...@googlegroups.com
> query = "SELECT 1;"

this surely is invalid! you have to add " from RDB$DATABASE" or
something similar - check your queries at https://dbfiddle.uk/HeRBuc3I

> Token unknown - line 1, column 15; SELECT [

i don't remember if the line counter starts at one or zero, but column
counter certainly does not start with 15

you have some wrapper probably, which decorates the queries and does
so invalidly

run and traceapi client ( personally i prefer http://FBProfiler.sf.net
) and monitor what is the REAL query which the server receives, it
most probably is different from what you write/think

Mark Rotteveel

unread,
Apr 19, 2023, 7:59:06 AM4/19/23
to firebi...@googlegroups.com
On 19-04-2023 04:05, 'Jonathan Boarman' via firebird-java wrote:
> Does anyone have ideas for how we should approach investigating or
> resolving the issue below?
>
> The error that comes back for any query attempted via JDBC is as
> follows, with the only variation being the column number, which does not
> seem to correspond to the length of the input query.
>
> *java.sql.SQLSyntaxErrorException: Dynamic SQL Error; SQL error code =
> -104; Token unknown - line 1, column 15; SELECT [SQLState:42000, ISC
> error code:335544634]
> *
> We have tried both the */4.0.9/* and */5.0.1/* versions of the Jaybird
> JDBC drivers.  The error seems to indicate the SQL is invalid, but the
> queries are the most basic of queries:
>
> /SELECT a.ID FROM table_name AS a/
>
> or something as simple as:
>
> /SELECT 1
> /
> We've tried coding various patterns, but here's a basic example of code
> the produces this error:
>
> /jdbc_url = 'jdbc:firebirdsql://xxx.xxx.xxx.xxx:xxxx/TEST.FDB'
> jdbc_properties = {
>   'user': 'user',
>   'password': 'pass',
>   'driver': 'org.firebirdsql.jdbc.FBDriver'
> }
>
> query = "SELECT 1;"
>
> # Execute the query and create a DataFrame
> result = spark.read.jdbc(url=jdbc_url, table=query,
> properties=jdbc_properties)
> result.show()/
>
> What should we try to better understand what is causing this error?

The problem is that method expects a table name, not a query. What
happens is that your query is transformed to

SELECT * FROM SELECT 1
or
SELECT * FROM SELECT a.ID FROM table_name AS a

Neither of which is valid SQL. This also accounts for the position in
the error message already pointed out by Arioch'The, the second SELECT
starts at position 15.

You should put in the table name only, or try to parenthesize the query
(e.g. "(SELECT a.ID FROM table_name AS a)"), though I'm not sure if this
last thing works though I guess it will work. It is also possible there
is an alternative parameter that allows a query instead of a table name,
but I'm not really familiar with Spark.

Mark
--
Mark Rotteveel

Jonathan Boarman

unread,
Apr 19, 2023, 11:19:45 AM4/19/23
to firebi...@googlegroups.com
Thanks Mark!  I shared a broken query and the subquery method is the workaround generally used.

But, even with the subquery method and a few other patterns commonly used, we get the same error.

Is there a good way to intercept and log the exact query being provided to the Jaybird driver?
 
Jonathan Boarman


--
You received this message because you are subscribed to the Google Groups "firebird-java" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-jav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebird-java/0138537f-1ea1-f1f9-cc0a-32711b1f6550%40lawinegevaar.nl.

Jonathan Boarman

unread,
Apr 19, 2023, 11:30:36 AM4/19/23
to firebird-java
Thanks Arioch for sharing that query example to try.

I'll have to find a way to intercept the REAL query that is available in this environment.  The options will be limited to what can be run using Python or Java without external binary dependencies other than maven or pip packages.

Mark Rotteveel

unread,
Apr 19, 2023, 12:49:43 PM4/19/23
to firebi...@googlegroups.com
On 19-04-2023 17:19, 'Jonathan Boarman' via firebird-java wrote:
> Thanks Mark!  I shared a broken query and the subquery method is the
> workaround generally used.
>
> But, even with the subquery method and a few other patterns commonly
> used, we get the same error.

What about just using the tablename?

> Is there a good way to intercept and log the exact query being provided
> to the Jaybird driver?

You can use the Firebird trace facility. You can use the barebones
fbtracemgr included with Firebird, or the Jaybird class FBTraceManager
(though that is not much better than using fbtracemgr), or the tool
Arioch'The suggested. There are also other tools that can use the trace
facility.

Mark
--
Mark Rotteveel

Jonathan Boarman

unread,
Apr 19, 2023, 1:51:52 PM4/19/23
to firebird-java
Yeah, even just a simple table name results in the same sort of error:

Dynamic SQL Error; SQL error code = -104; Token unknown - line 1, column 64; 21 [SQLState:42000, ISC error code:335544634]

I'm looking to see if I can get the JDBC logs to show the raw query, but I think the driver would have to post to the logs for that to happen.  The MySQL JDBC driver, for example, has a parameter to inform the driver to log details via Log4J.  I'm guessing that is not an option here?

We have another JDBC driver using SQLite, for example, and the exact same queries work there.  So there's some difference happening here, but we're still trying to find a way to intercept the real queries being posted in this environment since we don't have access to the server or anywhere to install tools, etc.

Jonathan Boarman

unread,
Apr 19, 2023, 3:11:46 PM4/19/23
to firebird-java
Logging to the console is now enabled by setting org.firebirdsql.jdbc.forceConsoleLogger  to true (per https://firebirdsql.github.io/jaybird-manual/jaybird_manual.html#ref-logging-console).

I am now seeing this warning in the stderr, though I can't say for sure whether this was showing up previously since these logs are ephemeral:

  [FBManagedConnection]WARNING: No connection character set specified (property lc_ctype, encoding, charSet or localEncoding), defaulting to character set NONE

Is there a secret option that would allow us to enable the same logging to stderr but at a trace/debug level that is not covered in the docs?

Arioch The

unread,
Apr 19, 2023, 3:25:39 PM4/19/23
to firebi...@googlegroups.com
> I'll have to find a way to intercept the REAL query that is available

Find??? I thought i told you below which exact program can do it...

>> you have some wrapper probably, which decorates the queries and does
>> so invalidly
>>
>> run any traceapi client ( personally i prefer http://FBProfiler.sf.net

Jonathan Boarman

unread,
Apr 19, 2023, 3:36:12 PM4/19/23
to firebird-java
I hope I'm not missing what you mean, but I do apologize if I am because I'm very grateful for everyone's help here.

That tracing tool looks awesome, but I don't think I have a way to run FBProfiler, or any other external tools, on the calling side within Databricks except those that are native to Java or Python since this is not a full VM.  It's limited to console output, JUL or Log4J.  We don't have access to the server where Firebird is running, and we can't install tools within a Databricks environment.  So this is why we're looking for an integrated logging option.

Most of the info I'm finding online just points back to enabling logging within a given JDBC driver, but the implementation of logging varies significantly from one JDBC driver to the next.


Arioch The

unread,
Apr 20, 2023, 4:33:07 AM4/20/23
to firebi...@googlegroups.com
The only thing FB Profiler or IBExpert or any TraceAPI client is needing is the direct connection to the firebird server.

Can you negotiate some TCP tunnel (openssl/putty or VPN or anything) to the firebird server, specific port on the specific machine? 

Also, can you run the same program on any other, less restricted, server, like on your own developer machine? Install Java and firebird on your dev-box and try there? 

ср, 19 апр. 2023 г., 22:36 'Jonathan Boarman' via firebird-java <firebi...@googlegroups.com>:
--
You received this message because you are subscribed to the Google Groups "firebird-java" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-jav...@googlegroups.com.

Mark Rotteveel

unread,
Apr 20, 2023, 6:57:09 AM4/20/23
to firebi...@googlegroups.com
On 19-04-2023 21:11, 'Jonathan Boarman' via firebird-java wrote:
> Logging to the console is now enabled by setting
> /org.firebirdsql.jdbc.*forceConsoleLogger***/ to */true/* (per
> https://firebirdsql.github.io/jaybird-manual/jaybird_manual.html#ref-logging-console).
>
> I am now seeing this warning in the /*stderr,*/ though I can't say for
> sure whether this was showing up previously since these logs are ephemeral:
>
> /  [FBManagedConnection]WARNING: No connection character set specified
> (property lc_ctype, encoding, charSet or localEncoding), defaulting to
> character set NONE

That just means that you didn't specify an explicit connection character
set and is unlikely to be relevant to your problem.

> Is there a secret option that would allow us to enable the same logging
> to /stderr/ but at a *trace/debug level *that is not covered in the docs?

No, because the driver doesn't log the statements anywhere (and if it
did, it wouldn't log it with the console logger, because it would like
be logged on debug or trace level, which the console logger doesn't
log). Your alternative might be to try a wrapping driver like P6Spy.

Mark
--
Mark Rotteveel

Mark Rotteveel

unread,
Apr 20, 2023, 6:58:06 AM4/20/23
to firebi...@googlegroups.com
On 19-04-2023 19:51, 'Jonathan Boarman' via firebird-java wrote:
> Yeah, even just a simple table name results in the same sort of error:
>
> Dynamic SQL Error; SQL error code = -104; Token unknown - line 1, column
> 64; 21 [SQLState:42000, ISC error code:335544634]

What is the method call you used to receive this error? Maybe we can
infer something from the actual call?

Mark
--
Mark Rotteveel

Jonathan Boarman

unread,
Apr 20, 2023, 3:33:13 PM4/20/23
to firebird-java
You guys have been super helpful, and it has paid off!  We traced the issue and have a fix, detailed below...

So, as suggested by the group there, we went ahead and set up a local instance of FB Server and used fbtracemgr to log what was going on.  And it turns out that Spark automatically limits the rows that come back and uses a paging mechanism to optimize and parallelize query execution, by appending {LIMIT <row> [OFFSET <offset>]} to all queries.

Since Spark works with other dialects like SQL Server, I looked into their JDBC driver to see how it handled "LIMIT" queries.  It turns out that it translates those into "TOP" statements within the JDBC driver as apparently Spark is not the only JDBC consumer that sends LIMIT queries.

For reference, here's their JDBC implementation with the regex expressions etc that they use for translating LIMIT to TOP: 

---------------------------------
query = """
(select rdb$get_context('SYSTEM', 'ENGINE_VERSION') as version
     , rdb$character_set_name
from rdb$database) AS a;    --       <=== ADD COMMENT TO BLOCK APPENDED "LIMIT" STATEMENT
"""

# Execute the query and create a DataFrame
result = spark.read.jdbc(url=jdbc_url, table=query, properties=jdbc_properties)
result.show()
---------------------------------
+-------+----------------------+ |VERSION|RDB$CHARACTER_SET_NAME| +-------+----------------------+ | 2.5.6| NONE ...| +-------+----------------------+

Thanks for all the help sorting this out!   :)

Mark Rotteveel

unread,
Apr 21, 2023, 5:06:02 AM4/21/23
to firebi...@googlegroups.com
On 20-04-2023 21:33, 'Jonathan Boarman' via firebird-java wrote:
> So, as suggested by the group there, we went ahead and set up a local
> instance of FB Server and used *fbtracemgr* to log what was going on.
> And it turns out that Spark automatically limits the rows that come back
> and uses a paging mechanism to optimize and parallelize query execution,
> by appending {LIMIT <row> [OFFSET <offset>]} to all queries.
>
> Since Spark works with other dialects like SQL Server, I looked into
> their JDBC driver to see how it handled "*LIMIT*" queries.  It turns out
> that it translates those into "*TOP*" statements within the JDBC driver
> as apparently Spark is not the only JDBC consumer that sends LIMIT queries.

It is a JDBC escape, defined by the JDBC specification, and Jaybird has
support for it by translating it to the ROWS clause[1]. If it doesn't
work for you, then either it wasn't in the syntactically right place for
Firebird (Jaybird does a pretty 'dumb' in-place replacement right now),
though then I would expect an error about ROWS, and not about the
number, so maybe there is some kind of bug in how the parser replaces it.

> For reference, here's their JDBC implementation with the regex
> expressions etc that they use for translating LIMIT to TOP:
> https://github.com/microsoft/mssql-jdbc/blob/be252dc53689538798b79eac00cf66be5bdfdeab/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerStatement.java#L2659

Jaybird has support for this:
-
https://github.com/FirebirdSQL/jaybird/blob/master/src/main/org/firebirdsql/jdbc/escape/FBEscapedParser.java#L259
-
https://github.com/FirebirdSQL/jaybird/blob/master/src/main/org/firebirdsql/jdbc/escape/FBEscapedParser.java#L366

So, if it doesn't work, what was the actual query sent to Firebird as
you saw in the trace? Maybe the replacement Jaybird does has a bug in it.

[..]

> Thanks for all the help sorting this out!   :)

You're welcome.

Mark

[1]:
https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-dml-select-rows

--
Mark Rotteveel

Jonathan Boarman

unread,
Apr 21, 2023, 9:53:45 AM4/21/23
to firebi...@googlegroups.com
Mark,

Below is the query exposed from the trace log.  If you need me to run any specific additional queries, I’m happy to share more result samples as needed.


2023-04-20T16:42:10.3320 (13:0x7f4cc24a69c0) START_TRANSACTION
        employee.fdb (ATT_60, SYSDBA:NONE, NONE, TCPv4:172.17.0.1/45224)
                (TRA_191, READ_COMMITTED | READ_CONSISTENCY | WAIT | READ_WRITE)

2023-04-20T16:42:10.4400 (13:0x7f4cc24a69c0) FAILED PREPARE_STATEMENT
        employee.fdb (ATT_60, SYSDBA:NONE, NONE, TCPv4:172.17.0.1/45224)
                (TRA_191, READ_COMMITTED | READ_CONSISTENCY | WAIT | READ_WRITE)

-------------------------------------------------------------------------------
SELECT  "EMP_NO","FIRST_NAME","LAST_NAME","PHONE_EXT","HIRE_DATE","DEPT_NO","JOB_CODE","JOB_GRADE","JOB_COUNTRY","SALARY","FULL_NAME" FROM employee     LIMIT 10001
      0 ms

2023-04-20T16:42:10.4400 (13:0x7f4cc24a69c0) ERROR AT JStatement::prepare
        employee.fdb (ATT_60, SYSDBA:NONE, NONE, TCPv4:172.17.0.1/45224)
335544569 : Dynamic SQL Error
335544436 : SQL error code = -104
335544634 : Token unknown - line 1, column 159
335544382 : 10001




On Apr 21, 2023, at 4:06 AM, Mark Rotteveel <ma...@lawinegevaar.nl> wrote:

Mark Rotteveel

unread,
Apr 21, 2023, 1:14:01 PM4/21/23
to firebi...@googlegroups.com
On 21-04-2023 15:53, 'Jonathan Boarman' via firebird-java wrote:
> Below is the query exposed from the trace log.  If you need me to run
> any specific additional queries, I’m happy to share more result samples
> as needed.
>
>
> 2023-04-20T16:42:10.3320 (13:0x7f4cc24a69c0) START_TRANSACTION
>         employee.fdb (ATT_60, SYSDBA:NONE, NONE, TCPv4:172.17.0.1/45224)
>                 (TRA_191, READ_COMMITTED | READ_CONSISTENCY | WAIT |
> READ_WRITE)
>
> 2023-04-20T16:42:10.4400 (13:0x7f4cc24a69c0) FAILED PREPARE_STATEMENT
>         employee.fdb (ATT_60, SYSDBA:NONE, NONE, TCPv4:172.17.0.1/45224)
>                 (TRA_191, READ_COMMITTED | READ_CONSISTENCY | WAIT |
> READ_WRITE)
>
> -------------------------------------------------------------------------------
> SELECT
>  "EMP_NO","FIRST_NAME","LAST_NAME","PHONE_EXT","HIRE_DATE","DEPT_NO","JOB_CODE","JOB_GRADE","JOB_COUNTRY","SALARY","FULL_NAME" FROM employee     LIMIT 10001
>       0 ms
>
> 2023-04-20T16:42:10.4400 (13:0x7f4cc24a69c0) ERROR AT JStatement::prepare
>         employee.fdb (ATT_60, SYSDBA:NONE, NONE, TCPv4:172.17.0.1/45224)
> 335544569 : Dynamic SQL Error
> 335544436 : SQL error code = -104
> 335544634 : Token unknown - line 1, column 159
> 335544382 : 10001

Thanks. That suggests it didn't use the JDBC limit escape (i.e. {LIMIT
10001}, which would have been translated to ROWS 10001, but instead put
in a bare LIMIT 10001 (no `{` or `}` around it). Or there is a bug in
how Jaybird parses this, but right now I can't see how (and it is
covered by a test).

I'll see if I can reproduce this myself with Spark, maybe it does some
kind of metadata check or something.

Mark
--
Mark Rotteveel

Arioch The

unread,
Apr 21, 2023, 5:10:25 PM4/21/23
to firebi...@googlegroups.com
Glad you could make some progress. 



  [FBManagedConnection]WARNING: No connection character set specified (property lc_ctype, encoding, charSet or localEncoding), defaulting to character set NONE

While indeed unrelated, I think it would be wise to fix it. 

Frankly, I consider this bad enough and easy to fix enough that I think Jaybird better absolutely prohibit it. I even managed to convince Mark once, but he later reverted it. 

(( That said, maybe Jaybird could default to UTF-8 or something, anything is better than NONE...)) 


Jonathan Boarman

unread,
Apr 21, 2023, 5:16:46 PM4/21/23
to firebi...@googlegroups.com
I'd vote +1 on UTF-8 by default.



--
You received this message because you are subscribed to the Google Groups "firebird-java" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-jav...@googlegroups.com.

Arioch The

unread,
Apr 21, 2023, 5:30:46 PM4/21/23
to firebi...@googlegroups.com
This would "kill" legacy app, which being uninformed used NONE and stored text implicitly (undocumented) encoded in UTF-16 instead.

Mark seems to be hesitant to bring a temporary axe to them...

...but not doing so ensures the vicious circle rotating infinitely, sadly. 

No one reads warnings unless facing some show stopper. 

сб, 22 апр. 2023 г., 00:16 'Jonathan Boarman' via firebird-java <firebi...@googlegroups.com>:

Jonathan Boarman

unread,
Apr 21, 2023, 6:14:26 PM4/21/23
to firebi...@googlegroups.com
Perhaps the UTF-8 could be one of those breaking changes in v6.

Mark Rotteveel

unread,
Apr 22, 2023, 5:31:50 AM4/22/23
to firebi...@googlegroups.com
On 22-04-2023 00:13, 'Jonathan Boarman' via firebird-java wrote:
> Perhaps the UTF-8 could be one of those breaking changes in v6.

Given using UTF8 reduces the maximum size of CHAR and VARCHAR by 3/4,
and the fact changing the default is a great way to break or logically
corrupt existing database using columns with character set NONE, that is
not going to happen. And as long as you use explicit character sets for
your columns, NONE generally works fine, and will allow you to mix
character sets. With connection character set NONE, Jaybird will only
use the JVM default character set when the column character set is also
NONE.

In Jaybird 3, not specifying an explicit character set was not allowed
in 3.0.0 and 3.0.1. My original idea was to then switch to UTF8 as the
default in Jaybird 4 or 5 (so people relying on the default of NONE had
time to switch to explicitly specifying it as the connection character).

However, given my experience (e.g. the questions/complaints) when using
no explicit character set was not allowed with version 3.0.0 and 3.0.1,
a lot of users don't read release notes (nor do they read error
messages; or I'm bad at writing them). Hence the change in Jaybird 3.0.2
to only register a SQLWarning on the connection and log a warning. I
might be a fan of breaking changes, but introducing such an extreme
breaking change like switching to UTF8 (or any other character set) as a
default connection character set is simply not going to happen,
especially given the risks of logical corruption of databases of people
who upgrade to major versions without reading release notes.

I even removed the warning in Jaybird 5, because indeed people either
ignore warnings, or they even complain about warnings instead of fixing
the cause of the warning.

TL;DR: NONE will remain the default.

Mark
--
Mark Rotteveel

Arioch The

unread,
Apr 22, 2023, 6:54:06 AM4/22/23
to firebi...@googlegroups.com


> And as long as you use explicit character sets for
your columns, NONE generally works fine, and will allow you to mix
character sets.

Next to never a case. 

It would take a very charset-aware developer (1) and a very specific task needing different charset for different columns (2). 

And (1) PPL would not have troubles explicitly specifying one. 

In Jaybird 3, not specifying an explicit character set was not allowed
in 3.0.0 and 3.0.1.

And that was breaking the vicious circle, while allowing an easy opt-in to old behavior. 

My original idea was to then switch to UTF8 as the
default in Jaybird 4 or 5 (so people relying on the default of NONE had
time to switch to explicitly specifying it as the connection character).

Sorry but how adding one single parameter (namely, charset=NONE) to connection string would require "having time"??? 

 (nor do they read error
messages;

Then they won't read warning as well. 

I even removed the warning in Jaybird 5, because indeed people either
ignore warnings, or they even complain about warnings instead of fixing
the cause of the warning.

TL;DR: NONE will remain the default.

Which means you promote spreading this disease. 
Databases contract this from bad applications, then new applications having to work with those DBs have to follow this viral suite, and the circle goes on. 

Or, from another angle, for the ease of  ignorant and even arrogant developers you left the danger in they way of developers would could and would learn.

This also makes the amount of bad code and bad advices in internet snowballing. And then new devs just Google for connection string and copy-paste it from random forums without giving a thought, and this is another vicious loop. 

Frankly, I think "for greater justice" it would better be the opposite. Make the thoughtful ones learn about the trap before it hits them, and "haters gonna hate" anyway. 

And would house haters whine publicly all those years - the Google to this year would probably be full of "Jaybird writes gibberish about charset, what should I do?" articles already. 

But then, I already told you all those reasoning before 3.0.0 and it was not well enough... 

Mark Rotteveel

unread,
Apr 22, 2023, 6:59:29 AM4/22/23
to firebi...@googlegroups.com
On 22-04-2023 12:53, Arioch The wrote:
> My original idea was to then switch to UTF8 as the
> default in Jaybird 4 or 5 (so people relying on the default of NONE had
> time to switch to explicitly specifying it as the connection character).
>
>
> Sorry but how adding one single parameter (namely, charset=NONE) to
> connection string would require "having time"???

Having time to switch over so they wouldn't be bitten by the next change
to change the default.

>  (nor do they read error
> messages;
>
>
> Then they won't read warning as well.

Which is why I removed it in Jaybird 5.

[..]
> But then, I already told you all those reasoning before 3.0.0 and it was
> not well enough...

I'm not going to rehash this again. My decisions stands.

Mark
--
Mark Rotteveel

Mark Rotteveel

unread,
Apr 22, 2023, 8:03:41 AM4/22/23
to firebi...@googlegroups.com
On 21-04-2023 19:13, Mark Rotteveel wrote:
> I'll see if I can reproduce this myself with Spark, maybe it does some
> kind of metadata check or something.

Using pyspark-3.4.0, the following program works fine for me:

```
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf() # create the configuration
conf.set("spark.driver.extraClassPath",
"D:\\Development\\libs\\jaybird-5.0.1.java11\\jaybird-5.0.1.java11.jar")

spark = SparkSession.builder \
.config(conf=conf) \
.master("local") \
.appName("Python Spark SQL basic example") \
.getOrCreate()

jdbc_url = 'jdbc:firebird://localhost/employee'
jdbc_properties = {
'user': 'sysdba',
'password': 'masterkey',
'driver': 'org.firebirdsql.jdbc.FBDriver'
}

result = spark.read.jdbc(url=jdbc_url, table="employee",
properties=jdbc_properties)
result.show()
```

And the trace doesn't include a LIMIT:

```
2023-04-22T13:30:20.4630 (2560:0000000007B104C0) PREPARE_STATEMENT
employee (ATT_193, SYSDBA:NONE, NONE, TCPv4:127.0.0.1/50943)
(TRA_338, READ_COMMITTED | READ_CONSISTENCY | WAIT | READ_WRITE)

Statement 30:
-------------------------------------------------------------------------------
SELECT
"EMP_NO","FIRST_NAME","LAST_NAME","PHONE_EXT","HIRE_DATE","DEPT_NO","JOB_CODE","JOB_GRADE","JOB_COUNTRY","SALARY","FULL_NAME"
FROM employee
1 ms
```

The documentation of Spark suggest there is a config option called
`pushDownLimit` to enable or disable this (default is false), but
enabling that didn't change anything. I also tried adding `.limit(2)` on
`result`, but that didn't add a limit either.

So, I'm a bit at a loss how to reproduce this...

Mark
--
Mark Rotteveel

Mark Rotteveel

unread,
Apr 22, 2023, 8:53:16 AM4/22/23
to firebi...@googlegroups.com
On 22-04-2023 14:03, Mark Rotteveel wrote:
> On 21-04-2023 19:13, Mark Rotteveel wrote:
>> I'll see if I can reproduce this myself with Spark, maybe it does some
>> kind of metadata check or something.
> And the trace doesn't include a LIMIT:
>
> ```
> 2023-04-22T13:30:20.4630 (2560:0000000007B104C0) PREPARE_STATEMENT
>     employee (ATT_193, SYSDBA:NONE, NONE, TCPv4:127.0.0.1/50943)
>         (TRA_338, READ_COMMITTED | READ_CONSISTENCY | WAIT | READ_WRITE)
>
> Statement 30:
> -------------------------------------------------------------------------------
> SELECT
> "EMP_NO","FIRST_NAME","LAST_NAME","PHONE_EXT","HIRE_DATE","DEPT_NO","JOB_CODE","JOB_GRADE","JOB_COUNTRY","SALARY","FULL_NAME" FROM employee
>       1 ms
> ```
>
> The documentation of Spark suggest there is a config option called
> `pushDownLimit` to enable or disable this (default is false), but
> enabling that didn't change anything. I also tried adding `.limit(2)` on
> `result`, but that didn't add a limit either.

I have looked at the Spark code, and the current code in master doesn't
actually use JDBC escapes (i.e. {LIMIT <n> [OFFSET <m>]}, it simply uses
LIMIT <n> [OFFSET <m>] (so not surrounded by braces). The reason it
works for SQL Server is that Spark has a dialect for SQL Server. In
other words, Jaybird doesn't get the chance to apply any transformation
for this.

However, it seems they recently changed part of this behaviour, by
disabling it for unknown JDBC drivers, which explains why I can't
reproduce it.

So, using pySpark 3.4.0 should solve it for you. It is possible that
there is a DataframeReader option to disable it (pushDownLimit), but
this was only introduced in 3.3.0. An alternative solution would be to
create a custom dialect for Firebird which either returns false for
supportsLimit or returns the right code fragment (e.g. ROWS <n> or ROWS
<m> TO <m> + <n>, or the SQL Standard OFFSET/FETCH clause introduced in
Firebird 3.0).

Mark
--
Mark Rotteveel

Reply all
Reply to author
Forward
0 new messages