Scott - I tried to post a SOLVED followup to the JDBC list but it was rejected :-!
I now have the opposite problem of getting rid of the cursor :-) ResultSet.close() does not work. I am trying to do a DROP TABLE from the other Connection, to whack the table I just finished the ETL on, but it just hangs indefintiely, and pg_locks shows the shared read lock still sitting there.
I am trying a Statement.close() and Connection.close() now, but I fear I may have to do something slightly ugly, as I have Apache DBCP sitting in between me and the actual PG JDBC driver.
I am hoping the slightly ugly thing is only closing the underlying connection, and does not have to be /etc/init.d/postgresql8.3 restart :-) Is there a backdoor way to forcibly get rid of a lock you don't need any more?
Cheers
DaveOn Mon, Apr 19, 2010 at 1:05 PM, Scott Carey <sc...@richrelevance.com> wrote:
On Apr 15, 2010, at 1:01 PM, Dave Crooke wrote:For scrolling large result sets you have to do the following to prevent it from loading the whole thing into memory:
> On Thu, Apr 15, 2010 at 2:42 PM, Dave Crooke <dcr...@gmail.com> wrote:
> Hey folks
>
> I am trying to do a full table scan on a large table from Java, using a straightforward "select * from foo". I've run into these problems:
>
> 1. By default, the PG JDBC driver attempts to suck the entire result set into RAM, resulting in java.lang.OutOfMemoryError ... this is not cool, in fact I consider it a serious bug (even MySQL gets this right ;-) I am only testing with a 9GB result set, but production needs to scale to 200GB or more, so throwing hardware at is is not feasible.
>
Use forward-only, read-only result scrolling and set the fetch size. Some of these may be the default depending on what the connection pool is doing, but if set otherwise it may cause the whole result set to load into memory. I regularly read several GB result sets with ~10K fetch size batches.
Something like:
Statement st = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY)
st.setFetchSize(FETCH_SIZE);
That's what I''m using, albeit without any args to createStatement, and it now works.
I have no idea what that is.
> 2. I tried using the official taming method, namely java.sql.Statement.setFetchSize(1000) and this makes it blow up entirely with an error I have no context for, as follows (the number C_10 varies, e.g. C_12 last time) ...
>
> org.postgresql.util.PSQLException: ERROR: portal "C_10" does not exist
> at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
> at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
> at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:1527)
> at org.postgresql.jdbc2.AbstractJdbc2ResultSet.next(AbstractJdbc2ResultSet.java:1843)
>
> This is definitely a bug :-)
>
>
It was because I was also writing to the same Connection ... when you call Connection.commit() with the PG JDBC driver, it also kills all your open cursors.
I think this is a side effect of the PG internal design where it does MVCC within a table (rows have multiple versions with min and max transaction ids) ... even a query in PG has a notional virtual transaction ID, whereas in e.g. Oracle, a query has a start time and visibility horizon, and as long as you have enough undo tablespace, it has an existence which is totally independent of any transactions going on around it even on the same JDBC connection.
--
Sent via pgsql-jdbc mailing list (pgsql...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc
On Mon, 19 Apr 2010, Dave Crooke wrote:
> Statement.close() appears to get the job done (in my envrionment, PG's
> driver never sees a Connection.close() because of DBCP).
>
> I'd consider the fact that ResultSet.close() does not release the implicit
> cursor to be something of a bug, but it may well have been fixed already.
PG doesn't release the locks acquired by the query until transaction end.
So closing a cursor will release some backend memory, but it won't release
the locks. The way the driver implements ResultSet.close() is to put
the close message into a queue so that the next time a message is sent to
the backend we'll also send the cursor close message. This avoids an
extra network roundtrip for the close action.
In any case Statement.close isn't helping you here either. It's really
Connection.commit/rollback that's releasing the locks.
Kris Jurka
I wouldn't hold MySQL up to be a particularly good implmentation of anything, other than speed (MyISAM) and usability (the CLI) .... I find Oracle's JDBC implmentation to be both user friendly and (largely) standards compliant.
> AFAICT from the Java end, ResultSet.close() is supposed to beFor that ResultSet. That doesn't mean a ResultSet defines a cursor.
> final.
Such methods as setCursorName, setFetchSize, and setFetchDirection
are associated with a Statement. Think of the ResultSet as the
result of a cursor *scan* generated by opening the cursor defined by
the Statement.
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#close%28%29
Notice that the ResultSet is automatically closed if the Statement
that generated it is re-executed. That is very much consistent with
Statement as the equivalent of a cursor, and not very consistent
with a ResultSet as the equivalent of a cursor.
Again, you're talking about the *results* from *opening* the cursor.
Are you talking about treating the Statement object as representing
> At a pragmatic level, the PGSQL JDBC driver has a lot of odd
> behaviours which, while they may or may not be in strict
> compliance with the letter of the standard, are very different
> from any other mainstream database that I have accessed from Java
> .... what I'd consider as normative behaviour, using regular JDBC
> calls without the need to jump through all these weird hoops, is
> exhibited by all of the following: Oracle, SQL Server, DB2, MySQL,
> Apache Derby and JET (MS-Access file-based back end, the .mdb
> format)
a cursor and the ResultSet representing the results from opening
the cursor, or are you thinking of something else here?
Huh. I dropped PostgreSQL into an environment with hundreds of
databases, and the applications pretty much "just worked" for us.
Of course, we were careful to write to the SQL standard and the JDBC
API, not to some other product's implementation of them.
(1) Needing to setFetchSize to avoid materializing the entire
result set in RAM on the client.
(2) Fixing a race condition in our software which was benign in
other products, but clearly my own bug.
Just out of curiosity, did you discuss that on the PostgreSQL lists?
Can you reference the thread(s)?
Sure, but I would consider it a step away from that to follow
MySQL's interpretation of cursors rather than the standard's.
YMMV, of course.
On Tue, 20 Apr 2010, Dave Crooke wrote:
> a. Make setFetchSize(10000) the default
The reason this is not done is that the mechanism used for fetching a
piece of the results at a time can change the query plan used if using a
PreparedStatement. There are three ways to plan a PreparedStatement:
a) Using the exact parameter values by substituting them directly into the
query. This isn't really "planned" as you can't re-use it at all. This
is only available using the V2 protocol.
b) Using the parameter values for statistics, but not making any stronger
guarantees about them. So the parameters will be used for evaluating the
selectivity, but not to perform other optimizations like
contraint_exclusion or transforming a LIKE operation to a range query.
This is the default plan type the JDBC driver uses.
c) Planning the query with no regard for the parameters passed to it.
This is the plan type the JDBC driver uses when it sees the same
PreparedStatement being re-used multiple times or when it is respecting
setFetchSize and allowing for partial results.
We must use (c) for partial results instead of (b) because of some
limitations of the server. Currently you cannot have two statements of
type (b) open on the same connection. So since the driver can't know if
the user will issue another query before fetching the remainder of the
first query's results, it must setup the first query to be of type (c) so
that multiple statements can exist simultaneously.
Switching the default plan type to (c) will cause a significant number of
complaints as performance on some queries will go into the tank. Perhaps
we could have a default fetchSize for plain Statements as it won't affect
the plan. I could also see making this a URL parameter though so it could
be set as the default with only a configuration, not a code change.
> b. If someone does call rs.close() before the end of the ResultSet, and has
> not created an explicit cursor at the JDBC level, flag the query / lock /
> virtual transaction in some way in the JDBC driver that tells it that it can
> just dump the cursor on a subsequent stmt.close(), conn.commit() or
> conn.close() call without sucking down the rest of the data.
This is already true. The JDBC driver only asks the server for more of
the ResultSet when a next() call requires it. So the server isn't
constantly spewing out rows that the driver must deal with, the driver
only gets the rows it asks for. Once the ResultSet is closed, it won't
ask for any more.
Kris Jurka
Hmm. I didn't think this was possible. How are you doing this?
> c) Planning the query with no regard for the parameters passed to it. This
> is the plan type the JDBC driver uses when it sees the same
> PreparedStatement being re-used multiple times or when it is respecting
> setFetchSize and allowing for partial results.
>
> We must use (c) for partial results instead of (b) because of some
> limitations of the server. Currently you cannot have two statements of type
> (b) open on the same connection. So since the driver can't know if the user
> will issue another query before fetching the remainder of the first query's
> results, it must setup the first query to be of type (c) so that multiple
> statements can exist simultaneously.
>
> Switching the default plan type to (c) will cause a significant number of
> complaints as performance on some queries will go into the tank. Perhaps we
> could have a default fetchSize for plain Statements as it won't affect the
> plan. I could also see making this a URL parameter though so it could be
> set as the default with only a configuration, not a code change.
...Robert
On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <bo...@ejurka.com> wrote:Hmm. I didn't think this was possible. How are you doing this?
> The reason this is not done is that the mechanism used for fetching a piece
> of the results at a time can change the query plan used if using a
> PreparedStatement. There are three ways to plan a PreparedStatement:
>
> a) Using the exact parameter values by substituting them directly into the
> query. This isn't really "planned" as you can't re-use it at all. This is
> only available using the V2 protocol.
>
> b) Using the parameter values for statistics, but not making any stronger
> guarantees about them. So the parameters will be used for evaluating the
> selectivity, but not to perform other optimizations like contraint_exclusion
> or transforming a LIKE operation to a range query. This is the default plan
> type the JDBC driver uses.
FWIW, I think there is some consensus to experiment (in the 9.1 cycle)
with making the server automatically try replanning of parameterized
queries with the actual parameter values substituted. It'll keep doing
so if it finds that that produces a significantly better plan than the
generic parameterized plan; which is what you'd expect if there's a
chance to optimize a LIKE search, eliminate partitions, etc.
regards, tom lane
>> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <bo...@ejurka.com> wrote:>>> ... There are three ways to plan a PreparedStatement:
FWIW, I think there is some consensus to experiment (in the 9.1 cycle)
with making the server automatically try replanning of parameterized
queries with the actual parameter values substituted. It'll keep doing
so if it finds that that produces a significantly better plan than the
generic parameterized plan; which is what you'd expect if there's a
chance to optimize a LIKE search, eliminate partitions, etc.
regards, tom lane
On Wed, 21 Apr 2010, Robert Haas wrote:
> On Tue, Apr 20, 2010 at 5:05 PM, Kris Jurka <bo...@ejurka.com> wrote:
>>
>> b) Using the parameter values for statistics, but not making any stronger
>> guarantees about them. So the parameters will be used for evaluating the
>> selectivity, but not to perform other optimizations like contraint_exclusion
>> or transforming a LIKE operation to a range query. This is the default plan
>> type the JDBC driver uses.
>
> Hmm. I didn't think this was possible. How are you doing this?
This is only possible at the protocol level, it's not available using SQL
commands only. You do this by creating an unnamed instead of a named
statement:
http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY
Query planning for named prepared-statement objects occurs when
the Parse message is processed. If a query will be repeatedly
executed with different parameters, it might be beneficial to send
a single Parse message containing a parameterized query, followed
by multiple Bind and Execute messages. This will avoid replanning
the query on each execution.
The unnamed prepared statement is likewise planned during Parse
processing if the Parse message defines no parameters. But if
there are parameters, query planning occurs during Bind processing
instead. This allows the planner to make use of the actual values
of the parameters provided in the Bind message when planning the
query.
Kris Jurka
On Wed, 21 Apr 2010, Nikolas Everett wrote:
> More to the point is there some option that can shift you into method a?
> I'm thinking of warehousing type applications where you want to re-plan
> a good portion of your queries.
>
This can be done by connecting to the database using the V2 protocol (use
URL option protocolVersion=2). This does remove some functionality of
the driver that is only available for V3 protocol, but will work just
fine for query execution.
Kris Jurka