Recommended use of @FetchSize for streaming rows

2,062 views
Skip to first unread message

Matthias

unread,
May 26, 2017, 4:08:43 AM5/26/17
to jDBI
Hi,

while streaming data from a very large table (SELECT * FROM myTable), I found that the ResultSet accumulates all rows desite @FetchSize annotation and finally causes an OutOfMemoryException.

JDBI method signature:

  @SqlQuery("SELECT * FROM sound_samples ORDER BY campaign_id, start")
  @Mapper(SoundRecordMapper.class)
  @FetchSize(6)
  Iterator<SoundRecord> getAllRows();


Is there anything else besides setting @FetchSize to a low value to prevent the JDBC driver from fetching all rows?

- Matthias


JDBC version: 2.78
Database: Postgresql 9.6
JDBC Driver: 42.0.0




Matthew Hall

unread,
Jun 7, 2017, 6:15:29 PM6/7/17
to jd...@googlegroups.com
Hi Matthias, my apologies for the late response.

Are you still experiencing this issue?

--
You received this message because you are subscribed to the Google Groups "jDBI" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jdbi+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias

unread,
Jun 9, 2017, 4:19:21 PM6/9/17
to jDBI
Yes, I still don't have a solution for this.

Matthew Hall

unread,
Jun 9, 2017, 8:19:11 PM6/9/17
to jd...@googlegroups.com
I think this might be the Postgres driver:


I'm not sure what's the reasoning for that behavior, but try disabling auto-commit on your connection prior to running your query:

handle.getConnection().setAutoCommit(false);

-Matt

On Fri, Jun 9, 2017 at 2:19 PM, Matthias <fronten...@gmail.com> wrote:
Yes, I still don't have a solution for this.

--

Matthew Hall

unread,
Jun 9, 2017, 8:24:10 PM6/9/17
to jd...@googlegroups.com

Matthias

unread,
Jun 10, 2017, 3:52:42 AM6/10/17
to jDBI
Thanks for the input, Matthew.
I've been experimenting with several approaches during the last days. Obtaining a handle from jDBI and setting autoCommit=false resolves the original problem but introduces new issues: Other queries in the application no longer execute as expected and I suspect that autoCommit=false affects larger parts of the underlying connection pool. Hence I am currently considering multiple connection pools (one with autoCommit=true, the other with autoCommit=false). - But I am not yet sure about the outcome.

Christopher Currie

unread,
Jun 10, 2017, 4:11:43 PM6/10/17
to jDBI
The reason that other queries are affected is that they are probably written implicitly assuming "autoCommit = true"; when you set autoCommit to false, you really do need to call commit on the underlying JDBC connection after each statement, if you want the same behavior. If you're using a connection pool, then changing the autoCommit setting persists after the connection is returned to the pool, which affects other threads that attempt to reuse the connection. So, when changing the autoCommit setting, you either need to not use connection pooling (not recommended), or you need to save the current autoCommit value, and restore it when you're done.

A technique that I've found useful is to put the code that needs to use @FetchSize in a transaction. Under the hood, JDBI will set autoCommit to false and then restore the original setting for you. This does complicate your code path, and if you're using JDBI v2 and the SqlObject API, you have to be aware that you cannot put @Transaction on the same method as @SqlQuery, as the former will be ignored. So the final form of your SqlObject would probably need to look something like:

// JDBI v2 *ONLY*
public abstract class SoundRecordDao {

  @SqlQuery("SELECT * FROM sound_samples ORDER BY campaign_id, start") 
  @Mapper(SoundRecordMapper.class) 
  @FetchSize(6) 
  protected abstract ResultIterator<SoundRecord> getAllRowsInternal();

  @Transaction
  public ResultIterator<SoundRecord> getAllRows() {
    return this.getAllRowsInternal();
  }

}

One other *very important* caveat to this technique: the returned iterator is a ResultIterator, and MUST BE CLOSED, in order to commit the transaction and release the connection to the pool. try-with-resources is your friend here. You should probably not do any other querying on SqlObjects in this thread, it could possibly block due to the open transaction (this is true regardless of how you set the autoCommit flag), and you *really* should not do any other queries on the current handle, as the database is still using that handle to return results to you.

I haven't used JDBI 3 yet, but I do know it doesn't support abstract class SqlObjects anymore, and I think that it has fixed the @Transaction + @SqlQuery issue, so your solution there may be to simply add @Transaction to your getAllRows() method. The remaining caveats about the closing the Iterator and avoiding potentially blocking situations probably still apply.

HTH,
Christopher



On Sat, Jun 10, 2017 at 12:52 AM Matthias <fronten...@gmail.com> wrote:
Thanks for the input, Matthew.
I've been experimenting with several approaches during the last days. Obtaining a handle from jDBI and setting autoCommit=false resolves the original problem but introduces new issues: Other queries in the application no longer execute as expected and I suspect that autoCommit=false affects larger parts of the underlying connection pool. Hence I am currently considering multiple connection pools (one with autoCommit=true, the other with autoCommit=false). - But I am not yet sure about the outcome.

--

Matthias

unread,
Jun 12, 2017, 9:10:21 AM6/12/17
to jDBI
Thanks Christopher, this is probably what I was looking for. Regarding your comment about CLOSING the ResultIterator - is it really required in this setup? The JavaDoc of ResultIterator says:

"The default implementation of
<code>ResultIterator</code> will automatically close
the result set after the last element has been retrieved via
<code>next()</code> and
<code>hasNext()</code> is called (which will return false). This allows for iteration
over the results with automagic resource cleanup."

My reading is that I have to care about closing if the returned Iterator is _not_ the default implementation. Further down the execution path I need an Iterator (without close semantics) anyway. If there's some magic in jDBI that does the job, I am happy to rely on it.

- Matthias


Matthew Hall

unread,
Jun 12, 2017, 9:46:58 AM6/12/17
to jd...@googlegroups.com
The try-with-resources is a safeguard, in case your iteration code throws an exception for any reason.

Steven Schlansker

unread,
Jun 12, 2017, 2:11:08 PM6/12/17
to jd...@googlegroups.com

> On Jun 9, 2017, at 5:19 PM, Matthew Hall <quali...@gmail.com> wrote:
>
> I think this might be the Postgres driver:
>
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgStatement.java#L372-L375
>
> I'm not sure what's the reasoning for that behavior

In autocommit mode, the transaction is committed before returning to the caller.
streaming rows from a MVCC store like Postgres requires the cursor and therefore transaction to remain open
(otherwise some concurrent actor could modify or delete the row-version out from under you).
So you either buffer the result set and close the cursor + txn, or keep
the txn open and stream the data out, but there is not a middle ground option.


This is covered in a bit more detail in the upstream docs:

https://jdbc.postgresql.org/documentation/94/query.html#query-with-cursor

> , but try disabling auto-commit on your connection prior to running your query:
>
> handle.getConnection().setAutoCommit(false);
>
> -Matt
>
> On Fri, Jun 9, 2017 at 2:19 PM, Matthias <fronten...@gmail.com> wrote:
> Yes, I still don't have a solution for this.
>
> --
> You received this message because you are subscribed to the Google Groups "jDBI" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jdbi+uns...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "jDBI" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jdbi+uns...@googlegroups.com.
signature.asc

Matthias

unread,
Aug 9, 2017, 2:39:50 PM8/9/17
to jDBI
A very late reply: With further testing I found that the ResultIterator delivered by tgetAllRows() will eventually starve. It looks like the transaction is ended when the ResultIterator is returned by the outer method so it cannot retrieve more than @FetchSize elements. I suppose all processing (and forwarding) needs to be done within getAllRows():


  @Transaction
  public void <SoundRecord> getAllRows() {
    doSomething(this.getAllRowsInternal());
  }

Matthias

Christopher Currie

unread,
Aug 9, 2017, 2:48:10 PM8/9/17
to jDBI
Curious, is this with v2, v3, or both?
--
You received this message because you are subscribed to the Google Groups "jDBI" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jdbi+uns...@googlegroups.com.

Matthias

unread,
Aug 11, 2017, 7:28:57 AM8/11/17
to jDBI
I am still working with v2.

Matthias

unread,
Aug 16, 2017, 5:38:17 AM8/16/17
to jDBI
I finally found a solution that works reliably so far. It's a little bit more boilerplate code since I had to manage handles explicitly. The general idea
is to obtain a ResultIterator from the DAO and turn it into a stream. For resource closing (ResultIterator and Handle), the appropriate Runnables
are attached to the Stream.

Thanks for your comments. - They definitely helped me to figure out what's going on under the hood.

- Matthias

interface SoundRecordDAO {


  @SqlQuery("SELECT * FROM sound_samples ORDER BY campaign_id, start")
  @Mapper(SoundRecordMapper.class)
  @FetchSize(6)
  Iterator<SoundRecord> getAllRows();
}

class Streamer {

  private final DBI dbi;

  Streamer(DBI dbi){
    this.dbi = dbi;
  }

  Stream<SoundRecord> (DBI dbi){
    Handle handle = dbi.open();

    // disable auto commit
    try {
      handle.getConnection().setAutoCommit(false);
    } catch (SQLException e) {
      e.printStackTrace();
    }
   
    SoundRecordDAO streamingDAO = handle.attach(SoundRecordDAO.class);
    ResultIterator<SoundRecord> rows = streamingDAO.getAllRows();
    Stream<SoundRecord> stream = toStream(rows);
    stream = stream.onClose(asUncheckedAutoCloseable(handle);
    return stream;
  }

  static Stream<T> toStream (ResultIterator<T> ri){
    return StreamSupport.stream(
           Spliterators.spliteratorUnknownSize(data, ORDERED | DISTINCT | NONNULL | IMMUTABLE), false)
           .onClose(asUncheckedAutoCloseable(closeableResource));
  }

  static Runnable asUncheckedAutoCloseable(AutoCloseable var0) {
    return () -> {
      try {
        var0.close();
      } catch (Exception var2) {
        throw new RuntimeException(var2);
      }
    };
  }

}

Steven Schlansker

unread,
Aug 16, 2017, 12:29:52 PM8/16/17
to jd...@googlegroups.com
>
> On Aug 16, 2017, at 2:38 AM, Matthias <fronten...@gmail.com> wrote:
>
> I finally found a solution that works reliably so far. It's a little bit more boilerplate code since I had to manage handles explicitly. The general idea
> is to obtain a ResultIterator from the DAO and turn it into a stream. For resource closing (ResultIterator and Handle), the appropriate Runnables
> are attached to the Stream.
>
> Thanks for your comments. - They definitely helped me to figure out what's going on under the hood.
>

The to-be-released-someday-soon-maybe jdbi v3 has this built in :)
signature.asc

Matthias

unread,
Aug 16, 2017, 4:31:06 PM8/16/17
to jDBI
Very convenient; I should consider switching.

How does it work in conjunction with @FetchSize and @Transaction for Postgresql backends? From Christopher's answer I undestand that I have to use ResultIterator as the "primary" return type to get cursor-based access and automated disabling / enabling of auto commits.

Steven Schlansker

unread,
Aug 16, 2017, 4:33:36 PM8/16/17
to jd...@googlegroups.com

> On Aug 16, 2017, at 1:31 PM, Matthias <fronten...@gmail.com> wrote:
>
> Very convenient; I should consider switching.
>
> How does it work in conjunction with @FetchSize and @Transaction for Postgresql backends? From Christopher's answer I undestand that I have to use ResultIterator as the "primary" return type to get cursor-based access and automated disabling / enabling of auto commits.

Right. In v3 similar caveats apply -- to truly stream, you still need a @Transaction and @FetchSize, but if you return a "higher level object" like Stream we do the dirty work of converting the result iterator into the higher level type. Critically though, the stream operations still need to execute within the scope of the @Transaction, otherwise the database resources are released before the stream pipeline is run.


signature.asc

Matthias

unread,
Aug 17, 2017, 5:27:17 AM8/17/17
to jd...@googlegroups.com
So if I need to forward the stream to higher appication levels, which
are unaware of the database access (and JDBI transactions), but merely
see a stream of values, I cannot use @Transaction at all?



Steven Schlansker

unread,
Aug 17, 2017, 12:40:41 PM8/17/17
to jd...@googlegroups.com
You can, they just must understand that they now "own" that open transaction.
Closing the Stream should then close the transaction and release the handle.

We recommend try-with-resources wherever possible to make sure this happens.

signature.asc
Reply all
Reply to author
Forward
0 new messages