JDBC Tap out of memory error

81 views
Skip to first unread message

Clive Cox

unread,
Aug 3, 2011, 11:36:10 AM8/3/11
to cascading-user
Hi,

I'm getting an OOM for a JDBC input Tap using Cascading.JDBC module.
Any suggestions for how to solve this?

2011-08-03 15:23:16,889 INFO org.apache.hadoop.mapred.TaskInProgress
(IPC Server handler 13 on 9001): Error from
attempt_201108031501_0005_m_000000_2: java.lang.OutOfMemoryError: Java
heap space
at com.mysql.jdbc.Buffer.getBytes(Buffer.java:198)
at com.mysql.jdbc.Buffer.readLenByteArray(Buffer.java:318)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1310)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2262)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:439)
at
com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1970)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1387)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1727)
at com.mysql.jdbc.Connection.execSQL(Connection.java:3170)
at com.mysql.jdbc.Connection.execSQL(Connection.java:3099)
at com.mysql.jdbc.Statement.executeQuery(Statement.java:1169)
at cascading.jdbc.db.DBInputFormat
$DBRecordReader.<init>(DBInputFormat.java:97)
at
cascading.jdbc.db.DBInputFormat.getRecordReader(DBInputFormat.java:
376)
at cascading.tap.hadoop.MultiInputFormat
$1.operate(MultiInputFormat.java:282)
at cascading.tap.hadoop.MultiInputFormat
$1.operate(MultiInputFormat.java:277)
at cascading.util.Util.retry(Util.java:624)
at
cascading.tap.hadoop.MultiInputFormat.getRecordReader(MultiInputFormat.java:
276)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:222)
at org.apache.hadoop.mapred.TaskTracker
$Child.main(TaskTracker.java:2216)

Thanks,

Clive

Clive Cox

unread,
Aug 3, 2011, 11:57:05 AM8/3/11
to cascading-user
After some investigation, I assume this is because my Hadoop cluster
is too small, so the number of input splits is small and thus each is
taking a very large number of rows which is causing the MySQL query to
run of out memory. I'll try on a larger cluster/more map tasks.

Are there any optimizations in the JDBC query to help this in
Cascading.JDBC. I see its just doing:

//statement.setFetchSize(Integer.MIN_VALUE);
String query = getSelectQuery();
try
{
results = statement.executeQuery( query );
}


Note the commented out code... ?

Chris K Wensel

unread,
Aug 3, 2011, 12:04:33 PM8/3/11
to cascadi...@googlegroups.com
You might look on github for forks, and know the db-migrate tap might be a bit more robust.

otherwise feel free to fork and make work for your environment. its very difficult to offer up one-size fits all integration extensions as every use-case is slightly different, and those differences amplify at scale. so forking and improving for your case is a working model for now. note there are already a few HBase taps.

also search conjars.org, there might be some already pushed/updated jars.

chris

> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support for Cascading

Reply all
Reply to author
Forward
0 new messages