Import Performance scaling

17 views
Skip to first unread message

Purush

unread,
Jan 22, 2015, 1:30:55 PM1/22/15
to lingua...@googlegroups.com
Hi

Is there a way to scale performance during import? For instance, select query built using row count though we have specified primary key column.

SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( SELECT log_id, client_id, user_id, event_sub_type, event_date FROM audit_log ) a 
WHERE rownum <= 171279990 + 3495527 ) WHERE dbif_rno >= 171279991

Thank You,

Regards,
Purush

John Lavoie

unread,
Jan 22, 2015, 2:04:07 PM1/22/15
to lingua...@googlegroups.com
I'm on Purush's team and we are exploring how we can connect our Cascading based app to legacy systems in order to pull the data from the RDBMS.  We're considering lingual as well as using the jdbc taps directly.  We have a need to push/pull hundreds of millions of records / billions of records on a regular basis.  We are finding that the basic select statements executed by lingual in the RDBMS are not utilizing indexed columns and forcing full table scans for each task that is trying to split the data in parallel.  This is killing performance (literally, the tasks are getting killed due to timeouts) and we have at least one more order of magnitude to go on the data volumes we need to work with.

In the SQL example Purush posted, the log_id column is an indexed primary key (surrogate key integer). But the query is using the built-in rownum function which is not taking advantage of the index and forcing a full table scan for each execution.

Is there anything we are missing that could influence the JDBC tap to utilize the PK index?  We've tested with Sqoop and it performs significantly better because it is able to use the index.

John Lavoie

Andre Kelpe

unread,
Jan 22, 2015, 2:34:50 PM1/22/15
to lingua...@googlegroups.com
Hi,

the performance problem you are seeing has nothing to do with lingual, but with cascading-jdbc. Can you create a new issue in which we discuss this further? https://github.com/Cascading/cascading-jdbc/issues/new 

Thanks!

- Andre

--
You received this message because you are subscribed to the Google Groups "Lingual User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lingual-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages