How to import data from Oracle to Hive?

421 views
Skip to first unread message

eagl...@gmail.com

unread,
Sep 12, 2017, 6:07:48 AM9/12/17
to Kylo Community
Hi, All,

I want to try import some tables from my Oracle DB to Hive.

I can use sqoop to do this like this:

import
--connect jdbc:oracle:thin@(description=(address=(protocol=tcp)(host=my-host-ip-address)(port=1521)(load_balance=yes)(failover=yes))(connect_data=(server=dedicated)(server_name=tora11g2)))
--username
   test
--password
   abc123
--table
  foo
--split-by
BUSINESS_DATE
--hive-import

I wonder how can I do this in Kylo.

The document at http://kylo.readthedocs.io/en/v0.8.3/how-to-guides/KyloDatasources.html give me some help.
But I have several questions about this:
1. Do I just need to set the datasource from the 'Data Sources' in Kylo?

2. What's the URL for my case?
    Is it jdbc:oracle:thin@my-host-ip-address:1521

3. Where is the driver?

Thanks

Boying

Ruslans Uralovs

unread,
Sep 12, 2017, 6:15:24 AM9/12/17
to Kylo Community
Hey Boying,

1. Yes, you can just add a data source in Kylo. This will in fact set up the Controller service in Nifi with details that you have provided to Kylo.
2. I'd imagine you can use the full URL, just like you are providing to sqoop "jdbc:oracle:thin@(description=(address=(protocol=tcp)(host=my-host-ip-address)(port=1521)(load_balance=yes)(failover=yes))(connect_data=(server=dedicated)(server_name=tora11g2)))"
3. Kylo doesn't ship with Oracle dirvers. You will need to provide the driver for Oracle and put it into a location readable by Nifi user, e.g. in /opt/nifi/oracle directory

eagl...@gmail.com

unread,
Sep 13, 2017, 4:35:37 AM9/13/17
to Kylo Community
Thanks for your help.
I use the URL: jdbc:oracle:thin@//my-ipaddress/tora11g2 to connect Oracle DB and it works.
Another thing is that the Oracle driver is also required by kylo-services or 'ClassNotFound' execption will be thrown, so I put a copy of ojdbc8.jar to /opt/kylo/kylo-services/lib
If it's a correct operation, maybe we should highlight it at http://kylo.readthedocs.io/en/v0.8.3/how-to-guides/KyloDatasources.html?highlight=oracle

I created a Feed to import a table from Oracle DB to Hive. But after click the Feed in the 'Feeds' page, the browser goes into 'Feed Details' page and the page hangs at 'Loading Feed'
status.  I can't even disable the Feed because I want to delete the Feed and import it again.

I want to delete the Feed manually,  does anyone know how to do that e.g. which table should I drop in mysql and Hive?

Can we move the 'disable' and 'delete' function to the 'Feeds' page? So user can disable and delete the Feed even they see the error like this.

Ruslans Uralovs

unread,
Sep 13, 2017, 8:16:42 AM9/13/17
to Kylo Community
When the Feed Details page hangs, 
  • what do you see in the kylo-services.log?
  • are there any errors reported in the browser console?

And yes, we should highlight adding the driver jar to kylo-services too

eagl...@gmail.com

unread,
Sep 13, 2017, 9:44:51 PM9/13/17
to Kylo Community
I didn't see any error in kylo-services.log and browser console.
I attached the kylo-services.log here (please ignore the lines marked by 'lby' which was added by myself to debug the issue).
kylo-services.log

eagl...@gmail.com

unread,
Sep 14, 2017, 5:50:00 AM9/14/17
to Kylo Community
Further debugging shows that the error is related to NiFi, NiFi reports OOM in the log file.
I increase the max memory from 521M to 1G (defined in the bootstrap.conf,  the 'Feed Details' shows correctly.

But I still can't delete the Feed after disabling it.

Is there any way to manually delete the related data from databases (e.g. mysql and hive)?

Thanks a lot.

eagl...@gmail.com

unread,
Sep 14, 2017, 5:51:22 AM9/14/17
to Kylo Community
The OOM shows following stack trace in the log file:
2017-09-14 17:44:02,040 ERROR [Timer-Driven Process Thread-6] c.t.n.v.core.watermark.LoadHighWaterMark LoadHighWaterMark[id=01a84157-0b38-14f2-9256-7c3a808a2886] LoadHighWaterMark[id=01a84157-0b38-14f2-9256-7c3a808a2886] failed to process due to java.lang.OutOfMemoryError: Java heap space; rolling back session: {}
java.lang.OutOfMemoryError: Java heap space
2017-09-14 17:44:02,049 ERROR [Timer-Driven Process Thread-5] c.t.nifi.v2.ingest.GetTableData GetTableData[id=a3eb6450-1f53-3e1e-713a-e74962477733] GetTableData[id=a3eb6450-1f53-3e1e-713a-e74962477733] failed to process session due to java.lang.OutOfMemoryError: Java heap space: {}
java.lang.OutOfMemoryError: Java heap space
        at java.util.HashSet.<init>(HashSet.java:161)
        at java.util.LinkedHashSet.<init>(LinkedHashSet.java:154)
        at org.apache.nifi.controller.repository.StandardProcessSession$Checkpoint.<init>(StandardProcessSession.java:3211)
        at org.apache.nifi.controller.repository.StandardProcessSession$Checkpoint.<init>(StandardProcessSession.java:3204)
        at org.apache.nifi.controller.repository.StandardProcessSession.<init>(StandardProcessSession.java:150)
        at org.apache.nifi.controller.repository.StandardProcessSessionFactory.createSession(StandardProcessSessionFactory.java:31)
        at org.apache.nifi.controller.repository.StandardProcessSessionFactory.createSession(StandardProcessSessionFactory.java:21)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:25)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120)
        at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
        at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Ruslans Uralovs

unread,
Sep 14, 2017, 9:51:05 AM9/14/17
to Kylo Community
What happens when you try to delete the feed? Any errors in the kylo-services and nifi logs?
Reply all
Reply to author
Forward
0 new messages