Need to automate sqoop import Job

1,162 views
Skip to first unread message

Muthu Pandi

unread,
Apr 10, 2014, 6:59:29 AM4/10/14
to chenn...@googlegroups.com
Hi All 

This is my first post in this Group

Am in a POC which gets table value from a Oracle table using sqoop and import it to hive table, there i need to ran few queries and save the output as CSV file.

I had done all the things in command like sqoop import,hive query etc.,

now i need to automate this kindly guide me to achieve this

1. Can i pack all them as Shell Script and automate it using cron job 
2. Take all concept from what i have achieved and try to implement using java 

If i use shell script will it be that efficiant

These are the import commands i have used 

1. sqoop import -m 1 --connect jdbc:oracle:thin:@<ipaddress>:1521/<DB> --username username --password password --table <tablename> --columns column1 ,column2,column3,column4,column5,column6,column7 --hive-import --hive-overwrite  --hive-table default.oracreport --lines-terminated-by '\n' --fields-terminated-by ',' --target-dir /user/hdfs/
2. hive -S -e " Hive query"

Amudhan K

unread,
Apr 10, 2014, 7:16:51 AM4/10/14
to chenn...@googlegroups.com
Hello Muthu Pandi,

         To automate this action, you are configure the entire above thing onto Apache Oozie, the workflow scheduler for Hadoop. Write a workflow.xml, which should run the sqoop job first to import the data from oracle and then after that the hive script to run the queries. And Configure that workflow using Coordinator to make it automated as per your requirements.

Check out this below link, for reference.




--
You received this message because you are subscribed to the Google Groups "Hadoop Users Group (HUG) Chennai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chennaihug+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thank You,
Amudhan K
Data Engineer at DataDotz

Muthu Pandi

unread,
Apr 10, 2014, 7:41:48 AM4/10/14
to chenn...@googlegroups.com
Thankyou amudhan.karuna

Am now installing oozie and will try to create the job and execute it. Will let you know once its completed.

Muthu Pandi

unread,
Apr 19, 2014, 2:00:14 AM4/19/14
to chenn...@googlegroups.com
I had able to write the Workflow and coordinator jobs to schedule my job and had faced so many issues while setting up Oozie and write jobs in oozie. I had some problems and working on for the same.


On Thursday, April 10, 2014 4:29:29 PM UTC+5:30, Muthu Pandi wrote:

Senthil Kumar

unread,
Apr 19, 2014, 7:13:21 PM4/19/14
to chenn...@googlegroups.com
HI Muthu Pandi
Can you post the issues you faced  or still facing? It will be helpful for the others

Thanks
Senthil

Muthu Pandi

unread,
Apr 21, 2014, 3:52:52 AM4/21/14
to chenn...@googlegroups.com
Yeah definitely,

Issue #1

    Faced issues while building Oozie for our distribution (hadoop 1.2.1 Hive 0.12 sqoop 1.4.3) need to edit pom.xml with the version to built

Issue #2

    Configuring workflow.xml for a job since it takes more time for research about framing the job and including the necessary files for that particular job etc..,

Issue #3

    Sqoop and hive need to be installed in all datanodes as the job submitted to different node where sqoop and hive is not present are failed.

Issue #4

     Building Coordinator job takes precise in time adjustment and need to properly set the Timezone in oozie-site.xml


It took for me to complete the job  (including workflow and co-ordinator ) about 5 days!!!!

On Thursday, April 10, 2014 4:29:29 PM UTC+5:30, Muthu Pandi wrote:
Reply all
Reply to author
Forward
0 new messages