Splout in CDH4.5, no chance to run Oozie jobs

54 views
Skip to first unread message

Vadim Kisselmann

unread,
May 12, 2014, 9:38:52 AM5/12/14
to sploutd...@googlegroups.com
Hi :)

We have an running CDH 4.5 cluster with splout.

From console directly on one hadoop machine and in the right folder(with splout.jar) everything works fine:
su hdfs -c 'hadoop jar splout-hadoop-0.2.5-hadoop-mr2.jar generate -tf /user/splout/database-schemas/requests-vs-responses.json -o /user/splout/database-files' &&
su hdfs -c "hadoop jar splout-hadoop-0.2.5-hadoop-mr2.jar deploy --root /user/splout/database-files --tablespaces requests_responses --replication 1 --qnode http://localhost:4412"

So i created a cronjob and it works.

But to run it into Oozie is a pain.
I tried it as a shell action and as a java action.
All the files, which are mentioned as "no such file or directory" are copied multiple times everywhere and are there. So in worst case its only a permission thing.

Java action exceptions (we had a lot. At first some "No such file or directory", later something like here):

Error starting action [GenerateSploutTestWithVacancyCheck]. ErrorType [ERROR], ErrorCode [URISyntaxException], Message [URISyntaxException: Illegal character in fragment at index 117: /user/hue/oozie/workspaces/_hdfs_-oozie-457-1399556943.46/libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so]
org.apache.oozie.action.ActionExecutorException: URISyntaxException: Illegal character in fragment at index 117: /user/hue/oozie/workspaces/_hdfs_-oozie-457-1399556943.46/libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so
    at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:401)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.addToCache(JavaActionExecutor.java:354)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.setLibFilesArchives(JavaActionExecutor.java:494)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:682)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:927)
    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211)
    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59)
    at org.apache.oozie.command.XCommand.call(XCommand.java:277)
    at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
    at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.URISyntaxException: Illegal character in fragment at index 117: /user/hue/oozie/workspaces/_hdfs_-oozie-457-1399556943.46/libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so
    at java.net.URI$Parser.fail(URI.java:2829)
    at java.net.URI$Parser.checkChars(URI.java:3002)
    at java.net.URI$Parser.parse(URI.java:3048)
    at java.net.URI.<init>(URI.java:595)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.addToCache(JavaActionExecutor.java:325)
    ... 12 more
2014-05-12 13:09:00,395 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[hdfs] GROUP[-] TOKEN[] APP[GenerateSploutTestWithVacancyCheck] JOB[0000285-140226152528875-oozie-oozi-W] ACTION[0000285-140226152528875-oozie-oozi-W@GenerateSploutTestWithVacancyCheck] Setting Action Status to [DONE]

Java action workflow:

<workflow-app name="GenerateSploutTestWithVacancyCheck" xmlns="uri:oozie:workflow:0.4">
    <start to="GenerateSploutTestWithVacancyCheck"/>
    <action name="GenerateSploutTestWithVacancyCheck">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <main-class>com.splout.db.hadoop.Driver</main-class>
            <arg>generate</arg>
            <arg>-tf</arg>
            <arg>/user/splout/database-schemas/vacancy-check.json</arg>
            <arg>-o</arg>
            <arg>/user/splout/database-files</arg>
            <file>libsqlite4java-linux-amd64.so#libsqlite4java-linux-amd64.so</file>
        </java>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>


Shell action with following errors:
Exception in thread "main" java.io.IOException: Error opening job jar: /user/buckley/jobs/splout-hadoop-0.2.5-hadoop-mr2.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:135)
Caused by: java.io.FileNotFoundException: /user/buckley/jobs/splout-hadoop-0.2.5-hadoop-mr2.jar (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)


untitled.sh:
hadoop jar /user/jobs/splout-hadoop-0.2.5-hadoop-mr2.jar generate -tf /user/splout/database-schemas/requests-vs-responses.json -o /user/splout/database-files
hadoop jar splout-hadoop-0.2.5-hadoop-mr2.jar deploy --root /user/splout/database-files --tablespaces requests_responses --replication 1 --qnode http://localhost:4412

Shell workflow:

<workflow-app name="copyToSplout" xmlns="uri:oozie:workflow:0.4">
    <start to="copytoSplout"/>
    <action name="copytoSplout">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>untitled.sh</exec>
            <file>untitled.sh#untitled.sh</file>
              <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>


My question is:
How to run an splout oozie job and is it possible?
How should a workflow look alike (java or shell, one of them)?
Which user should be used, and where should the jars or files should be?

I tried multiple configurations but nothing helps. 

Cheers and Thanks
Vadim

Pere Ferrera

unread,
May 16, 2014, 6:46:04 AM5/16/14
to sploutd...@googlegroups.com
Hi Vadim,

I'm not familiar with Oozie so I'll try to help the best that I can.

To start with I would forget about using a Java action and use a shell action directly. Your shell script should execute correctly independently of where you call it. Try to call it yourself in a terminal from different paths and check that it creates and deployes the tablespace.

I have noticed in your shell script that the second line is using a relative path for the splout JAR whereas the first line is using an absolute path. Did you write that on purpose? To me it seems that this is what is causing your shell script to fail, so just fixing it would maybe help.

You might also want to modify the script so that the executing user will "cd" to the right folder beforehand, in case you are having trouble with the native libraries.

Tell us if that helps,

Vadim Kisselmann

unread,
May 23, 2014, 8:38:54 AM5/23/14
to sploutd...@googlegroups.com
Hi Pere,
thanks für your response :D

The shell script which i use is running well outside of CDH and oozie directly from terminal/console.

The relative path in the shell script was an mistake in the first email... all my commands looks like(here only generate):

hadoop jar splout-hadoop-0.2.5-hadoop-mr2.jar generate -tf /user/splout/database-schemas/requests-vs-responses.json -o /user/splout/database-files' 

As i mentioned, it works fine from the Terminal. But i tested already all variants in Oozie, with absolute path, without, etc.
And i tested it already with "cd to the right path", "copy the splout-mr2.jar to the workflow folder, or to many other folders", etc.
But nothing helps.

Because of this i asked for the a workflow "look alike" or a template from someone who maybe managed it to run Splout with Ooozie :)
But maybe it will not really work with all the permissions stuff in hadoop or Oozie own default configuration, or maybe the style how Oozie is running the jobs and what Oozie is doing with the jars. . From time to time it´s really rocket science :D

Cheers

Vadim

 

Iván de Prado

unread,
May 27, 2014, 6:02:22 AM5/27/14
to Vadim Kisselmann, sploutd...@googlegroups.com
Just to give mi cent. 

- Splout command "hadoop jar splout-hadoop-0.2.5-hadoop-mr2.jar generate ..." must be executed from the splout installation folder. Otherwise the files under the folder "native" won't be found and the process will fail. 
- Oozie executes individual commands from whichever node. You should make sure that native folder is present and local to the "hadoop jar ..." command. 

Tell me if that helps. 

Regards, 
Iván



--
You received this message because you are subscribed to the Google Groups "sploutdb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sploutdb-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Iván de Prado
CEO & Co-founder
Reply all
Reply to author
Forward
0 new messages