Kylo - sample template - Data Ingest - Feed stuck on step 2 "Initialize cleanup Parameters"

vijayarajan marimuthu

unread,

Apr 12, 2017, 1:44:46 PM4/12/17

to Kylo Community

Hi Kylo community,

I've installed Kylo on my HDP cluster.

After successful log on , I've imported the sample template "Data Ingest" and created new feed for "user data" as instructed in the demo video.

When i try to run the feed with sample data (userdata2.csv) , the feed is keep running for more than 2 hrs and got stuck in step 2: with out showing any error .

Here by I've attached the screen shot of the Feed Job steps, Please let me know where to start the troubleshoot ?

Thanks

Vijay

kylo_feed_job_steps.PNG

Matt Hutton

unread,

Apr 12, 2017, 2:05:45 PM4/12/17

to Kylo Community

First check your installation. When you import the data ingest template, did you check 'import re-usable template' when prompted? If not, please reimport. To verify everything is correctly setup, go to NiFi and you should see a process group called 'reusable_templates' with an embedded process group called 'standard-ingest'. Then verify your feed shows up in the top-level 'webapps' process group and embedded process group called 'userdata'. You should see the output ports wired to the import port of the standard-ingest process group in reusable_templates.

Also, it looks like the job you attached was a delete operation vs. an ingest. This would be triggered by selecting 'Delete feed' from the feed's menu. Once you verify your install, please try again. You can abandon the long running job in the operations manager.

vijayarajan marimuthu

unread,

Apr 14, 2017, 11:49:26 AM4/14/17

to Kylo Community

Thanks Matt.

After re-import the template, the issue resolved. But again the job failed with the feed "index schema service" step 3 : Query Hive Table Schema with the following error

ExecuteSQL[id=2bd64d86-2b1f-1ef0-f632-284eb777a3f7] Unable to execute SQL select query SELECT d.NAME DATABASE_NAME, d.OWNER_NAME OWNER, t.CREATE_TIME, t.TBL_NAME, t.TBL_TYPE, c.COLUMN_NAME, c.TYPE_NAME FROM hive.COLUMNS_V2 c JOIN hive.SDS s on s.CD_ID = c.CD_ID JOIN hive.TBLS t ON s.SD_ID=t.SD_ID JOIN hive.DBS d on d.DB_ID = t.DB_ID where d.name = 'webapps'and t.tbl_name = 'user_data_ingest'; for StandardFlowFileRecord[uuid=a982109a-1fc3-4a0c-9324-ad04c8733a71,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1492184461121-1, container=default, section=1], offset=281015, length=48],offset=0,name=349054303986753,size=48] due to org.apache.nifi.processor.exception.ProcessException: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Could not connect to address=(host=localhost)(port=3306)(type=master) : Connection refused); routing to failure: org.apache.nifi.processor.exception.ProcessException: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Could not connect to address=(host=localhost)(port=3306)(type=master) : Connection refused)

I've configured /opt/kylo/kylo-services/conf/application.properties file with following properties

hive.metastore.datasource.driverClassName=org.mariadb.jdbc.Driver

hive.metastore.datasource.url=jdbc:mysql://xx.xxx.x.xxx:3306/hive

I'm not sure, how the host=localhost appearing in the error instead of configured ip address ?

Please let me know how to change the database url ?

Greg Hart

unread,

Apr 14, 2017, 5:38:57 PM4/14/17

to Kylo Community

Hi Vijayarajan,

The Query Hive Table Schema uses the MySQL controller service in NiFi. You can either modify the properties for this controller service or create a new controller service and modify the processor to use this controller service.

vijayarajan marimuthu

unread,

Apr 17, 2017, 1:28:19 PM4/17/17

to Kylo Community

Thanks Greg,

I've modified the properties for mysql controller service and now I'm getting the following error in Step 16 - Validate And Split Records

2017-04-17 12:53:10,526 ERROR [Timer-Driven Process Thread-9] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=2b1f1ef0-4e06-1bd6-cbce-5e6977fd5e56] ExecuteSparkJob for Validate And Split Records and flowfile: StandardFlowFileRecord[uuid=9b2ade6c-838f-4bbb-997b-efaceca2f93f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1492447947624-1,container=default, section=1], offset=140538, length=140429],offset=0,name=userdata2.csv,size=140429] completed with failed status 1

Greg Hart

unread,

Apr 17, 2017, 2:15:08 PM4/17/17

to Kylo Community

Hi Vijayarajan,

Could you attach your /var/log/nifi/nifi-app.log file?

Thanks!

vijayarajan marimuthu

unread,

Apr 17, 2017, 3:50:08 PM4/17/17

to Kylo Community

Greg,

Here by i"ve attached nifi-app.log file

nifi-app.log

Greg Hart

unread,

Apr 17, 2017, 3:56:53 PM4/17/17

to Kylo Community

It looks like you're running into this issue:

http://kylo.readthedocs.io/en/latest/tips-tricks/TroubleshootingandTips.html#spark-sql-fails-on-empty-orc-and-parquet-tables

To fix the issue, you can take these steps:

On the edge node, edit the file: /usr/hdp/current/spark-client/conf/spark-defaults.conf
Add these configuration entries to the file:

spark.sql.hive.convertMetastoreOrc false
spark.sql.hive.convertMetastoreParquet false

vijayarajan marimuthu

unread,

Apr 18, 2017, 3:56:22 PM4/18/17

to Kylo Community

Thanks Greg,

The data ingest job completed successfully , but "index_schema_service" is got stuck in step 2:

job_status.PNG

job_steps.PNG

Greg Hart

unread,

Apr 18, 2017, 5:55:05 PM4/18/17

to Kylo Community

Hi Vijayarajan,

Could you check in the NiFi UI and the NiFi logs to see if you're getting an error message?

Thanks!

vijayarajan marimuthu

unread,

Apr 20, 2017, 11:21:47 AM4/20/17

to Kylo Community

Greg

Yes! I've checked NiFi UI and found the MySQL connection is disabled due to modification of proper ip address. After I've enabled the MySQL connection, the Job completed successfully.

Thanks for your support!

Thanks

Vijay

dinesh.h...@gmail.com

unread,

Nov 14, 2018, 8:38:21 AM11/14/18

to Kylo Community

Hi,

Am having the same issue please help me to solve this issue.

Regards,

Dinesh D

application.properties

nifi-app.log

dinesh.h...@gmail.com

unread,

Nov 14, 2018, 8:41:38 AM11/14/18

to Kylo Community

Hi VIjay,

Could you please share me the configuration file of kylo.

Regards,

Dinesh D

dinesh.h...@gmail.com

unread,

Nov 14, 2018, 8:44:11 AM11/14/18

to Kylo Community

Hi Vijay ,

Please share me the Data ingest zip file.

Regards,

Dinesh D

ruslans.uralovs

unread,

Nov 15, 2018, 6:40:06 AM11/15/18

to Kylo Community

From Nifi log we can see that you are running Spark 1 code on Spark 2. It must be because your default Spark version is 2 but Nifi is configured to run Spark 1.

2018-11-14 17:01:52,869 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=9a51322c-dbf8-3edf-c126-a8ff931c5ac6] Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.table(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
2018-11-14 17:01:52,869 INFO [stream error] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=9a51322c-dbf8-3edf-c126-a8ff931c5ac6] 	at com.thinkbiganalytics.spark.SparkContextService16.toDataSet(SparkContextService16.java:43)

You have two options:

1) Reconfigure processors to run on Spark 1 if you have Spark 1 installed

2) Reconfigure Nifi for Spark 2 and you can run existing feeds on you default Spark version 2

For option 1)

To fix this go to Nifi and find all ExecuteSparkJob processors and change their SparkHome property to use Spark 1.

To avoid going into Nifi and updating properties manually you can update "nifi.executesparkjob.sparkhome" property in "/opt/kylo/kylo-services/conf/application.properties", restart kylo-services and re-import your templates.

For option 2)

To reconfigure Kylo jars for Nifi for a different Spark version you can execute following from command line:

export SPARK_PROFILE = <spark-profile, e.g. spark-v1 or spark-v2>

/opt/kylo/setup/nifi/update-nars-jars.sh <nifi-home> <kylo-setup-dir> <nifi-user> <nifi-user-group>

For example:

[root@sandbox ~]# export SPARK_PROFILE=spark-v2

[root@sandbox ~]# /opt/kylo/setup/nifi/update-nars-jars.sh -f /opt/nifi /opt/kylo/setup nifi users

The NIFI home folder is /opt/nifi using permissions nifi:users

Updating the kylo nifi nar and jar files

Creating symlinks for NiFi version 1.6.0.jar compatible nars

Nar files and Jar files have been updated

[root@sandbox ~]# ll /opt/nifi/current/lib/app | grep kylo

lrwxrwxrwx 1 nifi users 96 Nov 15 10:44 kylo-spark-interpreter-jar-with-dependencies.jar -> /opt/nifi/data/lib/app/kylo-spark-interpreter-spark-v2-0.10.0-SNAPSHOT-jar-with-dependencies.jar

lrwxrwxrwx 1 nifi users 97 Nov 15 10:44 kylo-spark-job-profiler-jar-with-dependencies.jar -> /opt/nifi/data/lib/app/kylo-spark-job-profiler-spark-v2-0.10.0-SNAPSHOT-jar-with-dependencies.jar

lrwxrwxrwx 1 nifi users 96 Nov 15 10:44 kylo-spark-merge-table-jar-with-dependencies.jar -> /opt/nifi/data/lib/app/kylo-spark-merge-table-spark-v2-0.10.0-SNAPSHOT-jar-with-dependencies.jar

lrwxrwxrwx 1 nifi users 95 Nov 15 10:44 kylo-spark-multi-exec-jar-with-dependencies.jar -> /opt/nifi/data/lib/app/kylo-spark-multi-exec-spark-v2-0.10.0-SNAPSHOT-jar-with-dependencies.jar

lrwxrwxrwx 1 nifi users 101 Nov 15 10:44 kylo-spark-validate-cleanse-jar-with-dependencies.jar -> /opt/nifi/data/lib/app/kylo-spark-validate-cleanse-spark-v2-0.10.0-SNAPSHOT-jar-with-dependencies.jar

Reply all

Reply to author

Forward