Data Ingest fro Feed stuck at "Step 8 - Initialize Feed?"

392 views
Skip to first unread message

Kanika Batra

unread,
May 22, 2018, 4:04:32 AM5/22/18
to Kylo Community
Hi,

I followed the manual steps to install Kylo and for Hadoop, Spark and Hive, I installed Apache offerings. I am able to create category and feed, however, when I import data using the standard ingest template available in the link - https://github.com/Teradata/kylo/commit/e532330fb52504b95f51af6fba620704292a6557, my feed job is stuck to Running state at "Step 8 - Initialize Feed?". 

I have checked in services, Nifi is healthy and I do not get any error as well in Nifi when I go to Reporting Tasks from hamburger Menu. There are no errors in Kylo-ui and Kylo-services log as well.

I have a doubt that the Data Ingest template (on Nifi 1.0 as per the link) is not compatible with Nifi v1.3 which I have installed. Please let me know if that is the issue or something else that I have messed up?

Thanks,
Kanika Batra


Scott Reisdorf

unread,
May 22, 2018, 8:22:46 AM5/22/18
to Kylo Community
That template is compatible with 1.3
Can you post your nifi-app.log file?  (/var/log/nifi/nifi-app.log)

Kanika Batra

unread,
May 22, 2018, 9:20:27 AM5/22/18
to Kylo Community
Please find attached file. 
nifi-app.log

GnResende

unread,
May 22, 2018, 10:17:52 AM5/22/18
to Kylo Community
Hi Kanika,

Probably your nifi-app.log file with the Step 8 error information already rotated. Can you check other versions of this log on /var/log/nifi/nifi-app*.log and try to find  the specific timestamp when the error occurs last time?

Thank you.

Kanika Batra

unread,
May 22, 2018, 10:23:07 AM5/22/18
to Kylo Community
Hi,

Attached is the latest file I could find in the logs. If required, I can remove all existing ones and try to upload data again.
nifi-app_2018-05-22_13.0.log

GnResende

unread,
May 22, 2018, 10:30:45 AM5/22/18
to Kylo Community
Yes, if it is possible, retry and post the logs just after the error. You don't need to remove the logs.

If you can also, post information about Kylo Bulletin (on Kylo-UI) and kylo logs.

Kanika Batra

unread,
May 22, 2018, 10:44:32 AM5/22/18
to Kylo Community
I removed all nifi-app log files, after that no file has been created. The feed status is Running form last 9 minutes. 

I have attached screenshots of the kylo-services log and kylo-ui. Please let me know if you need further details.
kylo-services.png
kylo-ui-screenshot.png

GnResende

unread,
May 22, 2018, 10:57:08 AM5/22/18
to Kylo Community
Strange, is expected to find some ERROR in the logs. Sorry for insist on that but don't remove your logs and try to find some ERROR on nifi and kylo logs...

grep ERROR *.log

Kanika Batra

unread,
May 22, 2018, 11:05:27 AM5/22/18
to Kylo Community
Sure, I will not delete files again. Please find attached the nifi-app.log file that has been created post my last run. I tried to find error, but could not find one.
nifi-app.log

GnResende

unread,
May 22, 2018, 1:14:38 PM5/22/18
to Kylo Community
Can you check on NiFi UI if there is any bulletin for this error? If you don't find any error or clue for this situation I think that a restart on all services and a retry on the FEED can let us to catch the error.

Greg Hart

unread,
May 22, 2018, 2:00:40 PM5/22/18
to Kylo Community
Hi Kanika,

Please try running the feed again and then attach your nifi-app.log file.

Greg Hart

unread,
May 22, 2018, 2:02:24 PM5/22/18
to Kylo Community
Also, please go to the root process group, click on the gear icon, controller services, and verify the properties for the Kylo Metadata Service are correct.

Kanika Batra

unread,
May 23, 2018, 8:43:28 AM5/23/18
to Kylo Community
I checked Nifi Bulletin Board, it is blank. I restarted my machine and then all the services one by one, it is still not working.

Kanika Batra

unread,
May 23, 2018, 8:46:48 AM5/23/18
to Kylo Community
Please find attached nifi-app.log, nifi-user.log file and nifi template diagram. I am not sure if what I see in user log file, is expected or not. 
nifi-app.log
nifi-user.log
Nifi_Template_UI.png

GnResende

unread,
May 23, 2018, 9:07:14 AM5/23/18
to Kylo Community
Kanika,

Did you re execute the feed?

Kanika Batra

unread,
May 23, 2018, 10:38:54 AM5/23/18
to Kylo Community
Yes

Kanika Batra

unread,
May 23, 2018, 12:44:56 PM5/23/18
to Kylo Community
Hi, 

Please let me know if there is any further information required to debug the issue. I am unable to find any errors as well.

Thanks,
Kanika Batra

GnResende

unread,
May 23, 2018, 2:30:34 PM5/23/18
to Kylo Community
Kanika,

I stopped everything on my Kylo/nifi lab and leave just one FEED running (data_ingest) looking for files on a specific directory. Before I put a new file to be processed, my nifi-app.log seems a lot with your (the last version that you attached), look:

018-05-23 14:28:31,983 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:28:32,031 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 47 milliseconds (Stop-the-world time = 27 milliseconds, Clear Edit Logs time = 15 millis), max Transaction ID 124
2018-05-23 14:28:32,031 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 47 milliseconds
2018-05-23 14:30:05,132 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 4 milliseconds (Stop-the-world time = 0 milliseconds, Clear Edit Logs time = 0 millis), max Transaction ID 2289
2018-05-23 14:30:32,032 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:30:32,100 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 67 milliseconds (Stop-the-world time = 46 milliseconds, Clear Edit Logs time = 16 millis), max Transaction ID 124
2018-05-23 14:30:32,100 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 67 milliseconds
2018-05-23 14:32:05,147 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 14 milliseconds (Stop-the-world time = 4 milliseconds, Clear Edit Logs time = 4 millis), max Transaction ID 2289
2018-05-23 14:32:32,100 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:32:32,150 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 49 milliseconds (Stop-the-world time = 29 milliseconds, Clear Edit Logs time = 16 millis), max Transaction ID 124
2018-05-23 14:32:32,150 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 49 milliseconds
2018-05-23 14:34:05,153 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 0 milliseconds, Clear Edit Logs time = 0 millis), max Transaction ID 2289
2018-05-23 14:34:32,150 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:34:32,231 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 80 milliseconds (Stop-the-world time = 57 milliseconds, Clear Edit Logs time = 19 millis), max Transaction ID 124
2018-05-23 14:34:32,231 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 80 milliseconds
2018-05-23 14:36:05,162 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 8 milliseconds (Stop-the-world time = 2 milliseconds, Clear Edit Logs time = 2 millis), max Transaction ID 2289
2018-05-23 14:36:32,231 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:36:32,278 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 47 milliseconds (Stop-the-world time = 28 milliseconds, Clear Edit Logs time = 15 millis), max Transaction ID 124
2018-05-23 14:36:32,279 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 47 milliseconds
2018-05-23 14:38:05,174 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 11 milliseconds (Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID 2289
2018-05-23 14:38:32,279 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:38:32,376 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 96 milliseconds (Stop-the-world time = 69 milliseconds, Clear Edit Logs time = 21 millis), max Transaction ID 124
2018-05-23 14:38:32,376 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 96 milliseconds
2018-05-23 14:40:05,186 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 11 milliseconds (Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID 2289
2018-05-23 14:40:32,376 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:40:32,423 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 46 milliseconds (Stop-the-world time = 27 milliseconds, Clear Edit Logs time = 15 millis), max Transaction ID 124
2018-05-23 14:40:32,423 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 46 milliseconds
2018-05-23 14:42:05,198 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 11 milliseconds (Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID 2289
2018-05-23 14:42:32,423 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:42:32,538 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 114 milliseconds (Stop-the-world time = 88 milliseconds, Clear Edit Logs time = 22 millis), max Transaction ID 124



But just after I put a new file on the target directory the flow start to run, look:

2018-05-23 14:42:05,198 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@5c064f4b checkpointed with 733 Records and 0 Swap Files in 11 milliseconds (Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID 2289
2018-05-23 14:42:32,423 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-05-23 14:42:32,538 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e61bd3f checkpointed with 0 Records and 0 Swap Files in 114 milliseconds (Stop-the-world time = 88 milliseconds, Clear Edit Logs time = 22 millis), max Transaction ID 124
2018-05-23 14:42:32,538 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 114 milliseconds
2018-05-23 14:43:14,395 INFO [FeedStatisticsManager-SendStats-0] c.t.n.p.jms.ProvenanceEventJmsWriter SENDING Batch Events to JMS ProvenanceEventRecordDTOHolder{events=3}
2018-05-23 14:43:14,699 INFO [Timer-Driven Process Thread-5] c.t.nifi.v2.common.FeedProcessor Resolving ID for feed users/teste
2018-05-23 14:43:15,302 INFO [Timer-Driven Process Thread-1] org.apache.hadoop.io.compress.CodecPool Got brand-new compressor [.bz2]
2018-05-23 14:43:16,475 INFO [Timer-Driven Process Thread-5] c.t.nifi.v2.common.FeedProcessor Resolving id efd745a8-480d-406d-a4e3-b6072e641b7e for feed users/teste
2018-05-23 14:43:17,360 INFO [FeedStatisticsManager-SendStats-1] c.t.n.p.jms.ProvenanceEventJmsWriter SENDING Batch Events to JMS ProvenanceEventRecordDTOHolder{events=12}
2018-05-23 14:43:17,451 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.Utils Supplied authorities: amb2.service.consul:10000
2018-05-23 14:43:17,451 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.Utils Resolved authority: amb2.service.consul:10000
2018-05-23 14:43:17,501 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.HiveConnection Will try to open client transport with JDBC Uri: jdbc:hive2://amb2.service.consul:10000/default
2018-05-23 14:43:17,666 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.Utils Supplied authorities: amb2.service.consul:10000
2018-05-23 14:43:17,666 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.Utils Resolved authority: amb2.service.consul:10000
2018-05-23 14:43:17,669 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.HiveConnection Will try to open client transport with JDBC Uri: jdbc:hive2://amb2.service.consul:10000/default
2018-05-23 14:43:17,724 INFO [Timer-Driven Process Thread-3] c.t.nifi.v2.thrift.RefreshableDataSource connection obtained by RefreshableDatasource
2018-05-23 14:43:17,736 WARN [Timer-Driven Process Thread-3] c.t.nifi.v2.thrift.RefreshableDataSource The Statement.setQueryTimeout() method is not supported for the JDBC URL: jdbc:hive2://amb2.service.consul:10000/default
2018-05-23 14:43:17,737 INFO [Timer-Driven Process Thread-3] c.t.nifi.v2.thrift.RefreshableDataSource perform validation query in RefreshableDatasource.executeWithTimeout()
2018-05-23 14:43:18,115 INFO [Timer-Driven Process Thread-3] c.t.nifi.v2.thrift.RefreshableDataSource validation query returned from RefreshableDatasource.executeWithTimeout() in 376.7 ms
2018-05-23 14:43:18,121 INFO [Timer-Driven Process Thread-3] c.t.nifi.v2.thrift.RefreshableDataSource Cleaning up the current connection using a background thread.
2018-05-23 14:43:18,122 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.Utils Supplied authorities: amb2.service.consul:10000
2018-05-23 14:43:18,122 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.Utils Resolved authority: amb2.service.consul:10000
2018-05-23 14:43:18,124 INFO [Timer-Driven Process Thread-3] org.apache.hive.jdbc.HiveConnection Will try to open client transport with JDBC Uri: jdbc:hive2://amb2.service.consul:10000/default
2018-05-23 14:43:19,839 INFO [Timer-Driven Process Thread-7] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=13174dde-d271-3296-f998-728ed33d2b07] Adding to class path '/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar'


So, in your case we are not seeing any change on the nifi-app.log - ERROR or INFO. So, I guess that you don't have any FEED running. I'm very confused about your issue. Do you see the FEED Running on Kylo dashboard?

Kanika Batra

unread,
May 23, 2018, 2:40:56 PM5/23/18
to Kylo Community
Hi Resende,

I am able to see the feed running in my kylo dashboard, attached is the screenshot for the same. I have also attached the activeMQ queue screenshot, thinking it might help.

Thanks,
Kanika Batra
kylo_feed_running.png
ActiveMQ.png

Kanika Batra

unread,
May 23, 2018, 3:42:33 PM5/23/18
to Kylo Community
As a stop gap solution, I created one AWS machine on which I can do my development work. In that I saw that the next step after the Initialize feed is Storing DB to HDFS. Taking that in mind when i ran hdfs dfs -l /etl/  in my setup, I see no data in that. Ideally, there should have been folder with the category name created in it. I think that is the reason my feed is not processing completely. 

Can you please help me fix this?

Thanks,
Kanika Batra
Message has been deleted

Greg Hart

unread,
May 24, 2018, 11:46:17 AM5/24/18
to Kylo Community
Hi Kanika,

In NiFi could you go to the reusable_templates -> standard-ingest process group, zoom in until you can see the details on the Initialize Feed processor, and then take a screenshot and attach it here?

Kanika Batra

unread,
May 26, 2018, 6:19:33 PM5/26/18
to Kylo Community
Hi,

I was able to fix the issue at step-8. When looked at the Nifi reusable template in detail in Nifi, the problem was with HDFS path as expected. My installation had hdfs location as /usr/local/hadoop, however, kylo was expecting it to be at /etc/hadoop. When I moved my files from one to the other, the problem was resolved. However, now I am stuck with following problem:

NiFi exceptions: ExecuteSparkJob[id=9a51322c-dbf8-3edf-3d39-0dadb4d14d8a] ExecuteSparkJob for Validate And Split Records and flowfile: StandardFlowFileRecord[uuid=204cd14c-fcde-41ef-becd-d30598f155af,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1527370781590-2, container=default, section=2], offset=424173, length=140429],offset=0,name=userdata2.csv,size=140429] completed with failed status 1

My spark installation is in /usr/lib/spark and I have also commented config.spark.validateAndSplitRecords.extraJars=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar. Even after that I was getting file not found error at location /usr/hdp which I have manually added. But now, I get the above error.

I saw in one of the answers that it could be because of spark version, so I downgraded from v2.3 to v1.6.3, still getting same error. 

Kanika Batra

unread,
May 27, 2018, 6:50:11 AM5/27/18
to Kylo Community
Please find attached Nifi-app.log, It says cannot access '/usr/hdp/current/spark-client/assembly/target/scala-2.10': No such file or directory but, I have not installed hortonworks and have also commented the line having hdp in application.properties. 
I checked spark.properties as well, i have no where mentioned about hdp. Please guide.

Thanks.
Kanika Batra
nifi-app.log

Ruslans Uralovs

unread,
May 28, 2018, 4:53:39 AM5/28/18
to Kylo Community
Kanika, can you open properties of both "Validate and Split Records" and "Profile Data" processors and ensure that "SparkHome" property is set to where your Spark installation is.


Kanika Batra

unread,
May 28, 2018, 5:25:32 AM5/28/18
to Kylo Community
Hi Rusains,

By default, Spark Home was pointing to /usr/hdp/... directory in  "Validate and Split Records" and "Profile Data" processors, I have changed it and now I do not get the hdp related error. However, I am still getting below error in nifi-app.log file:

ERROR [Timer-Driven Process Thread-9] c.t.nifi.v2.spark.ExecuteSparkJob ExecuteSparkJob[id=9a51322c-dbf8-3edf-3d39-0dadb4d14d8a] ExecuteSparkJob for Validate And Split Records and flowfile: StandardFlowFileRecord[uuid=24e1f7be-afc5-4cbd-bdeb-44afcdf9bd38,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1527499084080-1, container=default, section=1], offset=2087, length=623],offset=0,name=userdata3.csv,size=623] completed with failed status 1 

Ruslans Uralovs

unread,
May 28, 2018, 12:32:53 PM5/28/18
to Kylo Community
What do you see in the nifi-app.log for when you run the feed? Can you attach the log?

Kanika Batra

unread,
May 28, 2018, 12:51:42 PM5/28/18
to Kylo Community
Hi Ruslans,

I see there is stream error which has been logged as INFO and then there is the final error. I also checked hive, and I could see all tables w, w_feed, w_valid, w_invalid however, data is there only in w_feed. Please find attached the nifi-app.log.

Thanks,
Kanika Batra
nifi-app.log

Ruslans Uralovs

unread,
May 29, 2018, 4:59:54 AM5/29/18
to Kylo Community
This error is because Nifi cannot find Kylo classes. To set up Kylo classes for Nifi:
1. Stop Nifi
2. Run the following command to set up Kylo classes for Nifi:
/opt/kylo/setup/nifi/update-nars-jars.sh /opt/nifi /opt/kylo/setup nifi users
3. Start Nifi
4. Re-run the feed
5. Share Nifi log

Kanika Batra

unread,
May 29, 2018, 7:13:52 AM5/29/18
to Kylo Community
Hi Ruslans,

I tried the above steps, still getting same error. Attached are the nifi logs. Please also have a look at the screenshot, which shows that my spark-shell is able to communicate with hive ( sqlContext.sql("show databases").show()).



nifi-app.log
spark-shell_hive.png

Kanika Batra

unread,
May 29, 2018, 7:24:53 AM5/29/18
to Kylo Community
Please also find attached the /usr/lib/spark/conf/spark-env.sh file for AMI spark v/s for the one I have installed. Please let me now if I shall add more config. to my installation (blue).

Thanks,
Kanika Batra
spark_ami.png
spark_myInstallation.png

Ruslans Uralovs

unread,
May 29, 2018, 12:06:39 PM5/29/18
to Kylo Community
Looks like still the same problem where Nifi cannot find Kylo classes, rather Spark of course, but its Nifi which sets it up.
Can you show configuration properties for "Validate and Split Records" processor in Nifi please

Kanika Batra

unread,
May 29, 2018, 12:12:42 PM5/29/18
to Kylo Community
Please find attached the same.
nifi_property_1.png
nifi_property_2.png

Ruslans Uralovs

unread,
May 29, 2018, 12:24:22 PM5/29/18
to Kylo Community
Can you see if you the jar mentioned by "ApplicationJar" property on path ${nifi.home}/current/lib/kylo.....jar exists?

Kanika Batra

unread,
May 29, 2018, 12:41:49 PM5/29/18
to Kylo Community
The jar is there in /opt/nifi/current/..  However, I just now noticed that when I run service nifi status, it is pointing to different nifi_home location. I am running the nifi using service nifi start. Is that teh issue? Attached is screenshot of nifi status
nifi_status.png

Greg Hart

unread,
May 29, 2018, 2:52:15 PM5/29/18
to Kylo Community
Hi Kankia,

Please find the job in the Kylo UI and check the value for the nifi.home property. What value do you see?

The NiFi home in systemd looks correct. You should be able to use /opt/nifi/current and /opt/nifi/nifi-1.3.0 interchangeably.

Greg Hart

unread,
May 29, 2018, 2:57:35 PM5/29/18
to Kylo Community
Also, some of the properties of the ExecuteSparkJob processor seem incorrect. It looks like you were trying to follow step 4 of 'Tuning the ExecuteSparkJob Processor' but made the changes to the 'Validate and Split records' processor instead of the 'Execute Script' processor.

I recommend re-importing the data_ingest.zip template and creating a new feed to verify that everything is working. Then make the suggested changes and test again to verify that it is still working.

Kanika Batra

unread,
May 29, 2018, 6:16:09 PM5/29/18
to Kylo Community
Hi Greg,

Thanks for pointing out the mistake. I uploaded the data ingest template again and could resolve the issue with some tweaks. I can now see my data getting split into valid and invalid table. However, the feed as a whole is still failing with below error:

ERROR [Timer-Driven Process Thread-9] c.t.nifi.v2.ingest.MergeTable MergeTable[id=a838883e-7462-339b-798d-b66a3ba4379c] Unable to execute merge doMerge for StandardFlowFileRecord[uuid=64cbd24f-7d80-4b41-8b47-9d16a32f8dfc,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1527631507772-2, container=default, section=2], offset=163, length=79],offset=0,name=profile_details0.csv,size=79] due to java.lang.RuntimeException: Failed to execute query; routing to failure: java.lang.RuntimeException: Failed to execute query


Attached is nifi-app.log file and screenshot of the merge table processor configuration. Please note that I have disabled hive schema validation as earlier I installed v 2.3.0 and then downgraded to 1.2.0. 
nifi-app.log
hive_config.png

Greg Hart

unread,
May 29, 2018, 6:53:04 PM5/29/18
to Kylo Community
Hi Kanika,

Please try running this query using 'hive' on the command-line:
insert into table `work`.`w2` select `registration_dttm`,`id`,`first_name`,`last_name`,`email`,`gender`,`ip_address`,`cc`,`country`,`birthdate`,`salary`,`title`,`comments`, min(processing_dttm) processing_dttm from ( select distinct `registration_dttm`,`id`,`first_name`,`last_name`,`email`,`gender`,`ip_address`,`cc`,`country`,`birthdate`,`salary`,`title`,`comments`,`processing_dttm` from `work`.`w2_valid` where processing_dttm = "1527631282612"  union all  select `registration_dttm`,`id`,`first_name`,`last_name`,`email`,`gender`,`ip_address`,`cc`,`country`,`birthdate`,`salary`,`title`,`comments`,`processing_dttm` from `work`.`w2`) x group by `registration_dttm`,`id`,`first_name`,`last_name`,`email`,`gender`,`ip_address`,`cc`,`country`,`birthdate`,`salary`,`title`,`comments` having count(processing_dttm) = 1 and min(processing_dttm) = "1527631282612"

If it still fails please find the YARN job and examine the logs.

Kanika Batra

unread,
May 30, 2018, 8:41:03 AM5/30/18
to Kylo Community
Hi Greg,

As you expected, query fails when I try to run it directly in hive. Attached is what I see in the logs. I have made the changes as per what is written in suggestions (attached in screenshot, also tried /usr/local/hadoop instead of $HADOOP_HOME), still getting same issue. I feel I am missing something obvious, please help. I believe, this is the last issue in the process, and I am extremely excited to resolve it.

Looking forward to your suggestions.

Thanks,
Kanika Batra 
config_mapred-site.png
Screenshot_cluster_error.png

Kanika Batra

unread,
May 31, 2018, 10:41:12 AM5/31/18
to Kylo Community
Hi,

Please find attached my core-site, mapred-site and yarn-site xml files and the application error log. The error is happening in merge data nifi processor. Please help!

Thanks,
Kanika Batra
core-site.xml
mapred-site.xml
yarn-site.xml
Application_error_log.png

Greg Hart

unread,
May 31, 2018, 3:33:31 PM5/31/18
to Kylo Community
Hi Kanika,

It looks like the issue is with Hive and not related to Kylo. Please try asking on the Apache Hive mailing list or contacting Think Big Analytics for paid support.

Kanika Batra

unread,
Jun 1, 2018, 6:56:10 PM6/1/18
to Kylo Community
Thank you so much GnResende, Greg Hart, Ruslans Uralovs, Scott Reisdorf for being patient and helping me throughout. 
The setup is completely done, and I ran successful data import today and attached is a screenshot that makes me feel happy :)

Thanks,
Kanika Batra


Completed.png

ha bach Duong

unread,
May 21, 2021, 2:20:20 PM5/21/21
to Kylo Community
Dear  Kanika Batra,
Could you tell me how to fix the Merge problem ?
I have the problem same as your

Vào lúc 05:56:10 UTC+7 ngày Thứ Bảy, 2 tháng 6, 2018, Kanika Batra đã viết:
Reply all
Reply to author
Forward
0 new messages