Hi
I am trying to use Kylo 0.9.1 with a PostgreSQL database (9.5.5) for the Kylo metastore. However, running the Sample Feed example provided in the documentation it fails at the InitializeFeed processor (standard-ingest group), with the error:
2018-06-22 18:11:27,178 INFO [Timer-Driven Process Thread-3] c.t.ingest.GetTableDataSupport Executing full GetTableData query SELECT tbl."registration_dttm",tbl."id",tbl."first_name",tbl."last_name",tbl."email",tbl."gender",tbl."ip_address",tbl."cc",tbl."country",tbl."birthdate",tbl."salary",tbl."title",tbl."comments" FROM NA tbl
2018-06-22 18:11:27,181 ERROR [Timer-Driven Process Thread-3] c.t.nifi.v2.ingest.GetTableData GetTableData[id=d46dde68-e119-3ac1-005c-abcdab87b3df] Unable to execute SQL select from table due to StandardFlowFileRecord[uuid=6b8609b9-add1-4a9b-896b-fa3cd2d68021,claim=,offset=0,name=7357899322044403,size=0]; routing to failure: org.apache.nifi.processor.exception.ProcessException: IOException thrown from GetTableData[id=d46dde68-e119-3ac1-005c-abcdab87b3df]: java.io.IOException: SQL execution failure
.
.
Caused by: org.postgresql.util.PSQLException: ERROR: relation "na" does not exist
The PostgreSQL serverlog shows:
2018-06-22 18:11:27.169 BST,"kylo901","kylo901",2002,"10.60.47.228:47562",5b2d274b.7d2,58,"PARSE",2018-06-22 17:43:55 BST,23/69774,0,ERROR,42P01,"relation ""na"" does not exist",,,,,,"SELECT tbl.""registration_dttm"",tbl.""id"",tbl.""first_name"",tbl.""last_name"",tbl.""email"",tbl.""gender"",tbl.""ip_address"",tbl.""cc"",tbl.""country"",tbl.""birthdate"",tbl.""salary"",tbl.""title"",tbl.""comments"" FROM NA tbl",200,,"PostgreSQL JDBC Driver"
2018-06-22 18:11:54.063 BST,,,4636,"10.60.47.228:35232",5b2d2dda.121c,1,"",2018-06-22 18:11:54 BST,,0,LOG,00000,"connection received: host=xx.xx.xx.xxx port=35232",,,,,,,,,""
This behaviour seems to match the issue reported in KYLO-1219, but that issue is marked as resolved. Could you comment on whether the above symptoms do indeed suggest that this is due to case sensitivity in the PostgreSQL database or suggest another possible root cause. Thank you.
Regards,
Marek
Hi Greg,
Thanks for your response; I'm beginning to fear there may a more fundamental issue at play here. Today I attempted to replace my Postgres Kylo metastore with a mariadb (10.3.7) metastore. Unfortunately I got a similar error, albeit without the na to NA name mismatch shown in my earlier Postgres example (nifi-app log attached as failed_standard_ingest_mariadb_nifi-app.log).
Checking the list of tables present in the kylo schema I see that there is actually no table called NA present, so I am left wondering if this is some kind to temporary table, or whether there has been an issue when creating the kylo schema during the initial build. On the latter point, I did encounter the reported issue with the handling of the 1970-01-01 00:00:01 timestamp in the MYSQL build, and accordingly needed to edit and run kylo-db-update-script.sql to get the FEED table created, as the original build via liquibase was incomplete, but I’ve checked both the kylo schema in both my Postgres and MySQL builds and see no table called NA present. Below is a list of the tables present in the mariadb example.
MariaDB [kylo]> show tables;
+--------------------------------+
| Tables_in_kylo |
+--------------------------------+
| AUDIT_LOG |
| BATCH_EXECUTION_CONTEXT_VALUES |
| BATCH_FEED_SUMMARY_COUNTS_VW |
| BATCH_JOB_EXECUTION |
| BATCH_JOB_EXECUTION_CTX_VALS |
| BATCH_JOB_EXECUTION_PARAMS |
| BATCH_JOB_EXECUTION_SEQ |
| BATCH_JOB_INSTANCE |
| BATCH_JOB_SEQ |
| BATCH_NIFI_JOB |
| BATCH_NIFI_STEP |
| BATCH_RELATED_FLOW_FILES |
| BATCH_STEP_EXECUTION |
| BATCH_STEP_EXECUTION_CTX_VALS |
| BATCH_STEP_EXECUTION_SEQ |
| CHECK_DATA_TO_FEED_VW |
| DATABASECHANGELOG |
| DATABASECHANGELOGLOCK |
| FEED |
| FEED_ACL_INDEX |
| FEED_CHECK_DATA_FEEDS |
| FEED_HEALTH_VW |
| FEED_SUMMARY_VIEW |
| GENERATED_KEYS |
| ITEM_LAST_MODIFIED |
| KYLO_ALERT |
| KYLO_ALERT_CHANGE |
| KYLO_VERSION |
| LATEST_FEED_JOB_END_TIME_VW |
| LATEST_FEED_JOB_VW |
| LATEST_FINISHED_FEED_JOB_VW |
| MODESHAPE_REPOSITORY |
| NIFI_EVENT |
| NIFI_FEED_PROCESSOR_STATS |
| NIFI_FEED_STATS |
| NIFI_FLOW_CACHE_CLUSTER_SYNC |
| NIFI_FLOW_CACHE_UPDATE_ITEM |
| NIFI_RELATED_ROOT_FLOW_FILES |
| SLA_ACTION_TEMPLATE |
| SLA_ASSESSMENT |
| SLA_DESCRIPTION |
| SLA_FEED |
| SLA_METRIC_ASSESSMENT |
| SLA_OBLIGATION_ASSESSMENT |
| TMP_BATCH_NIFI_JOB_DELETED |
| VELOCITY_TEMPLATE |
+--------------------------------+
46 rows in set (0.000 sec)
This also tallies with the tables present in the Kylo VirtualBox Sandbox provided, so this leaves me rather bemused as to where I may have gone wrong.
Regards,
Marek
https://kylo.readthedocs.io/en/v0.9.1/installation/RunSampleFeed.html
Naturally I've had to amend the default data_ingest.zip template to amend a couple of parameters (such as the installation patch from /opt to our standard /app, and also the database connections details), so I've attached the modified templates for the Postgres and MySQL Kylo metastore test cases that I've been attempting.
Regards,
Marek
# Set the nifi home folderconfig.nifi.home=/opt/nifi
# Set the spark application jars locationconfig.nifi.kylo.applicationJarDirectory=/opt/nifi/current/lib/appHi Greg,
I believe that “NA” identifier is coming from the “Fetch RDBMS Data/GetTableData” processor, (parameter, ‘Source Table’). I’ve verified this by changing NA to some other arbitrary identifier and observing that the error in the nifi log changed accordingly.
Since the Source Database connection only allows me to set it to “Kylo MySQL”, isn’t this referring to the relational database (whether that be on Postgres or Mariadb) that (along with JBoss ModeShape) comprises the Kylo Metadata Server - or have I fundamentally misunderstood what is going on here?
Strangely enough if I compare this with the config on my Cloudera Sandbox environment, that also has ‘Source Table = NA’ in the “Fetch RDBMS Data/GetTableData” processor - but I don’t see any corresponding schema in the MySQL database that comes pre-canned with that deployment. Looking at the “Source Fields” it seems to be referencing a table with the columns corresponding to the userdata1.csv file used in the Sample Feed example being used, so it would seem to make sense that this is processor is trying to read the metadata that defines the feed highwater mark, but where that information actually resides and why what the correct value for the ‘Source Table’ parameter is the bit that is still eluding me.
Regards,
Marek