Unknown dataset URI pattern : file

60 views
Skip to first unread message

Nitin Motgi

unread,
Oct 18, 2015, 1:05:21 PM10/18/15
to CDK Development

Hi,


I am trying to create a Kite Dataset Writer Plugin for Hydrator and when I run in CDAP Standalone mode it complains that there Dataset URI is unknown, but the same code passes the unit test. 


org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:file:/tmp/xyz/tweets

Check that JARs for file datasets are on the classpath


I followed the documentation for URI to include <path>/<namespace>/<dataset-name>, but still unsure as to why it would complain. I turned the DEBUG mode on and this is what I see. Is there something I am missing to include in the path. 


2015-10-17 23:07:37,479 - INFO  [worker-ETLWorker-0:c.c.h.s.KiteDatasetWriter@68] - Kite Dataset URI dataset:file:/tmp/xyz/tweets 

2015-10-17 23:07:37,482 - DEBUG [worker-ETLWorker-0:o.k.d.s.Registration@147] - Registered repository URIs:

2015-10-17 23:07:37,482 - DEBUG [worker-ETLWorker-0:o.k.d.s.Registration@149] - Registered dataset URIs:


Any help would be appreciated. 


Thanks,

Nitin


Ryan Blue

unread,
Oct 19, 2015, 12:10:28 PM10/19/15
to Nitin Motgi, CDK Development
Nitin,

The problem is that the URIs weren't registered. That happens using a
ServiceLoader that relies on configuration files [1] that point to the
right loader class. The most common problem with this happens when you
build a shaded jar and these aren't merged properly into a single file
or are excluded entirely. Are you using the shade plugin or the assembly
plugin?

rb


[1]:
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/resources/META-INF/services/org.kitesdk.data.spi.Loadable
--
Ryan Blue
Software Engineer
Cloudera, Inc.

Nitin Motgi

unread,
Oct 19, 2015, 12:19:06 PM10/19/15
to Ryan Blue, CDK Development
Hi Ryan, 

Yes, I am using a shaded JAR. Are there any pointers on how to fix it.

Thanks,
Nitin

--
"Humility isn't thinking less of yourself, it's thinking of yourself less"

Ryan Blue

unread,
Oct 19, 2015, 12:23:38 PM10/19/15
to Nitin Motgi, CDK Development
You just need to add a config property to turn on the services resource
transformer in the shade plugin:


https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

rb

Nitin Motgi

unread,
Oct 19, 2015, 12:26:23 PM10/19/15
to Ryan Blue, CDK Development
Thanks Ryan. Will give it a shot. 

Nitin

Reply all
Reply to author
Forward
0 new messages