Creating dataset backed by Hive changing database

12 views
Skip to first unread message

Buntu Dev

unread,
Mar 11, 2015, 2:05:16 AM3/11/15
to cdk...@cloudera.org
Currently when I create the dataset using Kite CLI, its visible under the 'default' database. Is there a way to specify/change the default database the dataset is created under? If so, how should I update the Kite dataset sink configs 'kite.repo.uri' and 'dataset.name' in Flume config to specify the new '<database>/dataset' ?

Thanks!

Joey Echeverria

unread,
Mar 11, 2015, 11:50:06 AM3/11/15
to Buntu Dev, cdk...@cloudera.org
Yes, you can change the Hive database by setting a namespace in the
URI. Refer to the full Hive URI documentation[1] if you need to, but
the gist is you need a URI like the following:

dataset:hive:db/table

where db is the name of the database/namespace and table is the name
of the table.

For you Flume config, the easiest way is to switch to the dataset.uri
config parameter and pass in the same URI you use for create.

Either way, you have to update Flume to the version in CDH 5.2 or
later as support for namespaces wasn't added until 5.2.

-Joey
> --
> You received this message because you are subscribed to the Google Groups
> "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdk-dev+u...@cloudera.org.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.



--
Joey Echeverria
Senior Infrastructure Engineer

Buntu Dev

unread,
Mar 11, 2015, 1:32:26 PM3/11/15
to Joey Echeverria, cdk...@cloudera.org
Thanks Joey, I was able to create the dataset this way 'dataset:hive:/user/hive/warehouse/db/<table>'.

I do see the dataset.uri in the FlumeConfigCommand but do not see the config parameter for the Kite dataset sink in the Flume doc:

Thanks!

Ryan Blue

unread,
Mar 11, 2015, 1:35:00 PM3/11/15
to Buntu Dev, Joey Echeverria, cdk...@cloudera.org
On 03/11/2015 10:32 AM, Buntu Dev wrote:
> Thanks Joey, I was able to create the dataset this way
> 'dataset:hive:/user/hive/warehouse/db/<table>'.
>
> I do see the dataset.uri in the FlumeConfigCommand
> <https://github.com/kite-sdk/kite/blob/master/kite-tools-parent/kite-tools/src/main/java/org/kitesdk/cli/commands/FlumeConfigCommand.java#L174> but
> do not see the config parameter for the Kite dataset sink in the Flume doc:
> https://flume.apache.org/FlumeUserGuide.html#kite-dataset-sink-experimental

The doc is based on the Flume 1.5.0 release, which had a much older
version of the sink. The new version is included in CDH and possibly
other distros, but hasn't made it in an upstream release yet.

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Buntu Dev

unread,
Mar 11, 2015, 1:39:03 PM3/11/15
to Ryan Blue, Joey Echeverria, cdk...@cloudera.org
We are using CDH 5.3.0, so should be good for now.. thanks again!

Joey Echeverria

unread,
Mar 11, 2015, 1:54:06 PM3/11/15
to Buntu Dev, Ryan Blue, cdk...@cloudera.org
You can always check the CDH-version of the user guide for the latest
docs of CDH releases:

http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.5.0-cdh5.3.0/FlumeUserGuide.html#kite-dataset-sink

-Joey

Buntu Dev

unread,
Mar 11, 2015, 1:58:01 PM3/11/15
to Joey Echeverria, Ryan Blue, cdk...@cloudera.org
Nice thanks!
Reply all
Reply to author
Forward
0 new messages