Migrating data from a relational db to Druid

1,710 views
Skip to first unread message

Konojowara

unread,
Jul 21, 2015, 8:54:03 AM7/21/15
to druid-de...@googlegroups.com
I've been reading up on druid a few days and am familiar with it to some extent. What I need is to migrate some of the tables from a relational db to it. Could anyone give me a few pointers, that is:

1) How can I create the "tables" in Druid to which I'll migrate the data from the appropriate tables from my current db?
2) How can I actually migrate those tables? Will I have to export them to csv first and then "upload" to Druid? If so then how exactly?

Gian Merlino

unread,
Jul 21, 2015, 10:14:40 AM7/21/15
to druid-de...@googlegroups.com
Druid datasources (you can think of them as tables) are created automatically when you insert data to them. There are a couple ways to get data out of a relational db into Druid. If it's a smallish amount of data (~1GB or less) then you can just dump it to CSV/TSV/JSON and index it using Druid's builtin "index task". If it's a larger amount, you could load it into Hadoop as CSV/TSV/JSON and then once it's there, use Druid's Hadoop indexer to load it into Druid. Something like Sqoop (http://sqoop.apache.org/) might help there.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/ee41733e-0071-4702-8571-07d58b3f75fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Konojowara

unread,
Jul 21, 2015, 12:58:03 PM7/21/15
to druid-de...@googlegroups.com
Thanks.
1) So it's similar to how the "tables" in MongoDb get created - when you save something to them?
2) And how can I create what you call a "database" (in relational dbs) in Druid?
3) What are
Druid::Client.new('zk1:2181,zk2:2181/druid').query('service/source')

"service" and "source" here?

Gian Merlino

unread,
Jul 21, 2015, 1:01:38 PM7/21/15
to druid-de...@googlegroups.com
There isn't really a concept in druid of a grouping of tables (what other systems might call a "database"). Each datasource is independent of each other datasource.

Which client binding is that? I *think* "service" would be a service discovery key for a Druid broker, and "source" would be your datasource.

Konojowara

unread,
Jul 21, 2015, 11:51:36 PM7/21/15
to Druid Development

Konojowara

unread,
Jul 22, 2015, 5:26:30 AM7/22/15
to Druid Development
Looking at the druid documentation, I don't see a simple way to insert data into from a csv file. I see this http://druid.io/docs/0.6.52/Tasks.html but it seems it not only imports data but is doing a lot of extra work. This http://druid.io/docs/latest/ingestion/data-formats.html doesn't describe how to do that at all and only describes the formats.

So how can I insert data into from a csv or json file in a simple way?

Fangjin Yang

unread,
Jul 23, 2015, 12:08:08 AM7/23/15
to Druid Development, alex.masla...@gmail.com, alex.masla...@gmail.com
Hi Konojowara, just curious why you are looking at docs that old for Druid?

You can look at using the index task for a small batch of CSV data.

Reply all
Reply to author
Forward
0 new messages