Migration of Druid data between environments

991 views
Skip to first unread message

Maurizio Gallelli

unread,
Oct 6, 2015, 12:13:21 PM10/6/15
to Druid User
Hi,
I've a Druid platform running in the last months, I'm now moving to a new platform (completely different environment) that is now ingesting in parallel with the old one.
I'd like to have previous data in new environment, which is the best way to migrate data from one platform to the other?

Thanks
Maurizio

charles.allen

unread,
Oct 6, 2015, 12:33:47 PM10/6/15
to Druid User
I took a brief look at 

to see if it would fit your use case and I don't think it will because the mover assumes you're not going across loadSpec types. (aka, you're staying within S3, HDFS, cassandra, azure, etc... not moving from one type to another)

As such it would require development effort to get a task which does proper locking, copying locally then remotely, updating segment metadata, and verification.

I'm curious if there's a way to get a hadoop task to do this as part of distcp or similar. I'll try and ping one of the other devs about it.

Cheers,
Charles Allen

charles.allen

unread,
Oct 6, 2015, 12:36:32 PM10/6/15
to Druid User
I should also note that moving the metadata store without downtime is non-trivial and will require the assistance of a DBA or other expert.

Eric Tschetter

unread,
Oct 6, 2015, 1:03:24 PM10/6/15
to Druid User
Moving the old data requires

1) Moving the actual segments
2) Copying the metadata in the segments table on the metadata store
3) Updating the meta to point to the new location of the files. If
you are changing the type of deep storage, that can require not just
adjusting the path but also some other parts of the "payload" portion
of the segments table. If you analyze the things that are currently
loading in parallel and line them up with the data you've copied over,
it should work.

--Eric
> --
> You received this message because you are subscribed to the Google Groups
> "Druid User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to druid-user+...@googlegroups.com.
> To post to this group, send email to druid...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/druid-user/76a5e581-183b-4bbf-93ff-c1ebd6e7ddca%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jason Cheow

unread,
Mar 17, 2016, 7:33:55 AM3/17/16
to Druid User
Hi Eric,

Found this thread as I am currently trying to migrate data. I'm currently at point 3 of your steps listed below, where I need to update the "payload" column of the segments table. I understand it's stored as a JSON blob, so my thought would be to write a script to convert it to JSON, edit it, convert it back to a blob, and updating the "payload" column. Apart from this, is there an easier way to achieve this goal?

Regards,
Jason

Maurizio Gallelli

unread,
Mar 17, 2016, 7:47:01 AM3/17/16
to Druid User
Hi Jason,
what I've personally done was to:
- dump the data from MySQL
- replace the path into the payload column using sed/awk 
- restoring the modified dump into new MySQL

Thanks
Maurizio

Nishant Bangarwa

unread,
Mar 17, 2016, 12:42:56 PM3/17/16
to Druid User
Hi Jason, 
Just to mention there is a tool added by bingkun in 0.9.0 which allows you to update the payload of mysql segments. 
docs on how to use that can be found here - 



Jason Cheow

unread,
Mar 17, 2016, 10:17:29 PM3/17/16
to Druid User
Hi Maurizio and Nishant,

Thanks for the suggestions. I'll try the insert-segment-to-db tool since it's a coming tool that will be released soon.

Regards,
Jason

Jakub Liska

unread,
Apr 25, 2016, 10:23:49 AM4/25/16
to Druid User
Would please anybody add an s3 deep storage example to http://druid.io/docs/0.9.0/operations/insert-segment-to-db.html ?

If this is for hdfs :

--workingDir hdfs://host:port/druid/storage/wikipedia

Would it work with s3 deep storage like this ? :

--workingDir s3://bucket/druid/storage/wikipedia

I'm planning to have 2 clusters sharing a single s3 deep storage location. Only one cluster will be indexing. So I should just create a new cluster and use the migration tool
to hook into the existing segments on s3...

Nishant Bangarwa

unread,
Apr 25, 2016, 10:43:22 AM4/25/16
to Druid User
Hi Jakub, 
Right now the insert segment tool does not work for S3. 

Feel free to create an issue or submit a PR for it. 


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
Message has been deleted

Jakub Liska

unread,
Sep 9, 2016, 7:04:39 AM9/9/16
to Druid User
Hi fellas,

I implemented it https://github.com/druid-io/druid/pull/3446 ... If anybody needs it.
Reply all
Reply to author
Forward
0 new messages