How to move a segment from one data source to another?

1,181 views
Skip to first unread message

Lukáš Havrlant

unread,
Mar 18, 2016, 1:59:40 PM3/18/16
to Druid User
Hi all,
what is the easiest was to move a segment from one data source to another? E. g. I re-index yesterday's data to temporary data source, I check if all data is correct a then I wanna to move these segments to production data source. I suppose I can use Hadoop batch re-indexing and reindex the the data with different output data source but it sounds too complicated. Is there any easier way? 

Lukáš Havrlant

Lukáš Havrlant

unread,
Apr 13, 2016, 4:46:38 PM4/13/16
to Druid User
Hi again,
so I've tried the procedure with the Hadoop Index Task and it has worked but still -- isn't there any other easier approach how to move segments from data source A to data source B? 

Lukáš

Fangjin Yang

unread,
Apr 15, 2016, 6:43:00 PM4/15/16
to Druid User
Hi,

Can you try http://druid.io/docs/0.9.0/ingestion/update-existing-data.html and use delta ingestion with the datasource name changed?

Lukáš Havrlant

unread,
Apr 16, 2016, 6:48:31 AM4/16/16
to druid...@googlegroups.com
Thank you Fangjin, I'll take a look at. So there is no way how to move segments to another data source without running some Druid's MapReduce job? 

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/yxFckLZ-GEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/951dcd57-6793-4ef2-987b-c84b3cc6bcd8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Fangjin Yang

unread,
Apr 26, 2016, 8:42:09 PM4/26/16
to Druid User, lu...@havrlant.cz
Actually with 0.9.0, you can look into using or extending http://druid.io/docs/0.9.0/operations/insert-segment-to-db.html


On Saturday, April 16, 2016 at 3:48:31 AM UTC-7, Lukáš Havrlant wrote:
Thank you Fangjin, I'll take a look at. So there is no way how to move segments to another data source without running some Druid's MapReduce job? 
2016-04-16 0:43 GMT+02:00 Fangjin Yang <fan...@imply.io>:
Hi,

Can you try http://druid.io/docs/0.9.0/ingestion/update-existing-data.html and use delta ingestion with the datasource name changed?

On Wednesday, April 13, 2016 at 1:46:38 PM UTC-7, Lukáš Havrlant wrote:
Hi again,
so I've tried the procedure with the Hadoop Index Task and it has worked but still -- isn't there any other easier approach how to move segments from data source A to data source B? 

Lukáš

On Friday, 18 March 2016 18:59:40 UTC+1, Lukáš Havrlant wrote:
Hi all,
what is the easiest was to move a segment from one data source to another? E. g. I re-index yesterday's data to temporary data source, I check if all data is correct a then I wanna to move these segments to production data source. I suppose I can use Hadoop batch re-indexing and reindex the the data with different output data source but it sounds too complicated. Is there any easier way? 

Lukáš Havrlant

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/yxFckLZ-GEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

sascha...@smaato.com

unread,
May 30, 2016, 8:22:40 AM5/30/16
to Druid User, lu...@havrlant.cz
We are currently trying to use the insert-segment-to-db CLI tool to import segments but we are having an issue.
Our deep storage is in S3. We manually copied segments into the target location where segments of the same kind are located. We pointed the workingPath property to where the segments are located in S3.
Then we started the CLI tool to insert these segments, but we get the following error back:

1) Unknown provider[s3] of Key[type=io.druid.segment.loading.DataSegmentFinder, annotation=[none]], known options[[hdfs, local]]
  at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:86)
  while locating io.druid.segment.loading.DataSegmentFinder

1 error
    at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1014)
    at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1040)
    at io.druid.cli.InsertSegment.run(InsertSegment.java:94)
    at io.druid.cli.Main.main(Main.java:105)

Searching github for implementation classes of DataSegmentFinder only turns up a LocalDataSegmentFinder.java and an HdfsDataSegmentFinder.java and they register schemes local and hdfs, but there doesn't seem to be a class in the Druid repo for registering the S3 scheme.

Is it currently not possible to insert segments with this tool if the deep storage is S3 or am I doing something wrong?






On Wednesday, April 27, 2016 at 2:42:09 AM UTC+2, Fangjin Yang wrote:
Actually with 0.9.0, you can look into using or extending http://druid.io/docs/0.9.0/operations/insert-segment-to-db.html

On Saturday, April 16, 2016 at 3:48:31 AM UTC-7, Lukáš Havrlant wrote:
Thank you Fangjin, I'll take a look at. So there is no way how to move segments to another data source without running some Druid's MapReduce job? 
2016-04-16 0:43 GMT+02:00 Fangjin Yang <fan...@imply.io>:
Hi,

Can you try http://druid.io/docs/0.9.0/ingestion/update-existing-data.html and use delta ingestion with the datasource name changed?

On Wednesday, April 13, 2016 at 1:46:38 PM UTC-7, Lukáš Havrlant wrote:
Hi again,
so I've tried the procedure with the Hadoop Index Task and it has worked but still -- isn't there any other easier approach how to move segments from data source A to data source B? 

Lukáš

On Friday, 18 March 2016 18:59:40 UTC+1, Lukáš Havrlant wrote:
Hi all,
what is the easiest was to move a segment from one data source to another? E. g. I re-index yesterday's data to temporary data source, I check if all data is correct a then I wanna to move these segments to production data source. I suppose I can use Hadoop batch re-indexing and reindex the the data with different output data source but it sounds too complicated. Is there any easier way? 

Lukáš Havrlant

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/yxFckLZ-GEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Nishant Bangarwa

unread,
May 31, 2016, 3:54:15 AM5/31/16
to Druid User, lu...@havrlant.cz
Hi, 
Right now DataSegmentFinder is not implemented for S3. 
For making insert-segment-to-db tool work for S3, S3DataSegmentFinder needs to be implemented. 
It would be great if you could submit a PR or a github issue for this. 

You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
Message has been deleted
Message has been deleted

Jakub Liska

unread,
Sep 9, 2016, 7:06:15 AM9/9/16
to Druid User, lu...@havrlant.cz
Hi fellas,

I implemented it https://github.com/druid-io/druid/pull/3446 ... If anybody needs it.

Sidharth Singla

unread,
Jul 11, 2017, 2:19:39 AM7/11/17
to Druid User, lu...@havrlant.cz
Hi,
I am getting following error - 

Exception in thread "main" com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Unknown provider[mysql] of Key[type=io.druid.metadata.SQLMetadataConnector, annotation=[none]], known options[[derby]]
  at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:86) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.metadata.storage.derby.DerbyMetadataStorageDruidModule)
  while locating io.druid.metadata.SQLMetadataConnector
    for the 3rd parameter of io.druid.metadata.IndexerSQLMetadataStorageCoordinator.<init>(IndexerSQLMetadataStorageCoordinator.java:92)
  while locating io.druid.metadata.IndexerSQLMetadataStorageCoordinator
  at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:86) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.metadata.storage.derby.DerbyMetadataStorageDruidModule)
  while locating io.druid.indexing.overlord.IndexerMetadataStorageCoordinator

1 error
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1028)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054)
at io.druid.cli.InsertSegment.run(InsertSegment.java:92)
at io.druid.cli.Main.main(Main.java:108)

Any idea about it ?

Regards 
Sidharth

On Wednesday, April 27, 2016 at 6:12:09 AM UTC+5:30, Fangjin Yang wrote:
Actually with 0.9.0, you can look into using or extending http://druid.io/docs/0.9.0/operations/insert-segment-to-db.html

On Saturday, April 16, 2016 at 3:48:31 AM UTC-7, Lukáš Havrlant wrote:
Thank you Fangjin, I'll take a look at. So there is no way how to move segments to another data source without running some Druid's MapReduce job? 
2016-04-16 0:43 GMT+02:00 Fangjin Yang <fan...@imply.io>:
Hi,

Can you try http://druid.io/docs/0.9.0/ingestion/update-existing-data.html and use delta ingestion with the datasource name changed?

On Wednesday, April 13, 2016 at 1:46:38 PM UTC-7, Lukáš Havrlant wrote:
Hi again,
so I've tried the procedure with the Hadoop Index Task and it has worked but still -- isn't there any other easier approach how to move segments from data source A to data source B? 

Lukáš

On Friday, 18 March 2016 18:59:40 UTC+1, Lukáš Havrlant wrote:
Hi all,
what is the easiest was to move a segment from one data source to another? E. g. I re-index yesterday's data to temporary data source, I check if all data is correct a then I wanna to move these segments to production data source. I suppose I can use Hadoop batch re-indexing and reindex the the data with different output data source but it sounds too complicated. Is there any easier way? 

Lukáš Havrlant

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/yxFckLZ-GEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Jihoon Son

unread,
Jul 11, 2017, 2:54:39 AM7/11/17
to druid...@googlegroups.com, lu...@havrlant.cz
Hey Sidharth,

To use the mysql metadata store, you need to include mysql-metadata-storage extension. Please check out http://druid.io/docs/latest/development/extensions-core/mysql.html.

Thanks,
Jihoon

2017년 7월 11일 (화) 오후 3:19, Sidharth Singla <sidpkl...@gmail.com>님이 작성:
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Sidharth Singla

unread,
Jul 11, 2017, 3:22:15 AM7/11/17
to Druid User, lu...@havrlant.cz
Hi
That I have already included.

Sidharth Singla

unread,
Jul 11, 2017, 4:26:17 AM7/11/17
to Druid User, lu...@havrlant.cz
It worked now. Had to change the mysql-metadata-storage owner.

Regards.

Jakub Liska

unread,
Oct 31, 2017, 8:58:24 PM10/31/17
to Druid User
I'm using a scala script for that in case anyone needed it :. https://gist.github.com/l15k4/cfe3109fb9b65c3cafe0433efc8e9de2
Reply all
Reply to author
Forward
0 new messages