Migration to double columns in 0.11

65 views
Skip to first unread message

Gian Merlino

unread,
Jul 24, 2017, 7:42:15 PM7/24/17
to druid-de...@googlegroups.com
https://github.com/druid-io/druid/pull/4491 added support for 64-bit double columns in such a way that you get one when you specify a "doubleSum" aggregator at ingestion time. This is a behavior change from 0.10, where "doubleSum" would get you a float column. It's possible to get the old behavior back by switching the ingestion-time aggregator to "floatSum", which generates a 32-bit float column.

I think this is going to cause some problems when migrating existing clusters to 0.11. Consider rolling out the code to new middleManagers. The new "floatSum" aggregator cannot be used until all middleManagers are updated, since before then, not all of them will recognize it. So "doubleSum" must be used until all middleManagers are updated and stable.

During this rollout period, middleManagers will start creating double columns instead of float columns. There's no good way around that right now, which has a couple of down-sides.

1) Sites that want to opt-out of the new behavior will not be able to, since there is some period of time where they are forced to generate 64-bit double columns. There are legitimate reasons for wanting to opt-out, including controlling segment size (the new 64-bit columns will be larger than the previously generated 32-bit ones), caution regarding new code paths (32-bit is more tried and tested), and desire to be able to roll back historicals (which will be difficult if segments with double columns exist).

2) During the rollout period, delta indexing or reindexing can fail for no good reason, if the task is trying to read a segment with double columns and is scheduled on a middleManager that doesn't support double columns yet.

I suggest we address this by adding a runtime property that makes doubleSum revert to the old behavior of generating 32-bit columns. I'm not sure if it should be on or off by default. But either way, I think it should exist.

Gian

Gian Merlino

unread,
Jul 25, 2017, 4:56:32 PM7/25/17
to druid-de...@googlegroups.com
I meant to bring this up on the dev sync this morning but forgot. I do think something should change here though before 0.11.

Gian

Slim Bouguerra

unread,
Jul 25, 2017, 6:04:17 PM7/25/17
to Druid Development
@Gian Valid concern, i will argue in favor of 64bit representation by default to make sure that the community gets the correct representation by default.

Gian Merlino

unread,
Jul 25, 2017, 6:35:07 PM7/25/17
to druid-de...@googlegroups.com
What do you think about defaulting to 32 bit in 0.11, and 64 bit in 0.12?

I feel that adding the functionality and defaulting it on in the same release is going to mean that unless people remember set the property while upgrading, they will see issues during the rollout with things like delta indexing and reindexing. With gradually enabling the feature (waiting until 0.12 to default to 64 bit) then nobody will run into issues. And people that want it in 0.11 can set the property. We would probably even include it in our distribution's default configs.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/d9a9ce30-98c9-4798-b6e6-e34d1bb4433c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Slim Bouguerra

unread,
Jul 26, 2017, 11:43:48 AM7/26/17
to druid-de...@googlegroups.com
sounds reasonable 
-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/xCGedPwoBh0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACZNdYB56WY9-YkLymjoULazKWOBVjbUzd5cCEh3joCbMUgtvA%40mail.gmail.com.

Gian Merlino

unread,
Jul 26, 2017, 2:43:17 PM7/26/17
to druid-de...@googlegroups.com

Gian

On Wed, Jul 26, 2017 at 8:43 AM, Slim Bouguerra <slim.bo...@gmail.com> wrote:
sounds reasonable 
-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

On Jul 25, 2017, at 3:34 PM, Gian Merlino <gi...@imply.io> wrote:

What do you think about defaulting to 32 bit in 0.11, and 64 bit in 0.12?

I feel that adding the functionality and defaulting it on in the same release is going to mean that unless people remember set the property while upgrading, they will see issues during the rollout with things like delta indexing and reindexing. With gradually enabling the feature (waiting until 0.12 to default to 64 bit) then nobody will run into issues. And people that want it in 0.11 can set the property. We would probably even include it in our distribution's default configs.

Gian

On Tue, Jul 25, 2017 at 3:04 PM, Slim Bouguerra <slim.bouguerra@gmail.com> wrote:
@Gian Valid concern, i will argue in favor of 64bit representation by default to make sure that the community gets the correct representation by default.


On Tuesday, July 25, 2017 at 1:56:32 PM UTC-7, Gian Merlino wrote:
I meant to bring this up on the dev sync this morning but forgot. I do think something should change here though before 0.11.

Gian

On Mon, Jul 24, 2017 at 4:41 PM, Gian Merlino <gi...@imply.io> wrote:
https://github.com/druid-io/druid/pull/4491 added support for 64-bit double columns in such a way that you get one when you specify a "doubleSum" aggregator at ingestion time. This is a behavior change from 0.10, where "doubleSum" would get you a float column. It's possible to get the old behavior back by switching the ingestion-time aggregator to "floatSum", which generates a 32-bit float column.

I think this is going to cause some problems when migrating existing clusters to 0.11. Consider rolling out the code to new middleManagers. The new "floatSum" aggregator cannot be used until all middleManagers are updated, since before then, not all of them will recognize it. So "doubleSum" must be used until all middleManagers are updated and stable.

During this rollout period, middleManagers will start creating double columns instead of float columns. There's no good way around that right now, which has a couple of down-sides.

1) Sites that want to opt-out of the new behavior will not be able to, since there is some period of time where they are forced to generate 64-bit double columns. There are legitimate reasons for wanting to opt-out, including controlling segment size (the new 64-bit columns will be larger than the previously generated 32-bit ones), caution regarding new code paths (32-bit is more tried and tested), and desire to be able to roll back historicals (which will be difficult if segments with double columns exist).

2) During the rollout period, delta indexing or reindexing can fail for no good reason, if the task is trying to read a segment with double columns and is scheduled on a middleManager that doesn't support double columns yet.

I suggest we address this by adding a runtime property that makes doubleSum revert to the old behavior of generating 32-bit columns. I'm not sure if it should be on or off by default. But either way, I think it should exist.

Gian


-- 
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsub...@googlegroups.com.


-- 
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/xCGedPwoBh0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-development+unsub...@googlegroups.com.

To post to this group, send email to druid-development@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

Slim Bouguerra

unread,
Jul 26, 2017, 3:01:15 PM7/26/17
to druid-de...@googlegroups.com
thanks Gian for doing this !

-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CACZNdYBsahPCNf-jibATfo%2Bsoh7ZN6QsvUkhM5y3ns2_30qkQA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages