Message from discussion
incompatible type changes philosophy
Received: by 10.50.88.166 with SMTP id bh6mr672677igb.3.1336525477713;
Tue, 08 May 2012 18:04:37 -0700 (PDT)
X-BeenThere: protobuf@googlegroups.com
Received: by 10.50.181.201 with SMTP id dy9ls62445igc.1.gmail; Tue, 08 May
2012 18:04:35 -0700 (PDT)
Received: by 10.50.188.232 with SMTP id gd8mr15510823igc.4.1336525475530;
Tue, 08 May 2012 18:04:35 -0700 (PDT)
Received: by 10.50.188.232 with SMTP id gd8mr15510822igc.4.1336525475505;
Tue, 08 May 2012 18:04:35 -0700 (PDT)
Return-Path: <dwri...@google.com>
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182])
by gmr-mx.google.com with ESMTPS id no9si280135igc.0.2012.05.08.18.04.35
(version=TLSv1/SSLv3 cipher=OTHER);
Tue, 08 May 2012 18:04:35 -0700 (PDT)
Received-SPF: pass (google.com: domain of dwri...@google.com designates 209.85.214.182 as permitted sender) client-ip=209.85.214.182;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of dwri...@google.com designates 209.85.214.182 as permitted sender) smtp.mail=dwri...@google.com; dkim=pass header...@google.com
Received: by obcni5 with SMTP id ni5so12958443obc.41
for <protobuf@googlegroups.com>; Tue, 08 May 2012 18:04:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=google.com; s=20120113;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:cc:content-type:x-system-of-record;
bh=Tl+MryetHmz3xGXmZsc6m9cdNouCXP1cBUcuCxqyNuk=;
b=ZHF0ufZ3HOBAhk6l5aLJFXY8nlTaYNiO1oxF7POp8y3/TRozBDqLC7yQTg5OqKfVWz
S+AvJSQYh7VGkStMpUlbk3xOmsxqiHh1ipElnRguym2wIeU3Ue9flTbIa1PFHQj1WkQ1
EM2Vo9+gEuAiW3IdDsw2yl8SMl/t3s77b0qCuFjqVaWdC+veTEOiGa26PqFwsP3c93Y+
7roeoZMdL9IXgWd03hSSVbt287WaA/kpOVDw48QcF77ri6EI2HT9eihSKFsm/QGCyU5C
N3Y6Yr+2o+qiUEQ/GKtTPHfCqNOivefrWa4g/YBeNIUrlF/7HuStKbv5f1zRAYkjIL4G
ad1Q==
d=google.com; s=20120113;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:cc:content-type:x-system-of-record:x-gm-message-state;
bh=Tl+MryetHmz3xGXmZsc6m9cdNouCXP1cBUcuCxqyNuk=;
b=ggoCDemJYN76f7Xbpift4UzdEhkGoRP1GnA1vo/+lqaxwUBEwHyguYRviYPK0EM5/d
WAebNZvFf7huPGegMnAy0KUtBxvWJtayUSTX2VIToiVBRHXP+LW3bQJVOLf8X96veVDR
pjM8JYnTiBBurLPQBew+5K9M1bDn5QJmEJJGW51KIfpArELiQS1zCLhlA4bFv0ym9OtP
qLijqm2Bdz9ExQkfKIm2rWB32H7/Em6zsEYMAtN0SZ26hNkATFWUTiJOeES19xgketnK
HH5lpntEm403LBL7JCH2VzZZiZBBZAw8yy20334RibYE3WwwwiP6G5QS22wM3nkBDI7B
h4qw==
Received: by 10.182.232.38 with SMTP id tl6mr10685481obc.16.1336525475131;
Tue, 08 May 2012 18:04:35 -0700 (PDT)
Received: by 10.182.232.38 with SMTP id tl6mr10685465obc.16.1336525474905;
Tue, 08 May 2012 18:04:34 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.62.231 with HTTP; Tue, 8 May 2012 18:04:14 -0700 (PDT)
In-Reply-To: <81bebe0a-1b2b-4b34-8c78-8f80d78b1...@t23g2000yqd.googlegroups.com>
References: <81bebe0a-1b2b-4b34-8c78-8f80d78b1...@t23g2000yqd.googlegroups.com>
From: Daniel Wright <dwri...@google.com>
Date: Tue, 8 May 2012 18:04:14 -0700
Message-ID: <CAMhiWzXhEK=yeJvW9FE_4WwGz5YK9HHkE-Tv4Tj4s1pdi4J...@mail.gmail.com>
Subject: Re: [protobuf] incompatible type changes philosophy
To: Jeremy Stribling <st...@nicira.com>
Cc: Protocol Buffers <protobuf@googlegroups.com>
Content-Type: multipart/alternative; boundary=f46d044472c12a3dea04bf901660
X-System-Of-Record: true
X-Gm-Message-State: ALoCoQnNlfFgCbSZGZncotKfDdaAxa3qXPptVBPA1zIFCQBee/q47C5MShv9ZnYy4lqdx2PL9X24eTvKz01NC4YVE2EerFst7lp+vsWNvhRShdOucTdCNQuIxgyqVLnvA0gM3u2pG9JC2CEXRUHwa5t5QrPQNg3bhg==
--f46d044472c12a3dea04bf901660
Content-Type: text/plain; charset=ISO-8859-1
On Tue, May 8, 2012 at 4:42 PM, Jeremy Stribling <st...@nicira.com> wrote:
> I'm working on a project to upgrade- and downgrade-proof a distributed
> system that uses protobufs to communicate data between instances of a C
> ++ program. I'm trying to cover all possible cases for data schema
> changes between versions of my programs, and I was hoping to get some
> insight from the community on what the best practice is for the
> following tricky scenario.
>
> To reduce serialization type and protobuf message size, the format of
> a field in a message is changed between incompatible types. For
> example, a string field gets changed to an int, or perhaps a field
> gets changed from one message type to another. Because this is being
> done as an optimization, it makes no sense to keep both versions of
> the data around, so I think whether we change the field ID is not
> relevant -- we only ever want to have one version of the field in any
> particular protobuf.
>
Even though you don't keep both versions of the data around, you should
keep both fields around, and have the code be able to read from whichever
is set during the transition. You can rename the old one (say put
"deprecated" in the name) so that people know that it's old, but don't
actually remove it from the .proto file until no old instances of the proto
remain. To put it more concretely, say you have
optional string my_data = 1;
Now you come up with a way to encode it as an int64 instead. You'd change
the .proto to:
optional string deprecated_my_data = 1;
optional int64 my_data = 2;
- At this point, you write the data to "deprecated_my_data" and not
"my_data", but when you read, you check has_my_data() and
has_deprecated_my_data() and read from whichever one is present. It might
help to wrapper functions for reading and writing during the transition if
the field is accessed in many places.
- once all instances of the program have been re-compiled so they all know
about the new int64 field, you can start writing to my_data and not
deprecated_my_data.
- once all of the instances of the program have been recompiled again, you
can remove the code that reads deprecated_my_data, and delete the field.
This is kind of painful, but it's much cleaner than adding a version
number. It also only ever writes the data to one field, so there's no
bloat during the transition.
Daniel
Of course, this makes communicating between versions of the program
> very difficult, and I think it requires there to be some kind of
> translator code to transform the field from one format to the other.
> Ideally, this transformation would be invisible to the rest of the
> program. One ugly thought I had was to have a version field in every
> message, and then in the autogenerated C++ serialize code, maybe in
> MergePartialCodedFromStream, I could insert a call to an external
> translator program that would transform the input bytes into something
> that could be decoded by the version of the message expected by this
> instance of the program. I don't think there's an insertion point
> defined for this part of the code, so I'd have to write my own script
> to do it. The external translator program could be upgraded
> independently of the main program, so older versions would know how to
> intepret the fields of the newer versions.
>
> I'm wondering if anyone has experience with a scenario like this, and
> if there's a more elegant way to solve it. If not, what do folks
> think of this business of an external translator program? Foolish
> nonsense? Worthy of a proper insertion point?
>
> Thanks,
>
> Jeremy
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to protobuf@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>
--f46d044472c12a3dea04bf901660
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div class=3D"gmail_quote">On Tue, May 8, 2012 at 4:42 PM, Jeremy Stribling=
<span dir=3D"ltr"><<a href=3D"mailto:st...@nicira.com" target=3D"_blank=
">st...@nicira.com</a>></span> wrote:<br><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm working on a project to upgrade- and downgrade-proof a distributed<=
br>
system that uses protobufs to communicate data between instances of a C<br>
++ program. =A0I'm trying to cover all possible cases for data schema<b=
r>
changes between versions of my programs, and I was hoping to get some<br>
insight from the community on what the best practice is for the<br>
following tricky scenario.<br>
<br>
To reduce serialization type and protobuf message size, the format of<br>
a field in a message is changed between incompatible types. =A0For<br>
example, a string field gets changed to an int, or perhaps a field<br>
gets changed from one message type to another. =A0Because this is being<br>
done as an optimization, it makes no sense to keep both versions of<br>
the data around, so I think whether we change the field ID is not<br>
relevant -- we only ever want to have one version of the field in any<br>
particular protobuf.<br></blockquote><div><br></div><div>Even though you do=
n't keep both versions of the data around, you should keep both fields =
around, and have the code be able to read from whichever is set during the =
transition. =A0You can rename the old one (say put "deprecated" i=
n the name) so that people know that it's old, but don't actually r=
emove it from the .proto file until no old instances of the proto remain. =
=A0To put it more concretely, say you have</div>
<div><br></div><div>=A0 optional string my_data =3D 1;</div><div><br></div>=
<div>Now you come up with a way to encode it as an int64 instead. =A0You=
9;d change the .proto to:</div><div><br></div><div>=A0 optional string depr=
ecated_my_data =3D 1;</div>
<div>=A0 optional int64 my_data =3D 2;</div><div><br></div><div>- At this p=
oint, you write the data to "deprecated_my_data" and not "my=
_data", but when you read, you check has_my_data() and has_deprecated_=
my_data() and read from whichever one is present. =A0It might help to wrapp=
er functions for reading and writing during the transition if the field is =
accessed in many places.</div>
<div><br></div><div>- once all instances of the program have been re-compil=
ed so they all know about the new int64 field, you can start writing to my_=
data and not deprecated_my_data.</div><div><br></div><div>- once all of the=
instances of the program have been recompiled again, you can remove the co=
de that reads deprecated_my_data, and delete the field.</div>
<div><br></div><div>This is kind of painful, but it's much cleaner than=
adding a version number. =A0It also only ever writes the data to one field=
, so there's no bloat during the transition.</div><div><br></div><div>
Daniel</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Of course, this makes communicating between versions of the program<br>
very difficult, and I think it requires there to be some kind of<br>
translator code to transform the field from one format to the other.<br>
Ideally, this transformation would be invisible to the rest of the<br>
program. =A0One ugly thought I had was to have a version field in every<br>
message, and then in the autogenerated C++ serialize code, maybe in<br>
MergePartialCodedFromStream, I could insert a call to an external<br>
translator program that would transform the input bytes into something<br>
that could be decoded by the version of the message expected by this<br>
instance of the program. =A0I don't think there's an insertion poin=
t<br>
defined for this part of the code, so I'd have to write my own script<b=
r>
to do it. =A0The external translator program could be upgraded<br>
independently of the main program, so older versions would know how to<br>
intepret the fields of the newer versions.<br>
<br>
I'm wondering if anyone has experience with a scenario like this, and<b=
r>
if there's a more elegant way to solve it. =A0If not, what do folks<br>
think of this business of an external translator program? =A0Foolish<br>
nonsense? =A0Worthy of a proper insertion point?<br>
<br>
Thanks,<br>
<br>
Jeremy<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
--<br>
You received this message because you are subscribed to the Google Groups &=
quot;Protocol Buffers" group.<br>
To post to this group, send email to <a href=3D"mailto:protobuf@googlegroup=
s.com">protobuf@googlegroups.com</a>.<br>
To unsubscribe from this group, send email to <a href=3D"mailto:protobuf%2B=
unsubscribe@googlegroups.com">protobuf+unsubscribe@googlegroups.com</a>.<br=
>
For more options, visit this group at <a href=3D"http://groups.google.com/g=
roup/protobuf?hl=3Den" target=3D"_blank">http://groups.google.com/group/pro=
tobuf?hl=3Den</a>.<br>
<br>
</font></span></blockquote></div><br>
--f46d044472c12a3dea04bf901660--