Handling rr forward compatibility issues

Robert O'Callahan

unread,

May 15, 2021, 7:49:19 AM5/15/21

to rr-d...@googlegroups.com

For some years now we have maintained backward compatibility: new rr versions can replay recordings made by any version of rr since 5.0. This has worked well.

We haven't promised forward compatibility, and once in a while we make changes that break forward compatibility. For example I just landed https://github.com/rr-debugger/rr/commit/5c6e992b2ad174e980dd3b08987218ceaf5a5db1 to handle mmaps of large sparse files. This requires adding some metadata to MemWrites in the trace, indicating where "holes" occur in the written data (regions of all-zeroes). Some recordings made by rr after that commit will not replay correctly in rr built before that commit. I think changes of this kind are inevitable, though relatively rare in practice.

However, it's unfortunate that if you try to replay a new recording in an old rr, you'll just get a mysterious crash at some point during the replay. Ideally rr would immediately exit cleanly with an error indicating what the problem is. So I propose that we bake a "forward compatibility version number" into rr, set it to 1, and add it to the header with old traces defaulting to 0. Every time we break forward compatibility in the future (i.e, whenever we make rr changes so that it may produce traces that don't replay in older rr) we will increment that version number. rr replay will check that version number and if it's higher than the version number that rr build knows about, rr will abort the replay with a suitable message.

Obviously this wouldn't help with existing rr builds but it might ease pain in the future.

Any thoughts?

Rob
--

"He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all." [Isaiah 53:5-6]

Kyle Huey

unread,

May 15, 2021, 9:09:57 AM5/15/21

to Robert O'Callahan, rr-d...@googlegroups.com

On Sat, May 15, 2021 at 4:49 AM Robert O'Callahan <rocal...@gmail.com> wrote:
>
> For some years now we have maintained backward compatibility: new rr versions can replay recordings made by any version of rr since 5.0. This has worked well.
>
> We haven't promised forward compatibility, and once in a while we make changes that break forward compatibility. For example I just landed https://github.com/rr-debugger/rr/commit/5c6e992b2ad174e980dd3b08987218ceaf5a5db1 to handle mmaps of large sparse files. This requires adding some metadata to MemWrites in the trace, indicating where "holes" occur in the written data (regions of all-zeroes). Some recordings made by rr after that commit will not replay correctly in rr built before that commit. I think changes of this kind are inevitable, though relatively rare in practice.
>
> However, it's unfortunate that if you try to replay a new recording in an old rr, you'll just get a mysterious crash at some point during the replay. Ideally rr would immediately exit cleanly with an error indicating what the problem is. So I propose that we bake a "forward compatibility version number" into rr, set it to 1, and add it to the header with old traces defaulting to 0. Every time we break forward compatibility in the future (i.e, whenever we make rr changes so that it may produce traces that don't replay in older rr) we will increment that version number. rr replay will check that version number and if it's higher than the version number that rr build knows about, rr will abort the replay with a suitable message.
>
> Obviously this wouldn't help with existing rr builds but it might ease pain in the future.
>
> Any thoughts?

I'm not particularly familiar with capnproto but are we going to get
these mysterious crashes all the time or only if a new recording uses
the new feature? If it's the latter, then making this a hard error all
the time doesn't seem like a great idea.

- Kyle

Robert O'Callahan

unread,

May 15, 2021, 9:25:44 AM5/15/21

to Kyle Huey, rr-d...@googlegroups.com

On Sun, May 16, 2021 at 1:09 AM Kyle Huey <m...@kylehuey.com> wrote:

I'm not particularly familiar with capnproto but are we going to get
these mysterious crashes all the time or only if a new recording uses
the new feature? If it's the latter, then making this a hard error all
the time doesn't seem like a great idea.

It's the latter, in this case. We'll get a crash in every recording where someone mmapped a file with a hole (including stuff like memfd_create files) and we put the initial data for the mapping into the trace ... because the old rr will read data from the compressed raw data stream that doesn't belong to that event. I think that'll be pretty frequent.

I don't know a better alternative approach. Trying to detect which features have been used and setting the forward-compatibility version appropriately would become an increasing maintenance burden over time.

Rob

--

Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy ot mialcorp ew dna, ti ot yfitset dna ti nees evah ew; deraeppa efil eht. Efil fo Drow eht gninrecnoc mialcorp ew siht - dehcuot evah sdnah ruo dna ta dekool evah ew hcihw, seye ruo htiw nees evah ew hcihw, draeh evah ew hcihw, gninnigeb eht morf saw hcihw taht.

Kyle Huey

unread,

May 18, 2021, 5:39:16 PM5/18/21

to Robert O'Callahan, rr-d...@googlegroups.com

On Sat, May 15, 2021 at 6:25 AM Robert O'Callahan <rob...@ocallahan.org> wrote:
>
> On Sun, May 16, 2021 at 1:09 AM Kyle Huey <m...@kylehuey.com> wrote:
>>
>> I'm not particularly familiar with capnproto but are we going to get
>> these mysterious crashes all the time or only if a new recording uses
>> the new feature? If it's the latter, then making this a hard error all
>> the time doesn't seem like a great idea.
>
>
> It's the latter, in this case. We'll get a crash in every recording where someone mmapped a file with a hole (including stuff like memfd_create files) and we put the initial data for the mapping into the trace ... because the old rr will read data from the compressed raw data stream that doesn't belong to that event. I think that'll be pretty frequent.

Right, ok.

> I don't know a better alternative approach. Trying to detect which features have been used and setting the forward-compatibility version appropriately would become an increasing maintenance burden over time.

I agree that trying to be granular here is probably not worth the effort.

- Kyle

Robert O'Callahan

unread,

May 18, 2021, 8:54:09 PM5/18/21

to Kyle Huey, rr-devel

I've implemented this on master.

From now on when we change something that means new recordings may fail to replay in old rr builds so we want old rr builds to fail cleanly, bump FORWARD_COMPATIBILITY_VERSION in TraceStream.h.

Of course, rr builds from before this change lack the check, so they won't fail cleanly if FORWARD_COMPATIBILITY_VERSION is bumped. We could bump TRACE_VERSION, but that adds a bit of complexity and would give the wrong error message --- old rrs would say "Did you record `%s' with an older version of rr? If so, you'll need to replay `%s' with that older version. Otherwise, your trace is likely corrupted." which is pretty misleading. So I don't think this is worth worrying about; this is future proofing.

Reply all

Reply to author

Forward