Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Removing duplicated protobuf messages

71 views
Skip to first unread message

Victor Cherviakov

unread,
Nov 11, 2024, 11:47:00 AM11/11/24
to Protocol Buffers
Hi guys, I'm pretty new to using protobuf and maybe this question has been answered multiple times, but 
is there a way to check for messages duplications and drop them before sending them over socket/wire?

The thing is, my code sometimes generating messages of the same type with same field values (identical messages) and I want to skip sending those duplicated message.

I know that protobuf is unhashable. However, if all the messages are generated within same process, is it possible that:
msg1.SerializeToString() != msg2.SerializeToString() ?

My idea was to use serialized values as dict key and the message as a value, so I won't have problems with same messages.

P.S. I am also trying to remove possibility of duplicate message generating, however I am not there yet

Thanks!


Samuel Benzaquen

unread,
Nov 11, 2024, 11:52:44 AM11/11/24
to Victor Cherviakov, Protocol Buffers
On Mon, Nov 11, 2024 at 11:46 AM Victor Cherviakov <vicher...@gmail.com> wrote:
Hi guys, I'm pretty new to using protobuf and maybe this question has been answered multiple times, but 
is there a way to check for messages duplications and drop them before sending them over socket/wire?

The thing is, my code sometimes generating messages of the same type with same field values (identical messages) and I want to skip sending those duplicated message.

I know that protobuf is unhashable. However, if all the messages are generated within same process, is it possible that:
msg1.SerializeToString() != msg2.SerializeToString() ?

Yes, you can have two equivalent messages that serialize to different bytes. There are many reasons for this.
Of course, identical bytes represent the same content.
Serialization is non-deterministic, and even if you force determinism it is non-canonical.

My idea was to use serialized values as dict key and the message as a value, so I won't have problems with same messages.

If you use the serialized bytes as a key in a cache you will have false negatives and might end up with extra entries representing the same message.
The best way might be to create a key yourself with all the fields that matter to you.
You could use reflection for this for a more generic approach.
 

P.S. I am also trying to remove possibility of duplicate message generating, however I am not there yet

Thanks!


--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/protobuf/1c7ad210-742b-488c-be9b-bfa11ed9d75an%40googlegroups.com.
Message has been deleted

Victor Cherviakov

unread,
Nov 11, 2024, 12:08:39 PM11/11/24
to Protocol Buffers
Samuel thanks,
is there also a possibility that two different messages may be serialized to same byte string? I guess no, but want to double check

Samuel Benzaquen

unread,
Nov 11, 2024, 12:26:36 PM11/11/24
to Victor Cherviakov, Protocol Buffers
On Mon, Nov 11, 2024 at 12:08 PM Victor Cherviakov <vicher...@gmail.com> wrote:
Samuel thanks,
is there also a possibility that two different messages may be serialized to same byte string? I guess no, but want to double check

Two different instances of the same type?
If you have two instances of the same type and they serialize to the same bytes, then they are equivalent.

Or two different message types?
Two different message types can have the serialized bytes without having the same content.
There is not enough information in the wire format to tell them apart if their message definition is similar enough.
 
Reply all
Reply to author
Forward
0 new messages