inconsistency in specification

Marcus Held

unread,

May 21, 2014, 4:43:56 AM5/21/14

to smile-forma...@googlegroups.com

Hi,

I'm currenty working with your specification of smile as part of my bachelor thesis and I found a possible inconsistency in it.

In chapter "Low-Level-Format" you say: "0xFE is reserved for future use, and not used for anything currently." but some lines later you write: "End-of-String marker byte (0xFE) for variable length Strings.". Is this a mistake?

Regards,
Macus

Tatu Saloranta

unread,

May 21, 2014, 3:35:37 PM5/21/14

to smile-forma...@googlegroups.com

I will have to double-check this; but yes, that is an inconsistency in explanation. I think latter is true, but let me have a closer look to be sure. And after that, update specification.

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "smile-format-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tatu Saloranta

unread,

May 21, 2014, 4:11:30 PM5/21/14

to smile-forma...@googlegroups.com

Ok: the answer is that 0xFE _is_ reserved and is NOT used as end marker. In latter case, it should read "end-of-string marker byte (0xFC)". So I fixed that in specification.

Thank you for reporting this inconsistency. It is crucial to have un-ambiguous specification, and not just solid reference implementation, so that Smile codecs can be implemented with high level of interoperability.

Also, I would be interested in learning more about your thesis, if you feel you can share some information. And if not yet at this point, perhaps you can share more info at a later point (either on list, or via direct email).

-+ Tatu +-

Marcus Held

unread,

May 22, 2014, 5:47:06 AM5/22/14

to smile-forma...@googlegroups.com

Thank you for your fast reactions. As I worked more with the specification I also noticed this when you describe the structure markers in value mode as end-of-string marker with 0xFC.

I'm able to tell you some details about my thesis. Currently I'm working at a german company who produces online games and the server-client communication is based on Base64 encoded JSON which produces much unnecessary overhead. As part of my thesis I evaluate new binary serialization formats for wrapping the existing JSON based communication. Smile is one of the formats I will evaluate in our test case. If you are able to read german (or someone of your team) I could send you the releated parts of my thesis and maybe you get some new knowledge from it (And of course I could get perfect feedback for my work ;-) ).

Regars,
Marcus

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-discussion+unsub...@googlegroups.com.

Tatu Saloranta

unread,

May 22, 2014, 1:48:13 PM5/22/14

to smile-forma...@googlegroups.com

On Thu, May 22, 2014 at 2:47 AM, Marcus Held <marcu...@gmail.com> wrote:

Thank you for your fast reactions. As I worked more with the specification I also noticed this when you describe the structure markers in value mode as end-of-string marker with 0xFC.

Ok.

I'm able to tell you some details about my thesis. Currently I'm working at a german company who produces online games and the server-client communication is based on Base64 encoded JSON which produces much unnecessary overhead. As part of my thesis I evaluate new binary

Yes, that would add overhead, especially if it's done for whole JSON, and not just binary data (there's overhead for binary too of course but usually fewer blocks).

serialization formats for wrapping the existing JSON based communication. Smile is one of the formats I will evaluate in our test case. If you are able to read german (or someone of your team) I could send you the releated parts of my thesis and maybe you get some new knowledge from it (And of course I could get perfect feedback for my work ;-) ).

My german is pretty limited (unfortunately), but I have teutonic friends who should be able to translate.

-+ Tatu +-

Regars,
Marcus

Am Mittwoch, 21. Mai 2014 22:11:30 UTC+2 schrieb tsaloranta:

Ok: the answer is that 0xFE _is_ reserved and is NOT used as end marker. In latter case, it should read "end-of-string marker byte (0xFC)". So I fixed that in specification.

Thank you for reporting this inconsistency. It is crucial to have un-ambiguous specification, and not just solid reference implementation, so that Smile codecs can be implemented with high level of interoperability.

Also, I would be interested in learning more about your thesis, if you feel you can share some information. And if not yet at this point, perhaps you can share more info at a later point (either on list, or via direct email).

-+ Tatu +-

On Wed, May 21, 2014 at 12:35 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:

I will have to double-check this; but yes, that is an inconsistency in explanation. I think latter is true, but let me have a closer look to be sure. And after that, update specification.

-+ Tatu +-

On Wed, May 21, 2014 at 1:43 AM, Marcus Held <marcu...@gmail.com> wrote:

Hi,

I'm currenty working with your specification of smile as part of my bachelor thesis and I found a possible inconsistency in it.

In chapter "Low-Level-Format" you say: "0xFE is reserved for future use, and not used for anything currently." but some lines later you write: "End-of-String marker byte (0xFE) for variable length Strings.". Is this a mistake?

Regards,
Macus

--
You received this message because you are subscribed to the Google Groups "smile-format-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-discussion+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "smile-format-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-disc...@googlegroups.com.

Marcus Held

unread,

Jun 2, 2014, 4:38:31 AM6/2/14

to smile-forma...@googlegroups.com

Hi,

unfortunately I had to work on some other issues last week, but i resumed working through the specification of smile today :-) When I finished the part I'll send you the draft.

Also I might find another typo. In the section "Tokens: key mode" you write "Byte ranges are divides in 4 main sections (64 byte values each)" shouldn't it be bit and not byte? Also the wiki is unreachable for the moment (but I saved the relevant part, so no problem for me ;-) ).

Regards,
Marcus

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-discussion+unsubscr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Tatu Saloranta

unread,

Jun 2, 2014, 12:36:50 PM6/2/14

to smile-forma...@googlegroups.com

On Mon, Jun 2, 2014 at 1:38 AM, Marcus Held <marcu...@gmail.com> wrote:

Hi,

unfortunately I had to work on some other issues last week, but i resumed working through the specification of smile today :-) When I finished the part I'll send you the draft.

Also I might find another typo. In the section "Tokens: key mode" you write "Byte ranges are divides in 4 main sections (64 byte values each)" shouldn't

I'll have to re-read it, but I think that would refer to 4 groups of 64 distinct byte values (from 256 distinct values an individual byte can have).

it be bit and not byte? Also the wiki is unreachable for the moment (but I

saved the relevant part, so no problem for me ;-) ).

I got another report for this, will try to see why hosting is unavailable. Thanks!

-+ Tatu +-

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-disc...@googlegroups.com.

Marcus Held

unread,

Jun 3, 2014, 5:19:11 AM6/3/14

to smile-forma...@googlegroups.com

Hi,

I've got another question at this point: you noticed in the appendix, that the last byte in the ZigZag encoding only uses 6 data bits because the second highest bit must be zero to avoid 0xFF. But with the VInt encoding from protobuf (link) the first bit of the last byte is zero anyway. So why is it necessary to set the second-highest bit to zero?

Regards,
Marcus

Tatu Saloranta

unread,

Jun 6, 2014, 9:29:23 PM6/6/14

to smile-forma...@googlegroups.com

Sorry, looks like I forgot to respond to this one.

Good question!

The reason for keeping second-most-significant bit as zero is to ensure that highest byte values (particularly, 0xFC - 0xFF) are not used for encoding, except in case where raw binary values are directly embedded. For other bytes where MSB is zero things work as is, but for the last byte with MSB set, we'll just keep second one null.

So it's all about preserving uniqueness of marker bytes.

-+ Tatu +-

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-disc...@googlegroups.com.

Marcus Held

unread,

Jun 10, 2014, 5:03:16 AM6/10/14

to smile-forma...@googlegroups.com

So, you've negate the specification and use 0 instead of 1 as the indicator?

~ Marcus

Tatu Saloranta

unread,

Jun 10, 2014, 11:13:01 AM6/10/14

to smile-forma...@googlegroups.com

Not sure what you mean by negating specification here?

-+ Tatu +-

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-disc...@googlegroups.com.

Marcus Held

unread,

Jun 10, 2014, 5:37:31 PM6/10/14

to smile-forma...@googlegroups.com

Well, in the specification (https://developers.google.com/protocol-buffers/docs/encoding#varints) they're using 1 as the indicator for the next byte. But as you said you should use 0 instead of 1 for the indicator. Otherwise you should not have a problem with the last byte because the MSB is 0 anyway. (But you would have the mentioned problem in the bytes before)

~Marcus

Tatu Saloranta

unread,

Jun 10, 2014, 9:48:35 PM6/10/14

to smile-forma...@googlegroups.com

That is just a spec of protobuf, so it's provided more for convenience. But yes Smile uses negative bit as indicator of last bit, and not as indicator that there are more bytes to follow.
This has to be done for efficiency; the other way around would only allow 6 bits for first bytes, and 7 for last: leaving sign-bit empty gives 7 bits for first bytes and 6 for last.

-+ Tatu +-

To unsubscribe from this group and stop receiving emails from it, send an email to smile-format-disc...@googlegroups.com.

Reply all

Reply to author

Forward