binary and compression

118 views
Skip to first unread message

sean d'epagnier

unread,
Oct 5, 2015, 12:03:09 PM10/5/15
to Signal K
I am interested in signalk but everytime I look at the message I can't believe how many bytes are needed for something trivial.

My initial observations:
 
  • There are too many timestamps
  • Need to support a binary scheme to minimize data throughput
  • Should also support compression to reduce it more

Teppo Kurki

unread,
Oct 5, 2015, 1:58:51 PM10/5/15
to signalk
For streaming I admit the timestamp may be overkill and we could add suppression of sending it. 

If you think about use cases like periodically updating your information on a cloud service and making that available to others I think timestamps are warranted. Would you prefer a single timestamp? The data model needs to accommodate timestamps for individual data items as well.

The delta format would probably be pretty efficiently coded with msgpack or some similar technique. Would you feel like trying something like that? 

To me a binary scheme and compression are mostly the same thing.

IMHO the real beef in Signal K is the shared data model and keys, not the JSON encoding. Delta message reads logically "here are some updates about context X, here's stuff from source Y, here's the timestamp and here are the data values as key-value pairs". Nothing earth shattering there, but the shared keys.


Jeffrey Siegel

unread,
Oct 5, 2015, 5:27:59 PM10/5/15
to Signal K
Just to balance that, I don't think there should be any binary representation.  It's 2015 and the time, space, and background I/O makes the need to shorten transmissions non-existent.  gzip encoding can be automatic - nothing else should be done.

It took me 3 hours to interface 4 apps to Signal-K for lat, lon, cog, sog, and heading.  2 hours of it was user-interface for selecting the server.  45 minutes was reading the documentation and finding a demo server.  15 minutes was the actual interface coding.  That's how it should be.  That type of experience makes me look forward to adding more support - exactly the impression every developer should walk away with.

Delta formats with the ability to remove unnecessary items (timestamps) is a fine idea.  Keep this all object oriented and self documenting.  Save binary encoding for Wikipedia articles showing how things used to be done before 2012.

Keith Young

unread,
Oct 6, 2015, 1:48:44 PM10/6/15
to Signal K


I am interested in signalk but everytime I look at the message I can't believe how many bytes are needed for something trivial.

I may have got the wrong end of the stick but I've been thinking a bit about this...

Lack of economy of resources, both in terms of SignalK's communications format and server implementation initially turned me off Signal K but having thought about it I realise that the issue was mainly my expectations.  If you want Signal K to be a compact distributed bus-based data communications protocol for implementation on networks of low-power, resource-limited devices (a drop-in NMEA replacement) you may be disappointed. The Signal K developers will doubtless disagree but I'd say this is not the protocol droid you're looking for. It's a different animal entirely and for the use case of making available boat-related data to applications developed in high level languages on devices running modern operating systems with ample resources over relatively high capacity networks (802.11, ethernet) I (now) believe it is superior to what I was originally thinking of.  That's, the use case which most people seem to want Signal K for.  As Jeff is saying, the resource use is not an issue in the common case.  Ease of development (which will facilitate adoption) is.  Whilst implementing a JSON parser and an http stack with tightly limited resources is definitely not my idea of fun, working with data over that type of presentation/session layer is trivial for app developers working in high level languages like java and javascript who aren't afraid of the additional resources involved in importing the appropriate packages.

Part of the verbosity (including timestamps) are a consequence of what is being communicated: it's not discrete data points so much as parts of a centrally held data model (as Teppo points out).  Whereas we assume that environmental data communicated by more traditional means has a timestamp of "now", this may not be the case for the parts of data model Signal K is communicating which will probably have been received some (short?) time previously from another source.   Thus clients may be interested in how fresh the datum of interest is.  Moreover the de facto standard Signal K transports are "reliable".  Network drops out for a couple of minutes then comes back, old data will be pushed to the client: It would be good for the client to be aware of the data's age.

Not sure that I agree that a binary representation and compression are the same thing. Conveniently ignoring the 2038 problem. time_t is 32 bits if we ignore the 2038 problem, 64 if we're being good.  Would compressing the Signal K ascii time representation match that? Not that I'm suggesting unix time is necessarily The Right Way (although the NMEA seem to like it) given the leap second thing.

I'm not sure compression is worthwhile.  It requires computing resources so may be counter-productive on low powered devices, complicates the protocol, and those low bandwidth links where compression is necessary may have their own compression. If you accept that it isn't a compact protocol and there's no message size constraints, why bother?

Maybe communication of sensor updates in low power resource-limited networks should be a different, compact, non-boat-specific binary protocol which could be a plugin to the back end of a SignalK server.  Probably a good idea for the SIgnalK/NMEA relationship in separating that which complements the NMEA's products (i.e. SIgnalK) from that which competes with them (an open IoT protocol).

Fabian Tollenaar | Starting Point

unread,
Oct 7, 2015, 2:36:16 AM10/7/15
to sig...@googlegroups.com
Keith, Jeffrey; well said. SK is meant to be a modern, easy to use way of communicating for modern and fast networks and devices. Servers have the concept of "Producers", which essentially are plug-ins that transform any type of incoming data to SK - such as NMEA, Protocol buffers, etc. 

IMO if you really need a binary protocol somewhere in the stack or between nodes, use something existing (for instance Thrift) to transport parts of the data. On the other end you can simply feed it back into a SK server which makes it readable again and makes the data available for (local) clients with ample resources (e.g smartphones) 


--
You received this message because you are subscribed to the Google Groups "Signal K" group.
To unsubscribe from this group and stop receiving emails from it, send an email to signalk+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Keith Young

unread,
Oct 7, 2015, 3:13:36 AM10/7/15
to Signal K
If anyone has any meat on Thread beyond the public blurb (or pointers thereto) I'd be very interested to hear it. Of course what the world doesn't need is another closed protocol controlled by an industry association but what I know of thread so far makes me interested in seeing what marine applications it might be put to as a compliment to SignalK

Ric Morris

unread,
Oct 7, 2015, 6:52:21 AM10/7/15
to Signal K
Came to the same conclusion Keith. As it stands SignalK is not an end to end protocol for marine electronics but it is what it is and worth supporting where appropriate (from the router outward) I think 

Wooba Gooba

unread,
Oct 7, 2015, 5:15:06 PM10/7/15
to Signal K
Thread is an IPv6 stack optimized for 802.15.4 mesh networks.  100 kbit links (at best), per hop latency, etc.  Based on the vibe / commentary above (Signal K is not tailored for resource constrained IoT devices and links), Signal K will be ill suited to run atop Thread. Or LoRa, etc.

As you are JSON encoded, CBOR would be a logical step for a more efficient payload encoding (if needed).

Re: "Save binary encoding for Wikipedia articles showing how things used to be done before 2012."  Careful there.  IoT is bringing on mass quantities of resource constrained sensors and actuators.  Open Interconnect Consortium, Thread, etc are all adopting binary encoded app protocols and payload encodings (CoAP, CBOR, etc).



kees

unread,
Oct 8, 2015, 8:46:35 AM10/8/15
to Signal K


On Wednesday, October 7, 2015 at 11:15:06 PM UTC+2, Wooba Gooba wrote:
Thread is an IPv6 stack optimized for 802.15.4 mesh networks.  100 kbit links (at best), per hop latency, etc.  Based on the vibe / commentary above (Signal K is not tailored for resource constrained IoT devices and links), Signal K will be ill suited to run atop Thread. Or LoRa, etc.

As you are JSON encoded, CBOR would be a logical step for a more efficient payload encoding (if needed).

For now Signal K does not target (very) resource constrained devices, as there are already standards for these (NMEA 0183 and 2000) in the marine environment. No need to re-invent those (yet). This is open source though -- if you think there is a market, go ahead and plug a particular representation of Signal K in a binary format.
Note that a simple transducer will not need to parse Signal K, just generate it, and this should not be more difficult than generating NMEA 0183 or 2000.
 

Wooba Gooba

unread,
Oct 8, 2015, 10:59:51 AM10/8/15
to Signal K


On Thursday, October 8, 2015 at 8:46:35 AM UTC-4, kees wrote:


On Wednesday, October 7, 2015 at 11:15:06 PM UTC+2, Wooba Gooba wrote:
Thread is an IPv6 stack optimized for 802.15.4 mesh networks.  100 kbit links (at best), per hop latency, etc.  Based on the vibe / commentary above (Signal K is not tailored for resource constrained IoT devices and links), Signal K will be ill suited to run atop Thread. Or LoRa, etc.

As you are JSON encoded, CBOR would be a logical step for a more efficient payload encoding (if needed).

For now Signal K does not target (very) resource constrained devices, as there are already standards for these (NMEA 0183 and 2000) in the marine environment. No need to re-invent those (yet). This is open source though -- if you think there is a market,


I have no opinion re: market.  I was answering a technical question.  If anyone is considering a more efficient encoding, take a look at CBOR.

 
go ahead and plug a particular representation of Signal K in a binary format.
Note that a simple transducer will not need to parse Signal K, just generate it, and this should not be more difficult than generating NMEA 0183 or 2000.

Many, if not most, of the sensors on my boat require calibration, configuration, firmware update, etc.  They are often not output only devices.
 
 

sean d'epagnier

unread,
Oct 9, 2015, 3:34:51 AM10/9/15
to Signal K
Thank you all for the replies.  I realize you may consider now that bandwidth and cpu usage isn't as important as making this protocol flexible and easy to use, which may be true, but I still think we can achieve smaller side and less cpu to parse without sacrificing anything.  After all, if it is smaller and faster, it will actually work better in the end in more scenarios.

For example, if I want to push gyroscope data for 27 intertial sensors as well as multiple wind direction and pressure sensors at 50hz I can begin to use a significant amount of cpu just for this parsing on a slow arm processor.   The current scheme will saturate a 2-3mbit link for all of this.  So I am already limited to wifi and ethernet and even then it will become a problem if there are several boats.

I realize your typical scenarios contain a single gps with fast multicore laptop communicating over the local network interface.  The above may seem far fetched, but it is actually realistic and much worse cases do exist.

Initially, I would like to consider a very basic but simple restructure in the message.

Essentially, you pull the timestamp, source and any other fields up a level, so that they need not be retransmitted multiple times when their values are not changing.

For timestamps it would be nice to be able to also send just a delta from the previous one.

Compression with gz would be easy to implement.. but perhaps xz with a custom dictionary would be a bit more efficient.  In any case, most of these compressions are not efficient for streaming data, but a custom algorithm would work best (essentially converging signalk into binary format)  This may emerge if it proves useful, but should remain non standard, so could be a completely separate library and not be affiliated directly with signalk.

Keith Young

unread,
Oct 9, 2015, 6:11:54 AM10/9/15
to Signal K
Apologies for earlier mention of Thread: it was off topic for this group being something I wanted to look at for marine sensors and boat control systems as layers 1-4 under a binary application layer protocol (not Signal K, but an NMEA alternative): Obviously I'm aware of the blurb available in the white papers, just wondered if anyone doing "IoT" had more info.

Back on topic...


Essentially, you pull the timestamp, source and any other fields up a level, so that they need not be retransmitted multiple times when their values are not changing.

For timestamps it would be nice to be able to also send just a delta from the previous one.

To probably just re-state what you just said a different way...can economy be achieved through assumption of defaults in the server? Apologies in advance if I've missed discussions on this which have already raised these issues but I can't find reference in the documentation.

Initially I typed up this post about how the server could take updates from sensors on the network in a more compact form by assuming defaults which then didn't need to be stated in the update message.  Then I realised that to avoid "double processing" of updates by clients (once from the sensor, once when the server transmitted its update to the data model) you'd need clients to ignore the "compact" form.  Moreover, it's probably desirable to treat updates not coming from the SignalK server differently: they are actually a different case entirely.

Then pretty much I realised that the easiest and least disruptive way to address this is just with a backend converter (as per the nmea-0183 one).  Rather than "updates" we have an array of "sensor" (or similar).  The converter can then assume defaults which don't need to be stated explicitly in the message:

Timestamp: Not needed in this context. If no timestamp is given, "now" is assumed as it would for an NMEA update
Source:  Converter derives source for entry into the data model from the sensor's network address, e.g. "net.fe80::1234:5678:9abc:deff".  This avoids the need for sensors to given themselves a "source" (beyond their assigned network address). This scheme is not without issues but it's a thought...
Context: It's all going to be under vessels.self so we can omit that. That also partially overcomes the (what I see as a) limitation that a sensor needs to know it's on a boat to speak the format natively. Beyond the default assumption of "vessels.self" there can be specific defaults for known unqualified paths, e.g. an update with a "path" of "pitch", "yaw" etc. gets assigned a context of "vessels.self.navigation", one of "wind.*" gets assigned a context of "vessels.self.environment".  A little more processing on the server (in the converter).  That could be configurable (is there a case for a "config" node as implemented by many ldap servers?)
..and of course "values" is implicit so not need to make it explicit in the message.

So we have a sensor update which is pretty much just an array of (possibly unqualified) "path" and "value" with very little else.  Directly mapping onto the data model, going via the server like every other provider to avoid confusion and requiring no changes to existing software other than the addition of a new converter.

Final note on timestamps: I note that in the documentation on the web site the format used varies slightly (some have a "Z" timezone stated, others don't) and I can't find (probably just not looking hard enough) the statement of the "correct" one.  Is there a case for making the timestamp the number of seconds or milliseconds since an epoch as the NMEA have done with TAG timestamps (from publicly available presentations I stress before the NMEA's legal hounds are set on me)?  Fewer characters, marginally easier to process.  Personally I don't like the NMEA's use of UNIX time as it doesn't cope with leap seconds but you could use there Bernstein-favoured TAI-10 scheme.  Just an thought in case the format is heavily revised at any point.

kees

unread,
Oct 9, 2015, 6:27:13 AM10/9/15
to Signal K


On Friday, October 9, 2015 at 9:34:51 AM UTC+2, sean d'epagnier wrote:
Initially, I would like to consider a very basic but simple restructure in the message.

Essentially, you pull the timestamp, source and any other fields up a level, so that they need not be retransmitted multiple times when their values are not changing.

That is a good suggestion, and I think a very sane one. In fact, the standard should probably state something like "servers SHOULD send attributes at the highest point in the tree where they apply for all items in that tree" and "clients MUST accept attributes at any level and apply them to any items underneath that item". That way a stupid simple implementation can just send them for all data points and a smarter sender can coalesce.

Also, timestamp is fully optional. When data is flowing directly from a sensor, it makes little sense to send it out, certainly over slow links.
 

For timestamps it would be nice to be able to also send just a delta from the previous one.

Not so sure this would be a good idea. Let's keep the format simple for now.
 

Compression with gz would be easy to implement.. but perhaps xz with a custom dictionary would be a bit more efficient.  In any case, most of these compressions are not efficient for streaming data, but a custom algorithm would work best (essentially converging signalk into binary format)  This may emerge if it proves useful, but should remain non standard, so could be a completely separate library and not be affiliated directly with signalk.

Something like this could be done in a sub-profile -- but let's focus on the big one of HTTP(S)/WS(S) first.
 

kees

unread,
Oct 9, 2015, 7:35:46 AM10/9/15
to Signal K


On Friday, October 9, 2015 at 12:11:54 PM UTC+2, Keith Young wrote:
Final note on timestamps: I note that in the documentation on the web site the format used varies slightly (some have a "Z" timezone stated, others don't) and I can't find (probably just not looking hard enough) the statement of the "correct" one.  Is there a case for making the timestamp the number of seconds or milliseconds since an epoch as the NMEA have done with TAG timestamps (from publicly available presentations I stress before the NMEA's legal hounds are set on me)?  Fewer characters, marginally easier to process.  Personally I don't like the NMEA's use of UNIX time as it doesn't cope with leap seconds but you could use there Bernstein-favoured TAI-10 scheme.  Just an thought in case the format is heavily revised at any point.

We should probably define all timestamps to be according to RFC 3339: https://tools.ietf.org/html/rfc3339.
Note that even that allows multiple timezones, and it would be a good idea if we state that the preferred format is Z (UTC).

Theoretically I like TAI, it's just too hard in practice ;-)

Keith Young

unread,
Oct 9, 2015, 8:03:36 AM10/9/15
to Signal K
Note stuff in my previous post about duplicate updates was spurious: obviously signal k involves unicast subscriptions for updates rather than multicast/ broadcast so the multiple updaters issue doesn't arise for clients. The suggestion of a trimmed format for sensor updates to a back-end converter in the signal k server stands though.

Teppo Kurki

unread,
Oct 9, 2015, 9:38:24 AM10/9/15
to signalk
On Fri, Oct 9, 2015 at 1:11 PM, Keith Young <strip...@gmail.com> wrote:

To probably just re-state what you just said a different way...can economy be achieved through assumption of defaults in the server? 

rob...@42.co.nz

unread,
Oct 9, 2015, 7:58:53 PM10/9/15
to Signal K
Hi Sean,

I did some experimentation with signalk from and arduino (16Mhz, 8bit CPU, 8K RAM) and you have to use some tricks. Obvoiusly its easy to get a Signalk msg that bigger than 8K, so reading to RAM doesnt work. I solved this by reading the stream, parsing key by key, so only one key was in RAM at a time. But since I didnt want to hold long key names in RAM, I hashed the full signalk path/key-name to 32bits. It turns out they are all unique (currently). So any signalk path/key can be represented by a 32bit hash. You could send the hash=value as a smaller representation.

I only use the hash internally in my arduino code for now. We probably wont investigate  a compact binary format until after we reach Signalk V1, and the major use cases are established and in wide use. But we can look at it again then, meanwhile if you have a proposal please feel free to contribute.

Rob
Reply all
Reply to author
Forward
0 new messages