Yahoo Groups archive

AVR-Chat

Index last updated: 2026-04-28 22:41 UTC

Message

Data compression for embedded controllers

2016-12-02 by Chuck Hackett

I have an application that sends large quantities of endpoint status
messages over a CAN data bus.
 
By CAN bus definition, max data bytes per message is 8.
 
In my application all messages must be self-contained, I cannot create a
"large message" layer above the CAN bus layer in the protocol, and no state
can be carried over to a future message.
 
The endpoint IDs are 16 bits but tend to be "bunched".  There are 8 inputs
on each controller and the inputs are usually (but not always) numbered in
order.
 
Currently the controllers send the status of each input as a separate
message consisting of Input ID and Status <on/off/unknown>.  This is
currently resulting in about 50 status messages per second on the bus.
Inputs are sent upon input change and a configurable "retransmit interval"
to account for the fact that the CAN bus is not "guaranteed delivery" (for
other reasons I cannot add a higher level "Ack" that would address this).
 
99% of the status messages carry a status of "Off".
 
The compression algorithm will be used when the controller is sending the
status of its inputs to the CAN bus.
 
The same algorithm will be used by "Message Bridges" to aggregate the status
of all inputs received (currently about 100) into as few 8-byte messages as
possible that are forwarded onto a second CAN bus.  The Message Bridges
employ a second algorithm that determines when the input status information
must be forwarded.  This also creates "Bursts" of input status items needing
to be sent.
 
The simplest method I can think of is to use a separate CAN Message ID for
each status (on/off/unknown) and then transmit 1 to 4 16-bit input IDs in
each message.  This would give me a maximum compression ratio of 4.
 
Note: Impractical to create a "dictionary" of the (relatively) sparsely
allocated Input IDs, partially because the IDs are assigned by the end user
and partially because new Input IDs can be created at any time and new
'consumer' endpoints can attach to the network at any time.
 
The next approach I was considering is creating the outgoing message using
the following (still indicating the "status" within the CAN Message ID and
the Input IDs would first be sorted in ascending order before starting to
build the transmit messages):
 
<Item type>
-          Fixed length, 2 bits
-          Values: 
o   0 - Item is a 16-bit input ID
o   1 - Item is an increment of "1"
o   2 - Item is an 8-bit 'increment' from the last ID
 
Encoding sequence (pseudo code, repeated until all inputs with this status
have been sent):
 
Encode 16-bit Input ID
While <message bits remain>
{
If the next ID is "1" greater than the last ID
                        Encode an <Item Type> code of "1"
Else
                        If next ID is within 257 of the last ID
                                        Encode an <Item Type> code of "2"
followed by the 8-bit offset (with 2 subtracted from it)
                        Else
                                        Encode an <Item Type> code of "0"
followed by the 16-bit ID value
}
 
If I calculate correctly this could give me a theoretical best case
compression ratio of 25 if there were no gaps in the input ID assignment (16
bits for first ID followed by 24 2-bit <Item Type> codes of "1") and a worst
case ratio of 3 (16 bits for first ID followed by two sets of <Item type>
code 0 followed by 16-bit ID).
 
I suspect that, in real life, the Input IDs will be in 4 to 7 ID 'clusters'
of sequential IDs.  If we assume an average of 5 this results in a
compression ratio of 12 (16-bit ID followed by 5 "increment by 1" codes,
followed by "16 bit" code, 16-bit ID, followed by 5 "increment by 1" codes).
 
(Note: It just occurred to me that the <Item Type> codes could be (binary
bits) '0' (one bit) - increment by 1, '10' (2 bits) - 16-bit ID follows,
'11' (2 bits) - 8-bit increment follows.  This would result in "runs" of
sequential IDs being compressed twice as efficiently.  This may not provide
enough benefit to offset the complication.  I would need statistical
information to see if it made a significant difference)
 
Better ideas?
 
Thanks in advance for your time and thoughts .
 
Regards,
 
Chuck Hackett

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.