Printing negative hex

1,021 views
Skip to first unread message

cousteau

unread,
Feb 25, 2020, 8:46:15 PM2/25/20
to Developers
The result of printing negative numbers in a base other than 10 is counter-intuitive and confusing.

Currently, `int x = -16; Serial.print(x, HEX);` prints FFFFFFF0, even though ints are 16 bits, so one would expect FFF0. This is because Print::print(int n, int base) converts the 16-bit int n to 32-bit long before printing, and then prints that long in two's complement. As a result, a 16-bit number has a 32-bit representation. But I think this behavior should be fixed to produce less unexpected results.

I can think of three possible solutions:
1. Convert signed char, short, int, and long to unsigned char, unsigned short, unsigned int, and unsigned long respectively for printing them in a base other than 10, and use 2's complement for negative numbers. print(-16,HEX) will print FFF0. This makes sense to those familiar with 2's complement, and C's printf("%x") (which actually prints unsigned ints).
2. Extend the behavior of base 10 to every other base: print negative numbers with a '-' followed by the absolute value, instead of making this a special case for base 10. print(-16,HEX) will print -10. This is how it works in Python and JavaScript, for example. If someone wants FFF0, explicit conversion to unsigned int is needed.
3. Leave it as it is now, and explain in detail in the documentation why this happens and how to properly handle it. print(-16,HEX) will print FFFFFFF0. If someone wants FFF0, explicit conversion to unsigned int is needed.

Personally I think the solution that makes most sense is number 2. It is the mathematically logical one, doesn't require users being familiar with 2's complement, and makes it easy to identify negative numbers (otherwise, if someone accidentally prints a negative, they might not understand the output they get and will be harder for them to identify the issue). And if they really wanted to print it in 2's complement, they will probably figure out naturally that they need to convert to unsigned int.
Furthermore, implementing this is as simple as *removing* a check for base == 10, so the resulting code is even simpler. I made a PR implementing this in https://github.com/arduino/Arduino/pull/4535 (which is no longer valid because Chainsaw).

What do you think is the most desirable behavior? 1, 2, or 3? I wanted to get some feedback from the mailing list before I redo the commit.

Rob Tillaart

unread,
Feb 29, 2020, 6:38:22 AM2/29/20
to Arduino Developers
interesting point.

From my perspective the reason why most people [including me] print HEX or BIN values is to see the bit pattern of numbers.
In experiments I have used also several primes numbers, 24 and 36 as base, and I cannot recall I needed a sign bit.
So I don't know how big the need really is for the sign sign .

That said, a generalization of the problem you mention  is:  The number of digits shown is not correct. 
for negative numbers too much F
for smaller numbers, incorrect nr of leading zero's

I prefer printf() but it is not always available, so I definitely want (1) to get the amount of digits right,
uin8_t x = 15;    //  should print 0F 
uint6_t y = 15;    //  should print 000F 
uin32_t c = 15;    //  should print 0000000F 

No more local fixes for leading zero's in zillion sketches.
For negative values, I prefer no minus sign as I am most often interested in the bit patterns sec 
when I am using HEX.

IDEA: negative base
instead of converting the number we can use the base to indicate the formatting of the sign
bit by using a negative base. As it would extend the interface it will not break existing 
code and is easy to understand.

Serial.print(-16, HEX)  ==> FFF0
Serial.print(-16, -HEX) ==> -0010
Serial print(16, -HEX) ==> 0010

opinion?

Rob


--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.
To view this discussion on the web visit https://groups.google.com/a/arduino.cc/d/msgid/developers/b743ba13-bc2e-48d4-a12d-3d35b205b365%40arduino.cc.

Jon Perryman

unread,
Feb 29, 2020, 8:03:59 PM2/29/20
to devel...@arduino.cc
For C, Int can be either 2 or 4 bytes depending upon hardware architecture.  Printing 4 bytes is exactly what you should expect. While this may not be intuitive with regards to overflow / underflow, it does provide print consistency across various hardware architectures which is far more important. If 2 bytes is needed, then define the field as short or typecast it to a known length.

As for hex, it does not have a sign. At conversion time to another field type, the high order bit is used to determine sign.

Jon.

cousteau

unread,
Mar 1, 2020, 9:26:41 AM3/1/20
to Developers
Thank you very much for the insight!

> As for hex, it does not have a sign. At conversion time to another field type, the high order bit is used to determine sign.

To be precise, hex is not a type, just a representation. More exactly, a numeric base. What you described is signed vs unsigned types. The thing is, C doesn't have a printf formatter for hex for signed numbers; %x/%X expects an unsigned number, same as %o and %u. In fact, you would use %d or %u depending on whether you're printing a signed or unsigned int, but that doesn't mean base 10 is signed or unsigned. But here we're playing with types, and we can make a function detect the type of the argument, and work differently if it is signed. In C it was decided not to implement signed hex probably because it would involve adding an extra formatter which would rarely be used, but in our case it would actually be easier to always print it signed regardless of the base, and rely on conversions for unsigned. Also, C didn't really think of arbitrary numeric bases.

As for int being printed as 4 bytes because it is 4 bytes in some architectures, I don't think that makes sense. If you want consistency you would use int32_t (or just long), and probably you wouldn't want what print prints to be inconsistent with the actual size and behavior of the type.

Now, about this:

> I prefer printf() but it is not always available, so I definitely want (1) to *get the amount of digits right*,
> uin8_t x = 15;    //  should print 0F 
> uint6_t y = 15;    //  should print 000F 
> uin32_t c = 15;    //  should print 0000000F 

This makes sense actually, since you're trying to see a bit pattern. It's an interesting idea. I've been thinking on how it could be implemented. The problems I see are:
(1) It's harder to implement. Currently Print::print just calls the print(long, ...) or print(unsigned long, ...) versions, which in turn call a printNumber(unsigned long n, uint8_t base). Adding this functionality would either require three different versions of that function or an extra size/length parameter.
(2) It is not easily extendable to bases that are not powers of 2. How many digits would you use for print(-1, 36)? Use log(UINT_MAX)/log(36)? Maybe a solution is to only zero-pad if the base is a power of 2 (or maybe 2 to a power of 2).
(3) We would need to implement zero-padding, of course. (Not too hard to do; it just adds a bit of extra logic.)
(4) About that -HEX idea: if there are +/-HEX I'd think there should also be +/-DEC, but then the default should be -DEC (unless you flip the meaning of the sign). I think it would be easier to just tell users to convert the number to unsigned explicitly; I don't think it makes sense to modify the code for something that can be achieved with something as simple as an explicit conversion.

I think the issue here is that what you want is *extra formatting specifiers,* so that you can specify zero-padding and the number of digits. Implementing zero-padding would already require adding extra features to the code base anyway.
Extra formatting has been discussed before; basically have a wider range of format specifiers for base, padding, alignment, etc., and maybe delegate all that on a separate class.
I myself had some ideas for cramming extra format specifiers on a single 15-bit positive integer (6 for base, including whether you want uppercase or lowercase, 1 for explicit +/- sign, 1 for space- or zero-padding, and 7 for minimum length), but implementing it is another story.
Someone also suggested adding Print::printf (and even provided an efficient implementation that didn't rely on vsnprintf), but for some reason the idea wasn't well accepted; apparently its syntax was considered too confusing for the spirit of Arduino. (Personally I liked it though.)

Rob Tillaart

unread,
Mar 1, 2020, 10:52:32 AM3/1/20
to Arduino Developers
Made an formatting function hex()  with leading zeros and #digits  
All integer types will map on the uint32_t and the parameter digits determines the amount of digits in the return string, default 8.   
(variation included in mathhelper https://github.com/RobTillaart/Arduino/tree/master/libraries  ) 
 
char * hex(uint32_t value, uint8_t digits = 8)
{
  static char buffer[17];
  if ( digits > 16) digits = 16;
  buffer[ digits ] = '\0';
  while ( digits > 0)
  {
    uint8_t v = value & 0x0F;
    value >>= 4;
    buffer[--digits ] = (v < 10) ? '0' + v : 'A' - 10 + v;
  }
  return buffer;
}


Extended to do all bases (up to 36), and a sign as separate flag (not tested, not optimized)

char * format(uint32_t value, uint8_t digits = 8, uint8_t base = 10, bool showsign = false)
{
  static char buffer[18];          // note one extra place for the sign and one for the \0
  bool neg = (value & 0x80000000);

  if ( digits > 16) digits = 16;
  if ( showsign == true digits++;

  buffer[ digits ] = '\0';
  while ( digits > 0 )
  {
    uint8_t v = value % base;
    value -= (v * base);
    buffer[--digits ] = (v < 10) ? '0' + v : 'A' - 10 + v;
  }

  if ( showsign == true buffer[0] = neg ? '-' : '+';
  return buffer;
}

Yes it will strip the higher order part of the number if digits is set too low. 
For integers beyond 32 bit (64 bit or long long) the internal buffer should be extended.
For small bases (e.g. 2 = BIN) the internal buffer might be too small.

Calls could look like: [ not tested yet ]

Serial.print(format( b[n], 8, BIN));      // print an array element as bits
Serial.print(format( x, 4, HEX, true ));  // print as hex with sign
Serial.print(format( y, 2, HEX ));        // typical byte as hex with leading zero :)
Serial.print(format( color, 6, HEX ));    // eg an RGB value stored in a long
Serial.print(format( pincode, 6, 13 ));   // just because it is possible

Drawback is the code uses a static buffer to hold the output. ==> not thread safe.

Rob

--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.

Rob Tillaart

unread,
Mar 1, 2020, 10:57:34 AM3/1/20
to Arduino Developers
OOPS :)
value -= (v * base);    should be     value /= base;   of course 

Jon Perryman

unread,
Mar 3, 2020, 7:19:19 PM3/3/20
to devel...@arduino.cc
> To be precise, hex is not a type, just a representation.  
> More exactly, a numeric base.  What you described is signed vs unsigned types. 

I never meant to imply hex is a type. In the computer industry, it is strictly used as a visual data representation in the computer for diagnostics or machine representation in programs (e.g. /xFC). It will never see a sign associated with hex data.

When I mentioned conversion, I meant between compiler supported types. I was describing data encoding (not signed vs unsigned). Your not considering various data types allowed by various compilers and hardware architectures (e.g. int, float, packed, comp). Today, 8 bit byte is the standard but that was not always the case. I've heard of 6 and 7 bit bytes but never seen one. The location of the sign depends upon the hardware. Sometimes the sign can be counter intuitive such as packed decimal where bit 4 of the last byte is 1 for positive and 0 for negative.

Using INT has worked well because subtraction is actually a 2's complement conversion followed by an add. Today, most processors have add and subtract instructions but that was not always the case. Signed or unsigned doesn't matter because overflow will handle both situations correctly.

Jon.

cousteau

unread,
May 4, 2020, 7:43:17 AM5/4/20
to Developers
> Made an formatting function hex() with leading zeros and #digits

Honestly I think having separate formatting functions/classes makes more sense than delegating the formatting on the print() method; it would simplify things a lot and allow creating custom and more complex formatters. "Print the hex representation of x" (print(HEX(x))) looks better than "print x but in hex" (print(x, HEX)).
(Also, it would fit better with the idea of "print with variadic arguments" I proposed back in the day.)

Reply all
Reply to author
Forward
0 new messages