Proposal: Regarding 3.12 (Object Dimensions as 64-bit Integers)

43 views
Skip to first unread message

Christoph Conrads

unread,
Aug 28, 2017, 9:16:22 AM8/28/17
to slate...@icl.utk.edu
Hi,

please use `std::size_t` from `<cstdint>` for object dimensions. `std::size_t`
is an unsigned integer type; on 32-bit CPUs it is 32-bit wide and on 64-bit
CPUs it is 64-bit wide. Moreover, `size_t` in C and `std::size_t` in C++ are
used throughout the standard libraries, e.g., `strlen`, `memcpy`, `malloc`, and
`operator new` all receive or return arguments of type `size_t`; all STL
containers return their sizes as `std::size_t`. `std::size_t` is guaranteed by
the C++ standard to be large enough to be able to represent the size in bytes
of any object. While there is no guarantee that the size of `std::size_t`
matches the word size of the underlying CPU architecture, I am not aware of a
(desktop) CPU where this is not the case.

I do not understand why 64-bit integers are forced on users on 32-bit
architectures. I really do not understand why a signed integer type is used to
denote quantities that cannot be negative. Signed integer wrap-around is a
security risk [1] and by using a signed integer type, the draft authors are
forced to check if object dimensions are negative in the prototype code.

[1] http://cwe.mitre.org/data/definitions/190.html

Jakub Kurzak

unread,
Aug 28, 2017, 9:29:52 AM8/28/17
to Christoph Conrads, SLATE User
Just to clarify.
You're advocating the use of size_t for object dimensions, which will be 64-bits on 64-bit architectures and 32-bits on 32-bit architectures.
You are against using a signed type for object dimensions.
Did I get it right?
Jakub



--
You received this message because you are subscribed to the Google Groups "SLATE User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slate-user+unsubscribe@icl.utk.edu.
To post to this group, send email to slate...@icl.utk.edu.
To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/slate-user/1718319166.75581.1503926181067%40email.1und1.de.

Christoph Conrads

unread,
Aug 28, 2017, 9:41:39 AM8/28/17
to SLATE User
Dear Jakub,

- I am advocating the use of std::size_t.
- I am against using a signed type for object dimensions.
- On desktop CPUs you can assume that std::size_t will be 64 bits wide on 64-bit
architectures and 32 bits wide on 32-bit architectures.

> Jakub Kurzak <kur...@icl.utk.edu> hat am 28. August 2017 um 15:29 geschrieben:
>
> Just to clarify.
> You're advocating the use of size_t for object dimensions, which will be 64-bits on 64-bit architectures and 32-bits on 32-bit architectures.
> You are against using a signed type for object dimensions.
> Did I get it right?
> Jakub
>
> On Mon, Aug 28, 2017 at 9:16 AM, Christoph Conrads <sl...@christoph-conrads.name> wrote:
>
> > Hi,
> >
> > please use `std::size_t` from `` for object dimensions. `std::size_t`
> > is an unsigned integer type; on 32-bit CPUs it is 32-bit wide and on 64-bit
> > CPUs it is 64-bit wide.  Moreover, `size_t` in C and `std::size_t` in C++ are
> > used throughout the standard libraries, e.g., `strlen`, `memcpy`, `malloc`, and
> > `operator new` all receive or return arguments of type `size_t`; all STL
> > containers return their sizes as `std::size_t`. `std::size_t` is guaranteed by
> > the C++ standard to be large enough to be able to represent the size in bytes
> > of any object. While there is no guarantee that the size of `std::size_t`
> > matches the word size of the underlying CPU architecture, I am not aware of a
> > (desktop) CPU where this is not the case.
> >
> > I do not understand why 64-bit integers are forced on users on 32-bit
> > architectures. I really do not understand why a signed integer type is used to
> > denote quantities that cannot be negative. Signed integer wrap-around is a
> > security risk [1] and by using a signed integer type, the draft authors are
> > forced to check if object dimensions are negative in the prototype code.
> >
> > [1] [http://cwe.mitre.org/data/definitions/190.html](http://cwe.mitre.org/data/definitions/190.html)
> >
> > --
> > You received this message because you are subscribed to the Google Groups "SLATE User" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to slate-user%2Bunsu...@icl.utk.edu.
> > To post to this group, send email to slate...@icl.utk.edu.
> > To view this discussion on the web visit [https://groups.google.com/a/icl.utk.edu/d/msgid/slate-user/1718319166.75581.1503926181067%40email.1und1.de](https://groups.google.com/a/icl.utk.edu/d/msgid/slate-user/1718319166.75581.1503926181067%40email.1und1.de).
> >

Vincent Picaud

unread,
Aug 28, 2017, 4:53:29 PM8/28/17
to SLATE User, sl...@christoph-conrads.name
Unfortunately I am not sharing the same point of view concerning std::size_t.

The reason is that like Blas standard uses _signed_ increments this would lead to mix signed and unsigned arithmetic.

This is known to be error prone:
- http://www.drdobbs.com/cpp/unsigned-arithmetic-useful-but-tricky/240001198
- Bjarne Stroustrup, talk 12:50
 
IMHO Using std::ptrdiff_t  for sizes and increments seems a more conservative choice.

Vincent P.

Vincent Picaud

unread,
Aug 28, 2017, 4:55:08 PM8/28/17
to SLATE User, sl...@christoph-conrads.name

Wolfgang Bangerth

unread,
Aug 28, 2017, 5:36:37 PM8/28/17
to slate...@icl.utk.edu
On 08/28/2017 02:53 PM, Vincent Picaud wrote:
>
> The reason is that like Blas standard uses _signed_ increments this
> would lead to mix signed and unsigned arithmetic.

I do agree with Christoph Conrads on this point. For things that are
*logically* unsigned (such as the size of objects, or index offsets
within a zero- or one-based vector or matrix), unsigned data types are
semantically the correct choice. They are also self-documenting in some
sense: a user reading the signature of a function need not wonder
whether negative values are allowed and what they might mean. We have
found this self-documenting aspect really useful in the deal.II project.

The fact that this collides with the current BLAS standard points to
what's essentially a *policy choice*: Is your goal to just have thin
wrappers around BLAS that faithfully represent what BLAS is already
doing, just in C++? Or do you desire to design an interface to dense
algebra that covers what users need?

If it is the latter, i.e., if you are *designing an interface* from
scratch, then mixed arithmetic does not arise: Everything that logically
is unsigned should be expressed in an unsigned data type. If you happen
to map *internally* to an existing BLAS implementation, then that is an
implementation choice that might require converting to signed integers,
but it should not dictate how you design your interface.

Best
W.

Mark Gates

unread,
Aug 28, 2017, 7:09:05 PM8/28/17
to Wolfgang Bangerth, slate...@icl.utk.edu
On Aug 28, 2017, at 5:36 PM, Wolfgang Bangerth <bang...@gmail.com> wrote:
>
> On 08/28/2017 02:53 PM, Vincent Picaud wrote:
>> The reason is that like Blas standard uses _signed_ increments this would lead to mix signed and unsigned arithmetic.
>
> I do agree with Christoph Conrads on this point. For things that are *logically* unsigned (such as the size of objects, or index offsets within a zero- or one-based vector or matrix), unsigned data types are semantically the correct choice.

I think the point is that incx is signed: a negative value actually has meaning in some BLAS routines like dot and gemv. This then leads to mixing signed and unsigned values, which is generally a bad idea, as pointed out by Vincent. His link to the panel discussion with Stroustrup is helpful.

Pragmatically, BLAS is defined in Fortran, and Fortran doesn't really support unsigned int. So the underlying high-performance BLAS will always be signed. If the C++ API uses unsigned, it still has to check that data fits into signed in order to call Fortran, which is effectively the same as checking if (n < 0).

Regarding overflow and wrap-around, there is risk for both signed or unsigned integers; neither is a panacea. Quite often we compute matrix sizes, like

gemm( ..., (n - i), ... );

If (n - i) < 0, I would rather have that become a negative number which gemm catches and warns about, than to wrap-around as a very large unsigned number which then causes segfaults accessing memory outside the array.

A simple loop like:

for (i = n-1; i >= 0; --i) { ... }

works fine in signed arithmetic, but becomes an infinite loop in unsigned arithmetic. This kind of loop is very common in linear algebra algorithms. (It can of course be rewritten in various ways.)

-mark

Wolfgang Bangerth

unread,
Aug 28, 2017, 7:38:56 PM8/28/17
to Mark Gates, slate...@icl.utk.edu
On 08/28/2017 05:09 PM, Mark Gates wrote:
> Regarding overflow and wrap-around, there is risk for both signed or unsigned integers; neither is a panacea. Quite often we compute matrix sizes, like
>
> gemm( ..., (n - i), ... );
>
> If (n - i) < 0, I would rather have that become a negative number which gemm catches and warns about, than to wrap-around as a very large unsigned number which then causes segfaults accessing memory outside the array.

Assuming the matrix object stores its size, and that
function_argument=(n-i) is an index or size of a sub-object, then any
reasonable implementation would check with unsigned arguments
assert (function_argument <= size)
which triggers if i>n in all typical cases (such as off-by-one). Note
that this is even simpler than the signed argument check:
assert (function_argument >= 0
&&
function_argument <= size);
Wrap-around semantics of unsigned data types make these checks simpler,
not more complicated.

Best
W.

Jakub Kurzak

unread,
Aug 28, 2017, 8:37:52 PM8/28/17
to Wolfgang Bangerth, Mark Gates, SLATE User

You should not use the unsigned integer types such as uint32_t, unless there is a valid reason such as representing a bit pattern rather than a number, or you need defined overflow modulo 2^N. In particular, do not use unsigned types to say a number will never be negative.

Jakub


--
You received this message because you are subscribed to the Google Groups "SLATE User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slate-user+...@icl.utk.edu.

To post to this group, send email to slate...@icl.utk.edu.

Vincent Picaud

unread,
Aug 29, 2017, 12:21:08 AM8/29/17
to SLATE User, bang...@gmail.com
I completely agree with Mark. That is the kind of problems we have with unsigned arithmetic.

Vincent 

sl...@christoph-conrads.name

unread,
Aug 29, 2017, 8:55:35 PM8/29/17
to SLATE User
> The reason is that like Blas standard uses _signed_ increments this
> would lead to mix signed and unsigned arithmetic.

Since C and C++ use unsigned integers for all memory management (think of
`malloc`, `operator new`) and the STL containers, there will be a mixture of
signed and unsigned arithmetic as soon as you use BLAS somewhere in the
program. The question is if you want to have the user deal with it or if you
let the BLAS C++ interface deal with it.
What the article says is that:

> As so often happens, the code simplification that comes from making n
> unsigned can complicate other kinds of programs. Suppose, for example, that
> we want to visit the elements of x in reverse order.

In the video Bjarne Stroustrup shares the same point of view like the authors
of the Google C++ style guide (in the video, Google developer Chandler Carruth
responds right before BS at 12:15min):

1. Use signed integers unless you need two's complement arithmetic or a
   bit pattern.
2. Use the smallest integer type possible when you must store many integers.
3. When in doubt, use 64-bit integers.
4. "Stop worrying and use tools to catch it when you get it wrong."

Additionally, BS recommends to never mix signed and unsigned arithmetic. The
problem that I have with these guidelines is that they are given without
reasoning. I understand that BS designed C++ but that does not mean that he
automatically has the best advice. Similary, if you accept everything in the
Google C++ style guide at face value, then you cannot use exceptions, enums
must be prefixed with "k", you have to indent with two spaces, and use the
Stroustrup indent style
(https://en.wikipedia.org/wiki/Indent_style#Variant:_Stroustrup).

The Google C++ style guide gives two arguments for using signed integers:
* The reverse loop example does not work.
* "Equally bad bugs can occur when comparing signed and unsigned variables.
  Basically, C's type-promotion scheme causes unsigned types to behave
  differently than one might expect."

I cannot even remember when I wrote the last reverse loop but let me give you
the equivalent example for a problem intrinsic to signed integers:
```c++
signed_integer_t n = container.size()
```
If there are more than `LLONG_MAX` elements in the container, you have
undefined behavior. The second statement is not an argument (the type-promotion
scheme behaves exactly how you would expect it if you read the C++ standard)
but if you must to compare signed and unsigned values, you have to
* first check if the signed integer is less than zero,
* if the signed integer is positive, compare signed and unsigned integer.

Mixing signed and unsigned arithmetic is indeed a bad idea but it also
well-defined if both variables have the same rank. Thus, if object dimensions
are of type `std::size_t` and increments of type `std::ssize_t`, then an
operation `dimension + increment` will return the correct result.

In conclusion, if you use BLAS and C or C++ memory allocation, then you must
mix signed and unsigned arithmetic at some point. I prefer to have the
conversion done by the BLAS C++ interface.

Jakub Kurzak

unread,
Aug 29, 2017, 9:31:01 PM8/29/17
to sl...@christoph-conrads.name, SLATE User
We don't take everything in the Google guide literally :)
We've been trying to build our own style guide:
Although, I have to admit, I probably agree with 90% or the Google guide.
About integers, I am just having a hard time seeing how using int64_t is a bad idea.
Jakub


--
You received this message because you are subscribed to the Google Groups "SLATE User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slate-user+unsubscribe@icl.utk.edu.

To post to this group, send email to slate...@icl.utk.edu.

sl...@christoph-conrads.name

unread,
Aug 29, 2017, 9:41:50 PM8/29/17
to SLATE User, sl...@christoph-conrads.name
int64_t is not a bad idea because it is a large integer type and reflects the
BLAS and LAPACK API.

I favor size_t because
- size_t is used throughout the C and C++ standard libraries,
- size_t is guaranteed to be able to store the dimension of any storage, and
- the signed/unsigned conversion has to be dealt with only once (within the C++ interface).

Note that there is also the option to use the type ssize_t for object dimension; this type is only 32 bits wide on 32-bit desktop architectures.

Meiyue Shao

unread,
Aug 29, 2017, 9:45:19 PM8/29/17
to slate...@icl.utk.edu
I agree with Vincent, Mark, and Jakub.

Using unsigned integers can easily cause bugs that are difficult to identify, while the benefit is limited.  For the purpose of developing a linear algebra library, signed integers make more sense.

Meiyue
To unsubscribe from this group and stop receiving emails from it, send an email to slate-user+...@icl.utk.edu.

To post to this group, send email to slate...@icl.utk.edu.
Reply all
Reply to author
Forward
0 new messages