Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

on an analogy for verifying whether another digit fits (into an unsigned type)

157 views
Skip to first unread message

Meredith Montgomery

unread,
Jan 15, 2022, 9:28:22 PM1/15/22
to
I've been trying to think of an analogy for the verification

if( ((UINT64_MAX - c) / 10) >= r)
r = r * 10 + c;
else return -1; /* doesn't fit */

in the procedure below.

I thought of the following, which doesn't quite work. Consider fill up
a bucker of water where we must make sure not to overflow. We're given
one last glass of water to throw in the bucket and we need a strategy to
know whether that last glass would or would not overflow the bucket. We
have all the quantities involved --- the maximum volume in the bucket,
the volume of the glass of water and the volume currently in the bucket.
Say the maximum is M, r is the current volume and c is the volume of
water in the glass. The subtraction M - c represents the amount of
water that would be the exact amount to fill up the bucket completely
without a drop overflowing if we add c to the bucket. So, if the
current volume is greater than M - c, then we can't put c in the bucket.

That gives the general strategy, but it's not too good because there's
nothing in this analogy that matches the division by 10. I could of
course make a up world in which every glass of water you throw in a
bucket makes the volume in the bucket to be multiplied by a factor of 10
before the glass of water goes in. The purpose of an analogy is to make
something odd look natural, so this is a poor analogy.

Any cool ideas? Thank you.

(*) The procedure

uint64_t array_to_uint64(char *s, uint64_t *u)
{
uint64_t pos;
uint64_t r;
uint64_t c;

pos = 0; r = 0;

for ( ;; ) {
c = (uint64_t) (unsigned char) (s[pos] - '0');
if (c < 10) {
if( ((UINT64_MAX - c) / 10) >= r)
r = r * 10 + c;
else return -1; /* doesn't fit */
++pos; continue;
}
break;
}

*u = r;
return pos;
}

Bart

unread,
Jan 16, 2022, 1:44:18 PM1/16/22
to
On 16/01/2022 02:27, Meredith Montgomery wrote:
> uint64_t array_to_uint64(char *s, uint64_t *u)
> {
> uint64_t pos;
> uint64_t r;
> uint64_t c;
>
> pos = 0; r = 0;
>
> for ( ;; ) {
> c = (uint64_t) (unsigned char) (s[pos] - '0');
> if (c < 10) {
> if( ((UINT64_MAX - c) / 10) >= r)

Dividing by 10 each time is unnecessary (even if usually optimised to
shifts and multiplies).

You only need to make this check after you've already processed 19
characters, as it could overflow on the 20th, but not before.

I think when pos >= 18.



Öö Tiib

unread,
Jan 16, 2022, 3:15:55 PM1/16/22
to
On Sunday, 16 January 2022 at 04:28:22 UTC+2, Meredith Montgomery wrote:
> Any cool ideas? Thank you.

It can not be cool or something as decimal system is arbitrary and
nothing in real world (besides count of human fingers) suggests to
use it. I can try ...

If we want to take 10 times as lot of water as we currently have plus
some then it makes sense to first try if the current water (together with
1/10th of "plus some") fits into 10 times smaller bucket.

But it is unclear if that analogy helps to clarify anything.

Meredith Montgomery

unread,
Jan 17, 2022, 7:48:18 AM1/17/22
to
I suppose you're right, but imagine putting such check there. It would
take even more paragraphs to explain it to myself some time later when
I'm trying to figure out what I wrote a while back.

Meredith Montgomery

unread,
Jan 17, 2022, 7:53:20 AM1/17/22
to
Hey, I think that helps: we're at least reading out (more clearly) what
we're doing in the arithmetic expression. I'll go with that for now.
Thank you with so much.

Meredith Montgomery

unread,
Jan 17, 2022, 8:12:28 AM1/17/22
to
Hm. A good analogy here is anything with a certain exponential growth
because base-10 numbers grow exponentially. So I guess a colony of
bacteria would do. I can say whenever I add, say, c amount food to the
colony, they multiply themselves by 10 and (remarkably) c new members
are born too. (Researchers are investigating why.) Also astounding is
the fact that once they get very near UINT64_MAX members, they just stop
growing no matter what --- baffling the biologists. The exact condition
is that if the new population size would exceed UINT64_MAX, the c amount
of food does nothing at all. (They stop eating at that point and
everything stays at it is.)

That's probably the best I can do.

Ben Bacarisse

unread,
Jan 17, 2022, 12:27:54 PM1/17/22
to
You could just check that the array contains a string of digits
lexicographically less than or equal to 18446744073709551615. If there
are fewer digits than this, or, at every position, the digit you have is
no greater than the corresponding digit of that number, you are ok.

--
Ben.

Scott Lurndal

unread,
Jan 17, 2022, 1:06:52 PM1/17/22
to
That reminds me of an algorithm used on BCD mainframes to do addition
and subtraction on large (up to 100 digit) variable length operands.

The most significant digit was stored as the lowest addressed digit (nibble).

The algorithm handled addition and subtraction of two variable length
operands (with overflow detection) with a single pass starting from the
most significant digit of each operand.

"The processor uses an adder that accumulates two
fields from the most sigificant to the least significant
digit positions. Reverse addition ... has the advantage of
detecting an overflow condition prior to altering the
receiving field for the result"

"If the data fields are signed, sign manipulation takes place
prior to the addition since they are the most significant digits."

Fundamentally, the algorithm sign/zero extends the smaller operand
and adds a digit from each operand; if carry occurs and if all prior additions
(if any) had summed to nine, overflow is signaled and the operation completes.

To avoid writing to the receiving field before overflow is detected,
the processor had a nines-counter which counted the number of leading
digits each having the value 9. Once a digit no longer sums to 9 or has
a carry, the BCD digit '9' is flushed to the receiving field until the nines
counter is zero. The most recent digit sum is stored in a register
to accomodate carry (add one to the register, flush it, and move current
result to register).

1025475_B2500_B3500_RefMan_Oct69.pdf (flowchart p. 5-11)

Ben Bacarisse

unread,
Jan 17, 2022, 4:00:10 PM1/17/22
to
Interesting. Thanks.

Unrelated, but I noticed this gem in the list of the system's
advantages:

d. Programming so simple it can be started by one programmer and
finished by another

--
Ben.

Scott Lurndal

unread,
Jan 17, 2022, 8:01:55 PM1/17/22
to
Indeed, it was quite simple to program. Reading a memory
dump was a breeze:

RECD NO FILE: TRKTAP EOF = 394 7/27/2021 (TUESDAY) 17:05 PAGE 0001

1 40F87AF2F8 4000100064 0790000000 6430017400 4001000000 0010010426 94C2000064 0000060768 5600100007 7972000000
8 : 2 8 m B `
(00100) 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0003000064
...00100/001
2 E3D9D2E3C1 D7F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0
T R K T A P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(00100) F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0 F0F0F0F0F0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...00100/001
3 E3D76DD6E4 E340404040 4040D7D9D6 C6C9D30400 0040404040 4040E00149 6001600000 00000000C4 C9E2D24040 0040404040
T P _ O U T P R O F I L \ - - D I S K
(00100) 4040404040 4040404040 4040400000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000
...00100/001
4 D7D9C9D5E3 4040404040 4040D7D9E3 E3D9D20200 0040404040 4040E00600 8001600000 00000000C4 C9E2D24040 0040404040
P R I N T P R T T R K \ - D I S K
(00100) 4040404040 4040404040 4040400000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000
...00100/001
5 E3D9D2E3C1 D740404040 4040E3D9D2 E3C1D70400 0040404040 4040E00668 0001600000 00000000C4 C9E2D24040 0040404040
T R K T A P T R K T A P \ - D I S K
(00100) 4040404040 4040404040 4040400000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000000
...00100/001
Decoding the instruction stream was trivial by eye:

0547000E LIX IX1 IX4 Save table address 003104 670740 100008
0548000E Validate(IX2,"+ERR-6:") /Validate Address
0548000M MVW IX2 -VRqst Set request ptr 003116 120002 000016 C00340
0548000M MVN +ERR-6 LL -VLab Set error label 003134 11A606 004446 C00348
0548000M VEN -Vparm Valid Do the validation 003152 350007 E00340 0D008356
0549000E
0550000E MVW RQ-CSD X2 STRCSD X7 Set descriptor 003172 120005 800004 4E000002
0551000E MPY 05 BADR SZ TBL-RN X4 IX1 Branch table offset 003192 05A502 000060 4D000004 100008
0552000E INC DOVRD1 IX1 Make code relative 003218 010707 100108 100008

note the MPY instruction:
Opcode = 05
AFBF = A502 (A Field and B field lengths, bcd; AF=A5 (5 digit unsigned literal), BF=02
A = 000060 (Literal 6 encoded in address field)
B = 4D000004 (Four digits from the base of Index Register 4)
C = 100008 (Signed 7 (AF + BF) digit result at address 8 in memory, which is also IX1)

Bonita Montero

unread,
Jan 19, 2022, 2:52:58 AM1/19/22
to
How about that. I wrote it with g++ and it should fit with clang++:

#include <iostream>
#include <stdexcept>
#include <string>

using namespace std;

unsigned long long parseUll( char const *str )
{
using ull_t = unsigned long long;
ull_t value = 0;
for( unsigned char const *scn = (unsigned char const *)str; *scn; ++scn )
{
if( __builtin_umulll_overflow( value, 10, &value ) )
throw overflow_error( "parseUll overflow" );;
if( __builtin_uaddll_overflow( value, *scn - '0', &value ) )
throw overflow_error( "parseUll overflow" );;
}
return value;
}

int main()
{
for( ; ; )
try
{
string strValue;
cin >> strValue;
cout << parseUll( strValue.c_str() ) << endl;
}
catch( overflow_error & )
{
cout << "overflow" << endl;
}
}

Öö Tiib

unread,
Jan 19, 2022, 9:08:20 AM1/19/22
to
On Wednesday, 19 January 2022 at 09:52:58 UTC+2, Bonita Montero wrote:
> How about that. I wrote it with g++ and it should fit with clang++:
>

The analogy that OP asked for was likely meant from real word not from
other programming language.

Bonita Montero

unread,
Jan 19, 2022, 10:31:16 AM1/19/22
to
It's just about the principle; the code can be easily ported to C.

Bonita Montero

unread,
Jan 19, 2022, 12:03:05 PM1/19/22
to
Here's a slightly better implementation with improvements for MSVC
with a benchmark.

#include <iostream>
#include <stdexcept>
#include <string>
#include <string>
#include <vector>
#include <random>
#include <sstream>
#include <chrono>
#include <immintrin.h>

using namespace std;
using namespace chrono;

#if defined(_MSC_VER)
__declspec(noinline)
#elif defined(__GNUC__)
__attribute__((noinline))
#endif
unsigned long long parseUll( char const *str )
{
unsigned long long value = 0;
for( ; *str; ++str )
{
#if (!defined(__llvm__) && defined(__GNUC__) && !defined(_MSC_VER)) ||
defined(PARSE_ULL_SIMPLE)
if( value * 10 / 10 != value )
goto overflow;
value *= 10;
unsigned char digit = *str - '0';
if( value + digit < value )
goto overflow;
value += digit;
#elif defined(__llvm__) || defined(__GNUC__)
if( __builtin_umulll_overflow( value, 10, &value ) )
goto overflow;
if( __builtin_uaddll_overflow( value, (unsigned char)*str - '0', &value) )
goto overflow;
#elif defined(_MSC_VER)
unsigned long long hi;
value = _mulx_u64( value, 10, &hi );
if( hi )
goto overflow;
// _addcarry_u64 specified but missing (MSVC 2022)
if( value + ((unsigned char)*str - '0') < value )
goto overflow;
value += (unsigned char)*str - '0';
#endif
}
return value;
overflow:
throw overflow_error( "parseUll() overflow" );
}

unsigned long long volatile vSum;

int main()
{
constexpr size_t N = 1000;
vector<string> rNums;
rNums.reserve( N );
mt19937_64 mt;
uniform_int_distribution<unsigned long long> uidValues( 0, -1 );
ostringstream oss;
for( size_t i = 0; i != N; ++i )
{
oss.str( "" );
oss << uidValues( mt );
rNums.emplace_back( oss.str() );
}
unsigned long long sum = 0;
auto start = high_resolution_clock::now();
for( size_t i = 0; i != 1000; ++i )
for( string &str : rNums )
sum += parseUll( str.c_str() );
::vSum = sum;
double ns = (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (1000.0 * N);
cout << ns << endl;
}

Bart

unread,
Jan 19, 2022, 1:27:38 PM1/19/22
to
Complicated. I used the simpler **C** code below. It's runtime was 10%
slower than the C++ (that is, elapsed time of the 10,000 outer loop for
both).

(Building the C++ took 3.3 seconds; building the C even with a slow gcc
took 0.32 seconds. Faster C compilers do it instantly.)

My version calls strlen on the string (which is also assumed here to
contain the number without signs, leading zeros, and to end on the last
digit).

In practice this information will often already be known. If I modify
parseull() to take a length (here emulated with a table of precalculated
values), then the C version is 25% faster than your C++.

Maybe your C++ version can also benefit from knowing the length, but I
can't see how from the way it's written.


------------------------------------------------------------

u64 parseull(char* s) {
int length=strlen(s);

if (length>20 || (length==20 && strcmp(s,
"18446744073709551615")>0)) {
puts("Overflow"); exit(1);
}

u64 a=*s -'0';
while (--length) {a=a*10+*++s-'0';};
return a;
}

int main(void) {
u64 a;
volatile u64 sum=0;

for (int j=0; j<10000; ++j) {
for (int i=0; i<1000; ++i) {
sum+=parseull(numbers[i]);
}
}

printf("%llu\n",sum);
}


Bonita Montero

unread,
Jan 19, 2022, 1:45:06 PM1/19/22
to
That's slower than the variants with intrinsics.

Scott Lurndal

unread,
Jan 19, 2022, 1:47:12 PM1/19/22
to
Bart <b...@freeuk.com> writes:
>On 19/01/2022 17:02, Bonita Montero wrote:

>
>Complicated. I used the simpler **C** code below. It's runtime was 10%
>slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>both).

I just use strtoll. Why reinvent the wheel?

Bart

unread,
Jan 19, 2022, 3:59:37 PM1/19/22
to
It needs to be stroull() for this purpose, which is more elusive (gcc
has it on Windows, but the two other compilers I have don't),

As to why it's worth reinventing, if I use strtoull instead, then my
benchmarks runs in 2.8 seconds instead of 0.28 seconds; it's much slower!

Besides which, when I need to do this stuff, the requirements are more
diverse (eg dealing with numeric separators, or determining whether the
literal fits in i64, u64, i128, u128, or represents a big number).

Scott Lurndal

unread,
Jan 19, 2022, 4:12:35 PM1/19/22
to
Bart <b...@freeuk.com> writes:
>On 19/01/2022 18:46, Scott Lurndal wrote:
>> Bart <b...@freeuk.com> writes:
>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>
>>>
>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>> both).
>>
>> I just use strtoll. Why reinvent the wheel?
>
>It needs to be stroull() for this purpose, which is more elusive (gcc
>has it on Windows, but the two other compilers I have don't),
>
>As to why it's worth reinventing, if I use strtoull instead, then my
>benchmarks runs in 2.8 seconds instead of 0.28 seconds; it's much slower!

Does your benchmark measure anything useful?

Very, very, very few application will have strtoull as a dominant
performance driver.

Bart

unread,
Jan 19, 2022, 5:25:20 PM1/19/22
to
On 19/01/2022 21:12, Scott Lurndal wrote:
> Bart <b...@freeuk.com> writes:
>> On 19/01/2022 18:46, Scott Lurndal wrote:
>>> Bart <b...@freeuk.com> writes:
>>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>>
>>>>
>>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>>> both).
>>>
>>> I just use strtoll. Why reinvent the wheel?
>>
>> It needs to be stroull() for this purpose, which is more elusive (gcc
>> has it on Windows, but the two other compilers I have don't),
>>
>> As to why it's worth reinventing, if I use strtoull instead, then my
>> benchmarks runs in 2.8 seconds instead of 0.28 seconds; it's much slower!
>
> Does your benchmark measure anything useful?

Only text to binary conversion of numbers, but then nobody ever does
that do that? Apart from processing textual formats such XML, HTML,
source code of virtually every language, configuration files, database
files, CDF/CSV and other data files.

>
> Very, very, very few application will have strtoull as a dominant
> performance driver.

Plenty of slow software about. Ignoring things like this adds up.

But in the case of strtoull, there are other issues with it:

* Not all compilers may recognise it

* The performance depends on the library used

Using your own routine for this simple task eliminates those variables.

The timing I quoted was the worst:

* gcc/strtoull on Windows: 2.8 seconds (for 1M conversions)

* gcc/strtoull on WSL: 0.6 seconds

* gcc/_strtoui64 on Windows (in msvcrt): 0.7 seconds

* My parseull routine on Windows: 0.28/0.35 seconds (with/without length)

The intention was anyway to match that complex C++ code, which was also
the wrong language, and also took 10 times as long to compile.

Ben Bacarisse

unread,
Jan 19, 2022, 10:49:44 PM1/19/22
to
Bart <b...@freeuk.com> writes:

> But in the case of strtoull, there are other issues with it:
>
> * Not all compilers may recognise it

It's almost always library issue, not a compiler issue. Any
implementation (compiler + library) that does not have it is a poor one
since it's been standard for a long time now.

> * The performance depends on the library used
>
> Using your own routine for this simple task eliminates those variables.
>
> The timing I quoted was the worst:
>
> * gcc/strtoull on Windows: 2.8 seconds (for 1M conversions)
>
> * gcc/strtoull on WSL: 0.6 seconds
>
> * gcc/_strtoui64 on Windows (in msvcrt): 0.7 seconds

You keep citing figures with little useful information. What length
numbers? What are these various (apparently slow) implementation of
strtoull that you have managed to find? Calling them gcc/strtoull is
not helpful.

For 10^6 numbers with, on average, 18.3 digits, the strtoull on a recent
Ubuntu install (glibc 2.34) does the conversions in 0.042 seconds
(i5-8265U CPU). Is your machine really more than 10 times slower than
my laptop?

> * My parseull routine on Windows: 0.28/0.35 seconds (with/without
> length)

This one probably doesn't do the same job, so it's not a direct
comparison.

--
Ben.

Bonita Montero

unread,
Jan 20, 2022, 2:08:39 AM1/20/22
to
Am 19.01.2022 um 19:44 schrieb Bonita Montero:

> That's slower than the variants with intrinsics.

Here's a little benchmark:

#include <iostream>
#include <stdexcept>
#include <string>
#include <string>
#include <vector>
#include <random>
#include <sstream>
#include <chrono>
#include <cstring>
#include <immintrin.h>

using namespace std;
using namespace chrono;

unsigned long long parseUllIntrinsic( char const *str );
unsigned long long parseUllStd( char const *str );
unsigned long long parseUllBart( char const *str );

unsigned long long volatile vSum;

int main()
{
constexpr size_t N = 1000;
vector<string> rNums;
rNums.reserve( N );
mt19937_64 mt;
uniform_int_distribution<unsigned long long> uidValues( 0, -1 );
ostringstream oss;
for( size_t i = 0; i != N; ++i )
{
oss.str( "" );
oss << uidValues( mt );
rNums.emplace_back( oss.str() );
}
auto bench = [&]( unsigned long long (*parseUllFn)( char const *) ) ->
double
{
unsigned long long sum = 0;
auto start = high_resolution_clock::now();
for( size_t i = 0; i != 10'000; ++i )
for( string &str : rNums )
sum += parseUllFn( str.c_str() );
::vSum = sum;
return (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (10'000.0 * N);
};
cout << "intrinsic: " << bench( parseUllIntrinsic ) << endl;
cout << "std: " << bench( parseUllStd ) << endl;
cout << "Bart: " << bench( parseUllBart ) << endl;
}

#if defined(_MSC_VER)
__declspec(noinline)
#elif defined(__GNUC__)
__attribute__((noinline))
#endif
unsigned long long parseUllIntrinsic( char const *str )
{
if( !*str )
return 0;
unsigned long long value = (unsigned char)*str++ - '0';
for( ; *str; ++str )
{
#if defined(__llvm__) || defined(__GNUC__)
if( __builtin_umulll_overflow( value, 10, &value ) )
goto overflow;
if( __builtin_uaddll_overflow( value, (unsigned char)*str - '0', &value) )
goto overflow;
#elif defined(_MSC_VER)
unsigned long long hi;
value = _mulx_u64( value, 10, &hi );
if( hi )
goto overflow;
// _addcarry_u64 specified but missing (MSVC 2022)
if( value + ((unsigned char)*str - '0') < value )
goto overflow;
value += (unsigned char)*str - '0';
#else
#error no intinsic version
#endif
}
return value;
overflow:
throw overflow_error( "parseUll() overflow" );
}


#if defined(_MSC_VER)
__declspec(noinline)
#elif defined(__GNUC__)
__attribute__((noinline))
#endif
unsigned long long parseUllStd( char const *str )
{
unsigned long long value;
if( !*str )
return 0;
value = (unsigned char)*str++ - '0';
for( ; *str; ++str )
{
if( value * 10 / 10 != value )
goto overflow;
value *= 10;
unsigned char digit = *str - '0';
if( value + digit < value )
goto overflow;
value += digit;
}
return value;
overflow:
throw overflow_error( "parseUll() overflow" );
}

#if defined(_MSC_VER)
__declspec(noinline)
#elif defined(__GNUC__)
__attribute__((noinline))
#endif
unsigned long long parseUllBart( char const *str )
{
size_t len = strlen( str );
unsigned long long value;
if( len > 20 || len == 20 && strcmp( str, "18446744073709551615" ) > 0 )
goto overflow;
if( !*str )
return 0;
value = (unsigned char)*str++ - '0';
while( *str )
value *= 10,
value += (unsigned char)*str++ - '0';
return value;
overflow:
throw overflow_error( "parseUll() overflow" );
}

Here are the MSVC-results:

intrinsic: 20.4304
std: 20.3481
Bart: 22.2739

The gcc/O2-results:

intrinsic: 20.4639
std: 20.7395
Bart: 23.3007

The clang++/O2-results:

intrinsic: 21.1037
std: 21.9567
Bart: 22.7341

Bart

unread,
Jan 20, 2022, 5:05:19 AM1/20/22
to
On 20/01/2022 03:49, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
>> But in the case of strtoull, there are other issues with it:
>>
>> * Not all compilers may recognise it
>
> It's almost always library issue, not a compiler issue. Any
> implementation (compiler + library) that does not have it is a poor one
> since it's been standard for a long time now.
>
>> * The performance depends on the library used
>>
>> Using your own routine for this simple task eliminates those variables.
>>
>> The timing I quoted was the worst:
>>
>> * gcc/strtoull on Windows: 2.8 seconds (for 1M conversions)
>>
>> * gcc/strtoull on WSL: 0.6 seconds
>>
>> * gcc/_strtoui64 on Windows (in msvcrt): 0.7 seconds
>
> You keep citing figures with little useful information. What length
> numbers? What are these various (apparently slow) implementation of
> strtoull that you have managed to find? Calling them gcc/strtoull is
> not helpful.
>

The full program is here:

https://github.com/sal55/langs/blob/master/parseull.c

You need to uncomment the routine you want in the inner loop. The
numbers are the ones generated with BM's program. Timings are elapsed
time from running the program, measured externally.

> For 10^6 numbers with, on average, 18.3 digits, the strtoull on a recent
> Ubuntu install (glibc 2.34) does the conversions in 0.042 seconds
> (i5-8265U CPU). Is your machine really more than 10 times slower than
> my laptop?

Sorry, that was my mistake: it's 10^7 conversions (originally 10^6 in
the C++ code but that was too quick to easily measure).

>> * My parseull routine on Windows: 0.28/0.35 seconds (with/without
>> length)
>
> This one probably doesn't do the same job, so it's not a direct
> comparison.

It was supposed to do the same job as BM's program: take a string
containing only digits and convert them while checking for overflows.

Although mine won't allow leading zeros if the overflow check is to
work; that would slow it down slightly to eliminate them.


Ben Bacarisse

unread,
Jan 20, 2022, 12:02:55 PM1/20/22
to
Right. But it does not do the same job as the other functions you gave
timings for /in the post I replied to/.

--
Ben.

Öö Tiib

unread,
Jan 20, 2022, 12:47:55 PM1/20/22
to
On Thursday, 20 January 2022 at 09:08:39 UTC+2, Bonita Montero wrote:
> Am 19.01.2022 um 19:44 schrieb Bonita Montero:
>
> > That's slower than the variants with intrinsics.
> Here's a little benchmark:

All those test functions do something strange compared to what OP's posted
code did. OP posted relatively reasonable function like that:

uint64_t array_to_uint64(char *s, uint64_t *u) {./*...*/}

For example to string "15A" that function returned 2 and put 15 into value
pointed by u. That made sense.

Your code however returns 167. That feels nonsense, where it was specified?
It does not matter at what speed a code gives such nonsense answers. At
least among my customers.

Bart

unread,
Jan 20, 2022, 2:20:50 PM1/20/22
to
On 20/01/2022 17:02, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
>> On 20/01/2022 03:49, Ben Bacarisse wrote:
>>> Bart <b...@freeuk.com> writes:
>
>>>> * My parseull routine on Windows: 0.28/0.35 seconds (with/without
>>>> length)
>>>
>>> This one probably doesn't do the same job, so it's not a direct
>>> comparison.
>>
>> It was supposed to do the same job as BM's program: take a string
>> containing only digits and convert them while checking for overflows.
>
> Right. But it does not do the same job as the other functions you gave
> timings for /in the post I replied to/.

OK. Maybe you should ask Scott Lurndal why /he/ brought up strtoll().
The implication was they they did the same kind of thing, namely convert
a string of digits to an integer.

I suggested that among several reasons why a custom routine might be
used, was performance, even if they differ in features.

However, I've created a version of my routine which is more in line with
how strtoull etc work (but I don't know the full spec), shown below.

This version is still 50% faster than the fastest stroull routine on my
machine.

(One reason may be that it is hard-coded for base-10. If so, then that
is an argument for creating your own dedicated base-10 version.)

-----------------------------------------------
typedef unsigned long long u64;

u64 parseull(char* s, char** send) {
int length,neg=0;

while (*s==' ' || *s=='\t' ) ++s; // leading spaces
if (*s=='-') {neg=1; ++s;} // optional signs
else if (*s=='+') ++s;
while (*s=='0' && *(s+1)=='0') ++s; // leading zeros

length=0;
*send=s;

while (**send>='0' && **send<='9') ++(*send);
length=*send-s;
if (length==0) return -1; // u64.max on error
if (length>20) return -1;

if (length==20 && strncmp(s, "18446744073709551615",20)>0)
return -1;

u64 a=*s -'0';
while (--length) {a=a*10+*++s-'0';};
if (neg) return -a;
return a;
}




Bart

unread,
Jan 20, 2022, 2:31:56 PM1/20/22
to
On 20/01/2022 19:20, Bart wrote:

> -----------------------------------------------
>     typedef unsigned long long u64;
>
>     u64 parseull(char* s, char** send) {
>         int length,neg=0;
>
>         while (*s==' ' || *s=='\t' ) ++s;        // leading spaces
>         if (*s=='-') {neg=1; ++s;}               // optional signs
>         else if (*s=='+') ++s;
>         while (*s=='0' && *(s+1)=='0') ++s;      // leading zeros

That's not quite right, as it will leave 1 leading zero; better:

while (*s=='0' && (*(s+1)>='1' && *(s+1)<='9')) ++s;

Tim Rentsch

unread,
Jan 20, 2022, 10:24:51 PM1/20/22
to
Meredith Montgomery <mmont...@levado.to> writes:

> I've been trying to think of an analogy for the verification
>
> if( ((UINT64_MAX - c) / 10) >= r)
> r = r * 10 + c;
> else return -1; /* doesn't fit */
>
> in the procedure below. [...]

No analogy needed. The condition that needs to be
satisfied is

r * 10 + c <= UINT64_MAX

which is the same as

r * 10 <= UINT64_MAX - c

which is the same as

r <= (UINT64_MAX - c) / 10

which is the same as

(UINT64_MAX - c) / 10 >= r

giving the expression in the if() test. Done.


> (*) The procedure
>
> uint64_t array_to_uint64(char *s, uint64_t *u)
> {
> uint64_t pos;
> uint64_t r;
> uint64_t c;
>
> pos = 0; r = 0;
>
> for ( ;; ) {
> c = (uint64_t) (unsigned char) (s[pos] - '0');
> if (c < 10) {
> if( ((UINT64_MAX - c) / 10) >= r)
> r = r * 10 + c;
> else return -1; /* doesn't fit */
> ++pos; continue;
> }
> break;
> }
>
> *u = r;
> return pos;
> }

It's better to write the code so the funny division test
isn't needed:

uint64_t
array_to_uint64( char *s0, uint64_t *u ){
char *s = s0;
uint64_t r, d, nr;

for( r = 0; d = *s-'0', nr = r*10+d, d < 10; r = nr, s++ ){
if( r > nr ) return -1;
}

return *u = r, s-s0;
}

Bonita Montero

unread,
Jan 21, 2022, 2:08:11 AM1/21/22
to
Am 21.01.2022 um 04:24 schrieb Tim Rentsch:
> Meredith Montgomery <mmont...@levado.to> writes:
>
>> I've been trying to think of an analogy for the verification
>>
>> if( ((UINT64_MAX - c) / 10) >= r)
>> r = r * 10 + c;
>> else return -1; /* doesn't fit */
>>
>> in the procedure below. [...]
>
> No analogy needed. The condition that needs to be
> satisfied is
>
> r * 10 + c <= UINT64_MAX
>
> which is the same as
>
> r * 10 <= UINT64_MAX - c
>
> which is the same as
>
> r <= (UINT64_MAX - c) / 10
>
> which is the same as
>
> (UINT64_MAX - c) / 10 >= r
>
> giving the expression in the if() test. Done.
>

Doesn't help because c isn't a constant.

Tim Rentsch

unread,
Jan 21, 2022, 9:01:23 AM1/21/22
to
It's hard to know what to say to such an obviously
inapplicable comment.

Bonita Montero

unread,
Jan 21, 2022, 11:59:22 AM1/21/22
to
If your exchanges would result in a constant instead of a calucaltion
they would be favourable. But the overhead for your swapped calculation
is exactly the same and the code isn't more readable than before.

Scott Lurndal

unread,
Jan 21, 2022, 12:41:51 PM1/21/22
to
Although the ordering can be significant from a security perspective,
consider this:

if (len > PAGE_SIZE - 2 - size)

vs. this:

if (size + len + 2 > PAGE_SIZE)

Which is better, and why?

Bonita Montero

unread,
Jan 21, 2022, 1:27:13 PM1/21/22
to
Totally different discussion.

Tim Rentsch

unread,
Jan 21, 2022, 8:12:26 PM1/21/22
to
Apparently you don't understand that your comment has no bearing
on what I was talking about.

Keith Thompson

unread,
Jan 23, 2022, 5:26:39 PM1/23/22
to
Bart <b...@freeuk.com> writes:
> On 19/01/2022 18:46, Scott Lurndal wrote:
>> Bart <b...@freeuk.com> writes:
>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>> both).
>> I just use strtoll. Why reinvent the wheel?
>
> It needs to be stroull() for this purpose, which is more elusive (gcc
> has it on Windows, but the two other compilers I have don't),

gcc does not provide strtoll() or strtoull(). Both are provided by the
library, not by the compiler. (And both were introduced in C99, so I'd
be at least mildly surprised by an implementation that provides one
but not the other.)

I know you're tired of people pointing out that the compiler (gcc
in this case) does not provide library functions. The solution is
for you to stop making that mistake. Or should I assume you enjoy
these arguments?

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Bart

unread,
Jan 23, 2022, 7:13:43 PM1/23/22
to
On 23/01/2022 22:26, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:
>> On 19/01/2022 18:46, Scott Lurndal wrote:
>>> Bart <b...@freeuk.com> writes:
>>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>>> both).
>>> I just use strtoll. Why reinvent the wheel?
>>
>> It needs to be stroull() for this purpose, which is more elusive (gcc
>> has it on Windows, but the two other compilers I have don't),
>
> gcc does not provide strtoll() or strtoull(). Both are provided by the
> library, not by the compiler. (And both were introduced in C99, so I'd
> be at least mildly surprised by an implementation that provides one
> but not the other.)
>
> I know you're tired of people pointing out that the compiler (gcc
> in this case) does not provide library functions. The solution is
> for you to stop making that mistake. Or should I assume you enjoy
> these arguments?

gcc/tdm compiles programs using strtoull.

bcc/tcc fail with a link error, unless I include this line:

#define strtoull _strtoui64

but then gcc will complain about it.

Actually why that is the case, I don't know, don't care, and probably no
else cares who just installs a 'bundle' without wanting to trace and
check the provenance of each library function that it comes with.

The above is enough for me to think twice about using strtoull in a project.

Lew Pitcher

unread,
Jan 23, 2022, 7:39:07 PM1/23/22
to
On Mon, 24 Jan 2022 00:13:32 +0000, Bart wrote:

> On 23/01/2022 22:26, Keith Thompson wrote:
>> Bart <b...@freeuk.com> writes:
>>> On 19/01/2022 18:46, Scott Lurndal wrote:
>>>> Bart <b...@freeuk.com> writes:
>>>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>>>> both).
>>>> I just use strtoll. Why reinvent the wheel?
>>>
>>> It needs to be stroull() for this purpose, which is more elusive (gcc
>>> has it on Windows, but the two other compilers I have don't),
>>
>> gcc does not provide strtoll() or strtoull(). Both are provided by the
>> library, not by the compiler. (And both were introduced in C99, so I'd
>> be at least mildly surprised by an implementation that provides one
>> but not the other.)
>>
>> I know you're tired of people pointing out that the compiler (gcc
>> in this case) does not provide library functions. The solution is
>> for you to stop making that mistake. Or should I assume you enjoy
>> these arguments?
>
> gcc/tdm compiles programs using strtoull.
>
> bcc/tcc fail with a link error, unless I include this line:
>
> #define strtoull _strtoui64
>
> but then gcc will complain about it.

Countless generations of programmers have found a sensible way to handle such
conflicts by using conditional compilation.

strtoull() has been defined as being part of the C99 standard library, and
before that, part of the POSIX.1-2001 and POSIX.1-2001 Unix standard libraries.
A simple macro test of the value of __STDC_VERSION__, and setting of the appropriate
pre-C99 macros should be all that is necessary to properly access strtoull().

Something like
#if (__STDC_VERSION__ < 199901l)
//do whatever it takes to properly define strtoull() for this platform
#endif

(note, the above "do whatever" might include such things as testing macros
to determine the platform and take appropriate action, as in
#ifdef WIN32
#define strtoull _strtoui64
#endif
or defining macros to force platform-specific libraries, like
#define _SVID_SOURCE

[snip]
> The above is enough for me to think twice about using strtoull in a project.

Your loss, I guess.

--
Lew Pitcher
"In Skills, We Trust"

Bart

unread,
Jan 23, 2022, 8:09:56 PM1/23/22
to
If you're an implementer creating headers for a C compiler, then you
should just declare then properly.

I just have in mine; it took a minute or so (after previously finding
out that MS uses that different naming). But I can't do it for tcc or
any other compiler that might have an issue.

And if you're not an implementer, then that's a lot of clutter to go
into your application's headers just to pander for one or two compilers.

It's the sort of think I absolutely hate when poking arond application
headers, trying to figure why it doesn't work, because it's
special-casing some specific compilers, which will then exclude mine.

You might as well just implement a suitable function, then it'll run
anywhere, might be faster as I found, and could be made with a sweeter
interface.

Ben Bacarisse

unread,
Jan 23, 2022, 8:16:06 PM1/23/22
to
Bart <b...@freeuk.com> writes:

> On 23/01/2022 22:26, Keith Thompson wrote:
>> Bart <b...@freeuk.com> writes:
>>> On 19/01/2022 18:46, Scott Lurndal wrote:
>>>> Bart <b...@freeuk.com> writes:
>>>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>>>> both).
>>>> I just use strtoll. Why reinvent the wheel?
>>>
>>> It needs to be stroull() for this purpose, which is more elusive (gcc
>>> has it on Windows, but the two other compilers I have don't),
>> gcc does not provide strtoll() or strtoull(). Both are provided by the
>> library, not by the compiler. (And both were introduced in C99, so I'd
>> be at least mildly surprised by an implementation that provides one
>> but not the other.)
>> I know you're tired of people pointing out that the compiler (gcc
>> in this case) does not provide library functions. The solution is
>> for you to stop making that mistake. Or should I assume you enjoy
>> these arguments?
>
> gcc/tdm compiles programs using strtoull.
>
> bcc/tcc fail with a link error,

My tcc-based C installation handles it fine. That's because, as you
must know, it's not a compiler issue.

> unless I include this line:
>
> #define strtoull _strtoui64

Which, of course, means "unless I don't use strtoull".

> but then gcc will complain about it.
>
> Actually why that is the case, I don't know, don't care, and probably
> no else cares who just installs a 'bundle' without wanting to trace
> and check the provenance of each library function that it comes with.

I'd hope it's rare to not want to know what's going on.

> The above is enough for me to think twice about using strtoull in a
> project.

If your code must work with non-standard C implementations, then there
might well be a whole raft of things you should avoid. But since you
seem to use C a lot, wouldn't it be better either to try to fix your tcc
installation or to simply uninstall it?

--
Ben.

Keith Thompson

unread,
Jan 23, 2022, 8:38:38 PM1/23/22
to
It's obvious that you just enjoy the arguments that result when you
pretend to misunderstand the difference between a compiler and an
implementation and are not interested in helping anyone else understand
it. I'll try to adjust my future responses accordingly.

Bart

unread,
Jan 24, 2022, 6:21:32 AM1/24/22
to
c:\c>type c.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char* p;
printf("%llu\n", strtoull("1234", &p, 10));
}

c:\c>tcc c.c
tcc: error: undefined symbol 'strtoull'

c:\c>tcc -c c.c

c:\c>tcc -v
tcc version 0.9.27 (x86_64 Windows)

It looks like a link issue with tcc to me. As I said this was on
Windows. tcc appears to use msvcrt.dll, because when I add that define,
and look at the resulting exe, it says:

Name: msvcrt.dll
Import Addr RVA: 2038
Import: 20e3 0 _strtoui64
Import: 20f0 0 printf
Import: 20f9 0 __set_app_type
....

Maybe some people spend at lot of time telling their compilers exactly
which libraries to use for standard functions of C; I just use the
supplied compiler or driver program.

Since all C compilers that I've used will do the whole job of source ->
binary just by invoking that front end.

So, what version of tcc do you have, and what special thing do you have
to do to make it work?

If you're on Linux, that doesn't count as the standard library is
unlikely to use MS's _strtoui64 name for that function.

This is what I originally said:

"...(gcc has it on Windows, but the two other compilers I have don't)"

>> unless I include this line:
>>
>> #define strtoull _strtoui64
>
> Which, of course, means "unless I don't use strtoull".

It's better not to. If I were to post code anywhere that used it, then
at least some people trying it wouldn't be able to build it. And I'm not
having /my/ code full of those implementation-specific conditional
blocks that I despise.

For the same reason, I tend not to write shared code that uses '$' in
identifiers, because tcc doesn't support it.

> If your code must work with non-standard C implementations, then there
> might well be a whole raft of things you should avoid. But since you
> seem to use C a lot, wouldn't it be better either to try to fix your tcc
> installation or to simply uninstall it?

What do you mean by fix? Fixing my tcc doesn't help people compiling my
code unless everybody fixes theirs too.

Bart

unread,
Jan 24, 2022, 6:22:26 AM1/24/22
to
On 24/01/2022 01:38, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:

> It's obvious that you just enjoy the arguments that result when you
> pretend to misunderstand the difference between a compiler and an
> implementation and are not interested in helping anyone else understand
> it. I'll try to adjust my future responses accordingly.

It's not me bringing this up each time. It's you.

Bart

unread,
Jan 24, 2022, 6:51:42 AM1/24/22
to
I know what's going on here; but I can't do anything about it.

Inside my tcc's stdlib.h is this:

unsigned long long __cdecl strtoull(const char* __restrict__, char**
__restrict__, int);

Tha's identical to that from gcc/tdm's stdlib.h, but minus a
__MINGW_EXTENSION macro at the start.

However my gcc/tdm manages to associate that with an actual function
somewhere called 'strtoull'; tcc won't be able to do that if it only
makes use of msvcrt.dll.

Clearly no one ever tested strtoull on Windows tcc; neither did I on my
bcc; I copied the strtoull declaration from lccwin's stdlib.h; that one
uses its own implementation of the function.

(If I change tcc's stdlib to use _strtoui64 and add that define, then it
will work. But so what? I'm only changing my personal copy.)

Scott Lurndal

unread,
Jan 24, 2022, 10:52:04 AM1/24/22
to
Bart <b...@freeuk.com> writes:
>On 23/01/2022 22:26, Keith Thompson wrote:
>> Bart <b...@freeuk.com> writes:
>>> On 19/01/2022 18:46, Scott Lurndal wrote:
>>>> Bart <b...@freeuk.com> writes:
>>>>> On 19/01/2022 17:02, Bonita Montero wrote:
>>>>> Complicated. I used the simpler **C** code below. It's runtime was 10%
>>>>> slower than the C++ (that is, elapsed time of the 10,000 outer loop for
>>>>> both).
>>>> I just use strtoll. Why reinvent the wheel?
>>>
>>> It needs to be stroull() for this purpose, which is more elusive (gcc
>>> has it on Windows, but the two other compilers I have don't),
>>
>> gcc does not provide strtoll() or strtoull(). Both are provided by the
>> library, not by the compiler. (And both were introduced in C99, so I'd
>> be at least mildly surprised by an implementation that provides one
>> but not the other.)
>>
>> I know you're tired of people pointing out that the compiler (gcc
>> in this case) does not provide library functions. The solution is
>> for you to stop making that mistake. Or should I assume you enjoy
>> these arguments?
>
>gcc/tdm compiles programs using strtoull.
>
>bcc/tcc fail with a link error, unless I include this line:

Then they are POS compilers, or you're using them incorrectly,
such as forgetting to include <stdlib.h>.

Scott Lurndal

unread,
Jan 24, 2022, 10:54:13 AM1/24/22
to
Bart <b...@freeuk.com> writes:

> c:\c>tcc c.c
> tcc: error: undefined symbol 'strtoull'

strtoull is a POSIX symbol. If the tcc implementation you
are using supports POSIX, then you've likely misconfigured tcc,
otherwise you're tilting at windmills.

Ben Bacarisse

unread,
Jan 24, 2022, 10:57:34 AM1/24/22
to
Bart <b...@freeuk.com> writes:

> On 24/01/2022 01:15, Ben Bacarisse wrote:
>> Bart <b...@freeuk.com> writes:

>>> #define strtoull _strtoui64
>> Which, of course, means "unless I don't use strtoull".
>
> It's better not to. If I were to post code anywhere that used it, then
> at least some people trying it wouldn't be able to build it. And I'm
> not having /my/ code full of those implementation-specific conditional
> blocks that I despise.

What level of broken are you prepared to work around? If I produce a C
implementation, based, say, on tcc, which fails to link strlen, would
you feel you have to work round that? I don't think so. You'd tell me
that my implementation is broken.

Conditional blocks are ugly, but writing potentially buggy code just to
avoid a standard library function is also hardly ideal.

> For the same reason, I tend not to write shared code that uses '$' in
> identifiers, because tcc doesn't support it.

Not even in the same ball-park. No C implementation is required to
support $ in identifiers, but every conforming C implementation is
required to support strtoull.

>> If your code must work with non-standard C implementations, then there
>> might well be a whole raft of things you should avoid. But since you
>> seem to use C a lot, wouldn't it be better either to try to fix your tcc
>> installation or to simply uninstall it?
>
> What do you mean by fix? Fixing my tcc doesn't help people compiling
> my code unless everybody fixes theirs too.

I didn't realise anyone else compiled your code. I thought your C
projects were personal ones. You could just say that a conforming C
compiler is required. Would that really disenfranchise a large
user-base?

--
Ben.

Bart

unread,
Jan 24, 2022, 11:52:57 AM1/24/22
to
On 24/01/2022 15:57, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
>> On 24/01/2022 01:15, Ben Bacarisse wrote:
>>> Bart <b...@freeuk.com> writes:
>
>>>> #define strtoull _strtoui64
>>> Which, of course, means "unless I don't use strtoull".
>>
>> It's better not to. If I were to post code anywhere that used it, then
>> at least some people trying it wouldn't be able to build it. And I'm
>> not having /my/ code full of those implementation-specific conditional
>> blocks that I despise.
>
> What level of broken are you prepared to work around? If I produce a C
> implementation, based, say, on tcc, which fails to link strlen, would
> you feel you have to work round that? I don't think so. You'd tell me
> that my implementation is broken.


/That's/ not in the same ballpark! Without strlen, very few C programs
would build. Any problems would quickly be found.

With strtoull, people can spend decades coding C and not need to use it
or encounter it. Apparently no one has with tcc on Windows, or they
couldn't find where to file a bug report.


> Conditional blocks are ugly, but writing potentially buggy code just to
> avoid a standard library function is also hardly ideal.
>
>> For the same reason, I tend not to write shared code that uses '$' in
>> identifiers, because tcc doesn't support it.
>
> Not even in the same ball-park. No C implementation is required to
> support $ in identifiers, but every conforming C implementation is
> required to support strtoull.

Yet most do support '$'; why bother to allow it, unless they expect
people to use it?

The exceptions I've come across are tcc, and lccwin which has limited
support (I think only as a starter).

I used '$' extensively in generated C (when it occurs in the source
language, and is used as the 'dot' in qualified names). But because tcc
is such an important target for me, I had to work around it.

>
>>> If your code must work with non-standard C implementations, then there
>>> might well be a whole raft of things you should avoid. But since you
>>> seem to use C a lot, wouldn't it be better either to try to fix your tcc
>>> installation or to simply uninstall it?
>>
>> What do you mean by fix? Fixing my tcc doesn't help people compiling
>> my code unless everybody fixes theirs too.
>
> I didn't realise anyone else compiled your code.

A lot of my C code is posted on forums or linked to. If it's not to be
compiled as C, then there's no point in using C, as my own language is
much sweeter.

So if I go to that trouble, it has to compile. But I only test with tcc,
gcc and bcc. Usually if it passes bcc, it will work with anything, other
than tcc + $ symbols.

Öö Tiib

unread,
Jan 24, 2022, 1:17:20 PM1/24/22
to
On Monday, 24 January 2022 at 18:52:57 UTC+2, Bart wrote:
> On 24/01/2022 15:57, Ben Bacarisse wrote:
> > Bart <b...@freeuk.com> writes:
> >
> >> For the same reason, I tend not to write shared code that uses '$' in
> >> identifiers, because tcc doesn't support it.
> >
> > Not even in the same ball-park. No C implementation is required to
> > support $ in identifiers, but every conforming C implementation is
> > required to support strtoull.
>
> Yet most do support '$'; why bother to allow it, unless they expect
> people to use it?

They expect that people may need to use it. ABI on several platforms
allows '$' in symbol names and there might arise need to link
C code to library or shared object that contains such name. The
implementer tries to be helpful by allowing what ABI allows and
what can not otherwise contradict with C syntax. It does not mean
that one should start to use '$' instead of 'S' to look cool or
something.

Scott Lurndal

unread,
Jan 24, 2022, 1:24:09 PM1/24/22
to
C on VMS, for example, required '$' in identifiers for the system
services (library functions, system calls) such as $QIOW or $GETJPI et alia.

Keith Thompson

unread,
Jan 24, 2022, 4:08:47 PM1/24/22
to
I'm not sure what you mean by POSIX symbol. In C99 and later, strtoll
and strtoull are both defined by ISO C, and will be supported by any
conforming (hosted) implementation. Neither was defined by C90. I
suppose earlier editions of POSIX may have defined strtoll and strtoull
for pre-C99 compilers, but they depend on the long long and unsigned
long long types.

Did pre-C99 versions of POSIX (optionally?) specify long long and
unsigned long long before ISO C did?

Keith Thompson

unread,
Jan 24, 2022, 4:15:16 PM1/24/22
to
Bart <b...@freeuk.com> writes:
> On 24/01/2022 01:15, Ben Bacarisse wrote:
[...]
>> Which, of course, means "unless I don't use strtoull".
>
> It's better not to. If I were to post code anywhere that used it, then
> at least some people trying it wouldn't be able to build it. And I'm
> not having /my/ code full of those implementation-specific conditional
> blocks that I despise.
>
> For the same reason, I tend not to write shared code that uses '$' in
> identifiers, because tcc doesn't support it.

Again, both strtoll and strtoull have been standard since ISO C 1999.
I remember when it wasn't safe to assume C99 support for portable
code, but I don't get the impression that that's the case anymore.
Workarounds might still be needed, but I suggest it's not worth
worrying about too much.

(Bart knows that it's the library, not the tcc compiler, that supports
or doesn't support strtoull. I will not argue with him about that. And
as I recall we recently had a discussion indicating that there are
multiple versions of msvcrt.dll.)

Bart

unread,
Jan 24, 2022, 5:03:15 PM1/24/22
to
I said it's a link error. They both depend on the msvcrt.dll library for
standard C functions, and strtoull is not defined in that library. (It
is defined inside ucrtbase.dll, but then that's missing stuff like printf.)

As certain people are so fond of reminding me, the library is a
completely different entity from the compiler, so it is apparently not a
compiler problem as both provide a proper API entry for that function.


Scott Lurndal

unread,
Jan 24, 2022, 5:51:44 PM1/24/22
to
Keith Thompson <Keith.S.T...@gmail.com> writes:
>sc...@slp53.sl.home (Scott Lurndal) writes:
>> Bart <b...@freeuk.com> writes:
>>
>>> c:\c>tcc c.c
>>> tcc: error: undefined symbol 'strtoull'
>>
>> strtoull is a POSIX symbol. If the tcc implementation you
>> are using supports POSIX, then you've likely misconfigured tcc,
>> otherwise you're tilting at windmills.
>
>I'm not sure what you mean by POSIX symbol. In C99 and later, strtoll
>and strtoull are both defined by ISO C, and will be supported by any
>conforming (hosted) implementation. Neither was defined by C90. I
>suppose earlier editions of POSIX may have defined strtoll and strtoull
>for pre-C99 compilers, but they depend on the long long and unsigned
>long long types.

POSIX generally incorporates a C standard by reference, and we
had implemented (circa 1989) the strto[u]ll functions in our versions
of Unix for the Motorla 88100. My early specs are in storage,
so I can't refer back to them to see when 1003.4 or X/Open adopted
them.

Keith Thompson

unread,
Jan 24, 2022, 6:33:33 PM1/24/22
to
Bart <b...@freeuk.com> writes:
> On 24/01/2022 15:51, Scott Lurndal wrote:
>> Bart <b...@freeuk.com> writes:
[...]
>>> bcc/tcc fail with a link error, unless I include this line:
>> Then they are POS compilers, or you're using them incorrectly,
>> such as forgetting to include <stdlib.h>.
>
> I said it's a link error. They both depend on the msvcrt.dll library
> for standard C functions, and strtoull is not defined in that
> library. (It is defined inside ucrtbase.dll, but then that's missing
> stuff like printf.)

An aside: I have three different versions of msvcrt.dll on my Windows
system. None of them appear to support strtoll or strtoull.

> As certain people are so fond of reminding me, the library is a
> completely different entity from the compiler, so it is apparently not
> a compiler problem as both provide a proper API entry for that
> function.

I think that may be the first time you've actually acknowledged that.

However, it's not necessary for the compiler itself to have a
"proper API entry" for a library function, or to know anything
about it. Using strtol as an example (since it's supported all the
way back to C89/C90), the compiler knows how to call it because it
sees the declaration in <stdlib.h>. (For a C90 compiler, if you
don't include the header, the compiler will assume an incorrect
declaration for strtol when it sees a call.) <stdlib.h>, which provides
the declaration, and whatever file(?) provides the code that actually
implements the function, are typically provided by the same package (or
at least they must be kept closely in synch).

A call to a standard library function is just a function call,
unless the implementation chooses to do something fancy.

Some compilers incorporate some information about some standard
library functions for the purpose of producing better diagnostics
and/or optimizations. But if, for example, a future standard
added an strtofoo() function, no *compiler* changes would be needed
to support it, as long as the headers and library implementation
supported it correctly. The same thing happens if you add your own
footobar() function and provide a header that declares it (using
a different name because names starting with "str" are reserved).

Of course if you're using an implementation that doesn't conform to
C99, you're not guaranteed to be able to use strtoll() or strtoull(),
and you'll have to find some workaround. An implementation that
depends entirely on msvcrt.dll for the C standard library cannot
conform to C99 (unless there's a later version of msvcrt.dll that I'm
not aware of). I doubt that very many people have to deal with that.
(I presume you'll agree that you are not a typical user.)

Bart

unread,
Jan 24, 2022, 7:10:10 PM1/24/22
to
On 24/01/2022 23:33, Keith Thompson wrote:
> Bart <b...@freeuk.com> writes:
>> On 24/01/2022 15:51, Scott Lurndal wrote:
>>> Bart <b...@freeuk.com> writes:
> [...]
>>>> bcc/tcc fail with a link error, unless I include this line:
>>> Then they are POS compilers, or you're using them incorrectly,
>>> such as forgetting to include <stdlib.h>.
>>
>> I said it's a link error. They both depend on the msvcrt.dll library
>> for standard C functions, and strtoull is not defined in that
>> library. (It is defined inside ucrtbase.dll, but then that's missing
>> stuff like printf.)
>
> An aside: I have three different versions of msvcrt.dll on my Windows
> system. None of them appear to support strtoll or strtoull.
>
>> As certain people are so fond of reminding me, the library is a
>> completely different entity from the compiler, so it is apparently not
>> a compiler problem as both provide a proper API entry for that
>> function.
>
> I think that may be the first time you've actually acknowledged that.
>
> However, it's not necessary for the compiler itself to have a
> "proper API entry" for a library function, or to know anything
> about it. Using strtol as an example (since it's supported all the
> way back to C89/C90), the compiler knows how to call it because it
> sees the declaration in <stdlib.h>.

The declaration is what I mean by 'API' entry. I'm using 'API' to mean
all the information needed by the programmer to write calls to a
function, and for the compiler to check those calls and generate the
proper code, usually inside some header file.


> (I presume you'll agree that you are not a typical user.)

I guess not. I make considerably more use of the C standard library from
outside C than inside it. However msvcrt.dll exports just over 1300
functions, but I only ever use a few dozen. strtoull is one of the 1250
or so that I haven't yet needed.


Keith Thompson

unread,
Jan 25, 2022, 12:14:58 AM1/25/22
to
You said that "both provide a proper API entry for that function", where
the context indicated that "both" referred to the compiler and the
runtime library. The compiler needs to *see* a C function declaration
(a clearer description IMHO than "proper API entry"), presumably in a
header file, but that declaration is not provided by the compiler
itself. (Some headers might be distributed as part of the same package
as the compiler rather than as part of the runtime library; for example
gcc provides <stddef.h>.)

I think we both understand all this. Please don't obfuscate it further.

[...]

Manfred

unread,
Jan 26, 2022, 3:01:14 PM1/26/22
to
Most importantly, msvcrt.dll is *not* a C standard library. It is a
Microsoft library that exports C runtime functions for their own
products [*].
It is not documented anywhere as a C *standard* library.
To be explicit, this means that you can't complain if it does not export
a conforming implementation of C, and if you find inconsistencies with
the standard it is certainly not the fault neither of the language nor
of the standard.

It is your choice if you want to use it, but it should be no surprise at
all if some C standard function is not available, or shows a different
behaviour, or even has a different signature from the ISO standard. It
is *not* a C standard library. (I guess I already wrote that)

In case the above were not sufficiently clear, not even Microsoft does
list 'msvcrt.dll' among the redistributables for software developed with
their development environment for C, which is Visual Studio, and is the
only product for which they document /some/ conformancy with ISO C.

cfr. some information from Microsoft:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/crt-library-features?view=msvc-170


As for some background, in the '90s 'msvcrt.dll' used to be Microsoft's
"C runtime library" and it was part of the redistributables for early
versions of their development product for C, which was called Visual
C++. In this respect, you might say that /at that time/ it was part of
Microsoft's implementation of C.
That said, the necessary remark is that, expecially at that time,
Microsoft's implementation of C was well known for being wildly
diverging from ISO C.


([*] More specifically, the page I linked above lists 'msvcrt.lib' among
the "libraries that implement CRT initialization and termination" for C
programs written with Visual Studio.)

Bart

unread,
Jan 26, 2022, 3:54:11 PM1/26/22
to
I think I first came across this library when I started to work with
Windows sometime in the 90s.

I didn't really associate with it C (I didn't use it from that
language); it seemed just another set of WinAPI functions from the docs,
all of which used C-style declarations.

It seems to be still present on every Windows OS, so I don't think it's
going anywhere.

msvcrt.dll is also used by gcc/tdm on Windows as well as tcc, for
building programs. If it suddenly disappeared, then quite a lot of
programs would stop working...

... including gcc.exe and tcc.exe which themselves both import msvcrt.dll.

Manfred

unread,
Jan 26, 2022, 9:42:42 PM1/26/22
to
Yes, msvcrt.dll has been part of Windows for a very long time, and most
probably it will stay this way, although one has to notice that with
Microsoft this kind of prediction has gotten harder and harder as of
recent - I believe they have thrown out more stuff out of the window in
the last five years than in the previous 35, to the point that today it
is even impossible to get redistributables that were first released in 2010.

But that was not my point.

Meredith Montgomery

unread,
Jan 28, 2022, 8:15:57 PM1/28/22
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> Meredith Montgomery <mmont...@levado.to> writes:
>
>> Bart <b...@freeuk.com> writes:
>>
>>> On 16/01/2022 02:27, Meredith Montgomery wrote:
>>>> uint64_t array_to_uint64(char *s, uint64_t *u)
>>>> {
>>>> uint64_t pos;
>>>> uint64_t r;
>>>> uint64_t c;
>>>> pos = 0; r = 0;
>>>> for ( ;; ) {
>>>> c = (uint64_t) (unsigned char) (s[pos] - '0');
>>>> if (c < 10) {
>>>> if( ((UINT64_MAX - c) / 10) >= r)
>>>
>>> Dividing by 10 each time is unnecessary (even if usually optimised to
>>> shifts and multiplies).
>>>
>>> You only need to make this check after you've already processed 19
>>> characters, as it could overflow on the 20th, but not before.
>>>
>>> I think when pos >= 18.
>>
>> I suppose you're right, but imagine putting such check there. It would
>> take even more paragraphs to explain it to myself some time later when
>> I'm trying to figure out what I wrote a while back.
>
> You could just check that the array contains a string of digits
> lexicographically less than or equal to 18446744073709551615. If there
> are fewer digits than this, or, at every position, the digit you have is
> no greater than the corresponding digit of that number, you are ok.

That seems correct, but if I change the size of my register, then I must
replace the number too. :-)

Meredith Montgomery

unread,
Jan 28, 2022, 8:25:21 PM1/28/22
to
Tim Rentsch <tr.1...@z991.linuxsc.com> writes:

> Meredith Montgomery <mmont...@levado.to> writes:
>
>> I've been trying to think of an analogy for the verification
>>
>> if( ((UINT64_MAX - c) / 10) >= r)
>> r = r * 10 + c;
>> else return -1; /* doesn't fit */
>>
>> in the procedure below. [...]
>
> No analogy needed. The condition that needs to be
> satisfied is
>
> r * 10 + c <= UINT64_MAX
>
> which is the same as
>
> r * 10 <= UINT64_MAX - c
>
> which is the same as
>
> r <= (UINT64_MAX - c) / 10
>
> which is the same as
>
> (UINT64_MAX - c) / 10 >= r
>
> giving the expression in the if() test. Done.

That's a brilliant explanation!

Ben Bacarisse

unread,
Jan 28, 2022, 9:36:49 PM1/28/22
to
Sure. Just as you'd have to replace UINT64_MAX and so on. Remember you
don't have to write 18446744073709551615. You could use snprintf to put
UINT64_MAX into a static buffer.

--
Ben.

Meredith Montgomery

unread,
Jan 30, 2022, 7:13:02 AM1/30/22
to
Lol. You're totally right. It's totally trivial to solve this problem.
Good to know. I will take notice of this. I value trivial solutions
quite a lot: I had to think hard about that verification and there was
this easy solution on my face all along.

Ben Bacarisse

unread,
Jan 30, 2022, 10:14:04 AM1/30/22
to
There may be pitfalls I've not spotted because I've never seen this done
in anyone else's code.

--
Ben.
0 new messages