Memory allocators and reporting the allocation size

James Harris

unread,

Nov 25, 2022, 8:02:29 AM11/25/22

to

I will get back to you guys on other topics but I need to improve the
memory allocator I use in my compiler and that has led to the following
query.

Imagine an allocator which carries out plain malloc-type allocations
(i.e. specified as a number of 8-bit bytes suitably aligned for any
type) without having to have compatibility with malloc, and in a
language which is not C.

For the sake of discussion the set of calls could be

m_alloc
m_calloc
m_realloc
m_resize
m_free

Would there be any value in giving the programmer a way to find out the
size of a given allocation? I don't mean the size requested but the size
allocated (which would be greater than or equal to the size requested).

I ask because it's simple enough to precede an aligned chunk of memory
with the allocation size. The allocator may well do that anyway in order
to help it manage memory. So there seems to be no good reason to keep
that info from the programmer. The question is over whether there's
value in allowing the programmer to get such info. I am thinking that he
could get the value with a call such as

m_msize(p)

It seems simple enough, potentially useful, and should be harmless. But
it's noticeable that the malloc-type calls don't have anything similar.
So maybe there's a good reason why not.

I thought it might cause implementation issues such as requiring a
larger header if the allocator wants to search only free memory. But
whatever implementation is used I think the allocator will still need to
either store or calculate the size of the memory allocated so I am not
sure I see a problem with the idea.

What do you guys think? Opinions welcome!

--
James Harris

Bart

unread,

Nov 25, 2022, 9:01:12 AM11/25/22

to

The implied assumption here is that the allocator itself needs to
remember the allocated size, presumably because malloc does so.

That is different from how any of my own allocators have worked, where
user programs keep track of the sizes.

The consequence is that when you call a Free function, you need to
provide the original size.

The way malloc works is convenient, especially for larger blocks of
sizes determined at runtime.

But for allocating millions of instances of the same small type T, you
will always know when freeing an instance, that it will have sizeof(T).
No need for the allocator to waste 8 or more bytes (and messing with
alignment) storing that information for an allocation that might not be
much bigger.

David Brown

unread,

Nov 25, 2022, 11:15:08 AM11/25/22

to

On 25/11/2022 14:02, James Harris wrote:
> I will get back to you guys on other topics but I need to improve the
> memory allocator I use in my compiler and that has led to the following
> query.
>
> Imagine an allocator which carries out plain malloc-type allocations
> (i.e. specified as a number of 8-bit bytes suitably aligned for any
> type) without having to have compatibility with malloc, and in a
> language which is not C.
>
> For the sake of discussion the set of calls could be
>
> m_alloc
> m_calloc
> m_realloc
> m_resize
> m_free

In a new language, I would not follow the names from C as closely. For
one thing, it could confuse people - they may think they do the same
thing as in C. For another, "calloc" in particular is a silly name.
And you don't need "realloc" and "resize".

It can make sense to distinguish between "alloc_zeroed" and
"alloc_unintialised", for performance reasons.

>
> Would there be any value in giving the programmer a way to find out the
> size of a given allocation? I don't mean the size requested but the size
> allocated (which would be greater than or equal to the size requested).
>

No. It is pointless.

Why would anyone want to know the result of such a function? The only
conceivable thought would be if they had first allocated space for x
units, and then they now want to store y units - calling "get_real_size"
could let them know if they need to call "resize" or not.

The answer is that they should simply call "resize". If "resize" does
not need to allocate new memory because the real size is big enough, it
does nothing.

> I ask because it's simple enough to precede an aligned chunk of memory
> with the allocation size. The allocator may well do that anyway in order
> to help it manage memory. So there seems to be no good reason to keep
> that info from the programmer. The question is over whether there's
> value in allowing the programmer to get such info. I am thinking that he
> could get the value with a call such as
>
> m_msize(p)
>
> It seems simple enough, potentially useful, and should be harmless. But
> it's noticeable that the malloc-type calls don't have anything similar.
> So maybe there's a good reason why not.
>
> I thought it might cause implementation issues such as requiring a
> larger header if the allocator wants to search only free memory. But
> whatever implementation is used I think the allocator will still need to
> either store or calculate the size of the memory allocated so I am not
> sure I see a problem with the idea.
>
> What do you guys think? Opinions welcome!
>

I think you should not try to copy C's malloc/free mechanism. In
particular, I do not think the memory allocator should track the sizes
of allocations. "free" should always include the size, matching the
value used in "alloc". (And "resize" should pass the old size as well
as the new size.)

Storing the size of allocation was not too bad in earliest days of C,
with simpler processors, no caches, no multi-threading, and greater
concern for simple allocation implementations than for performance or
reliability. Modern allocators no longer work with a simple linked list
storing sizes and link pointers at an address below the address returned
to the user, so why follow a similar interface?

Programs know the size of memory they requested when they allocated it.
They know, almost invariably, the size when they are freeing the
memory. They know the size of the types, and the size of the arrays.
So having the memory allocator store this too is a waste.

Once you've dropped size tracking, you can easily separate allocation
tracking from the memory in use - this gives neater sizes for pools and
better cache and virtual page usage. You can have pools of different
sizes - perhaps 16 byte blocks for the smallest, then 256 byte blocks,
then 4 KB blocks, etc. (You might prefer different sizes, perhaps with
more intermediary sizes.) A pool element would have 64 blocks.
Allocation within a pool is done by quick searches within a single
64-bit map - no need to search through lists.

(Of course real implementations involve a fair bit of complexity and
care, especially for thread safety. But the emphasis must be on cache
friendliness.)

Dmitry A. Kazakov

unread,

Nov 25, 2022, 12:21:11 PM11/25/22

to

On 2022-11-25 17:15, David Brown wrote:
> On 25/11/2022 14:02, James Harris wrote:

> It can make sense to distinguish between "alloc_zeroed" and
> "alloc_unintialised", for performance reasons.

alloc_unmapped/unpaged
alloc_shared (shared memory pool)
alloc_local (thread local pool)
...

> I think you should not try to copy C's malloc/free mechanism.

Right. There should be an abstract memory pool interface instead. C
interface is way too low-level.

A related important issue is interaction between pointers from different
pools and universal pointers applicable to all pools including locally
allocated objects.

For example, can a universal pointer be used to deallocate an object? If
yes, how can it determine the specific pool?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Bart

unread,

Nov 25, 2022, 2:24:00 PM11/25/22

to

On 25/11/2022 16:15, David Brown wrote:
> On 25/11/2022 14:02, James Harris wrote:

>> Would there be any value in giving the programmer a way to find out
>> the size of a given allocation? I don't mean the size requested but
>> the size allocated (which would be greater than or equal to the size
>> requested).
>>
>
> No. It is pointless.
>
> Why would anyone want to know the result of such a function? The only
> conceivable thought would be if they had first allocated space for x
> units, and then they now want to store y units - calling "get_real_size"
> could let them know if they need to call "resize" or not.
>
> The answer is that they should simply call "resize". If "resize" does
> not need to allocate new memory because the real size is big enough, it
> does nothing.

But who knows how efficient 'resize' will be?

With `get_real_size`, you do this once when the block is allocated. Then
you have a simple integer to compare current usage against.

But there is another reason; take a block which will grow a byte at a
time. Apart from needing to call 'resize' for every byte, who knows how
much spare capacity 'resize' will actually provide; it may need to do
real reallocations more often.

This C program:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
char* p;
int n=1;

p=malloc(n);

for(int i=0; i<20000000; ++i) {
++n;
p = realloc(p,n);
*p = 'A;;
}
}

takes 15 seconds to grow a block a byte at a time to 20M bytes. 10M took
a 1/4 or that, so it's O(n-squared).

Contrast that with this /dynamic/ and /interpreted/ loop which grows a
20M character string a character at a time, but using an internal scheme
where it controls and can check capacity more efficiently:

s ::= ""

to 20 million do
s +:= 'A'
od

This takes 0.3 seconds. Growing to 100M takes 1.4 seconds, so a linear
increase; the dumb approach above would take an estimated 6 minutes even
executing optimised C code not bytecode.

Bart

unread,

Nov 25, 2022, 2:46:09 PM11/25/22

to

On 25/11/2022 19:23, Bart wrote:
> On 25/11/2022 16:15, David Brown wrote:

>> The answer is that they should simply call "resize". If "resize" does
>> not need to allocate new memory because the real size is big enough,
>> it does nothing.
>
> But who knows how efficient 'resize' will be?

> This takes 0.3 seconds. Growing to 100M takes 1.4 seconds, so a linear
> increase; the dumb approach above would take an estimated 6 minutes even
> executing optimised C code not bytecode.

A C mock-up of the code used within my interpreter manages to grow a
100MB string in 0.3 seconds, about 1000 times faster than the dumb
realloc approach.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {char* ptr; int length, capacity;} string;

void initstring(string* s) {
s->ptr=malloc(16);
s->length=0;
s->capacity=16;
}

void appendchar(string* s, int c) {
int newcap;
if (s->length >= s->capacity) {
newcap = s->capacity*2;
s->ptr = realloc(s->ptr, newcap);
s->capacity = newcap;
}

*(s->ptr + s->length) = c;
++(s->length);
}

int main(void) {
string s;
initstring(&s);

for(int i=0; i<100000000; ++i) {
appendchar(&s, 'A');
}

appendchar(&s, 0);
printf("%lld\n", strlen(s.ptr));
}

James Harris

unread,

Nov 26, 2022, 6:31:43 AM11/26/22

to

It's an interesting topic. I can think of three really fast memory allocation approaches:

1. sbrk-like
2. stack
3. array of slots, with occupied/vacant essentially indicated by a bitmap.

The sbrk-like approach (taking memory as needed and not returning it) can work well if all allocations can remain until the program terminates.

The stack approach can work well if allocations can be deallocated in reverse order.

The array of slots can work well if the things to allocate are of the same size.

Unfortunately, none of those work well for the dynamic creation and destruction of strings - which is what I need just now. The malloc-type approach is the most flexible. While it's not the fastest it is the most generally useful. The idea of giving the allocator the ability to return the allocation size would be akin to it returning the 'capacity'.

Your suggestion of requiring the caller to remember how large the allocation is is intriguing and it sounds rather like what mmap expects. But for strings how would it work, especially if a string's length is not known in advance? Further, what if the caller requested 100 bytes then told the allocator to free 99 bytes? What if it told the allocator to free 101 bytes?

--
James

Dmitry A. Kazakov

unread,

Nov 26, 2022, 6:57:13 AM11/26/22

to

On 2022-11-26 12:31, James Harris wrote:

> It's an interesting topic. I can think of three really fast memory allocation approaches:
>
> 1. sbrk-like
> 2. stack
> 3. array of slots, with occupied/vacant essentially indicated by a bitmap.

You should really look at existing algorithms and implementations of
memory pools. There is a huge amount of literature and research done on
the subject with different goals set as there is no best memory pool
management algorithm for all purposes.

> Unfortunately, none of those work well for the dynamic creation and destruction of strings - which is what I need just now.

Then memory fragmentation must be your main concern. For strings some
prediction heuristics are used both at the pool level as in the dynamic
string implementation. Usually more memory allocated than needed in
order to anticipate further growth.

BTW, since any reasonable pool implementation provides some pool walking
API as well as allocation/deallocation hooks for debugging purpose,
clearly, the block size and other pool management stuff must be
available anyway.

E.g. see:

https://learn.microsoft.com/en-us/windows/win32/toolhelp/heap-lists-and-heap-walking

James Harris

unread,

Nov 26, 2022, 8:05:47 AM11/26/22

to

On Friday, 25 November 2022 at 16:15:08 UTC, David Brown wrote:
> On 25/11/2022 14:02, James Harris wrote:
> > I will get back to you guys on other topics but I need to improve the
> > memory allocator I use in my compiler and that has led to the following
> > query.
> >
> > Imagine an allocator which carries out plain malloc-type allocations
> > (i.e. specified as a number of 8-bit bytes suitably aligned for any
> > type) without having to have compatibility with malloc, and in a
> > language which is not C.
> >
> > For the sake of discussion the set of calls could be
> >
> > m_alloc
> > m_calloc
> > m_realloc
> > m_resize
> > m_free
> In a new language, I would not follow the names from C as closely. For
> one thing, it could confuse people - they may think they do the same
> thing as in C. For another, "calloc" in particular is a silly name.
> And you don't need "realloc" and "resize".

The idea is that realloc could move the allocation whereas resize could only resize it in place. The latter is not present in C's malloc family.

>
> It can make sense to distinguish between "alloc_zeroed" and
> "alloc_unintialised", for performance reasons.

Agreed, but what exactly would you want for the calls? Maybe

m_alloc(size)
m_alloc_a(size, flags)

?

> >
> > Would there be any value in giving the programmer a way to find out the
> > size of a given allocation? I don't mean the size requested but the size
> > allocated (which would be greater than or equal to the size requested).
> >
> No. It is pointless.
>
> Why would anyone want to know the result of such a function? The only
> conceivable thought would be if they had first allocated space for x
> units, and then they now want to store y units - calling "get_real_size"
> could let them know if they need to call "resize" or not.
>
> The answer is that they should simply call "resize". If "resize" does
> not need to allocate new memory because the real size is big enough, it
> does nothing.

It's partly for performance, as Bart's comments have backed up. If code can find out the capacity then it can fill that allocation up to the stated capacity without having to make any other calls and without risking moving what has been stored so far.

> > I ask because it's simple enough to precede an aligned chunk of memory
> > with the allocation size. The allocator may well do that anyway in order
> > to help it manage memory. So there seems to be no good reason to keep
> > that info from the programmer. The question is over whether there's
> > value in allowing the programmer to get such info. I am thinking that he
> > could get the value with a call such as
> >
> > m_msize(p)
> >
> > It seems simple enough, potentially useful, and should be harmless. But
> > it's noticeable that the malloc-type calls don't have anything similar.
> > So maybe there's a good reason why not.
> >
> > I thought it might cause implementation issues such as requiring a
> > larger header if the allocator wants to search only free memory. But
> > whatever implementation is used I think the allocator will still need to
> > either store or calculate the size of the memory allocated so I am not
> > sure I see a problem with the idea.
> >
> > What do you guys think? Opinions welcome!
> >
> I think you should not try to copy C's malloc/free mechanism. In
> particular, I do not think the memory allocator should track the sizes
> of allocations. "free" should always include the size, matching the
> value used in "alloc". (And "resize" should pass the old size as well
> as the new size.)

Bart said the same and the suggestion surprises me. It seems prone to error.

>
> Storing the size of allocation was not too bad in earliest days of C,
> with simpler processors, no caches, no multi-threading, and greater
> concern for simple allocation implementations than for performance or
> reliability. Modern allocators no longer work with a simple linked list
> storing sizes and link pointers at an address below the address returned
> to the user, so why follow a similar interface?
>
> Programs know the size of memory they requested when they allocated it.
> They know, almost invariably, the size when they are freeing the
> memory. They know the size of the types, and the size of the arrays.
> So having the memory allocator store this too is a waste.

That's sometimes the case but not always. Say I wanted to read a line from a file a byte at a time without knowing in advance how long the line would be (a common-enough requirement but one which C programs all too often fudge by defining a maximum line length). The required allocation could not be known in advance.

There are various specialised allocation techniques but the malloc style is the most general I know of.

>
> Once you've dropped size tracking, you can easily separate allocation
> tracking from the memory in use - this gives neater sizes for pools and
> better cache and virtual page usage. You can have pools of different
> sizes - perhaps 16 byte blocks for the smallest, then 256 byte blocks,
> then 4 KB blocks, etc. (You might prefer different sizes, perhaps with
> more intermediary sizes.) A pool element would have 64 blocks.
> Allocation within a pool is done by quick searches within a single
> 64-bit map - no need to search through lists.

I presume by pools you mean slots of a certain size I may, longer term, let a program specify slot sizes and pool geometry. But that's for the future and it wouldn't work well with variable-length strings, especially where the length is not known in advance. So I need something simple and general for now.

>
> (Of course real implementations involve a fair bit of complexity and
> care, especially for thread safety. But the emphasis must be on cache
> friendliness.)

Agreed.

--
James

James Harris

unread,

Nov 26, 2022, 8:15:04 AM11/26/22

to

On Friday, 25 November 2022 at 17:21:11 UTC, Dmitry A. Kazakov wrote:
> On 2022-11-25 17:15, David Brown wrote:
> > On 25/11/2022 14:02, James Harris wrote:
>
> > It can make sense to distinguish between "alloc_zeroed" and
> > "alloc_unintialised", for performance reasons.
> alloc_unmapped/unpaged
> alloc_shared (shared memory pool)
> alloc_local (thread local pool)
> ...
> > I think you should not try to copy C's malloc/free mechanism.
> Right. There should be an abstract memory pool interface instead. C
> interface is way too low-level.

As I said to David, what about strings, especially if their length is not known in advance? What's your preferred model for manipulating them? The malloc-style calls are not perfect but they do support any such memory handling.

>
> A related important issue is interaction between pointers from different
> pools and universal pointers applicable to all pools including locally
> allocated objects.

David was talking about, AIUI, pools of slots where each slot had the same size. You seem to be talking about pools being 'NUMA pools' where each pool is of bytes but has certain access characteristics such as bandwidth, latency, and contention. Is that right?

>
> For example, can a universal pointer be used to deallocate an object? If
> yes, how can it determine the specific pool?

I am not sure what that means. Presumably addresses can have two parts: NUMA pool and offset within the pool.

--
James

Bart

unread,

Nov 26, 2022, 8:18:54 AM11/26/22

to

How would strings work now with malloc? That still requires the length!

If you have counted strings, then you will know their length to free
them. Otherwise use strlen. But /something/ has to record the length.

You could use special routines just for strings, that say store a length
in the space just before the returned pointer. Then you can also have a
routine to return that length, saving you some hassle. But all you've
done now really is create a mini string library.

>Further, what if the caller requested 100 bytes then told the allocator to free 99 bytes? What if it told the allocator to free 101 bytes?

Then it'll go wrong. You could also pass the wrong pointer to the
deallocator. Or allocate the space in the first place with 99 instead of
100; what stops you doing that?

You can always create a couple of routines on top which will allocate 8
extra bytes to store the requested length, similar to the idea for
strings. This is purely for convenience.

It's also quite easy to create custom allocators for special purposes.
For example if you are dealing with millions of identical small objects
which are being constantly created and destroyed, allocate a large pool
of memory to hold them.

Then all freed objects can be linked together; allocating a new object
is very fast, and deallocating instant: add this object to the head of
the free-list.

You can also deallocate the lot in one go.

Dmitry A. Kazakov

unread,

Nov 26, 2022, 8:46:12 AM11/26/22

to

On 2022-11-26 14:15, James Harris wrote:
> On Friday, 25 November 2022 at 17:21:11 UTC, Dmitry A. Kazakov wrote:
>> On 2022-11-25 17:15, David Brown wrote:
>>> On 25/11/2022 14:02, James Harris wrote:
>>
>>> It can make sense to distinguish between "alloc_zeroed" and
>>> "alloc_unintialised", for performance reasons.
>> alloc_unmapped/unpaged
>> alloc_shared (shared memory pool)
>> alloc_local (thread local pool)
>> ...
>>> I think you should not try to copy C's malloc/free mechanism.
>> Right. There should be an abstract memory pool interface instead. C
>> interface is way too low-level.
>
> As I said to David, what about strings, especially if their length is not known in advance? What's your preferred model for manipulating them? The malloc-style calls are not perfect but they do support any such memory handling.

Usual implementation of a *contiguous* string is that when truncated,
the memory is not reclaimed. When expanded, it is done by some chunks.
For example by 100 bytes. Or by some linear function of current size
etc. On top of that sits reference counting that prevents cloning the
string body when there is only one reference to it.

Note that memory pools apply similar techniques as well. You can check
clib memory pool to see that it allocates more than you actually request
even excluding block management stuff.

I do not know why you care about it. Normally the pools of the language
run-time are backed by the system pool. E.g. Ada's standard memory pool
allocates in clib pool under Linux. Your concern must be allowing
programmer to allocate objects in custom pools in some unified way. E.g.
my components library provides pools shared by multiple processes, pools
persistent between application runs etc. This is possible in Ada because
pointer types there are associated with a pool. So when you write

P := new Employee_Record;

it dispatches according to the type of the pool associated with type of
the pointer P. That calls method Allocate the pool must provide.

>> A related important issue is interaction between pointers from different
>> pools and universal pointers applicable to all pools including locally
>> allocated objects.
>
> David was talking about, AIUI, pools of slots where each slot had the same size.

This one of many techniques to implement a pool, e.g. as a set of lists
of blocks of same size, usually power of two.

> You seem to be talking about pools being 'NUMA pools' where each pool is of bytes but has certain access characteristics such as bandwidth, latency, and contention. Is that right?

Just a pool = heap. E.g. under Windows an application may have a number
of different heaps.

>> For example, can a universal pointer be used to deallocate an object? If
>> yes, how can it determine the specific pool?
>
> I am not sure what that means. Presumably addresses can have two parts: NUMA pool and offset within the pool.

You cannot do that because a universal pointer can point anywhere, to
the thread local storage for example, and because pools are rarely
contiguous in the process address space. Therefore in this model the
universal pointer must be fatter than the machine address:

- address of the pool head
- address of the object

All conversions between pointers must carry, remove, add pool head address.

BGB

unread,

Nov 26, 2022, 5:22:23 PM11/26/22

to

In my allocator designs, I don't spend 8 bytes on remembering the object
size, but rather 8 bits, with the size typically stored as an E5.F3
microfloat style format (sort of like A-law).

This size value is also used as a key to the corresponding free-list
(with all objects in a given free-list being the same size in memory).

Say, object sizes are N*16, where N is encoded in a microfloat:
00..07: 0000..0007
08..0F: 0008..000F
10..17: 0010..001E
18..1F: 0020..003C
20..27: 0040..0078
28..2F: 0080..00F0
30..37: 0100..01E0
38..3F: 0200..03C0
40..47: 0400..0780
48..4F: 0800..0F00
50..57: 1000..1E00
58..5F: 2000..3C00
60..67: 4000..7800
68..6F: 8000..F000
...

So, say, '68' here maps to an object holding 512K, ...

This scheme could theoretically express object sizes up to around 256GB.

Except that by this point, we have switched to spans of pages, which
store a page count rather than expressing their size using an 8-bit
microfloat.

Bart

unread,

Nov 26, 2022, 7:02:53 PM11/26/22

to

I have used a scheme where an 8-bit code represented the allocated size
of an object, as there was no room to store the full 32-bit value.

The actual byte-size was obtained with a lookup table; allocated sizes
started off doubling in size up to a point (32MB?), then increased
linearly, up to a maximum of just under 2GB.

But, within an actual allocator, storing a single byte is going to cause
problems, because it will throw out the alighment; you might need to
round up to 64 bits anyway (on 64-bit systems). Unless you stored these
bytes in a separate location, which then makes it all more complex.

As I said, I currently store zero bytes with each allocation, which has
no effect on alignment!

BGB

unread,

Nov 27, 2022, 2:16:56 AM11/27/22

to

In my case, there is also type-tag and zone-ID information in the object
headers.

Say:
Object size byte;
Object mode/flag byte;
Type-Tag as a 16 bit word;
Zone-ID as a 16-bit word;
...

Heap objects generally have a 16 byte alignment in my case.

For larger objects (heap chunks and page-allocated data), typically the
metadata about the object is stored externally.

I had considered switching to external metadata for medium objects, but
have not done so. For small objects (which are allocated using a cell
bitmap; 16 bytes at a time), there is no good alternative to using
inline headers (typically, the first cell of the object is dedicated as
a header cell).

For some special object types (such as "cons cells") use a separate
allocator mechanism that does not use per-object headers. However, the
size and type of each such object is fixed (for example, all cons cells
are 16 bytes; and currently cons cells may not be assigned to zones).

The "zone" is basically a mechanism where a zone ID can be assigned to
an object, and then all objects with zone ID's matching a given pattern
can be bulk freed.

David Brown

unread,

Nov 27, 2022, 10:18:15 AM11/27/22

to

On 25/11/2022 20:23, Bart wrote:
> On 25/11/2022 16:15, David Brown wrote:
>> On 25/11/2022 14:02, James Harris wrote:
>
>>> Would there be any value in giving the programmer a way to find out
>>> the size of a given allocation? I don't mean the size requested but
>>> the size allocated (which would be greater than or equal to the size
>>> requested).
>>>
>>
>> No. It is pointless.
>>
>> Why would anyone want to know the result of such a function? The only
>> conceivable thought would be if they had first allocated space for x
>> units, and then they now want to store y units - calling
>> "get_real_size" could let them know if they need to call "resize" or not.
>>
>> The answer is that they should simply call "resize". If "resize" does
>> not need to allocate new memory because the real size is big enough,
>> it does nothing.
>
> But who knows how efficient 'resize' will be?
>
> With `get_real_size`, you do this once when the block is allocated. Then
> you have a simple integer to compare current usage against.
>

A "resize" function would always have code equivalent to first calling
"get_real_size" and then seeing if an actual re-allocation and copy is
needed. It would be pointless for user code to duplicate this
functionality. Therefore, it is pointless to have a "get_real_size"
function available to the user code - using it would be less efficient
than simply calling "resize" directly.

> But there is another reason; take a block which will grow a byte at a
> time. Apart from needing to call 'resize' for every byte, who knows how
> much spare capacity 'resize' will actually provide; it may need to do
> real reallocations more often.
>
> This C program:
>
>     #include <stdio.h>
>     #include <stdlib.h>
>
>     int main(void) {
>         char* p;
>         int n=1;
>
>         p=malloc(n);
>
>         for(int i=0; i<20000000; ++i) {
>             ++n;
>             p = realloc(p,n);
>             *p = 'A;;
>         }
>     }
>

Trying to optimise your interface around pathological examples is
madness. Design the interface to be useful for real, practical code.

Bart

unread,

Nov 27, 2022, 10:47:57 AM11/27/22

to

So why do you think realloc is so slow in this context, compared with
how realloc was called in my other post? (Repeated below for convenience.)

That is, the above dumb use of realloc shows O(n^2) behaviour compared
with the version below which is O(n). Because you are saying above that
it should be just as efficient.

People do have memory blocks that can grow by small amounts, or a byte
at a time; how do you think such growth should be implemented, by
millions of calls to realloc for a byte-at-a-time reallocation, or
something a bit cleverer?

It's that 'a bit cleverer' part that was the purpose of 'get_real_size',
except I would go further.

Andy Walker

unread,

Nov 27, 2022, 12:06:03 PM11/27/22

to

On 25/11/2022 16:15, David Brown wrote:

> I think you should not try to copy C's malloc/free mechanism.

Perhaps worth noting that there are two different sorts of "free"
mechanism:

(a) When the /programmer/ decides that some storage is no longed needed
and therefore decides it can be re-used. Obviously, there are some
applications where this is reliable. But in general it is unsafe.
The problems arise when there are graph structures where nodes of
the graph may be pointed at directly or indirectly by virtue of
some pointer accessing a component of the node. This is difficult
programming, and programmers often get it wrong, leading to perhaps
subtle bugs, or else don't bother, leading to poor performance as
storage is retained that could be freed.

(b) When the storage manager for the compiler/library decides that some
storage is no-longer needed. IOW, when garbage collection, or near
equivalent, re-organises storage and perhaps returns some to the
OS. This is also difficult, but is a solved problem so should be
tried and tested for your language and thus reliable.

Summary: You will deduce that my opinion is that systems/languages
that rely on people to write programs that get "free" right are a
disaster waiting to happen You could perhaps regard "free" as a hint
that you expect storage to be garbage-collectible, but the final
decision should always be taken by a proper garbage collector.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Hummel

David Brown

unread,

Nov 27, 2022, 12:12:01 PM11/27/22

to

On 27/11/2022 16:47, Bart wrote:
> On 27/11/2022 15:18, David Brown wrote:

>> Trying to optimise your interface around pathological examples is
>> madness. Design the interface to be useful for real, practical code.
>
> So why do you think realloc is so slow in this context, compared with
> how realloc was called in my other post? (Repeated below for convenience.)
>
> That is, the above dumb use of realloc shows O(n^2) behaviour compared
> with the version below which is O(n). Because you are saying above that
> it should be just as efficient.
>

No - I am saying that using some kind of "get_real_size" gives you nothing.

Only an idiot would call realloc() or anything remotely like that a byte
at a time. (We are talking about low-level languages here - in a high
level language, you might append to a string or list one element at a
time, expecting the language to implement things efficiently.)

> People do have memory blocks that can grow by small amounts, or a byte
> at a time; how do you think such growth should be implemented, by
> millions of calls to realloc for a byte-at-a-time reallocation, or
> something a bit cleverer?
>
> It's that 'a bit cleverer' part that was the purpose of 'get_real_size',
> except I would go further.
>

The code that tracks capacity is vastly cleverer than code that
allocates a byte at a time. "get_real_size" would add /nothing/ to it,
because you already track the capacity - and it is more efficient to do
so directly in your code than you'd have be calling an external
function. Your code is a clear demonstration why "get_real_size" is
/not/ a function that is needed at all.

And in low-level library interfaces like this, an unneeded function is a
liability - it becomes an interface that must be maintained even if it
is no longer relevant, and can hinder implementation changes later. You
don't put functions or functionality in an interface unless they add
real, useful value. This one does not. (Nor does James' "resize"
suggestion, distinct from his "realloc" function.)

Bart

unread,

Nov 27, 2022, 1:19:01 PM11/27/22

to

I use my own memory allocator functions, notably from within my
interpreters.

The interpreter creates object descriptors for flexible data that
includes 'capacity'. So you're right, it's done within the app.

But where does it get that capacity figure from? It's from my memory
allocation functions. (There, the actual allocated size is returned in a
global that the caller of `pcm_alloc` can pick up; not a special function.)

That way the application doesn't itself need to get involved in
allocation strategies or free lists and so on.

That value that is provided is equivalent to that hypothetical
'get_real_size` function.

One difference however is that, being my library, I know the spare
capacities for allocated blocks are generous. So with a third party
library, I would need to know or control that aspect of it.

If not generous enough, I would need to build-in extra capacities within
my application, but even then, I would like to make use of any extra
allocated space there happens to be, and that needs 'get_real_size' or
equivalent.

And, yes, the language implemented with my interpreter can very
efficiently, for interpreted code, grow very large strings a character
at time, or grow any list or array the same way. In my programs that's a
common idiom.

David Brown

unread,

Nov 28, 2022, 2:11:46 AM11/28/22

to

On 26/11/2022 14:05, James Harris wrote:
> On Friday, 25 November 2022 at 16:15:08 UTC, David Brown wrote:
>> On 25/11/2022 14:02, James Harris wrote:
>>> I will get back to you guys on other topics but I need to improve
>>> the memory allocator I use in my compiler and that has led to the
>>> following query.
>>>
>>> Imagine an allocator which carries out plain malloc-type
>>> allocations (i.e. specified as a number of 8-bit bytes suitably
>>> aligned for any type) without having to have compatibility with
>>> malloc, and in a language which is not C.
>>>
>>> For the sake of discussion the set of calls could be
>>>
>>> m_alloc m_calloc m_realloc m_resize m_free
>> In a new language, I would not follow the names from C as closely.
>> For one thing, it could confuse people - they may think they do the
>> same thing as in C. For another, "calloc" in particular is a silly
>> name. And you don't need "realloc" and "resize".
>
> The idea is that realloc could move the allocation whereas resize
> could only resize it in place. The latter is not present in C's
> malloc family.
>

Where would such a function be useful? If the application needs to
expand an object, it needs to expand it - if that means moving it, so be
it. I can't see where you would have use for a function that /might/ be
able to expand an object. Do you have /real/ use-cases in mind? (I
don't mean examples like Bart's attempts at making the silliest code he
can think of.)

>>
>> It can make sense to distinguish between "alloc_zeroed" and
>> "alloc_unintialised", for performance reasons.
>
> Agreed, but what exactly would you want for the calls? Maybe
>
> m_alloc(size) m_alloc_a(size, flags)
>
> ?
>

There could be many different interfaces. Dmitry also pointed out other
possible distinctions for allocation of shared memory, unpageable
memory, and so on. Not all memory is created equal!

Maybe you want a more object-oriented interface, for supporting several
different pools or memory allocation choices.

I don't know if you plan to support overloaded functions, so that you
can use the same name with different parameter numbers and types. That
works well with tag types.

The one thing I would avoid at all costs, in any interfaces, is the
"integer flag" nonsense found in many languages (such as C). If you
really want some kind of flags, then at least be sure your language has
strong type-checked enumerations.

>
>>>
>>> Would there be any value in giving the programmer a way to find
>>> out the size of a given allocation? I don't mean the size
>>> requested but the size allocated (which would be greater than or
>>> equal to the size requested).
>>>
>> No. It is pointless.
>>
>> Why would anyone want to know the result of such a function? The
>> only conceivable thought would be if they had first allocated space
>> for x units, and then they now want to store y units - calling
>> "get_real_size" could let them know if they need to call "resize"
>> or not.
>>
>> The answer is that they should simply call "resize". If "resize"
>> does not need to allocate new memory because the real size is big
>> enough, it does nothing.
>
> It's partly for performance, as Bart's comments have backed up. If
> code can find out the capacity then it can fill that allocation up to
> the stated capacity without having to make any other calls and
> without risking moving what has been stored so far.
>

There is no performance benefit in real code - and you should not care
about silly cases.

>>> I ask because it's simple enough to precede an aligned chunk of
>>> memory with the allocation size. The allocator may well do that
>>> anyway in order to help it manage memory. So there seems to be no
>>> good reason to keep that info from the programmer. The question
>>> is over whether there's value in allowing the programmer to get
>>> such info. I am thinking that he could get the value with a call
>>> such as
>>>
>>> m_msize(p)
>>>
>>> It seems simple enough, potentially useful, and should be
>>> harmless. But it's noticeable that the malloc-type calls don't
>>> have anything similar. So maybe there's a good reason why not.
>>>
>>> I thought it might cause implementation issues such as requiring
>>> a larger header if the allocator wants to search only free
>>> memory. But whatever implementation is used I think the allocator
>>> will still need to either store or calculate the size of the
>>> memory allocated so I am not sure I see a problem with the idea.
>>>
>>> What do you guys think? Opinions welcome!
>>>
>> I think you should not try to copy C's malloc/free mechanism. In
>> particular, I do not think the memory allocator should track the
>> sizes of allocations. "free" should always include the size,
>> matching the value used in "alloc". (And "resize" should pass the
>> old size as well as the new size.)
>
> Bart said the same and the suggestion surprises me. It seems prone to
> error.
>

Your choices with memory management are basically to either trust the
programmer, or do everything automatically with garbage collection.
Even if you use something like C++'s RAII mechanisms for allocating and
deallocating, you still have to trust the programmer to some extent. If
you are relying on the programmer to get their pointers right for calls
to "free", why are you worried that they'll get the size wrong?

>>
>> Storing the size of allocation was not too bad in earliest days of
>> C, with simpler processors, no caches, no multi-threading, and
>> greater concern for simple allocation implementations than for
>> performance or reliability. Modern allocators no longer work with a
>> simple linked list storing sizes and link pointers at an address
>> below the address returned to the user, so why follow a similar
>> interface?
>>
>> Programs know the size of memory they requested when they allocated
>> it. They know, almost invariably, the size when they are freeing
>> the memory. They know the size of the types, and the size of the
>> arrays. So having the memory allocator store this too is a waste.
>
> That's sometimes the case but not always. Say I wanted to read a line
> from a file a byte at a time without knowing in advance how long the
> line would be (a common-enough requirement but one which C programs
> all too often fudge by defining a maximum line length). The required
> allocation could not be known in advance.
>

That is irrelevant. (On a side note, it makes sense to have library
calls that make this kind of common function more convenient.) The
programmer knows the size of any "malloc" calls or "realloc" calls made
- that's the size given back to "free".

> There are various specialised allocation techniques but the malloc
> style is the most general I know of.
>

No, it is not particularly general - it's just relatively simple. And
it is a piss-poor interface for a modern language where you should be
thinking about things like thread-safety, cache friendliness and user
convenience, not picking your interface based on a throw-away example
from an old C book (which is where malloc/free comes from - it was a
simple example, not intended as a real memory allocation implementation).

As Dmitry has pointed out, there are /many/ ways to make allocators, and
none of them are optimal in all cases. Trying to squeeze things into
one very limited interface simply guarantees that you never get the best
choice in your code.

>>
>> Once you've dropped size tracking, you can easily separate
>> allocation tracking from the memory in use - this gives neater
>> sizes for pools and better cache and virtual page usage. You can
>> have pools of different sizes - perhaps 16 byte blocks for the
>> smallest, then 256 byte blocks, then 4 KB blocks, etc. (You might
>> prefer different sizes, perhaps with more intermediary sizes.) A
>> pool element would have 64 blocks. Allocation within a pool is done
>> by quick searches within a single 64-bit map - no need to search
>> through lists.
>
> I presume by pools you mean slots of a certain size I may, longer
> term, let a program specify slot sizes and pool geometry. But that's
> for the future and it wouldn't work well with variable-length
> strings, especially where the length is not known in advance. So I
> need something simple and general for now.
>

Make a bad interface now and you'll be stuck with it.

I still don't understand your targets - either in terms of computer
systems, or expected uses. If the target is modern PC-style systems (so
that you can, like in Bart's languages, assume that you have 64-bit
integers and plenty of memory), then you can make a string format that
has uses, say, 256 byte lumps. Some of that is for pointers and length
counts to handle chains for big strings, and maybe reference counting,
with over 200 bytes free for the string itself. The solid majority of
all strings in practical use will fit in one lump, and you have no
challenges with memory allocators, resizing, or the rest of it. That
simplification will completely outweigh the inefficiency of the wasted
memory. And then for long strings, doing everything in a single C-style
block is inefficient anyway - you'll want some kind of linked chain anyway.

David Brown

unread,

Nov 28, 2022, 5:17:29 AM11/28/22

to

The normal way is for the code to ask the allocator "can I have 1000
bytes?", and when the allocator says "yes, here's a pointer to it" you
say "great - I have a capacity of 1000 bytes".

It seems completely wrong to me for the code then to ask the allocator
"You know I asked for 1000 bytes? What did you /really/ give me?". If
I needed more than 1000 bytes, I'd have asked for more than 1000 bytes.
If I /later/ need more than 1000 bytes, I'll ask for that /later/.

So later on, when I need 1020 bytes instead of 1000, I'll ask "You know
that 1000 byte block you gave me? Can you fiddle things and make it
1020?". I don't mind if the allocator does this by noting that there is
already 1020 space and returning quickly, or by doing a new allocation -
I need these 1020 bytes, or I would not have asked for them.

I cannot see any point in first asking how much space I /really/ have,
before asking for the 1020 bytes. What is the point?

The one thing I would /never/ do is ask originally for 1000 bytes, then
ask how much I got, and then use those extra bytes. That would be lying
to the allocator. When people start doing that kind of thing, it means
they are making undocumented assumptions about implementations - and
that means implementations get stuck with with these limitations, or
user code breaks.

> That way the application doesn't itself need to get involved in
> allocation strategies or free lists and so on.
>
> That value that is provided is equivalent to that hypothetical
> 'get_real_size` function.
>

No, that is not needed in any way. Ask for what you need from the
allocator - that's all.

> One difference however is that, being my library, I know the spare
> capacities for allocated blocks are generous. So with a third party
> library, I would need to know or control that aspect of it.
>
> If not generous enough, I would need to build-in extra capacities within
> my application, but even then, I would like to make use of any extra
> allocated space there happens to be, and that needs 'get_real_size' or
> equivalent.
>
> And, yes, the language implemented with my interpreter can very
> efficiently, for interpreted code, grow very large strings a character
> at time, or grow any list or array the same way. In my programs that's a
> common idiom.
>

That is fine in a high-level language. And it is the /language/ and its
run-time that must handle this efficiently. Typically that will be done
by something similar to the code you wrote, allocating in chunks rather
than byte by byte. This run-time sits between the application code
(where the usage should be as convenient as possible) and the low-level
allocation code (which should be efficient).

Dmitry A. Kazakov

unread,

Nov 28, 2022, 6:34:23 AM11/28/22

to

On 2022-11-28 11:17, David Brown wrote:
> I cannot see any point in first asking how much space I /really/ have,
> before asking for the 1020 bytes. What is the point?

No point, you are right.

However pool hooking facilities usually provide that information. E.g.
you could have an allocation callback with information like actual
address (rather than aligned pointer), actual size, next used block etc.
For example, Linux has an incredibly poorly designed malloc_hook:

https://www.man7.org/linux/man-pages/man3/malloc_hook.3.html

In any case, programmers with special needs must have an ability to
implement their own pools like arenas, mark and release pools etc.

Bart

unread,

Nov 28, 2022, 9:04:28 AM11/28/22

to

On 28/11/2022 10:17, David Brown wrote:

> On 27/11/2022 19:18, Bart wrote:

>> But where does it get that capacity figure from? It's from my memory
>> allocation functions. (There, the actual allocated size is returned in
>> a global that the caller of `pcm_alloc` can pick up; not a special
>> function.)
>>
>
> The normal way is for the code to ask the allocator "can I have 1000
> bytes?", and when the allocator says "yes, here's a pointer to it" you
> say "great - I have a capacity of 1000 bytes".
>
> It seems completely wrong to me for the code then to ask the allocator
> "You know I asked for 1000 bytes? What did you /really/ give me?". If
> I needed more than 1000 bytes, I'd have asked for more than 1000 bytes.
> If I /later/ need more than 1000 bytes, I'll ask for that /later/.
>
> So later on, when I need 1020 bytes instead of 1000, I'll ask "You know
> that 1000 byte block you gave me? Can you fiddle things and make it
> 1020?". I don't mind if the allocator does this by noting that there is
> already 1020 space and returning quickly, or by doing a new allocation -
> I need these 1020 bytes, or I would not have asked for them.
>
> I cannot see any point in first asking how much space I /really/ have,
> before asking for the 1020 bytes. What is the point?

You're never curious about how much spare capacity was actually set
aside for your request? You don't think there is any way to profitably
exploit that?

You don't want to know how much memory is being wasted on your behalf?

It looks like you want to assume the allocator will set aside zero spare
capacity, and you want to offload ALL the work of efficiently allocating
gradually expanding buffers onto the application.

Remember the subject is about /creating/ the allocation library. That
means being able to specify how it will work. I would rather some of
that work was offloaded to the allocator, since the author of the
library may have a better idea of allocation strategies than I do.

> The one thing I would /never/ do is ask originally for 1000 bytes, then
> ask how much I got, and then use those extra bytes.

You use those extra bytes when it transpires you need more.

That would be lying
> to the allocator. When people start doing that kind of thing, it means
> they are making undocumented assumptions about implementations - and
> that means implementations get stuck with with these limitations, or
> user code breaks.

That sounds backwards. In what way is it to making assumptions to
actually ASK what an implementation has reserved?

If you work on the basis that there are zero extra bytes, isn't THAT
making an assumption?

>> And, yes, the language implemented with my interpreter can very
>> efficiently, for interpreted code, grow very large strings a character
>> at time, or grow any list or array the same way. In my programs that's
>> a common idiom.
>>
>
> That is fine in a high-level language. And it is the /language/ and its
> run-time that must handle this efficiently. Typically that will be done
> by something similar to the code you wrote, allocating in chunks rather
> than byte by byte.

No, byte-by-byte or int-by-int or record-by-record is what the
user-program requests. Chunk-by-chunk is what the implementation has to
do to get it efficient.

Dmitry A. Kazakov

unread,

Nov 28, 2022, 10:03:51 AM11/28/22

to

On 2022-11-28 15:04, Bart wrote:

> You don't want to know how much memory is being wasted on your behalf?

https://man7.org/linux/man-pages/man3/mallinfo.3.html

> It looks like you want to assume the allocator will set aside zero spare
> capacity,

That is a reasonable assumption. Note that frequently dynamically
allocated library data types are designed to work with all possible
implementations of pools, e.g. a pool is taken as a parameter of a
generic or OO instance. Therefore the weakest assumption is made.

Bart

unread,

Nov 28, 2022, 11:49:02 AM11/28/22

to

On 28/11/2022 15:03, Dmitry A. Kazakov wrote:
> On 2022-11-28 15:04, Bart wrote:
>
>> You don't want to know how much memory is being wasted on your behalf?
>
> https://man7.org/linux/man-pages/man3/mallinfo.3.html
>
>> It looks like you want to assume the allocator will set aside zero
>> spare capacity,

That doesn't look to useful. It's far more complex than someone would
want, and is not specific to an individual allocation.

>
> That is a reasonable assumption. Note that frequently dynamically
> allocated library data types are designed to work with all possible
> implementations of pools, e.g. a pool is taken as a parameter of a
> generic or OO instance. Therefore the weakest assumption is made.

But also unlikely. Someone one requests 37 bytes; will the allocator
return a pointer to exactly 37 bytes from the pool, so that the next
request is 37 bytes offset from the last?

This would make that next allocation misaligned.

Calls to MS' VirtualAlloc are rounded up to a multiple of 4KB for fresh
allocations.

Calls to my malloc() just appear to do basic rounding, although this
also has to accommodate size info, but two successive calls to malloc(1)
produces addresses that differ by +32 bytes.

Calls my own routines, implemented on top of malloc(), round to powers
of two for sizes of 16 bytes to 33MB (this was created with my
interpreter in mind).

Three very different behaviours. So it makes sense not to assume, but to
ask.

David Brown

unread,

Nov 28, 2022, 11:50:38 AM11/28/22

to

On 28/11/2022 16:03, Dmitry A. Kazakov wrote:
> On 2022-11-28 15:04, Bart wrote:
>
>> You don't want to know how much memory is being wasted on your behalf?
>
> https://man7.org/linux/man-pages/man3/mallinfo.3.html
>
>> It looks like you want to assume the allocator will set aside zero
>> spare capacity,
>
> That is a reasonable assumption. Note that frequently dynamically
> allocated library data types are designed to work with all possible
> implementations of pools, e.g. a pool is taken as a parameter of a
> generic or OO instance. Therefore the weakest assumption is made.
>

No, the weakest assumption (which is the one made by most programmers)
is that you do not know anything about how much wasted or spare capacity
there might be - you assume only that the allocation function does what
it says it will do. So when you ask for 100 bytes, you assume nothing
more than that you have 100 bytes. You don't assume you might have
spare memory, or wasted memory - but you also do not assume that there
is zero extra or wasted memory. You assume nothing at all, and rely
solely on the specification of the allocation functions.

David Brown

unread,

Nov 28, 2022, 11:50:40 AM11/28/22

to

I haven't been particularly curious about that, no. And no, I really do
not think there is any way to make use of the information - the benefits
would, at most, be absolutely miniscule.

There are, of course, times when you really want to count each byte and
track things in detail - I do that regularly in my line of work. But
the answer there is to avoid dynamic memory as much as you possibly can,
and often to use specific memory allocation routines designed for the
particular task in hand. In cases where I might use malloc and free
from the standard library, I really don't care how many extra bytes are
allocated.

> You don't want to know how much memory is being wasted on your behalf?
>

When I need to know, I know - and not through a pointless
"get_real_size" function, but by making sure I have an appropriate
memory allocation system for the job. When I have a big system - say,
an embedded Linux board with 2 GB of memory - why would I care if an
allocation is wasting 8 bytes or 16 bytes? How could it possibly make a
difference in real life? You need to be doing /millions/ of allocations
to have a waste that is possibly worth measuring. (And if you are doing
millions of allocations, you won't be doing it with malloc - you'll be
using a specialised allocator optimised for the use-case.)

> It looks like you want to assume the allocator will set aside zero spare
> capacity, and you want to offload ALL the work of efficiently allocating
> gradually expanding buffers onto the application.
>

No, I did not say that.

If you are writing low-level code with a low-level language, then you
will usually want to do that - only the application programmer knows the
patterns he/she will need, and can pick the ideal sizes for buffers and
increases in allocation size. (Do you want to double each realloc? Add
100 bytes each time? The allocator doesn't know.) As you have shown in
sample code, tracking the capacity in application code is not hard.

If you are writing high-level code in a high-level language, the
language run-time libraries will handle these details using heuristics
to give an okay result in a wide range of cases.

As for how much extra space gets allocated by a standard library
allocator, I do not care (other than assuming it is not an unreasonable
waste - I assume that people writing standard libraries for a given
platform write appropriate code for that platform).

Seriously - if you want 1000 bytes, you ask malloc for 1000 bytes. You
don't ask for 990 bytes and assume it will give you at least 10 bytes
extra. You ask for what you need.

> Remember the subject is about /creating/ the allocation library. That
> means being able to specify how it will work. I would rather some of
> that work was offloaded to the allocator, since the author of the
> library may have a better idea of allocation strategies than I do.
>

Did you miss the bit where I explained, several times, about how the
implementation of "realloc" could use the extra space if it happens to
be enough to satisfy the new request? In such cases, realloc will be
extremely fast.

>> The one thing I would /never/ do is ask originally for 1000 bytes,
>> then ask how much I got, and then use those extra bytes.
>
> You use those extra bytes when it transpires you need more.
>

No, when you need more, you ask for more. If there are enough extra
bytes, "realloc" will return immediately. You don't need user code to
faff around with "get_real_size" - it's just more work for the
programmer, more opportunities for errors, probably slower, and more
limiting for implementation of the allocation system.

> That would be lying
>> to the allocator. When people start doing that kind of thing, it
>> means they are making undocumented assumptions about implementations -
>> and that means implementations get stuck with with these limitations,
>> or user code breaks.
>
> That sounds backwards. In what way is it to making assumptions to
> actually ASK what an implementation has reserved?
>

You are assuming particular characteristics of the allocator that are
inappropriate for a general-purpose memory allocator. All a low-level
general allocator needs to do is have a function to allocate memory, a
function to release memory, and optionally (but usefully) a function to
increase an allocation efficiently. Every function you include beyond
that, every detail in specifications, is a limitation on the flexibility
of the allocator.

Here's an example.

Suppose I make an allocator that allocates everything in 128-byte
blocks, tracked with bitmaps. You ask for 800 bytes - I give you that
by allocating 7 blocks, for a total of 896 bytes. You now want to
increase the size of the allocation, ideally to 1000 bytes. You call
the "get_real_size" function and find there are 96 bytes extra. What is
your code going to do now? Maybe it thinks getting more than 896 bytes
is going to be slow, and makes do with just 896 bytes. But that could
be wrong - perhaps my allocator happens to have the next block free, so
increasing to 1000 bytes would be almost instant - just a check and an
OR on a bitmap. Maybe it thinks it really needs the 1000 bytes, in
which case the call to "get_real_size" is completely wasted. The
information from "get_real_size" is /useless/. It gives you nothing.

But what is the cost to /me/, the implementer of the allocation system?
If I have made a "get_real_size" function, and let you use it to get
extra space, I have to stand by that - if I've told you that you really
got 896 bytes, then I can't re-use the extra 96 bytes in any way.
However, if I /don't/ give you such a function, then those 96 bytes are
still mine. I can make version 2 of the library, with a sub-block
allocator so that when you ask for 10 bytes, you get it from there. The
very act of providing a "get_real_size" function guarantees that those
96 bytes are wasted and cannot be used.

> If you work on the basis that there are zero extra bytes, isn't THAT
> making an assumption?
>

No one is assuming that there are zero extra bytes.

>>> And, yes, the language implemented with my interpreter can very
>>> efficiently, for interpreted code, grow very large strings a
>>> character at time, or grow any list or array the same way. In my
>>> programs that's a common idiom.
>>>
>>
>> That is fine in a high-level language. And it is the /language/ and
>> its run-time that must handle this efficiently. Typically that will
>> be done by something similar to the code you wrote, allocating in
>> chunks rather than byte by byte.
>
> No, byte-by-byte or int-by-int or record-by-record is what the
> user-program requests. Chunk-by-chunk is what the implementation has to
> do to get it efficient.
>

No, that is not what user programs request for explicit low-level memory
allocation. Programs allocate in lumps if they can, at least for memory
buffers that might be realloc'ed (which is what we are talking about here).

Dmitry A. Kazakov

unread,

Nov 28, 2022, 11:56:04 AM11/28/22

to

On 2022-11-28 17:50, David Brown wrote:
> On 28/11/2022 16:03, Dmitry A. Kazakov wrote:
>> On 2022-11-28 15:04, Bart wrote:
>>
>>> You don't want to know how much memory is being wasted on your behalf?
>>
>> https://man7.org/linux/man-pages/man3/mallinfo.3.html
>>
>>> It looks like you want to assume the allocator will set aside zero
>>> spare capacity,
>>
>> That is a reasonable assumption. Note that frequently dynamically
>> allocated library data types are designed to work with all possible
>> implementations of pools, e.g. a pool is taken as a parameter of a
>> generic or OO instance. Therefore the weakest assumption is made.
>
> No, the weakest assumption (which is the one made by most programmers)
> is that you do not know anything about how much wasted or spare capacity
> there might be - you assume only that the allocation function does what
> it says it will do.

Yes, this is what I meant. One cannot rely on any internal heuristics.

So when you ask for 100 bytes, you assume nothing
> more than that you have 100 bytes. You don't assume you might have
> spare memory, or wasted memory - but you also do not assume that there
> is zero extra or wasted memory. You assume nothing at all, and rely
> solely on the specification of the allocation functions.

Yes, that is weakest precondition: only the requested space was allocated.

Dmitry A. Kazakov

unread,

Nov 28, 2022, 12:25:06 PM11/28/22

to

On 2022-11-28 17:48, Bart wrote:
> On 28/11/2022 15:03, Dmitry A. Kazakov wrote:
>> On 2022-11-28 15:04, Bart wrote:
>>
>>> You don't want to know how much memory is being wasted on your behalf?
>>
>> https://man7.org/linux/man-pages/man3/mallinfo.3.html
>>
>>> It looks like you want to assume the allocator will set aside zero
>>> spare capacity,
>
> That doesn't look to useful. It's far more complex than someone would
> want, and is not specific to an individual allocation.
>
>>
>> That is a reasonable assumption. Note that frequently dynamically
>> allocated library data types are designed to work with all possible
>> implementations of pools, e.g. a pool is taken as a parameter of a
>> generic or OO instance. Therefore the weakest assumption is made.
>
>
> But also unlikely. Someone one requests 37 bytes; will the allocator
> return a pointer to exactly 37 bytes from the pool, so that the next
> request is 37 bytes offset from the last?

Could be.

> This would make that next allocation misaligned.

If 1 was the alignment requested. The minimal interface of Allocate is

- Size in storage elements
- Alignment

Note that pools are implemented in a huge variety of ways.

> Calls to MS' VirtualAlloc are rounded up to a multiple of 4KB for fresh
> allocations.

VirtualAlloc is not a memory pool allocator. It is a part of paged
memory API.

Bart

unread,

Nov 28, 2022, 12:54:15 PM11/28/22

to

On 28/11/2022 16:50, David Brown wrote:
> On 28/11/2022 15:04, Bart wrote:

>> You don't want to know how much memory is being wasted on your behalf?
>>
>
> When I need to know, I know - and not through a pointless
> "get_real_size" function, but by making sure I have an appropriate
> memory allocation system for the job. When I have a big system - say,
> an embedded Linux board with 2 GB of memory - why would I care if an
> allocation is wasting 8 bytes or 16 bytes? How could it possibly make a
> difference in real life? You need to be doing /millions/ of allocations
> to have a waste that is possibly worth measuring. (And if you are doing
> millions of allocations, you won't be doing it with malloc - you'll be
> using a specialised allocator optimised for the use-case.)

My malloc() seems to use a minimum of 32 bytes per allocation. And yes
people /do/ use malloc() without considering possible inefficiencies.

(One of my compilers uses my own allocator library. If I switch to using
malloc() directly, compilation speed is 80% slower.

And, on the same test, memory usage seems to be 20% higher (0.6GB vs
0.5GB, but I can't measure accurately, because, well, there is no way to
find out exactly how much malloc is using.)

>> Remember the subject is about /creating/ the allocation library. That
>> means being able to specify how it will work. I would rather some of
>> that work was offloaded to the allocator, since the author of the
>> library may have a better idea of allocation strategies than I do.
>>
>
> Did you miss the bit where I explained, several times, about how the
> implementation of "realloc" could use the extra space if it happens to
> be enough to satisfy the new request? In such cases, realloc will be
> extremely fast.

I'm sorry but I don't trust it to be fast. See how much longer malloc()
took on my test above; nearly 50% of all runtime was spent inside
malloc(); why is it so much slower?

>> That sounds backwards. In what way is it to making assumptions to
>> actually ASK what an implementation has reserved?

> Suppose I make an allocator that allocates everything in 128-byte
> blocks, tracked with bitmaps. You ask for 800 bytes - I give you that
> by allocating 7 blocks, for a total of 896 bytes. You now want to
> increase the size of the allocation, ideally to 1000 bytes. You call
> the "get_real_size" function and find there are 96 bytes extra. What is
> your code going to do now? Maybe it thinks getting more than 896 bytes
> is going to be slow, and makes do with just 896 bytes. But that could
> be wrong - perhaps my allocator happens to have the next block free, so
> increasing to 1000 bytes would be almost instant - just a check and an
> OR on a bitmap. Maybe it thinks it really needs the 1000 bytes, in
> which case the call to "get_real_size" is completely wasted. The
> information from "get_real_size" is /useless/. It gives you nothing.
>
> But what is the cost to /me/, the implementer of the allocation system?
> If I have made a "get_real_size" function, and let you use it to get
> extra space, I have to stand by that - if I've told you that you really
> got 896 bytes, then I can't re-use the extra 96 bytes in any way.
> However, if I /don't/ give you such a function, then those 96 bytes are
> still mine. I can make version 2 of the library, with a sub-block
> allocator so that when you ask for 10 bytes, you get it from there. The
> very act of providing a "get_real_size" function guarantees that those
> 96 bytes are wasted and cannot be used.

It seems simple to me. If the allocator plans on using any reserved
space, then get_real-size returns the exact requested, no extra.

If it includes any extra on version 1, then on version 2 the behaviour
of get_real_size can change to exclude it.

The fact is that MY allocator DOES have the equivalent of
`get_real_size`. (However, because it does not record block sizes, the
info is put into a global - it could have been a function - that the
caller must make use of before the next allocation. Or I can change the
API so that the allocator returns a second pieces of information.)

This works extremely well.

So why is it that it can't work on any of the allocators you use? Or any
that someone else might create? Maybe I should rewrite mine because
according to use, it cannot work and cannot give any benefits!

I think perhaps you're getting confused about what goes into the
allocation library, what goes into the language implementation or
runtime, and what goes into the user's application.

>> No, byte-by-byte or int-by-int or record-by-record is what the
>> user-program requests. Chunk-by-chunk is what the implementation has
>> to do to get it efficient.
>>
>
> No, that is not what user programs request for explicit low-level memory
> allocation. Programs allocate in lumps if they can, at least for memory
> buffers that might be realloc'ed (which is what we are talking about here).

It depends on the language. One that performs the allocations for you
(which is most of them now), especially allocations that can grow, will
have user-programs write to an array element at a time, and the
implemention, if it allows that, will need to do so efficiently.

And that means not leaving it 100% to the allocation library.

Bart

unread,

Nov 28, 2022, 12:57:08 PM11/28/22

to

On 28/11/2022 17:25, Dmitry A. Kazakov wrote:

> On 2022-11-28 17:48, Bart wrote:

>
>> This would make that next allocation misaligned.
>
> If 1 was the alignment requested. The minimal interface of Allocate is
>
> - Size in storage elements
> - Alignment
>
> Note that pools are implemented in a huge variety of ways.

Returning pointers which are not alligned to at least 64 bits would be
very poor, if there is no way to control that.

>> Calls to MS' VirtualAlloc are rounded up to a multiple of 4KB for
>> fresh allocations.
>
> VirtualAlloc is not a memory pool allocator. It is a part of paged
> memory API.
>

I don't care how it works. I want N bytes, it will return a pointer N
bytes, just like any other allocator. (I have to use this one when I
need executable memory.)

David Brown

unread,

Nov 28, 2022, 2:01:28 PM11/28/22

to

On 28/11/2022 18:54, Bart wrote:
> On 28/11/2022 16:50, David Brown wrote:
>> On 28/11/2022 15:04, Bart wrote:
>
>>> You don't want to know how much memory is being wasted on your behalf?
>>>
>>
>> When I need to know, I know - and not through a pointless
>> "get_real_size" function, but by making sure I have an appropriate
>> memory allocation system for the job. When I have a big system - say,
>> an embedded Linux board with 2 GB of memory - why would I care if an
>> allocation is wasting 8 bytes or 16 bytes? How could it possibly make
>> a difference in real life? You need to be doing /millions/ of
>> allocations to have a waste that is possibly worth measuring. (And if
>> you are doing millions of allocations, you won't be doing it with
>> malloc - you'll be using a specialised allocator optimised for the
>> use-case.)
>
> My malloc() seems to use a minimum of 32 bytes per allocation. And yes
> people /do/ use malloc() without considering possible inefficiencies.

Of course they do. People use all sorts of things in their programming
without considering possible inefficiencies - this is known as "using
the right level of abstraction". You always have a balance between
maximum control to get the most efficient results (assuming - usually
incorrectly - that the programmer knows enough to get optimal results),
or relinquishing control for easier and faster development. Do you
think people should always worry about efficiency before every call to
"printf"? Or concern themselves about how their loop structures will
interact with the cpu's branch prediction? Of course not. /Some/
people, for certain code, need to worry about that kind of thing. But
for most programmers and most situations, the correct choice is to use
the simplest source code and not worry.

The same applies to memory allocation. If you are worrying about a
possible inefficiency of 32 bytes in a memory allocation, you are doing
things /wrong/.

(Note that implementing a language interpreter or run-time library is
one of these few times when you might want to concern yourself a little
about such efficiencies - but still not /too/ much. And we are
discussing the use of these facilities in user code, rather than the
actual implementations.)

>
> (One of my compilers uses my own allocator library. If I switch to using
> malloc() directly, compilation speed is 80% slower.
>
> And, on the same test, memory usage seems to be 20% higher (0.6GB vs
> 0.5GB, but I can't measure accurately, because, well, there is no way to
> find out exactly how much malloc is using.)
>

There is almost no conceivable situation where a program will be in any
measurable way better by using 0.5 GB instead of 0.6 GB.

>
>>> Remember the subject is about /creating/ the allocation library. That
>>> means being able to specify how it will work. I would rather some of
>>> that work was offloaded to the allocator, since the author of the
>>> library may have a better idea of allocation strategies than I do.
>>>
>>
>> Did you miss the bit where I explained, several times, about how the
>> implementation of "realloc" could use the extra space if it happens to
>> be enough to satisfy the new request? In such cases, realloc will be
>> extremely fast.
>
> I'm sorry but I don't trust it to be fast. See how much longer malloc()
> took on my test above; nearly 50% of all runtime was spent inside
> malloc(); why is it so much slower?
>

You are missing the point - deliberately, perhaps? A good "realloc"
function will be extremely fast for cases where the "real" allocated
size is enough for the new request. If it is not enough, then it has to
do a new malloc() and then copy over the data - this will take time.
(If you are doing this on Windows and using MS's C runtime DLL's, then
/everything/ is slow because calling DLL functions has significant
overhead.)

It is also very common to have "local" allocators for small memory
needs. Such allocators will be faster than OS-based ones, because they
don't have to get memory from the OS, track ownership, handle multiple
threads safely, etc.

Making it useless.

> The fact is that MY allocator DOES have the equivalent of
> `get_real_size`. (However, because it does not record block sizes, the
> info is put into a global - it could have been a function - that the
> caller must make use of before the next allocation. Or I can change the
> API so that the allocator returns a second pieces of information.)
>
> This works extremely well.
>
> So why is it that it can't work on any of the allocators you use? Or any
> that someone else might create? Maybe I should rewrite mine because
> according to use, it cannot work and cannot give any benefits!
>
> I think perhaps you're getting confused about what goes into the
> allocation library, what goes into the language implementation or
> runtime, and what goes into the user's application.
>

No, I am not confused. And I fully agree that you should handle
different aspects of allocations at different levels of a language
system - I have said so several times.

But I believe I am running out of new ways to explain to you the
pointlessness of a "get_real_size" function in user code. It is worse
than useless.

Bart

unread,

Nov 28, 2022, 2:41:18 PM11/28/22

to

If you are not worried about using 32 bytes instead of 16 or 8, then
/you/ are getting it wrong. Why even bother using a lower level
language; just use Python.

> (Note that implementing a language interpreter or run-time library

And a compiler. And an image processing library. And ... pretty much
everything which is computationally intensive.

>> (One of my compilers uses my own allocator library. If I switch to
>> using malloc() directly, compilation speed is 80% slower.
>>
>> And, on the same test, memory usage seems to be 20% higher (0.6GB vs
>> 0.5GB, but I can't measure accurately, because, well, there is no way
>> to find out exactly how much malloc is using.)
>>
>
> There is almost no conceivable situation where a program will be in any
> measurable way better by using 0.5 GB instead of 0.6 GB.

Pointlessly using 20% more memory is fine? Except perhaps where the
memory is already full.

My own non-optimising compilers are rubbish because they generate code
that runs at half the speed of gcc's. But using a memory allocator that
slows down programs by that same amount is no problem!

> you are doing this on Windows and using MS's C runtime DLL's, then
> /everything/ is slow because calling DLL functions has significant
> overhead.)

No, it doesn't. On x64 under Windows, a DLL call has to execute one
extra instruction compared with a direct call, and even that could be
fixed up properly, except the overhead is too insignificant to measure.

(Really insignificant this time, not 80% more runtime and 20% more memory!)

>
> It is also very common to have "local" allocators for small memory
> needs. Such allocators will be faster than OS-based ones, because they
> don't have to get memory from the OS, track ownership, handle multiple
> threads safely, etc.

I'm sure malloc doesn't need to ask Windows for every tiny allocation.

> But I believe I am running out of new ways to explain to you the
> pointlessness of a "get_real_size" function in user code. It is worse
> than useless.

And I'm running out of ways to convince you that in my library, it works.

Remember, yet again, that this is about building a new library where
spare allocation capacitities can be a new feature.

Dmitry A. Kazakov

unread,

Nov 28, 2022, 2:59:51 PM11/28/22

to

On 2022-11-28 18:57, Bart wrote:
> On 28/11/2022 17:25, Dmitry A. Kazakov wrote:
>> On 2022-11-28 17:48, Bart wrote:
>
>>
>>> This would make that next allocation misaligned.
>>
>> If 1 was the alignment requested. The minimal interface of Allocate is
>>
>> - Size in storage elements
>> - Alignment
>>
>> Note that pools are implemented in a huge variety of ways.
>
> Returning pointers which are not alligned to at least 64 bits would be
> very poor, if there is no way to control that.

Rubbish, if I allocate an array of bits, there is no reason to align it.

>>> Calls to MS' VirtualAlloc are rounded up to a multiple of 4KB for
>>> fresh allocations.
>>
>> VirtualAlloc is not a memory pool allocator. It is a part of paged
>> memory API.
>
> I don't care how it works.

Again, it is NOT a memory pool allocator. Do not use the toilet cleaner
to brush your teeth!

Bart

unread,

Nov 28, 2022, 4:02:35 PM11/28/22

to

On 28/11/2022 19:59, Dmitry A. Kazakov wrote:
> On 2022-11-28 18:57, Bart wrote:
>> On 28/11/2022 17:25, Dmitry A. Kazakov wrote:
>>> On 2022-11-28 17:48, Bart wrote:
>>
>>>
>>>> This would make that next allocation misaligned.
>>>
>>> If 1 was the alignment requested. The minimal interface of Allocate is
>>>
>>> - Size in storage elements
>>> - Alignment
>>>
>>> Note that pools are implemented in a huge variety of ways.
>>
>> Returning pointers which are not alligned to at least 64 bits would be
>> very poor, if there is no way to control that.
>
> Rubbish, if I allocate an array of bits, there is no reason to align it.

Suppose it isn't an array of bits? The allocators we're talking about
don't know anything about what types are to be stored in them.

And if it was an array of bits, they would still benefit greatly from
being 8-byte aligned, since operations such as clear, copy, compare etc
can be done more efficiently.

(I implement bit arrays and bit-sets, as sets of so many 64-bit words.
It is only views of bit-slices that may start off in the middle of a word.

I also implement arrays of 2 and 4 bits, and those additionally need
elements aligned within a byte, even for slices.)

>>>> Calls to MS' VirtualAlloc are rounded up to a multiple of 4KB for
>>>> fresh allocations.
>>>
>>> VirtualAlloc is not a memory pool allocator. It is a part of paged
>>> memory API.
>>
>> I don't care how it works.
>
> Again, it is NOT a memory pool allocator. Do not use the toilet cleaner
> to brush your teeth!

I don't even know what that is. My applications just want a memory block
allocated for them.

Can I write to data within the block? Can I read from it? (Can I execute
in some cases?) Can I free it?

Ergo it is the same as any other allocator as far as an application is
concerned.

BGB

unread,

Nov 28, 2022, 4:09:41 PM11/28/22

to

Great problem with GC is that there isn't really any good way to do it
in a way that does not have adverse effects on either performance and/or
latency.

These issues tend to make it a deal-breaker for things like hard real
time applications (such as motor-controls or reading data from optical
or magnetic sensors, ...). In these applications, typically latency
requirements are measured in terms of microseconds.

These sorts of requirements mostly exclude GC, and also most languages
with requirements much higher than those needed to run C (one is often
dealing with CPUs with clock-speeds measured in MHz and RAM sizes
measured in kB).

For user facing applications (on a PC class CPU), GC may be more
acceptable, though often the latency still needs to be kept within the
low millisecond territory. Say, for example, one is trying to update the
screen at 60fps or similar, and any unexpected delay over 5ms may risk
disrupting the frame-rate.

The challenge then is having a GC that is both effective and can
reliably keep delays under 1-5 ms.

But, say, if one wants a language that can, say:
Be used to implement an OS kernel;
Be used for real-time motor controls or similar;
...

There isn't a whole lot of choice here.

One is mostly limited to things like malloc/free and a zone allocator
(major common example of the latter being the original Doom engine, 1).

*1: One can allocate memory with a "zone tag" giving a category as to
what part of the engine it applies to. When doing a task such as a map
change, any objects allocated in a zone below that of the current map
can be freed in bulk.

One other option being "automatic" allocate and free, where the memory
allocation can be tied to the stack frame where it was allocated, so
when the parent function returns the associated memory is freed implicitly.

In my C compiler, I had used a similar sort of mechanism for "alloca"
and also VLAs, where they are heap allocated, but automatically freed
when the caller returns. In this case, the compiler+runtime maintains a
linked list (for any functions which call "alloca()" or similar), and
then frees any objects held in this list when this function returns.

My BGBScript2 language uses a similar scheme, just with "new":
new Type; //normal new, heap allocated.
new! Type; //automatic, destroyed when parent function returns
new(ZTag) Type; //heap allocated, assigned to a zone.

The memory objects also maintain a type-tag, so (given a pointer) it is
possible to determine what type of object it is (this type-tag is
generally also present in the high bits of the pointer).

Though, for arrays, the situation gets a little more complicated.

Ironically, despite in my current project it having initially seemed as
if C had "won the war", in some of my more recent stuff BS2 is starting
to seemingly "claw its way back into relevance" (mostly in corners that
C does not address all that effectively). BS2 had been mostly "on the
back burner" for most of my current project (whereas BS and BS2 had
played a bigger role in my 3D engine projects, for mostly "bare metal"
programming on a custom CPU ISA, it faces more direct competition mostly
from C and ASM).

Though, the situation in this case is a little different than for
traditional HLLs, in that both C and BS2 share the same compiler and ABI
(so actually most of the difference is more in terms of syntax and
language rules).

In past incarnations (on x86-64 and similar), both BS and BS2 had
typically run in VMs (which, granted, do not share the C ABI).

In the BS2 case, the 'native' keyword was used for a mechanism along
similar lines to P/Invoke in C# (which was at least slightly less
terrible than something like the JVM's JNI or similar). There was also a
partial wrapper-glue tool to be able to export functions from BS2 to C.

No glue generation is needed for BGBCC, where in this case 'native' is
mostly understood as "use the bare C ABI here" (so in this case
functions more like 'extern "C"' in C++).

Though, at present, BGBCC is only targeting my own ISA (my BJX2 ISA).

Had considered an attempt to back-port my compiler to be able to target
x86-64, but this has gone nowhere thus far (not a huge incentive to do
so at the moment).

I did at one point attempt limited code generation on ARMv7 / Thumb2
(though, in this case, by using a partial fork of my BJX2 emulator as a
sort of AOT), but results were "not promising" (performance was
"extremely bad" if compared with native GCC output).

Other more specialized code-generation attempts were not promising, as
it seems almost like some sort of "magic ingredient" is needed to get
acceptable performance out of ARMv7.

Though, ARMv7's relatively cramped register space does result in codegen
output that consists almost entirely of a wall of load and store
operations from all the register spill and fill (and GCC seems to do a
somewhat better job here).

...

Dmitry A. Kazakov

unread,

Nov 28, 2022, 4:17:07 PM11/28/22

to

On 2022-11-28 22:02, Bart wrote:
> On 28/11/2022 19:59, Dmitry A. Kazakov wrote:
>> On 2022-11-28 18:57, Bart wrote:
>>> On 28/11/2022 17:25, Dmitry A. Kazakov wrote:
>>>> On 2022-11-28 17:48, Bart wrote:
>>>
>>>>
>>>>> This would make that next allocation misaligned.
>>>>
>>>> If 1 was the alignment requested. The minimal interface of Allocate is
>>>>
>>>> - Size in storage elements
>>>> - Alignment
>>>>
>>>> Note that pools are implemented in a huge variety of ways.
>>>
>>> Returning pointers which are not alligned to at least 64 bits would
>>> be very poor, if there is no way to control that.
>>
>> Rubbish, if I allocate an array of bits, there is no reason to align it.
>
> Suppose it isn't an array of bits? The allocators we're talking about
> don't know anything about what types are to be stored in them.

Not we. I am talking about a reasonable pool interface.

> And if it was an array of bits, they would still benefit greatly from
> being 8-byte aligned, since operations such as clear, copy, compare etc
> can be done more efficiently.

No, because if packed bit-array supports slicing, all operations must
expect bit-aligned data anyway.

Bart

unread,

Nov 28, 2022, 4:33:59 PM11/28/22

to

On 28/11/2022 21:17, Dmitry A. Kazakov wrote:

> On 2022-11-28 22:02, Bart wrote:

>> And if it was an array of bits, they would still benefit greatly from
>> being 8-byte aligned, since operations such as clear, copy, compare
>> etc can be done more efficiently.
>
> No, because if packed bit-array supports slicing, all operations must
> expect bit-aligned data anyway.

The original bit-array that is heap-allocated will be u64-aligned,
always. An allocator is never going to start such a thing 2/3 through a
word; that would be totally ludicrous.

When you take a slice, it will be of a bit-array that already exists, so
that allocation doesn't come it it.

But the slice takes the form of a BYTE-pointer and a bit offset;
pointers to individual bits don't exist on current hardware.

So first bit of the bit-slice may not start at bit 0 of the
byte-pointer; it is not aligned at the bit-level.

(My bit-arrays have elements of 1/2/4 bits, which are aligned within qa
byte such that an 2/4-bit element will not straddle a byte.)

Dmitry A. Kazakov

unread,

Nov 29, 2022, 2:45:35 AM11/29/22

to

On 2022-11-28 22:34, Bart wrote:
> On 28/11/2022 21:17, Dmitry A. Kazakov wrote:
>> On 2022-11-28 22:02, Bart wrote:
>
>>> And if it was an array of bits, they would still benefit greatly from
>>> being 8-byte aligned, since operations such as clear, copy, compare
>>> etc can be done more efficiently.
>>
>> No, because if packed bit-array supports slicing, all operations must
>> expect bit-aligned data anyway.
>
> The original bit-array that is heap-allocated will be u64-aligned,
> always.

I have several specialized heap implementations that do not impose
additional alignment.

It is a pointless discussion, really.

David Brown

unread,

Nov 29, 2022, 3:10:52 AM11/29/22

to

Let me say it again. If you are worried about possible 32 byte
inefficiencies in a memory allocation, you are doing things /wrong/.

Understand the logic of that sentence.

Perhaps you have plenty of resources - PC programming. A mere 32 bytes
waste is irrelevant, and you should be spending your brain cells on
things that are actually important. (And yes, you might well be better
off in a higher level language.)

If you /don't/ have plenty of resources, such as small-systems embedded
programming, or you are make so many allocations that a waste of up to
32 bytes per allocation is relevant, then you should not be using that
general-purpose allocator. You should be using a system where you
/know/ how much, or how little, is wasted, and where you know the waste
is small enough that it is not a problem.

Ergo, if you are worried about possible 32 byte inefficiencies in a

memory allocation, you are doing things /wrong/.

>>

>> There is almost no conceivable situation where a program will be in
>> any measurable way better by using 0.5 GB instead of 0.6 GB.
>
> Pointlessly using 20% more memory is fine? Except perhaps where the
> memory is already full.
>

You are bike-shedding. (I hope you know the term - it applies to a lot
of what you write.) Concern yourself with things that are important,
and make a real difference.

PC ram costs - what, $10 per GB? That wasted 0.1 GB is a dollar's worth
of waste. If a customer has to pay the programmer to improve the
program to save 20% memory in allocations - how much will /that/ cost?
How much market share will be lost while waiting for the programmer to
finish, turning the "good enough for the purpose" program into a
something that is technically better but /exactly/ the same from the
users' viewpoint? What is the cost of the subtle bug in the
programmer's smarter memory allocator that causes users' data to be
corrupted a couple of times a month?

You worry about utterly irrelevant details of efficiency or speed, yet
say you consider bugs and incorrect code just a fact of life in
programming and don't worry about that. You have the whole thing
ass-backwards. Worry far more about code /correctness/ - avoiding bugs,
or at least making it easier to find bugs - and far less about minor
inefficiencies that do not affect code usability.

James Harris

unread,

Nov 29, 2022, 4:50:33 AM11/29/22

to

On 26/11/2022 13:18, Bart wrote:
> On 26/11/2022 11:31, James Harris wrote:

...

>> It's an interesting topic. I can think of three really fast memory
>> allocation approaches:
>>
>> 1. sbrk-like
>> 2. stack
>> 3. array of slots, with occupied/vacant essentially indicated by a
>> bitmap.
>>
>> The sbrk-like approach (taking memory as needed and not returning it)
>> can work well if all allocations can remain until the program terminates.
>>
>> The stack approach can work well if allocations can be deallocated in
>> reverse order.
>>
>> The array of slots can work well if the things to allocate are of the
>> same size.

...

>> Your suggestion of requiring the caller to remember how large the
>> allocation is is intriguing and it sounds rather like what mmap
>> expects. But for strings how would it work, especially if a string's
>> length is not known in advance?
>
> How would strings work now with malloc? That still requires the length!

You know how strings work with malloc; you gave a good example thereof!
What I was saying is that none of the three fast schemes mentioned above
are generally suitable for strings.

>
> If you have counted strings, then you will know their length to free
> them. Otherwise use strlen. But /something/ has to record the length.

For efficiency, one may want to distinguish between length and capacity.
Memory management would manage capacity changes. The programmer (or a
string library) would concern himself/itself with the length and the
capacity.

>
> You could use special routines just for strings, that say store a length
> in the space just before the returned pointer. Then you can also have a
> routine to return that length, saving you some hassle. But all you've
> done now really is create a mini string library.

I'd have the /length/ in a control block but memory allocation would
only be interested in the /capacity/.

>
>> Further, what if the caller requested 100 bytes then told the
>> allocator to free 99 bytes? What if it told the allocator to free 101
>> bytes?
>
> Then it'll go wrong. You could also pass the wrong pointer to the
> deallocator. Or allocate the space in the first place with 99 instead of
> 100; what stops you doing that?

A deallocator can check that it's been passed a valid pointer.

...

> It's also quite easy to create custom allocators for special purposes.
> For example if you are dealing with millions of identical small objects
> which are being constantly created and destroyed, allocate a large pool
> of memory to hold them.

Agreed.

>
> Then all freed objects can be linked together; allocating a new object
> is very fast, and deallocating instant: add this object to the head of
> the free-list.
>
> You can also deallocate the lot in one go.

I've never been comfortable with freeing loads of allocations in one go
except when the program terminates but I take your point.

--
James Harris

James Harris

unread,

Nov 29, 2022, 5:15:23 AM11/29/22

to

On 26/11/2022 13:46, Dmitry A. Kazakov wrote:
> On 2022-11-26 14:15, James Harris wrote:
>> On Friday, 25 November 2022 at 17:21:11 UTC, Dmitry A. Kazakov wrote:

...

>>> A related important issue is interaction between pointers from different
>>> pools and universal pointers applicable to all pools including locally
>>> allocated objects.

...

>> You seem to be talking about pools being 'NUMA pools' where each pool
>> is of bytes but has certain access characteristics such as bandwidth,
>> latency, and contention. Is that right?
>
> Just a pool = heap. E.g. under Windows an application may have a number
> of different heaps.

In that case it might be better to call them heaps. Pool is already
overloaded.

--
James Harris

James Harris

unread,

Nov 29, 2022, 5:27:44 AM11/29/22

to

On 27/11/2022 15:18, David Brown wrote:

> On 25/11/2022 20:23, Bart wrote:

...

> A "resize" function would always have code equivalent to first calling
> "get_real_size" and then seeing if an actual re-allocation and copy is
> needed. It would be pointless for user code to duplicate this
> functionality. Therefore, it is pointless to have a "get_real_size"
> function available to the user code - using it would be less efficient
> than simply calling "resize" directly.

Two points of disagreement on that.

1. A program which knows the capacity of an allocation can iterate up to
the capacity without incurring the overhead of a function call.

2. A program which knows the capacity can also decide whether to append
to the existing section of the string, move the existing section, or to
start a new section.

>> But there is another reason; take a block which will grow a byte at a
>> time. Apart from needing to call 'resize' for every byte, who knows
>> how much spare capacity 'resize' will actually provide; it may need to
>> do real reallocations more often.

...

> Trying to optimise your interface around pathological examples is
> madness. Design the interface to be useful for real, practical code.

Sometimes the only way to design a new system is to work out the ways in
which it might be used.

Further, appending to a string is a very common use case.

--
James Harris

James Harris

unread,

Nov 29, 2022, 5:43:05 AM11/29/22

to

On 28/11/2022 14:04, Bart wrote:
> On 28/11/2022 10:17, David Brown wrote:

...

>> I cannot see any point in first asking how much space I /really/ have,
>> before asking for the 1020 bytes. What is the point?
>
> You're never curious about how much spare capacity was actually set
> aside for your request? You don't think there is any way to profitably
> exploit that?

It's not about curiosity but about transparency. And greater
transparency enables greater efficiency.

...

>> That would be lying
>> to the allocator. When people start doing that kind of thing, it
>> means they are making undocumented assumptions about implementations -
>> and that means implementations get stuck with with these limitations,
>> or user code breaks.

I wanted to pick up on David talking about 'lying' to the allocator.
That's a misleadingly emotive term probably based on the idea that the
allocation call requests exactly N bytes. Instead, the initial request,
in this case, would be for "at least N". Asking "how many" is not in any
way lying.

--
James Harris

James Harris

unread,

Nov 29, 2022, 5:48:47 AM11/29/22

to

On 28/11/2022 19:59, Dmitry A. Kazakov wrote:

...

> Again, it is NOT a memory pool allocator. Do not use the toilet cleaner
> to brush your teeth!

You seem to be thinking a lot about toilets lately, Dmitry. Recent
problems with the plumbing?

;-)

--
James Harris

James Harris

unread,

Nov 29, 2022, 5:50:11 AM11/29/22

to

On 28/11/2022 16:50, David Brown wrote:

Don't ask for 100 bytes. Ask for at least 100.

--
James Harris

Dmitry A. Kazakov

unread,

Nov 29, 2022, 6:17:22 AM11/29/22

to

Rather a reference to "Dennis the Menace" 1993.

Bart

unread,

Nov 29, 2022, 6:49:58 AM11/29/22

to

Trying to argue with David Brown has really worn me down on this.

- Everything I say is wrong. Every example I give is a pathological one
that is not worth looking at (it's 'madness'). Who grows a string, list
etc an element at a time

- Using twice or four times the memory for a small allocation is
irrelevant: "If you are worrying about a possible inefficiency of 32

bytes in a memory allocation, you are doing things /wrong/."

- Having an allocator being slow enough to take up nearly half the
runtime of the entire application, not just the memory allocation bits,
is irrelevant.

- "Using a 'get_real_size` function is pointless" (even in my own
allocator which has spare capacity, designed to be interrogated, as a
feature!)

- A good 'realloc' function will be extremely fast

- Windows/MS/DLLs are slow anyway, what do you expect? [Incorrect]

- Avoid dynamic memory completely if it's that critical

This is on top of his opinion that super-fast compilers that are 10-100
times faster than typical (partly thanks to people sweating the small
stuff) are pointless and irrelevant. Or if it's something I've done,
that nobody cares.

One wonders what the hell he's doing in a language design group where
all these things are up for grabs

James Harris

unread,

Nov 29, 2022, 7:49:35 AM11/29/22

to

On 28/11/2022 16:50, David Brown wrote:

> On 28/11/2022 15:04, Bart wrote:
>> On 28/11/2022 10:17, David Brown wrote:
>>> On 27/11/2022 19:18, Bart wrote:

...

>> Remember the subject is about /creating/ the allocation library. That
>> means being able to specify how it will work. I would rather some of
>> that work was offloaded to the allocator, since the author of the
>> library may have a better idea of allocation strategies than I do.
>>
>
> Did you miss the bit where I explained, several times, about how the
> implementation of "realloc" could use the extra space if it happens to
> be enough to satisfy the new request? In such cases, realloc will be
> extremely fast.

Could you have holed your own argument below the waterline? You mention
realloc being "extremely fast" when it has room to expand so you appear
to acknowledge that it could be relatively slow when there is not enough
room. The problem, I would suggest, is that under your scheme an
application would not know whether a particular call to realloc was
going to get the fast or the slow response. Correct?

So the very call which needed a fast response /could not use/ your
realloc because it would have no way to know whether the allocator's
response would be fast or slow.

In other words, there may be a fast realloc sitting there available to
the program but it could not take advantage of it the very kind of call
which needs the speed. By contrast, under my proposal a program /could/
take advantage of the extra speed - and more. See below.

...

>> That sounds backwards. In what way is it to making assumptions to
>> actually ASK what an implementation has reserved?
>>
>
> You are assuming particular characteristics of the allocator that are
> inappropriate for a general-purpose memory allocator. All a low-level
> general allocator needs to do is have a function to allocate memory, a
> function to release memory, and optionally (but usefully) a function to
> increase an allocation efficiently. Every function you include beyond
> that, every detail in specifications, is a limitation on the flexibility
> of the allocator.
>
> Here's an example.
>
> Suppose I make an allocator that allocates everything in 128-byte
> blocks, tracked with bitmaps.

It seems odd to use fixed-size slots for variably sized objects. And 128
bytes is rather coarse but I will, below, apply my proposed scheme to
your scenario.

BTW, I can tell you are a professional as you expect us to have the
128-times table in our heads!

> You ask for 800 bytes - I give you that
> by allocating 7 blocks, for a total of 896 bytes. You now want to
> increase the size of the allocation, ideally to 1000 bytes. You call
> the "get_real_size" function and find there are 96 bytes extra. What is
> your code going to do now? Maybe it thinks getting more than 896 bytes
> is going to be slow, and makes do with just 896 bytes. But that could
> be wrong - perhaps my allocator happens to have the next block free, so
> increasing to 1000 bytes would be almost instant - just a check and an
> OR on a bitmap. Maybe it thinks it really needs the 1000 bytes, in
> which case the call to "get_real_size" is completely wasted. The
> information from "get_real_size" is /useless/. It gives you nothing.

Under the proposal, if the code had to have the 1000 bytes as a single
run it would call m_realloc(). That would extend the 896 to 1024 if
there was space. If not, it would create a new allocation and move the
original there. This is the same as C's realloc().

By contrast, if the code did not require the 1000 bytes to be in a
single run it would see that the current run was 896 and would call
m_resize() to ask whether the run could be extended in place. As before,
if there was space available the allocator would increase the allocation
to 1024. If there was not enough space, however, (i.e. if the next block
was occupied) it would leave the allocation as 896. The caller would
then create a new allocation for the rest of the space it needed. (The
caller would likely also link the two allocations together.)

Note that under the proposal a program could still go with the simple
realloc if that's what it wanted to do but it would also have other options.

--
James Harris

James Harris

unread,

Nov 29, 2022, 7:55:02 AM11/29/22

to

On 29/11/2022 08:10, David Brown wrote:
> On 28/11/2022 20:41, Bart wrote:
>> On 28/11/2022 19:01, David Brown wrote:

...

>>> The same applies to memory allocation. If you are worrying about a
>>> possible inefficiency of 32 bytes in a memory allocation, you are
>>> doing things /wrong/.
>>
>> If you are not worried about using 32 bytes instead of 16 or 8, then
>> /you/ are getting it wrong. Why even bother using a lower level
>> language; just use Python.
>>
>
> Let me say it again. If you are worried about possible 32 byte
> inefficiencies in a memory allocation, you are doing things /wrong/.

I cannot speak for Bart but AISI the problem isn't space efficiency but
time efficiency. If you want to extend a string but there's no room to
expand it where it is you may prefer to request space for a new section
of the string rather than to copy all the existing content.

--
James Harris

David Brown

unread,

Nov 29, 2022, 9:55:53 AM11/29/22

to

No, if you need 100 bytes, ask for 100 bytes. If you need more, ask for
more.

The implementation can satisfy the request by giving more than 100
bytes, but that is irrelevant to the application - it's just an
implementation detail.

David Brown

unread,

Nov 29, 2022, 10:00:56 AM11/29/22

to

Pick one or the other - either you want to keep your strings as single
contiguous buffers, or you use a chain of buffers. In the first
instance, you use realloc and let it copy the data if needed. In the
second case, you use sections of a fixed size and malloc them - you
don't expand them.

In either case, messing around with trying to expand only when
"get_real_size" indicates it is cheap will lead to more complicated code
that is regularly slower, and is only occasionally fast (and the realloc
will be fast too in those cases).

Bart

unread,

Nov 29, 2022, 11:30:38 AM11/29/22

to

On 29/11/2022 10:43, James Harris wrote:
> On 28/11/2022 14:04, Bart wrote:
>> On 28/11/2022 10:17, David Brown wrote:
>
> ...
>
>>> I cannot see any point in first asking how much space I /really/
>>> have, before asking for the 1020 bytes. What is the point?
>>
>> You're never curious about how much spare capacity was actually set
>> aside for your request? You don't think there is any way to profitably
>> exploit that?
>
> It's not about curiosity but about transparency. And greater
> transparency enables greater efficiency.

It depends on what you intend doing with a function like 'get_real_size'.

If you want to rely on it for helping implement expanding buffers
efficiently, then you need to know that the allocator will give you a
worthwhile amount of spare capacity.

With a single, general-purpose allocator, there is no reason for it to
do that; it will only round up enough to meet alignment needs.

So it might be better to two have kinds of allocator: fixed size, and
expanding.

The latter is a bit like your 'at least N' suggestion, but that's not
strong enough I think. It needs to be specifically suitable for growing.

My own allocator was a special-purpose one for an interpreter:
allocations up to 33MB were rounded up to the next power of two size,
with a minimum buffer size of 16 bytes.

I use it from ordinary programs as well, but there it could do with an
fixed-size allocator too (which would only round to the next multiple of
8). There however I probably wouldn't support Free, or not effectively.

David Brown

unread,

Nov 29, 2022, 11:54:17 AM11/29/22

to

Allocation functions like "malloc" do not ask for "at least N bytes".
They ask for "N bytes". And the allocator gives you "N bytes" (if it
can). What you get is a pointer and the guarantee that you can use N
bytes from that pointer. You don't get to use any more than N bytes,
even if the allocation results in M extra bytes being left unusable at
the end of the N bytes. Using those M extra bytes without telling the
allocator (by "realloc") is, if not "lying", at least making unwarranted
assumptions and misusing the space behind the back of the allocator that
owns the space.

Programming is all about specifications. "malloc" is specified to
return a pointer to N bytes. That is /all/. If you start messing
around by assuming more than that, or trying to sneak out extra
information, your code is now in the category of "works by luck" instead
of "works by design". Lots of people program by luck, but it is hardly
something to aspire to or encourage.

David Brown

unread,

Nov 29, 2022, 11:54:20 AM11/29/22

to

On 29/11/2022 13:49, James Harris wrote:
> On 28/11/2022 16:50, David Brown wrote:
>> On 28/11/2022 15:04, Bart wrote:
>>> On 28/11/2022 10:17, David Brown wrote:
>>>> On 27/11/2022 19:18, Bart wrote:
>
> ...
>
>>> Remember the subject is about /creating/ the allocation library. That
>>> means being able to specify how it will work. I would rather some of
>>> that work was offloaded to the allocator, since the author of the
>>> library may have a better idea of allocation strategies than I do.
>>>
>>
>> Did you miss the bit where I explained, several times, about how the
>> implementation of "realloc" could use the extra space if it happens to
>> be enough to satisfy the new request? In such cases, realloc will be
>> extremely fast.
>
> Could you have holed your own argument below the waterline? You mention
> realloc being "extremely fast" when it has room to expand so you appear
> to acknowledge that it could be relatively slow when there is not enough
> room. The problem, I would suggest, is that under your scheme an
> application would not know whether a particular call to realloc was
> going to get the fast or the slow response. Correct?

Your description is correct, but I don't see the problem. If you need
the memory, you need the memory - correctness trumps speed. And if you
need high speed (or as is often the case, predictable time limits), then
you don't use a general-purpose allocator. You use something like a
pool of fixed size blocks that matches the application, or some other
specialised allocator.

>
> So the very call which needed a fast response /could not use/ your
> realloc because it would have no way to know whether the allocator's
> response would be fast or slow.
>

If the application needs memory, it needs memory. The situations where
you can use a tiny bit of extra memory if that is available quickly but
you'd rather do without than spend time, are almost non-existent, and
you would have been better off requesting a larger amount in the first
place.

It is /wrong/ to design a general-purpose allocator around such a
hypothetical situation, and in so doing limit the implementation or
encourage more complex code constructs in user code.

A far better solution here is to make a simple general-purpose allocator
that does its job without unnecessary extra interfaces. And make
efficient string processing part of the language or part of the standard
library, using a specialised allocator for the purpose. Even better,
you should also make pool allocators and/or some kind of abstract
allocator interface so that allocators can be made available that are
better optimised for particular use-cases.

> In other words, there may be a fast realloc sitting there available to
> the program but it could not take advantage of it the very kind of call
> which needs the speed. By contrast, under my proposal a program /could/
> take advantage of the extra speed - and more. See below.
>
> ...
>
>>> That sounds backwards. In what way is it to making assumptions to
>>> actually ASK what an implementation has reserved?
>>>
>>
>> You are assuming particular characteristics of the allocator that are
>> inappropriate for a general-purpose memory allocator. All a low-level
>> general allocator needs to do is have a function to allocate memory, a
>> function to release memory, and optionally (but usefully) a function
>> to increase an allocation efficiently. Every function you include
>> beyond that, every detail in specifications, is a limitation on the
>> flexibility of the allocator.
>>
>> Here's an example.
>>
>> Suppose I make an allocator that allocates everything in 128-byte
>> blocks, tracked with bitmaps.
>
> It seems odd to use fixed-size slots for variably sized objects. And 128
> bytes is rather coarse but I will, below, apply my proposed scheme to
> your scenario.

Good modern allocation systems will use different block sizes for
different purposes. You don't pack everything into the smallest
possible size - that just leads to heap fragmentation and far more
wasted space.

>
> BTW, I can tell you are a professional as you expect us to have the
> 128-times table in our heads!
>

I should have thought that was natural for everyone here :-) Feel free
to use a calculator.

>
>> You ask for 800 bytes - I give you that by allocating 7 blocks, for a
>> total of 896 bytes. You now want to increase the size of the
>> allocation, ideally to 1000 bytes. You call the "get_real_size"
>> function and find there are 96 bytes extra. What is your code going
>> to do now? Maybe it thinks getting more than 896 bytes is going to be
>> slow, and makes do with just 896 bytes. But that could be wrong -
>> perhaps my allocator happens to have the next block free, so
>> increasing to 1000 bytes would be almost instant - just a check and an
>> OR on a bitmap. Maybe it thinks it really needs the 1000 bytes, in
>> which case the call to "get_real_size" is completely wasted. The
>> information from "get_real_size" is /useless/. It gives you nothing.
>
> Under the proposal, if the code had to have the 1000 bytes as a single
> run it would call m_realloc(). That would extend the 896 to 1024 if
> there was space. If not, it would create a new allocation and move the
> original there. This is the same as C's realloc().

Yes.

>
> By contrast, if the code did not require the 1000 bytes to be in a
> single run it would see that the current run was 896 and would call
> m_resize() to ask whether the run could be extended in place. As before,
> if there was space available the allocator would increase the allocation
> to 1024. If there was not enough space, however, (i.e. if the next block
> was occupied) it would leave the allocation as 896. The caller would
> then create a new allocation for the rest of the space it needed. (The
> caller would likely also link the two allocations together.)
>

That is all just imaginary - real code does not do that. Tracking it
all at the application side would take a good deal extra code with
associated bugs (and it is questionable if it is even testable), and
will probably waste as much time faffing around as it could hope to gain
in the few cases where a "m_resize" was fast.

> Note that under the proposal a program could still go with the simple
> realloc if that's what it wanted to do but it would also have other
> options.
>

Don't design a language or library based on squeezing in as many
features as you can imagine possibly being useful on a few occasions to
a few people. Design it simply and cleanly, in a way that can be
expanded by users as needed, and where the implementation can be changed.

I'd start with an object-based interface for an "Allocator", with methods :

Allocator.alloc(size) -> pointer
Allocator.alloc_zero(size) -> pointer
Allocator.realloc(pointer, oldsize, newsize) -> pointer
Allocator.free(pointer, size)

(You might dispense with the non-zeroing alloc too.)

The standard library should provide a standard general-purpose
Allocator, called "heap". You might also provide other Allocators, such
as "thread_unsafe_heap" (being thread unsafe, it could be faster - but
the default heap must be thread safe), "nofree_stack", "big_heap",
"small_heap", "binary_pools", and whatever else you feel fits.

Don't bother making the allocators return null pointers on failure.
Programmers don't check their malloc returns well enough as it is, and
allocations never fail in PC programming unless the programmer has
seriously screwed up somewhere. Throw an exception, call an error
handler, crash the code, or whatever you choose to do normally on a
critical error.

James Harris

unread,

Nov 29, 2022, 12:10:47 PM11/29/22

to

On 29/11/2022 16:54, David Brown wrote:
> On 29/11/2022 11:43, James Harris wrote:
>> On 28/11/2022 14:04, Bart wrote:
>>> On 28/11/2022 10:17, David Brown wrote:

...

Not sure why you are talking about malloc. I said at the outset that I
was talking about "an allocator which carries out plain malloc-type
allocations (i.e. specified as a number of 8-bit bytes suitably aligned
for any type) without having to have compatibility with malloc".

The scheme I put forward was akin to but not the same as the malloc
approach. Maybe I wasn't clear enough but:

m_alloc would request AT LEAST N
m_calloc would request AT LEAST N units of size M and zero them
m_resize would resize IN PLACE as far as it was able
m_realloc would resize in place or move existing data to larger area
m_free would be akin to malloc's free
m_msize would report the capacity (greater than or equal to N)

IOW, the request would be "at least N" and m_msize() would report the
actual allocation.

--
James Harris

James Harris

unread,

Nov 29, 2022, 12:27:49 PM11/29/22

to

On 29/11/2022 16:30, Bart wrote:
> On 29/11/2022 10:43, James Harris wrote:
>> On 28/11/2022 14:04, Bart wrote:
>>> On 28/11/2022 10:17, David Brown wrote:
>>
>> ...
>>
>>>> I cannot see any point in first asking how much space I /really/
>>>> have, before asking for the 1020 bytes. What is the point?
>>>
>>> You're never curious about how much spare capacity was actually set
>>> aside for your request? You don't think there is any way to
>>> profitably exploit that?
>>
>> It's not about curiosity but about transparency. And greater
>> transparency enables greater efficiency.
>
> It depends on what you intend doing with a function like 'get_real_size'.
>
> If you want to rely on it for helping implement expanding buffers
> efficiently, then you need to know that the allocator will give you a
> worthwhile amount of spare capacity.

On the contrary, I don't want the allocator to give me spare capacity
unnecessarily. Not all allocations will need to expand.

>
> With a single, general-purpose allocator, there is no reason for it to
> do that; it will only round up enough to meet alignment needs.

Yes.

>
> So it might be better to two have kinds of allocator: fixed size, and
> expanding.

Absolutely! But if you can only have one type it /has/ to be the latter
because only it can cope with both fixed- and variable-sized requests.

>
> The latter is a bit like your 'at least N' suggestion, but that's not
> strong enough I think. It needs to be specifically suitable for growing.

Yes, while I was away at the weekend I was thinking about how to do this
'properly'. By that I mean fast, even faster than your
expanding-allocation example. I came up with something but it's too
involved to go into here.

>
> My own allocator was a special-purpose one for an interpreter:
> allocations up to 33MB were rounded up to the next power of two size,
> with a minimum buffer size of 16 bytes.
>
> I use it from ordinary programs as well, but there it could do with an
> fixed-size allocator too (which would only round to the next multiple of
> 8). There however I probably wouldn't support Free, or not effectively.

I know power-of-two allocators are popular but I'm not a fan. And
something still has to cope with a string which begins small but
incrementally and unpredictably gets larger and larger.

--
James Harris

Bart

unread,

Dec 3, 2022, 8:33:52 AM12/3/22

to

On 29/11/2022 08:10, David Brown wrote:

> On 28/11/2022 20:41, Bart wrote:

>> Pointlessly using 20% more memory is fine? Except perhaps where the
>> memory is already full.
>>
>
> You are bike-shedding. (I hope you know the term - it applies to a lot
> of what you write.) Concern yourself with things that are important,
> and make a real difference.
>
> PC ram costs - what, $10 per GB? That wasted 0.1 GB is a dollar's worth
> of waste. If a customer has to pay the programmer to improve the
> program to save 20% memory in allocations - how much will /that/ cost?

You can't install 0.1GB memory, it will have to be in units such as 4GB
or 8GB, and it may require replacing all the memory if the maximum slots
have been reached and you need bigger modules.

Or the memory might not be user-gradable.

Plus you're thinking in terms of one user of the software: there might
be 1000 - or 1000000.

Consider also that machine might already be at near-maximum usage
because all the 100 running processes have the same attitude as you
towards memory.

> How much market share will be lost while waiting for the programmer to
> finish, turning the "good enough for the purpose" program into a
> something that is technically better but /exactly/ the same from the
> users' viewpoint? What is the cost of the subtle bug in the
> programmer's smarter memory allocator that causes users' data to be
> corrupted a couple of times a month?

Add 20% of memory to the 80% of slower runtime. Now each user needs to
buy a faster machine.

> You worry about utterly irrelevant details of efficiency or speed, yet
> say you consider bugs and incorrect code just a fact of life in
> programming and don't worry about that. You have the whole thing
> ass-backwards.

I do worry about it, but since I'm talking mostly about other people's
software, what can /I/ do about that? I see bugs all the time in the
apps in my gadgets, on websites, in my car's tech, and even on banking
applications.

So building in some sort of fault-tolerance or thinking about what
customers might do when things when things go, is common sense.

It might not even be a software bug: someone might yank the power cord.

Doesn't avionics software have redundant systems so that a fault or
error in one system is not catastrophic?

If all software was perfect, every program would be on version 1.000.

>ore about code /correctness/ - avoiding bugs,
> or at least making it easier to find bugs

You work on bugs of course, but they may need reporting first. But they
will always be there. They need not be the kind you can detect by
analysis of a program, but flaws in program logic or in spec, or
feedback from customers.

In my commercial apps, commands were implemented with scripts. If a
script were to go wrong, it would be stopped, but the application
continued working and they can try something else. If there was an
actual crash of the main app, they would be able to recover their data
from auto-save files. This is quite common these days.

You think apps that have such features (including Thunderbird I'm using
now) are admitting their failures?

- and far less about minor
> inefficiencies that do not affect code usability.

I remember something in Firefox, maybe a bug, where the cap on how many
temporary files could be created was ignored. The hard drive always
seemed very busy when accessing the internet, but it worked, albeit slowly.

I eventually discovered it had created some 3.5 million temporary files,
occupying some 60GB IIRC. Deleting those files from the machine took /14
hours/.

Some, perhaps including you, would have said 'So what; do you know how
cheap disk storage is? Buy a bigger drive, buy an SSD'.

You need to acknowledge when there is a real program, not just blindly
throw more resources at it.

Dmitry A. Kazakov

unread,

Dec 3, 2022, 9:23:02 AM12/3/22

to

On 2022-12-03 14:33, Bart wrote:
> On 29/11/2022 08:10, David Brown wrote:

>> You worry about utterly irrelevant details of efficiency or speed, yet
>> say you consider bugs and incorrect code just a fact of life in
>> programming and don't worry about that. You have the whole thing
>> ass-backwards.
>
> I do worry about it, but since I'm talking mostly about other people's
> software, what can /I/ do about that? I see bugs all the time in the
> apps in my gadgets, on websites, in my car's tech, and even on banking
> applications.
>
> So building in some sort of fault-tolerance or thinking about what
> customers might do when things when things go, is common sense.
>
> It might not even be a software bug: someone might yank the power cord.
>
> Doesn't avionics software have redundant systems so that a fault or
> error in one system is not catastrophic?

They usually have dynamically allocated memory verboten, so you need not
to worry about that. (:-))

BTW, having redundant software is quite stupid idea, since differently
to hardware software bugs have deterministic nature. No matter how many
times you repeat an error it is still an error. The other side of the
coin, once the bug is out, it is out. Hardware can always fail.

In older days software redundancy was achieved per independent
development by two independent teams. These days, I believe, it is
considered too expensive.

> If all software was perfect, every program would be on version 1.000.

Requirements change, hardware change...

> In my commercial apps, commands were implemented with scripts. If a
> script were to go wrong, it would be stopped, but the application
> continued working and they can try something else.

Yes, like wiping the log files so that the authority would never find
out how you killed the client... (:-))

James Harris

unread,

Dec 3, 2022, 2:17:21 PM12/3/22

to

On 29/11/2022 15:00, David Brown wrote:
> On 29/11/2022 13:55, James Harris wrote:
>> On 29/11/2022 08:10, David Brown wrote:
>>> On 28/11/2022 20:41, Bart wrote:
>>>> On 28/11/2022 19:01, David Brown wrote:
>>
>> ...
>>
>>>>> The same applies to memory allocation. If you are worrying about a
>>>>> possible inefficiency of 32 bytes in a memory allocation, you are
>>>>> doing things /wrong/.
>>>>
>>>> If you are not worried about using 32 bytes instead of 16 or 8, then
>>>> /you/ are getting it wrong. Why even bother using a lower level
>>>> language; just use Python.
>>>>
>>>
>>> Let me say it again. If you are worried about possible 32 byte
>>> inefficiencies in a memory allocation, you are doing things /wrong/.
>>
>> I cannot speak for Bart but AISI the problem isn't space efficiency
>> but time efficiency. If you want to extend a string but there's no
>> room to expand it where it is you may prefer to request space for a
>> new section of the string rather than to copy all the existing content.
>>
>
> Pick one or the other - either you want to keep your strings as single
> contiguous buffers, or you use a chain of buffers. In the first
> instance, you use realloc and let it copy the data if needed. In the
> second case, you use sections of a fixed size and malloc them - you
> don't expand them.

Ugh! I take it you don't care about performance and scalability!!

Such designs are just exasperating. Where did you get your idea from? Is
there some scientific basis for it or did you make it up - perhaps based
on small cases that are so slow they can be run in batch mode? :-(

For example, who decided that buffers in a chain couldn't be enlarged?
Who invented that prohibition and why is it there?

>
> In either case, messing around with trying to expand only when
> "get_real_size" indicates it is cheap will lead to more complicated code
> that is regularly slower, and is only occasionally fast (and the realloc
> will be fast too in those cases).

If you need best performance you cannot use realloc; it might move
what's already there and there's no way before calling it to tell
whether it will expand in place (fast) or move memory (slow). Hence I
proposed resize (i.e. 'resize in place') which will never move existing
content but would expand the allocation as possible (the new size would
be visible via the msize call). That's good for not just performance but
also guarantees that pointers into the buffer won't be invalidated.

--
James Harris

James Harris

unread,

Dec 3, 2022, 2:34:15 PM12/3/22

to

On 03/12/2022 13:33, Bart wrote:
> On 29/11/2022 08:10, David Brown wrote:

...

>> ore about code /correctness/ - avoiding bugs, or at least making it
>> easier to find bugs

...

>> - and far less about minor
>> inefficiencies that do not affect code usability.
>
>
> I remember something in Firefox, maybe a bug, where the cap on how many
> temporary files could be created was ignored. The hard drive always
> seemed very busy when accessing the internet, but it worked, albeit slowly.
>
> I eventually discovered it had created some 3.5 million temporary files,
> occupying some 60GB IIRC. Deleting those files from the machine took /14
> hours/.

Great example. I was heartened recently to hear people making
performance a key design goal. Performance can be just as much a part of
engineering a solution as correctness.

https://en.wikipedia.org/wiki/Data-oriented_design
https://youtu.be/rX0ItVEVjHc

A language ought to be able to help a lot with that - e.g. by letting
the optimiser decide memory layout.

...

> You need to acknowledge when there is a real program, not just blindly
> throw more resources at it.

Indeed! For example, throwing threads and cores at a problem can make it
run slower rather than faster.

--
James Harris

Bart

unread,

Dec 3, 2022, 3:01:27 PM12/3/22

to

On 03/12/2022 19:34, James Harris wrote:
> On 03/12/2022 13:33, Bart wrote:
>> On 29/11/2022 08:10, David Brown wrote:
>
> ...
>
>>> ore about code /correctness/ - avoiding bugs, or at least making it
>>> easier to find bugs
>
> ...
>
>>> - and far less about minor
>>> inefficiencies that do not affect code usability.
>>
>>
>> I remember something in Firefox, maybe a bug, where the cap on how
>> many temporary files could be created was ignored. The hard drive
>> always seemed very busy when accessing the internet, but it worked,
>> albeit slowly.
>>
>> I eventually discovered it had created some 3.5 million temporary
>> files, occupying some 60GB IIRC. Deleting those files from the machine
>> took /14 hours/.
>
> Great example. I was heartened recently to hear people making
> performance a key design goal. Performance can be just as much a part of
> engineering a solution as correctness.

Here's an interesting discussion I found a link to:

https://news.ycombinator.com/item?id=24735366

(I wasn't involved.) It's about fast compilation, and it's refreshing to
hear people who are enthusiastic about it and can appreciate the benefits.

(Someone compares 3 seconds build time to 3 minutes. I guess they
consider 3 seconds as a example of a fast build; I would think that my
compiler had hung!)

James Harris

unread,

Dec 3, 2022, 3:23:17 PM12/3/22

to

On 29/11/2022 16:54, David Brown wrote:

Yes, predictable time limits can matter more than overall run time. If a
particular function has to execute quickly then unless it is referring
to a very small existing buffer it may be unable to call realloc. It
could, however, call resize to see if the buffer can be resized in place.

>
>>
>> So the very call which needed a fast response /could not use/ your
>> realloc because it would have no way to know whether the allocator's
>> response would be fast or slow.
>>
>
> If the application needs memory, it needs memory. The situations where
> you can use a tiny bit of extra memory if that is available quickly but
> you'd rather do without than spend time, are almost non-existent, and
> you would have been better off requesting a larger amount in the first
> place.

Too many existing programs are stymied by fixed buffer sizes. And it's
not always possible to decide a buffer size in advance. Sometimes one
has to read an object into a chain of buffers and then, if desired,
consolidate afterwards.

>
> It is /wrong/ to design a general-purpose allocator around such a
> hypothetical situation, and in so doing limit the implementation or
> encourage more complex code constructs in user code.
>
> A far better solution here is to make a simple general-purpose allocator
> that does its job without unnecessary extra interfaces. And make
> efficient string processing part of the language or part of the standard
> library, using a specialised allocator for the purpose. Even better,
> you should also make pool allocators and/or some kind of abstract
> allocator interface so that allocators can be made available that are
> better optimised for particular use-cases.

Well, how would your specialised allocator work for a string of unknown
length?

...

>> Under the proposal, if the code had to have the 1000 bytes as a single
>> run it would call m_realloc(). That would extend the 896 to 1024 if
>> there was space. If not, it would create a new allocation and move the
>> original there. This is the same as C's realloc().
>
> Yes.
>
>>
>> By contrast, if the code did not require the 1000 bytes to be in a
>> single run it would see that the current run was 896 and would call
>> m_resize() to ask whether the run could be extended in place. As
>> before, if there was space available the allocator would increase the
>> allocation to 1024. If there was not enough space, however, (i.e. if
>> the next block was occupied) it would leave the allocation as 896. The
>> caller would then create a new allocation for the rest of the space it
>> needed. (The caller would likely also link the two allocations together.)
>>
>
> That is all just imaginary - real code does not do that.

Who says? Just because /you/ wouldn't write code that way does not mean
that others would not.

> Tracking it
> all at the application side would take a good deal extra code with
> associated bugs (and it is questionable if it is even testable), and
> will probably waste as much time faffing around as it could hope to gain
> in the few cases where a "m_resize" was fast.

Let's see how much extra code would be required. Compare these two. Both
start with an initial buffer pointed to by b.

Simple:

if BUF_SIZE < reqd /* If buffer is smaller than we need */
{
new_b = alloc(b, BUF_SIZE) /* Create a new buffer */
.... link into the chain ....
b = new_b
}

Responsive:

if msize(b) < reqd /* If buffer is smaller than we need */
{
resize(b, reqd) /* try to resize in place */
if msize(b) < reqd /* If buffer is still smaller than we need */
{
new_b = alloc(EXTENSION_SIZE) /* Create a new buffer */
.... link into the chain ....
b = new_b
}
}

What is that - only 2 extra lines? How can that be overly complicated?

>
>> Note that under the proposal a program could still go with the simple
>> realloc if that's what it wanted to do but it would also have other
>> options.
>>
>
> Don't design a language or library based on squeezing in as many
> features as you can imagine possibly being useful on a few occasions to
> a few people. Design it simply and cleanly, in a way that can be
> expanded by users as needed, and where the implementation can be changed.
>
> I'd start with an object-based interface for an "Allocator", with methods :
>
>     Allocator.alloc(size) -> pointer
>     Allocator.alloc_zero(size) -> pointer
>     Allocator.realloc(pointer, oldsize, newsize) -> pointer
>     Allocator.free(pointer, size)

Thanks for coming up with a specific suggestion. What would you want to
happen if Allocator.free was asked to release a section of memory which
was wholly within an allocated chunk?

>
> (You might dispense with the non-zeroing alloc too.)
>
> The standard library should provide a standard general-purpose
> Allocator, called "heap". You might also provide other Allocators, such
> as "thread_unsafe_heap" (being thread unsafe, it could be faster - but
> the default heap must be thread safe), "nofree_stack", "big_heap",
> "small_heap", "binary_pools", and whatever else you feel fits.

OK. How would they be called

heap.alloc(size)
nofree_stack.alloc(size)
etc

?

>
> Don't bother making the allocators return null pointers on failure.
> Programmers don't check their malloc returns well enough as it is, and
> allocations never fail in PC programming unless the programmer has
> seriously screwed up somewhere. Throw an exception, call an error
> handler, crash the code, or whatever you choose to do normally on a
> critical error.

Agreed. I throw an exception.

--
James Harris

David Brown

unread,

Dec 4, 2022, 7:31:03 AM12/4/22

to

What /are/ you talking about?

Look, at the lowest level, buffers are /never/ enlarged. You've
allocated a bit of memory, and you use that. If that's not enough, you
have to allocate a /new/ bit of memory. All this piddle about "real"
allocation sizes is just peanuts - a byte here, a byte there, almost
irrelevant in the average case and completely irrelevant in the worst case.

Your choices are whether to allocate your new bit of memory for the new
data only and chain it together with the first lump, or to allocate
something big enough for all existing data as well as the new bit, copy
over the old data and thus have a single contiguous buffer. That second
solution gives faster and more convenient access to the data, but costs
more for the allocation and copying. Trade-offs, as always.

>>
>> In either case, messing around with trying to expand only when
>> "get_real_size" indicates it is cheap will lead to more complicated
>> code that is regularly slower, and is only occasionally fast (and the
>> realloc will be fast too in those cases).
>
> If you need best performance you cannot use realloc; it might move
> what's already there and there's no way before calling it to tell
> whether it will expand in place (fast) or move memory (slow). Hence I
> proposed resize (i.e. 'resize in place') which will never move existing
> content but would expand the allocation as possible (the new size would
> be visible via the msize call). That's good for not just performance but
> also guarantees that pointers into the buffer won't be invalidated.
>
>

You are trying to solve the wrong problem in the wrong place.

Any information about buffer sizes compared to buffer usage belongs in
the application code or run-time library layers, not the allocation
system at the low level. For common functions, such as string handling,
the run-time library is appropriate for the convenience of users. (The
run-time library has the advantage of knowing about the low-level
implementation, and thus every allocation it makes can be of a "perfect"
size without wasted space.)

David Brown

unread,

Dec 4, 2022, 8:27:22 AM12/4/22

to

On 03/12/2022 21:23, James Harris wrote:
> On 29/11/2022 16:54, David Brown wrote:

I'm skipping the rest here - either you understand, appreciate and agree
with what I wrote, or you don't. (That means you either don't
understand it, don't appreciate it, or just don't agree with it - not
necessarily that I don't think you understand it.) That's fair enough -
it's your language and libraries, not mine.

>> Don't design a language or library based on squeezing in as many
>> features as you can imagine possibly being useful on a few occasions
>> to a few people. Design it simply and cleanly, in a way that can be
>> expanded by users as needed, and where the implementation can be changed.
>>
>> I'd start with an object-based interface for an "Allocator", with
>> methods :
>>
>>      Allocator.alloc(size) -> pointer
>>      Allocator.alloc_zero(size) -> pointer
>>      Allocator.realloc(pointer, oldsize, newsize) -> pointer
>>      Allocator.free(pointer, size)
>
> Thanks for coming up with a specific suggestion. What would you want to
> happen if Allocator.free was asked to release a section of memory which
> was wholly within an allocated chunk?
>

You are asking what the allocator should do if the programmer using it
writes nonsense? Who cares? Launch the nasal daemons. (You are
welcome, even strongly recommended, to have debug modes or optional
run-time checks where efficiency takes a back seat to ease of debugging.
In such situations the application code is supposed to give the proper
values, but the allocator has also tracked the sizes despite the cost -
then you have a run-time check and fire off error messages.)

>>
>> (You might dispense with the non-zeroing alloc too.)
>>
>> The standard library should provide a standard general-purpose
>> Allocator, called "heap". You might also provide other Allocators,
>> such as "thread_unsafe_heap" (being thread unsafe, it could be faster
>> - but the default heap must be thread safe), "nofree_stack",
>> "big_heap", "small_heap", "binary_pools", and whatever else you feel
>> fits.
>
> OK. How would they be called
>
> heap.alloc(size)
> nofree_stack.alloc(size)
> etc
>
> ?

Yes - although I'd recommend keeping track of the returned pointers!

You may also provide some kind of wrapper for objects with constructors
and destructors (like "new" and "delete" mechanisms found in many
languages), depending on the rest of the language.

James Harris

unread,

Dec 4, 2022, 8:55:01 AM12/4/22

to

I'm talking about engineering, not the oversimplistic theory, idealism
and naivety that you keep telling me is all that anyone should ever
need. :-(

>
> Look, at the lowest level, buffers are /never/ enlarged.

Rubbish! Just because /you/ never want such a thing does not mean that
others do not want it, nor does your dislike for it or your lack of
experience of it mean that it's a bad idea.

> You've
> allocated a bit of memory, and you use that. If that's not enough, you
> have to allocate a /new/ bit of memory. All this piddle about "real"
> allocation sizes is just peanuts - a byte here, a byte there, almost
> irrelevant in the average case and completely irrelevant in the worst case.

That may be what /you/ would always do but your view shouldn't restrict
others. In fact, you talk about saving only 'a byte here, a byte there'.
That can make a difference to some applications.

For example, say a string library implements two formats: a fast format
for short strings and a general format for others. (I know of at least
two libraries which do.) Having a single extra byte in the short format
allows more strings to use the fast form, for a net gain.

So even a single byte can make a difference. But because of alignment in
what we have been talking about a programmer could gain 3, 7 or 15 bytes
in every allocation which he would not otherwise have had! These extra
bytes would be completely free and would otherwise have been wasted.
There is no good reason to prevent a client from using them.

>
> Your choices are whether to allocate your new bit of memory for the new
> data only and chain it together with the first lump, or to allocate
> something big enough for all existing data as well as the new bit, copy
> over the old data and thus have a single contiguous buffer. That second
> solution gives faster and more convenient access to the data, but costs
> more for the allocation and copying. Trade-offs, as always.

Whether to chain or to reallocate is an interesting consideration but it
is orthogonal to the basic allocation. If chaining, finding out how much
space is available can avoid forming a new link. If reallocating,
finding out how much space is available can avoid carrying out a costly
reallocation.

On the chain/reallocate decision ... in simple terms, if I knew the
length of the string I was creating I would use a single allocation
whereas if I did not then I would use chaining.

>
>
>
>>>
>>> In either case, messing around with trying to expand only when
>>> "get_real_size" indicates it is cheap will lead to more complicated
>>> code that is regularly slower, and is only occasionally fast (and the
>>> realloc will be fast too in those cases).
>>
>> If you need best performance you cannot use realloc; it might move
>> what's already there and there's no way before calling it to tell
>> whether it will expand in place (fast) or move memory (slow). Hence I
>> proposed resize (i.e. 'resize in place') which will never move
>> existing content but would expand the allocation as possible (the new
>> size would be visible via the msize call). That's good for not just
>> performance but also guarantees that pointers into the buffer won't be
>> invalidated.
>>
>>
>
> You are trying to solve the wrong problem in the wrong place.
>
> Any information about buffer sizes compared to buffer usage belongs in
> the application code or run-time library layers, not the allocation
> system at the low level. For common functions, such as string handling,
> the run-time library is appropriate for the convenience of users. (The
> run-time library has the advantage of knowing about the low-level
> implementation, and thus every allocation it makes can be of a "perfect"
> size without wasted space.)

Well, the software would be in layers.

app
|
V
string library
|
V
allocator

What I've proposed in this thread is a simple and small change to the
allocator which would allow the string library to make best use of the
spaces which the allocator returns to the string library.

I am not sure why you keep objecting to that but I am pretty sure you
and I aren't going to agree on this stuff. Still, no problem. Agreement
is not mandatory. :-)

--
James Harris

James Harris

unread,

Dec 4, 2022, 9:07:19 AM12/4/22

to

On 04/12/2022 13:27, David Brown wrote:
> On 03/12/2022 21:23, James Harris wrote:
>> On 29/11/2022 16:54, David Brown wrote:
>
> I'm skipping the rest here - either you understand, appreciate and agree
> with what I wrote, or you don't. (That means you either don't
> understand it, don't appreciate it, or just don't agree with it - not
> necessarily that I don't think you understand it.) That's fair enough -
> it's your language and libraries, not mine.

:-) I just wrote something similar to you in another reply.

>
>
>>> Don't design a language or library based on squeezing in as many
>>> features as you can imagine possibly being useful on a few occasions
>>> to a few people. Design it simply and cleanly, in a way that can be
>>> expanded by users as needed, and where the implementation can be
>>> changed.
>>>
>>> I'd start with an object-based interface for an "Allocator", with
>>> methods :
>>>
>>>      Allocator.alloc(size) -> pointer
>>>      Allocator.alloc_zero(size) -> pointer
>>>      Allocator.realloc(pointer, oldsize, newsize) -> pointer
>>>      Allocator.free(pointer, size)
>>
>> Thanks for coming up with a specific suggestion. What would you want
>> to happen if Allocator.free was asked to release a section of memory
>> which was wholly within an allocated chunk?
>>
>
> You are asking what the allocator should do if the programmer using it
> writes nonsense?

No, I was asking whether you intended that a programmer should be able
to release memory in units other than those he used for the allocation
requests. For example, consider an allocation from A to D when the
programmer wants to release B to C within it. A < B < C < D:

----------------------------------------------------
A B C D

I was asking whether you thought the programmer should be able to
release B to C before releasing the other parts.

I thought that was what you and Bart (who IIRC also mentioned something
similar) had in mind and I spend some time designing an allocator to
support splitting memory in that way down to the byte level. It was an
interesting exercise and I may use it one day but I thought you had some
use cases in mind.

> Who cares? Launch the nasal daemons.

Never, never, and never!

...

--
James Harris

Bart

unread,

Dec 4, 2022, 9:47:10 AM12/4/22

to

I don't think so. You suggestion sounds a little crazy actually: someone
allocates a buffer to store a string, which spans A to D exclusive, and
which will have ensured that A was aligned, and D was aligned (for the
next alloc)

Later, they want to independently free a section of that from B to C
exclusive, to turn that one block into three:

A -> B allocated
B -> C free
C -> D allocated

So what happens to the alignment needs of B and C? What happens to the
string that was stored in it, of which 2 remnants remain, and one chunk
is missing?

Presumably the process can be repeated, so that any block will be full
of freed holes, which will eventually join up, but will now be fragmented.

> had in mind and I spend some time designing an allocator to
> support splitting memory in that way down to the byte level. It was an
> interesting exercise and I may use it one day but I thought you had some
> use cases in mind.

The only use case I can think of is within the allocator, splitting up
an available free block into smaller ones, but noting alignment needs
and minimum sizes (like 8 bytes).

Is this supposed to be some generalisation of Realloc? Currently such a
routine working on A->D would turn it into this when reducing:

A->B still allocated
B->D now available

or into this when expanding:

A->E now allocated including D->E

Here E>D, but the whole block may need relocating in either case.

James Harris

unread,

Dec 4, 2022, 10:56:53 AM12/4/22

to

On 04/12/2022 14:47, Bart wrote:
> On 04/12/2022 14:07, James Harris wrote:
>> On 04/12/2022 13:27, David Brown wrote:

...

>>> You are asking what the allocator should do if the programmer using
>>> it writes nonsense?
>>
>> No, I was asking whether you intended that a programmer should be able
>> to release memory in units other than those he used for the allocation
>> requests. For example, consider an allocation from A to D when the
>> programmer wants to release B to C within it. A < B < C < D:
>>
>> ----------------------------------------------------
>> A B C D
>>
>> I was asking whether you thought the programmer should be able to
>> release B to C before releasing the other parts.
>>
>> I thought that was what you and Bart (who IIRC also mentioned
>> something similar)
>
> I don't think so. You suggestion sounds a little crazy actually: someone
> allocates a buffer to store a string, which spans A to D exclusive, and
> which will have ensured that A was aligned, and D was aligned (for the
> next alloc)

In the ABCD example I wasn't talking about strings.

>
> Later, they want to independently free a section of that from B to C
> exclusive, to turn that one block into three:
>
>    A -> B allocated
>    B -> C free
>    C -> D allocated
>
> So what happens to the alignment needs of B and C? What happens to the
> string that was stored in it, of which 2 remnants remain, and one chunk
> is missing?

Again, I wasn't talking about strings but a general case. I thought it
was you who, along with David, suggested a 'free' which included the
length as in

free(address, length)

It got me thinking that the system call munmap does similar: calls to it
supply the length as well as the starting address. The call unmaps pages
and, AIUI, a run of pages being freed doesn't have to match the run
allocated. IOW it supports freeing B to C from the illustration, above,
(and other permutations) as long as we understand that the amount freed
will be whole pages. If that's usefully done for mmap then I was
thinking that perhaps it could be useful be done for other allocators,
and that maybe that was what you had in mind.

>
> Presumably the process can be repeated, so that any block will be full
> of freed holes, which will eventually join up, but will now be fragmented.

I would merge adjacent holes into one.

>
>
>> had in mind and I spend some time designing an allocator to support
>> splitting memory in that way down to the byte level. It was an
>> interesting exercise and I may use it one day but I thought you had
>> some use cases in mind.
>
> The only use case I can think of is within the allocator, splitting up
> an available free block into smaller ones, but noting alignment needs
> and minimum sizes (like 8 bytes).
>
> Is this supposed to be some generalisation of Realloc? Currently such a
> routine working on A->D would turn it into this when reducing:
>
> A->B still allocated
> B->D now available

No, it's nothing to do with realloc.

If A-B-C-D was allocated (as a single range A-D) and then B-C was freed
the result would be what you pointed out earlier

A-B still allocated
B-C freed
C-D still allocated

>
> or into this when expanding:
>
> A->E now allocated including D->E
>
> Here E>D, but the whole block may need relocating in either case.

I'm not sure what that means.

I decided to make each split (the start/end of each piece of allocated
or unallocated memory) a byte address so that user code could call
allocate and free with arbitrary values and place a split at literally
any address. In the ABCD example, the actual addresses could be, say,

A = 2 (3 bytes allocated at addresses 2, 3 and 4)
B = 5 (1 byte unallocated)
C = 6 (2 bytes allocated)
D = 8 (the potential start of a new allocation)

For each range a bit would be needed to indicate free/allocated. Because
allocations would have to be byte aligned I chose the top bit, and to
separately manage memory where the top address bit was 0 and the other
where the top bit was 1.

Then with all allocation markers being a single machine word I would
store them as 'keys without values' in an in-memory B tree or B+ tree.
(There would be one B tree or B+ tree for each half of the address space.)

As I say, I thought that you and/or David had a use case for arbitrary
freeing, but it was an interesting thought experiment nonetheless.

--
James Harris

David Brown

unread,

Dec 4, 2022, 11:30:12 AM12/4/22

to

On 03/12/2022 20:34, James Harris wrote:
> On 03/12/2022 13:33, Bart wrote:

>> I eventually discovered it had created some 3.5 million temporary
>> files, occupying some 60GB IIRC. Deleting those files from the machine
>> took /14 hours/.

Somebody is using an OS with a crappy filesystem - combined with a
crappy treatment of temporary files.

I'm not sure, however, that the existence of bugs or poor design in
/massive/ programs like Firefox mean it's okay to be happy with bugs in
your own code.

>
> Great example. I was heartened recently to hear people making
> performance a key design goal. Performance can be just as much a part of
> engineering a solution as correctness.

If your code is not correct, it doesn't matter how quickly you get the
wrong answer.

Performance might be /part/ of correctness - the specification for the
program or function might say "do this calculation within time X" or
include limits for memory usage. In that case, a function that takes
too long or too much memory is not a correct solution. But other than
such cases, correctness /always/ trumps performance.

(Of course, lots of code is so badly specified that it's hard to say
what a "correct" implementation should actually do.)

And remember, 90% of time is spend in 10% of the code, meaning that
performance is of little relevance for almost all programming (but it is
very important in a small proportion of code). Correctness, however, is
important in 100% of code.

It is easier to make correct code fast than to make fast code correct.
It is easier to make clear code correct than to make correct code clear.
Write you code /clearly/ with an aim for /correctness/ - don't focus
on performance.

Bart

unread,

Dec 4, 2022, 12:46:50 PM12/4/22

to

On 04/12/2022 16:30, David Brown wrote:
> On 03/12/2022 20:34, James Harris wrote:
>> On 03/12/2022 13:33, Bart wrote:
>
>>> I eventually discovered it had created some 3.5 million temporary
>>> files, occupying some 60GB IIRC. Deleting those files from the
>>> machine took /14 hours/.
>
> Somebody is using an OS with a crappy filesystem - combined with a
> crappy treatment of temporary files.

That's what I thought. Until I worked out that deleting 3.5M files in 14
hours was still at a rate of 70 file deletions per second. That is not
that terrible, given that each file is within a single folder of
millions of files (I don't know the complexities of NTFS or the
mechanics involved).

No doubt a Linux system would have handled that in - I don't know, how
long /would/ it have taken on a spinning hard drive?

James Harris

unread,

Dec 4, 2022, 1:21:06 PM12/4/22

to

On 04/12/2022 16:30, David Brown wrote:

> On 03/12/2022 20:34, James Harris wrote:

...

>> Performance can be just as much a part
>> of engineering a solution as correctness.

... (snipped comments which basically expanded on the above)

> It is easier to make correct code fast than to make fast code correct.
> It is easier to make clear code correct than to make correct code clear.
> Write you code /clearly/ with an aim for /correctness/ - don't focus
> on performance.

I understand where you are coming from but I have to say I think you are
wrong in keeping downplaying performance. As I said, it /can be/ just as
important as correctness in engineering a solution. And it might not be.
There are different scenarios:

If you are writing an application and you know that performance of it is
not critical (and it contains no time-critical parts) then by all means
focus laser-like on correctness.

By contrast, if you are writing an application which has time-critical
parts then you have to ensure time compliance. This case, like the first
one, is simple.

But there are areas of middle ground in which performance requirements
are not so straightforward. Say you are writing an OS, specifying a
language, or designing a library. You don't know in advance what will be
the timing requirements of the apps that will use them. Nor do you know
what crazy ways apps may use your OS, language or library. You have to
anticipate users going off piste and stressing the design with larger
elements and more of them than you would have thought possible.

The case in point is two libraries: a memory allocator and a string
library which uses the memory allocator. What I am saying to you is that
such libraries need to be correct but they also need to scale to large
strings, many strings, many string operations, many other non-string
allocation activity, etc, because it's impossible to say just now how
they will be used. It's no good to just "focus on correctness".

--
James Harris

James Harris

unread,

Dec 4, 2022, 4:20:14 PM12/4/22

to

I didn't try the full 3.5 million but took these timings on a Linux ext4:

creation of 350,000 empty files: 527/s
deletion of 350,000 empty files: 178/s
creation of 35,000 empty files: 172/s
deletion of 35,000 empty files: 179/s

I suspect that the initially greater speed may have been something to do
with the directory starting off as empty which was not the case later.
Even after deleting all files in the folder the directory still occupies
12Mby.

For ref, commands used for 350,000:

time for (( i = 0; i < 350000; i++ )) ; do touch file$i.trial; done
time for (( i = 0; i < 350000; i++ )) ; do rm file$i.trial; done

--
James Harris

David Brown

unread,

Dec 4, 2022, 5:23:09 PM12/4/22

to

I am an engineer, not an academic. (I just happen to have a more
academic background, and more academic interest, than most software
engineers.) But like any good engineer, I want the basis of my work to
be solidly founded on good theory. An mechanical engineer doesn't have
to understand why springs work linearly or how moments of inertia
formula are derived - but they need to know that the people who worked
out these things, did so correctly. A typical software engineer does
not need to know how to design programming languages, or how to make a
mutex or thread-safe queue. But they /do/ have to rely on the people
who made these lower-level parts and that /they/ understand the theory
and can be sure it is correct.

If people cannot rely on your language and libraries to be correct -
/always/ correct, in theory and in practice - then the whole exercise is
just a fun game to play on your own.

Some people complain about languages like C and C++, saying it should be
easy to add some new feature, or they don't see why it has particular
quirks. The answer often comes down to the immense amount of work that
goes into the theory - the analysis of the implications of changes, and
how it might affect other aspects of the language or existing code no
matter how hypothetical or unusual.

If you haven't understood the simplistic theories, you can't begin to
work on the more advanced theories - and certainly not on anything
practical in a language. Remember, /you/ are designing the fundaments
that others will build on. If those are not solid and correct, things
will break.

>>
>> Look, at the lowest level, buffers are /never/ enlarged.
>
> Rubbish! Just because /you/ never want such a thing does not mean that
> others do not want it, nor does your dislike for it or your lack of
> experience of it mean that it's a bad idea.
>

They are /not/ enlarged.

You allocate some memory for a buffer. You allocate more memory later
for something else - it gets put after the first buffer. How are you
going to enlarge the first buffer? Stomp all over the second allocation?

What happens is that /new/ memory is allocated. At a higher level, this
can look like a buffer enlargement when someone calls "realloc" with a
larger value. But not at the low level, the implementation - the buffer
is not enlarged.

>> You've allocated a bit of memory, and you use that. If that's not
>> enough, you have to allocate a /new/ bit of memory. All this piddle
>> about "real" allocation sizes is just peanuts - a byte here, a byte
>> there, almost irrelevant in the average case and completely irrelevant
>> in the worst case.
>
> That may be what /you/ would always do but your view shouldn't restrict
> others. In fact, you talk about saving only 'a byte here, a byte there'.
> That can make a difference to some applications.
>

As I have said before, if it makes a difference, you are doing things
wrong. You are using the wrong allocator, or the wrong implementation,
or the wrong algorithm, or putting the size tracking in the wrong place,
or using the wrong type of processor.

> For example, say a string library implements two formats: a fast format
> for short strings and a general format for others. (I know of at least
> two libraries which do.) Having a single extra byte in the short format
> allows more strings to use the fast form, for a net gain.
>
> So even a single byte can make a difference. But because of alignment in
> what we have been talking about a programmer could gain 3, 7 or 15 bytes
> in every allocation which he would not otherwise have had! These extra
> bytes would be completely free and would otherwise have been wasted.
> There is no good reason to prevent a client from using them.
>

It is quite common to have string formats that support arbitrarily long
strings with dynamically allocated memory (either as a contiguous
buffer, or as a chain), and have a "short string optimisation" for
holding data locally. You do not allocate data for the short string,
and you do not increase its buffer size, or faff around trying to get a
byte or two extra. You make a fixed format with a fixed size
short-string buffer - if you need more, you are now a "long" string.
The size of the short-string buffer is fixed in the type (possibly
always the same, possibly as a template parameter).

If the string type is part of the language's basic standard library (and
I'd recommend that), then the buffer size is going to be picked to match
the allocation size from the standard library allocator used. It would
be downright idiotic to have an allocator that, say, rounds sizes up to
32-byte blocks and then have a string structure that was 20 bytes long,
supporting short strings of up to 18 bytes. It would be even dafter to
have a string structure that is of varying size and have users
re-allocate it to grow it to use the extra size that the library (not
the user) knows is always there.

>>
>> Your choices are whether to allocate your new bit of memory for the
>> new data only and chain it together with the first lump, or to
>> allocate something big enough for all existing data as well as the new
>> bit, copy over the old data and thus have a single contiguous buffer.
>> That second solution gives faster and more convenient access to the
>> data, but costs more for the allocation and copying. Trade-offs, as
>> always.
>
> Whether to chain or to reallocate is an interesting consideration but it
> is orthogonal to the basic allocation. If chaining, finding out how much
> space is available can avoid forming a new link. If reallocating,
> finding out how much space is available can avoid carrying out a costly
> reallocation.
>
> On the chain/reallocate decision ... in simple terms, if I knew the
> length of the string I was creating I would use a single allocation
> whereas if I did not then I would use chaining.
>

/You/ are making the string type and library. You will never know what
the user wants (unless you are making a one-man language where you
change the language and the compiler as often as you write programs in
the language). You will /never/ make decisions that are ideal for
everyone. You pick a balance between making general choices that are
good enough for a wide range of uses, or making the support to let users
design these types for themselves.

Yes.

> What I've proposed in this thread is a simple and small change to the
> allocator which would allow the string library to make best use of the
> spaces which the allocator returns to the string library.
>

No.

The simple method is that the string library knows the size of chunks
provided by the allocator, and always uses that information.

> I am not sure why you keep objecting to that but I am pretty sure you
> and I aren't going to agree on this stuff. Still, no problem. Agreement
> is not mandatory. :-)
>

It would be boring indeed if we all agreed! /Disagreement/ is mandatory
for an interesting thread :-)

But we do agree on many things.

David Brown

unread,

Dec 4, 2022, 5:44:42 PM12/4/22

to

The key to doing that kind of operation efficiently is to buffer them.
Windows is too enthusiastic about committing everything to disk at every
point. So for each file, it has to find it in the directory (which is a
big effort at the best of times - but made /vastly/ worse because in
Windows and NTFS filenames are stored case-preserving but must be
searched case-insensitive in UTC2 encoding). Then the deletion change
is logged to the NTFS metadata log and committed to disk, the change
itself is made and committed to disk, then the completion record is
placed in the log. And then it goes on to the next file.

Taking that into account, 70 deletions on spinning rust in an overloaded
directory is not a bad speed!

In *nix systems, directories are essentially files in a particular
format. Deletions are primarily changes to those files. You also need
to remove the inode from the inode table (assuming - as is almost
certainly the case here - that there are no other directory entries
pointing to the same file), and again the inode tables are treated much
like files. You still need to find the files in the directories, and
sifting through 3.5 million files takes time on most filesystems (some
have directory formats that are less bad for huge directories, but none
are great for it). But the changes to the directories and the inode
tables, with corresponding metadata log commits, is buffered up and
effectively done in batches.

Still, I'd not be holding my breath for clearing up such a mess - even
on a Linux system with a PCIe SSD it would take uncomfortably long.

David Brown

unread,

Dec 4, 2022, 5:59:20 PM12/4/22

to

On 04/12/2022 19:21, James Harris wrote:
> On 04/12/2022 16:30, David Brown wrote:
>> On 03/12/2022 20:34, James Harris wrote:
>
> ...
>
>>> Performance can be just as much a part of engineering a solution as
>>> correctness.
>
> ... (snipped comments which basically expanded on the above)
>
>
>> It is easier to make correct code fast than to make fast code correct.
>> It is easier to make clear code correct than to make correct code
>> clear. Write you code /clearly/ with an aim for /correctness/ -
>> don't focus on performance.
>
> I understand where you are coming from but I have to say I think you are
> wrong in keeping downplaying performance. As I said, it /can be/ just as
> important as correctness in engineering a solution. And it might not be.
> There are different scenarios:
>
> If you are writing an application and you know that performance of it is
> not critical (and it contains no time-critical parts) then by all means
> focus laser-like on correctness.
>

This means correctness first, performance a bonus.

> By contrast, if you are writing an application which has time-critical
> parts then you have to ensure time compliance. This case, like the first
> one, is simple.

Here, you are making particular timing aspects a requirement - the
program is only correctly implementing the required specification if it
works within the time limits. Any other performance aspects (such as
memory efficiency, or performance beyond the required time constraints)
are secondary. Again - correctness first, performance a bonus.

>
> But there are areas of middle ground in which performance requirements
> are not so straightforward. Say you are writing an OS, specifying a
> language, or designing a library. You don't know in advance what will be
> the timing requirements of the apps that will use them. Nor do you know
> what crazy ways apps may use your OS, language or library. You have to
> anticipate users going off piste and stressing the design with larger
> elements and more of them than you would have thought possible.
>

If you write an OS and you don't focus on correctness, it is /useless/.
No one wants an OS where a file write will be fast but might make
mistakes! You certainly put in lots of effort to making the OS
efficient, but /never/ at the cost of making it do the /wrong/ thing.

> The case in point is two libraries: a memory allocator and a string
> library which uses the memory allocator. What I am saying to you is that
> such libraries need to be correct but they also need to scale to large
> strings, many strings, many string operations, many other non-string
> allocation activity, etc, because it's impossible to say just now how
> they will be used. It's no good to just "focus on correctness".
>

I did not say that you should ignore performance or efficiency. I said
/correctness/ trumps performance every time. /Every/ time. No exceptions.

If particular specific performance expectations are part of the
requirements, then they are part of /correctness/.

If you are just looking for "as fast as practically possible given the
budget constraints for the development process", then that's fine too -
but it /always/ comes secondary to correctness.

You can also sometimes define your requirements in a way that allows
some kinds of "imperfections" - such as defining a lossy compression
format for audio or video files. But that's just changing the
specifications - "this function should be able to show 60 frames per
second with no more than 10 dB increase in noise difference from the
original and using no more than 10 MB memory". A /correct/ function
implements those requirements even if it takes 100% cpu time.
Additional effort into performance to reduce that cpu time, or to
improve the quality to better than 10 dB, or to reduce the memory
overhead, is useful. But you do not get to use 12 MB memory for a 10
times speed increase - that function would be incorrect, and not satisfy
the requirements. (You could go back to the customer and recommend a
change to the requirements, of course.)

Bart

unread,

Dec 4, 2022, 6:02:21 PM12/4/22

to

I think I know where you're coming from: because my allocator doesn't
maintain any information about the allocation size, you assume its Free
can be used on any block of arbitrary memory, include a chunk of a
previously allocated block.

Well, this depends. It wouldn't work on mine; there are two parts to it:

(I) For blocks 2KB and below, a requested size N is rounded if needed to
the next power of 2 (minimum 16, maximum 2048).

The same rounding is performed when calling Free(P, N). So if N was 240:

* N is rounded to 256 and 256 bytes are allocated (either from a
freelist of 256-byte blocks, or from the pool)

* If you later try and free the top 35 bytes of the 240, the 35 is
rounded to 64, and that block is put into a free list of 64-byte blocks.
When later allocated to another call, it will use 13 bytes
((240-35)-256) that belong to a different block as it extends past the
original 256 bytes.

* Further, the base address of 35-byte block will be 205 bytes into the
original block, which is misaligned by 5 from a multiple of 8, and 13
bytes from a multiple of 16 which is what my allocator assumes.

* When you later free the remaining 205 bytes, that still rounds to 256,
but now includes 51 bytes (256-51), that belong to that freed 64-byte
chunk (35->64), which itself encroaches on the next.

(II) Allocations over 2KB are also rounded to a power of 2 (up to 33MB
where the rules change). So if N is 6000, this becomes 8192, which is
requested directly from malloc.

This now runs into trouble if you try and free the last 1000 of the
6000, even though that 2KB block lies wholly within the 8192, because
you are messing with the workings of malloc which /does/ record the sizes.

Freeing a hole in the middle is not going to be much better in either
case, because of this behind-scenes size rounding, assumptions about
alignment, and the machinations of malloc.

David Brown

unread,

Dec 4, 2022, 6:12:32 PM12/4/22

to

You mean the programmer does a single "malloc" for (D - A) bytes,
getting pointer A as a result, and then later wants to call "free" to
release bytes from B to C while keeping A to B and C to D? No, I don't
think that should be considered for a general allocator. You only get
to free the memory blocks you received, fully and completely. I suppose
it is possible to think of a more specialised allocator that would allow
freeing parts of a block like this, but it would be unusual.

>
> I thought that was what you and Bart (who IIRC also mentioned something
> similar) had in mind and I spend some time designing an allocator to
> support splitting memory in that way down to the byte level. It was an
> interesting exercise and I may use it one day but I thought you had some
> use cases in mind.
>
>> Who cares? Launch the nasal daemons.
>
> Never, never, and never!
>

Garbage in, garbage out.

You cannot give the program sensible answers to stupid questions. And
no matter how much you try to check the program's requests for sanity
and valid values, there will be a programmer who will outwit you and
find new ways to be wrong that are beyond your imagination. If you have
a low-level language that allows things like pointers and conversions
between points and integers, this will happen /very/ easily, and there
is nothing you can do to stop it.

So as the language designer, and as the tools implementer, you get to
say "These are the rules. Follow them, and your programs can do
wonderful things. Break them, and I take no responsibility for the
chaos that may follow". This is how it works for /every/ programming
language. Some languages try to make ever more complicated schemes to
detect rule-breaking and follow up with some kind of "controlled" error
handling, but really that's just more rules. And the controlled error
handling never really works in practice, because the programmer was not
expecting an error - if he/she were, they would have fixed the bug in
their code.

(Of course you do your best to help programmers find these kinds of
problems and fix them, at least in debug modes, and you don't go out of
your way simply to make the results of undefined behaviour worse than
necessary.)

The concept of undefined behaviour - of "garbage in, garbage out" - is
as old as computing:

"""
On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into
the machine wrong figures, will the right answers come out?' I am not
able rightly to apprehend the kind of confusion of ideas that could
provoke such a question.

Charles Babbage
"""

Bart

unread,

Dec 4, 2022, 6:41:34 PM12/4/22

to

On 04/12/2022 22:23, David Brown wrote:
> On 04/12/2022 14:54, James Harris wrote:

>> So even a single byte can make a difference. But because of alignment
>> in what we have been talking about a programmer could gain 3, 7 or 15
>> bytes in every allocation which he would not otherwise have had! These
>> extra bytes would be completely free and would otherwise have been
>> wasted. There is no good reason to prevent a client from using them.
>>
>
> It is quite common to have string formats that support arbitrarily long
> strings with dynamically allocated memory (either as a contiguous
> buffer, or as a chain), and have a "short string optimisation" for
> holding data locally. You do not allocate data for the short string,
> and you do not increase its buffer size, or faff around trying to get a
> byte or two extra.

Why not? I've done that sometimes, for example:

I have a counted-string data type, where string data is allocated from
the heap.

This string is not zero-terminated, so when I want to pass it to a API
expecting a C-style zero-terminated string, it would mean creating a new
heap copy with an extra byte for that zero.

However, when an allocator rounds up, even if only to a multiple of 8,
the chances are there will be a spare byte already there: I can just set
to to zero (or the string library can do that when it created this string).

You might say, why not just allocate an extra byte anyway, but:

* Sometimes, that can mean allocating more memory (up to double with my
allocators) just for that byte, because there is no reserve.

* You don't want to start making assumptions that there will always be
that terminator. It won't exist when making a slice or view into the
middle of another string for example.

It's better to keep this zero-terminator business strictly within FFI
reqirements only and not build it in to your data type.

Dmitry A. Kazakov

unread,

Dec 5, 2022, 3:05:18 AM12/5/22

to

On 2022-12-04 23:44, David Brown wrote:

> The key to doing that kind of operation efficiently is to buffer them.
> Windows is too enthusiastic about committing everything to disk at every
> point. So for each file, it has to find it in the directory (which is a
> big effort at the best of times - but made /vastly/ worse because in
> Windows and NTFS filenames are stored case-preserving but must be
> searched case-insensitive in UTC2 encoding). Then the deletion change
> is logged to the NTFS metadata log and committed to disk, the change
> itself is made and committed to disk, then the completion record is
> placed in the log. And then it goes on to the next file.

Moreover, Windows does lots of garbage calls on the directory file,
opening, closing, opening, closing (looking for desktop.ini in each
directory! (:-)) etc. I once implemented a virtual file system for
Windows and was amazed by the sheer amount of nonsense that is going on
there.

> Still, I'd not be holding my breath for clearing up such a mess - even
> on a Linux system with a PCIe SSD it would take uncomfortably long.

On bulk operations SSD's have performance problems of their own.

Dmitry A. Kazakov

unread,

Dec 5, 2022, 3:15:25 AM12/5/22

to

On 2022-12-05 00:12, David Brown wrote:

> The concept of undefined behaviour - of "garbage in, garbage out" - is
> as old as computing:

> """
> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into
> the machine wrong figures, will the right answers come out?' I am not
> able rightly to apprehend the kind of confusion of ideas that could
> provoke such a question.
>
> Charles Babbage
> """

That is a cool quote!

Another maxim:

relax preconditions, strengthen postconditions

David Brown

unread,

Dec 5, 2022, 5:40:32 AM12/5/22

to

On 05/12/2022 09:15, Dmitry A. Kazakov wrote:
> On 2022-12-05 00:12, David Brown wrote:
>
>> The concept of undefined behaviour - of "garbage in, garbage out" - is
>> as old as computing:
>
>> """
>> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put
>> into the machine wrong figures, will the right answers come out?' I am
>> not able rightly to apprehend the kind of confusion of ideas that
>> could provoke such a question.
>>
>> Charles Babbage
>> """
>
> That is a cool quote!
>

Apparently it was members of parliament that asked him this!

It never ceases to amaze me how many people don't understand it,
however. People try to write libraries or functions that will somehow
"work" even when given bad data, or cling to the believe that it is
possible and desirable for a language to define the behaviour of
everything. The result, inevitably, is extra inefficiencies for correct
code, error paths that are completely untestable, and at best you simply
push the bugs around to somewhere else in the code.

When someone hits their thumb with a hammer, it is not the fault of the
hammer, or the fault of the nail, and nor should the hammer designer be
responsible for making sure the nail goes in even if you hit your thumb.
If you are not confident in using the hammer safely and effectively,
then it is /your/ responsibility, as the hammer user, to learn how to do
it correctly, or to find alternative solutions to the problem if
hammering is too risky.

There's a thread on comp.lang.c at the moment from some twit who things
allocation functions in C should have type "long" for their size
argument rather than "size_t". His justification is that the allocation
function can then check for negative values of the size and return an
error. This is, of course, completely silly - it does not ensure the
value is "safe" in any way, it wastes effort for correct code, and any
programmer who writes code that tries to allocate negative space is
highly unlikely to write suitable error handling.

It's fine to have extra checks or controls for debugging and testing
purposes. But you have to assume programmers will do their job.

> Another maxim:
>
> relax preconditions, strengthen postconditions
>

That's not a great maxim IMHO. When you have good specifications for a
function (or other piece of code), you /may/ relax the preconditions and
you /may/ strengthen the postconditions, but that won't give you a
better solution. Your task is to ensure that the code you write does
not strengthen the preconditions and does not weaken the postconditions.

(To be fair, code is rarely well enough specified!)

Dmitry A. Kazakov

unread,

Dec 5, 2022, 6:06:29 AM12/5/22

to

On 2022-12-05 11:40, David Brown wrote:

> Your task is to ensure that the code you write does
> not strengthen the preconditions and does not weaken the postconditions.

That is the task of somebody who implements the specifications. An
implementation that strengthens the precondition or weakens the
postcondition is simply incorrect.

The task of a specification designer is to mandate weakest precondition
possible. Any precondition is a heavy burden on the code user and a
source of bugs.

> (To be fair, code is rarely well enough specified!)

People frequently confuse preconditions with erroneous inputs. The
precondition or real sqrt is plain True (assuming types are enforced).
Negative argument is as valid as positive:

Pre: True
Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)

Now a bad design would be:

Pre: x >= 0
Post: sqrt(x)**2 = x

Here you have undefined behavior for x < 0.

Bart

unread,

Dec 5, 2022, 6:17:26 AM12/5/22

to

Not at all. For x<0, my sqrt() gives an invalid number (displayed as
-1.#IND00, depending on the stringify library).

Any attempt to use it for further calculation will propagate that error
value. User programs can test for such values if they wish.

Dmitry A. Kazakov

unread,

Dec 5, 2022, 6:35:46 AM12/5/22

to

On 2022-12-05 12:17, Bart wrote:
> On 05/12/2022 11:06, Dmitry A. Kazakov wrote:
>> On 2022-12-05 11:40, David Brown wrote:
>>
>>> Your task is to ensure that the code you write does not strengthen
>>> the preconditions and does not weaken the postconditions.
>>
>> That is the task of somebody who implements the specifications. An
>> implementation that strengthens the precondition or weakens the
>> postcondition is simply incorrect.
>>
>> The task of a specification designer is to mandate weakest
>> precondition possible. Any precondition is a heavy burden on the code
>> user and a source of bugs.
>>
>>> (To be fair, code is rarely well enough specified!)
>>
>> People frequently confuse preconditions with erroneous inputs. The
>> precondition or real sqrt is plain True (assuming types are enforced).
>> Negative argument is as valid as positive:
>>
>> Pre: True
>> Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)
>>
>> Now a bad design would be:
>>
>> Pre: x >= 0
>> Post: sqrt(x)**2 = x
>>
>> Here you have undefined behavior for x < 0.
>
> Not at all. For x<0, my sqrt() gives an invalid number (displayed as
> -1.#IND00, depending on the stringify library).

You did not understand the point.

Your sqrt is not real-value sqrt at all, because the result is not a
real number. However, if we considered rather sqrt defined on some
subset of R U NaN, here NaN is a not-a-number ideal. Then the point can
be easily made. Good design:

Pre: True
Post: (x /= NaN and x >= 0 and sqrt(x)**2 = x) or
(x /= NaN and x < 0 and sqrt(x) = NaN) or
(x = NaN and sqrt(x) = NaN)

Bad design:

Pre: x /= NaN and x >= 0
Post: sqrt(x)**2 = x

(Of course IEEE 754 ideals was a very stupid idea from SW POV, but that
is unrelated to the issue of undefined behavior.)

David Brown

unread,

Dec 5, 2022, 9:38:48 AM12/5/22

to

On 05/12/2022 12:06, Dmitry A. Kazakov wrote:
> On 2022-12-05 11:40, David Brown wrote:
>
>> Your task is to ensure that the code you write does not strengthen the
>> preconditions and does not weaken the postconditions.
>
> That is the task of somebody who implements the specifications. An
> implementation that strengthens the precondition or weakens the
> postcondition is simply incorrect.
>

Yes.

> The task of a specification designer is to mandate weakest precondition
> possible. Any precondition is a heavy burden on the code user and a
> source of bugs.
>

No.

A good strong precondition makes the use of the function clear, the
implementation simpler and more efficient, and the whole thing easier to
test. See below in your example of square root.

A weak precondition makes the function more flexible to call, but harder
to implement. The opposite is true for a strong precondition. A
designer striving to make the precondition as weak as possible, or as
strong as possible, is failing in his/her job - the precondition should
be an appropriate choice for the task in hand.

(The weakest precondition is "nothing", the strongest postcondition is
"everything". This is known in computer science as the function "magic"
- it takes any inputs whatsoever, and always gives the user exactly what
they wanted. It may be ideal from a client's viewpoint, but it cannot
be implemented - and is thus useless as a specification.)

>> (To be fair, code is rarely well enough specified!)
>
> People frequently confuse preconditions with erroneous inputs.

If the inputs satisfy the preconditions, they are not erroneous.

(What satisfies the preconditions may vary at run-time - for example, a
"draw rectangle" function may have a precondition that the coordinates
must be within the current window frame. That kind of thing is usually
a bad idea for specifications, but people don't always write good
specifications!)

> The
> precondition or real sqrt is plain True (assuming types are enforced).
> Negative argument is as valid as positive:
>
> Pre: True
> Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)
>
> Now a bad design would be:
>
> Pre: x >= 0
> Post: sqrt(x)**2 = x
>
> Here you have undefined behavior for x < 0.
>

The second version is /far/ better than the first.

The first version has a confusing specification. It leaves the user
wondering what it really does when reading the precondition - will this
function give me the square root of negative numbers? Is it giving
complex results, or saturating, or expecting me to multiply the result
by "i" myself?

Then the user reads the postcondition and sees that for negative x it
raises an exception. Now my code needs to handle the possibility of an
exception - that's not what I want! I know I never give the sqrt
function a negative number, so the exception handling path will never be
taken and cannot be tested without writing extra test code to simulate
an impossible situation. That's more extra code, more testing, more
risk of errors, more inefficiencies, for exactly /zero/ benefit.

Or I could leave the exception unhandled, relying on code higher up the
food chain to do it - possibly written by a different programmer who
knows nothing about square roots or their errors. That means code that
calls the code that calls sqrt could get unexpected exceptions, usually
not clearly specified or declared in the function type - basically, it's
a hidden "goto" to an almost random part of the program with unexpected,
untested and unplanned consequences.

Exceptions are a valid mechanism for handling unusual and rare run-time
circumstances, such as a database transaction failure, loss of contact
in a network connection, and so on. They are /not/ a crutch for poor
programming and coding errors!

And of course the implementer of the first version can't make a nice
efficient solution that boils down to a single "fsqrtd" assembly
instruction (or whatever your cpu calls it), but has add extra checks
that will never trigger in normal usage, and some kind of exception
throwing mechanism.

The second version of the specification is clear and simple. The user
knows they need to call the function with non-negative values, and they
are quite happy with that - they wouldn't be doing anything else anyway.
The implementer knows the function is only ever called with
non-negative values and can efficiently calculate the square root. The
responsibility for using appropriate inputs is where it should be - with
the caller. The responsibility for handling exceptions is no longer an
issue for anyone.

The first version requires so much extra work for both the user and the
implementer that it is higher risk - despite apparently being "safer"
because it has no undefined behaviour in the sqrt function itself. The
second version has no undefined behaviour in practice because the user
does not call the function with inappropriate inputs. The first version
/does/ have undefined behaviour in practice, because exceptions like
these are often not handled appropriately or fully tested.

Dmitry A. Kazakov

unread,

Dec 5, 2022, 10:14:40 AM12/5/22

to

On 2022-12-05 15:38, David Brown wrote:
> On 05/12/2022 12:06, Dmitry A. Kazakov wrote:
>> On 2022-12-05 11:40, David Brown wrote:

>> The task of a specification designer is to mandate weakest
>> precondition possible. Any precondition is a heavy burden on the code
>> user and a source of bugs.
>>
>
> No.
>
> A good strong precondition makes the use of the function clear, the
> implementation simpler and more efficient, and the whole thing easier to
> test.

Strong precondition is a booby trap.

> A weak precondition makes the function more flexible to call, but harder
> to implement.

Weak precondition guaranties defined behavior.

>> People frequently confuse preconditions with erroneous inputs.
>
> If the inputs satisfy the preconditions, they are not erroneous.

They are. Violated precondition is a bug. Erroneous input is just
anticipated event, e.g. a user typed 68 minutes.

> (What satisfies the preconditions may vary at run-time - for example, a
> "draw rectangle" function may have a precondition that the coordinates
> must be within the current window frame. That kind of thing is usually
> a bad idea for specifications, but people don't always write good
> specifications!)
>
>> The precondition or real sqrt is plain True (assuming types are
>> enforced). Negative argument is as valid as positive:
>>
>> Pre: True
>> Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)
>>
>> Now a bad design would be:
>>
>> Pre: x >= 0
>> Post: sqrt(x)**2 = x
>>
>> Here you have undefined behavior for x < 0.
>
> The second version is /far/ better than the first.
>
> The first version has a confusing specification. It leaves the user
> wondering what it really does when reading the precondition - will this
> function give me the square root of negative numbers? Is it giving
> complex results, or saturating, or expecting me to multiply the result
> by "i" myself?

It unambiguously states that sqrt has a real argument.

> Then the user reads the postcondition and sees that for negative x it
> raises an exception. Now my code needs to handle the possibility of an
> exception - that's not what I want!

If you do not want it, don't pass negative number as an argument.

> I know I never give the sqrt
> function a negative number,

And how do know that?

> so the exception handling path will never be
> taken and cannot be tested without writing extra test code to simulate
> an impossible situation. That's more extra code, more testing, more
> risk of errors, more inefficiencies, for exactly /zero/ benefit.

Pre- and postconditions are for correctness proofs, not for testing. It
is stupid to test program proven correct.

> Or I could leave the exception unhandled,

You cannot do that. Again, it is about proofs. If your subprogram has no
exception in its postcondition you could not prove it correct unless x
is non-negative.

If you do not prove correctness, then exception propagation is just
defined behavior as opposed to undefined one.

> Exceptions are a valid mechanism for handling unusual and rare run-time
> circumstances, such as a database transaction failure, loss of contact
> in a network connection, and so on. They are /not/ a crutch for poor
> programming and coding errors!

Sure undefined behavior is much better than that! (:-))

> The second version of the specification is clear and simple.

It leaves behavior undefined without any good reason. Strong
precondition if not an utter necessity is a grave design fault.

> The implementer knows the function is only ever called with
> non-negative values and can efficiently calculate the square root.

He does not know that without correctness proofs, which are rarely
possible and rarely used.

> The
> responsibility for using appropriate inputs is where it should be - with
> the caller.

Exactly, shoveling your problems to the caller, causing distributed
overhead and making sure of having untraceable bugs in the production
code under to motto:

"no qualified programmer would ever do X"

> The responsibility for handling exceptions is no longer an
> issue for anyone.

Replaced by obligatory reading core dumps...

David Brown

unread,

Dec 5, 2022, 11:56:08 AM12/5/22

to

On 05/12/2022 16:14, Dmitry A. Kazakov wrote:
> On 2022-12-05 15:38, David Brown wrote:
>> On 05/12/2022 12:06, Dmitry A. Kazakov wrote:
>>> On 2022-12-05 11:40, David Brown wrote:
>
>>> The task of a specification designer is to mandate weakest
>>> precondition possible. Any precondition is a heavy burden on the code
>>> user and a source of bugs.
>>>
>>
>> No.
>>
>> A good strong precondition makes the use of the function clear, the
>> implementation simpler and more efficient, and the whole thing easier
>> to test.
>
> Strong precondition is a booby trap.

Only if it is not clear and documented.

>
>> A weak precondition makes the function more flexible to call, but
>> harder to implement.
>
> Weak precondition guaranties defined behavior.

Which is pointless if the defined behaviour is useless or wrong.

Remember the old joke about politicians in a crisis? "We must do
something! This is something - therefore we must do it!".

Put your effort into defining useful handling of real requirements - not
desperate attempts to assign meanings to meaningless situations.

>
>>> People frequently confuse preconditions with erroneous inputs.
>>
>> If the inputs satisfy the preconditions, they are not erroneous.
>
> They are. Violated precondition is a bug. Erroneous input is just
> anticipated event, e.g. a user typed 68 minutes.
>

That all depends on your definitions of "error" and "bug". Input from
outside - such as user input - must be sanitized and checked, with
appropriate feedback to the user or other data source when there is
invalid data. If bad data is passed around inside the code, that is a bug.

If you specify your square root function with a precondition of "true",
then no input is erroneous. Some values might be senseless in
mathematical terms, and be the result of bugs in the code, but you have
specified your function to be happy with senseless input - it is
therefore not an error.

>> (What satisfies the preconditions may vary at run-time - for example,
>> a "draw rectangle" function may have a precondition that the
>> coordinates must be within the current window frame. That kind of
>> thing is usually a bad idea for specifications, but people don't
>> always write good specifications!)
>>
>>> The precondition or real sqrt is plain True (assuming types are
>>> enforced). Negative argument is as valid as positive:
>>>
>>> Pre: True
>>> Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)
>>>
>>> Now a bad design would be:
>>>
>>> Pre: x >= 0
>>> Post: sqrt(x)**2 = x
>>>
>>> Here you have undefined behavior for x < 0.
>>
>> The second version is /far/ better than the first.
>>
>> The first version has a confusing specification. It leaves the user
>> wondering what it really does when reading the precondition - will
>> this function give me the square root of negative numbers? Is it
>> giving complex results, or saturating, or expecting me to multiply the
>> result by "i" myself?
>
> It unambiguously states that sqrt has a real argument.
>

But not that the result of the function is real - that is not specified.

>> Then the user reads the postcondition and sees that for negative x it
>> raises an exception. Now my code needs to handle the possibility of
>> an exception - that's not what I want!
>
> If you do not want it, don't pass negative number as an argument.
>

I don't want the possibility to exist in my code. I don't want to use a
function that might throw an exception - that would just mean more
effort on my part to deal with the possibility and test that handler code.

>> I know I never give the sqrt function a negative number,
>
> And how do know that?
>

That's my side of the bargain.

Software development is all about contracts. A function has a
precondition and a postcondition - it promises that if the precondition
is fulfilled before the call, the postcondition will be fulfilled after
the call. It says /nothing/ about what will happen if the precondition
is not fulfilled. As a caller of the function, my job is to make sure
the precondition is fulfilled, and then I will rely on the postcondition
after the call. Just as the caller can rely on the output of sqrt(9) to
be 3, since that's the promise made by the function specification, the
function can rely on x >= 0, since that's the promise made by the caller.

If you can't rely on the calling code to do its job, what makes you
think you can rely on the called function to do its job? What makes you
think you can rely on anything here? The whole exercise becomes
pointless. The whole concept of a "precondition" is nonsensical if the
function implementation cannot rely on it being true!

>> so the exception handling path will never be taken and cannot be
>> tested without writing extra test code to simulate an impossible
>> situation. That's more extra code, more testing, more risk of errors,
>> more inefficiencies, for exactly /zero/ benefit.
>
> Pre- and postconditions are for correctness proofs, not for testing. It
> is stupid to test program proven correct.
>

Well, that's /very/ questionable - you still have to test that the
proofs have been made correctly. Correctness proofs and tests work
together, not as alternatives.

And preconditions and postconditions are primarily contracts for the
implementation of functions and calling code, with correctness proofs an
optional (but useful) extra.

However, I was not talking about testing the pre- or postconditions - I
was talking about testing the exception handling code I, as the caller,
have to write since the function may throw an exception.

>> Or I could leave the exception unhandled,
>
> You cannot do that. Again, it is about proofs. If your subprogram has no
> exception in its postcondition you could not prove it correct unless x
> is non-negative.
>
> If you do not prove correctness, then exception propagation is just
> defined behavior as opposed to undefined one.
>

This kind of misuse of exceptions to hide bugs in code is not about
proofs - it is about arse-covering. It has nothing to do with making
code safer or more reliable, or being able to quantify risks or
reliability. It's all about being able to stand up in court and say
that plane crash was not /my/ fault - I put checks in the code and threw
an exception. It doesn't matter if the exception was part of the cause
of the problem by forcing the function caller to add so much extra code
that it increased the risks. You don't care that instead of a clearly
bad value being shown on a display, the attempted negative square root
now lead to an unhandled exception that brought down the whole system.
It is not /your/ fault.

Oh wait, you claimed that the person writing the calling code couldn't
possibly have forgotten to handle the exception, because then they
wouldn't have been able to prove their code is correct. So you trust
them to have the skills, training, and resources to do formal
correctness proofs on their code - but you don't trust them to be able
to see if a value is negative? Do you not see anything wrong with that
"logic" ?

>> Exceptions are a valid mechanism for handling unusual and rare
>> run-time circumstances, such as a database transaction failure, loss
>> of contact in a network connection, and so on. They are /not/ a
>> crutch for poor programming and coding errors!
>
> Sure undefined behavior is much better than that! (:-))
>

The point about undefined behaviour is that it does not happen. You
define what /does/ happen. You say the function is defined for x >= 0.
The caller ensures that it is called with x >= 0. The undefined
behaviour does not happen - end of story.

It is /better/ to leave situations that must not be allowed to occur, as
undefined. It is clearer and simpler for everyone. One set of rules -
everyone has a job to do, and knows what it is.

With your specification, roles are mixed - should the caller check for
negative x? Is that the function's job? If the caller has checked
this, does it really have to handle an exception? Logically no, but
maybe the language requires it somewhere. It's all a bloody mess, and
much more likely to end in tears.

>> The second version of the specification is clear and simple.
>
> It leaves behavior undefined without any good reason. Strong
> precondition if not an utter necessity is a grave design fault.
>

/Please/ read a book about how to write specifications for functions,
and how to program using them. There's a good one called "Programming
from Specifications" that is freely available - though it is quite
theoretical and mathematical.

In short - you define behaviour for what you want to happen for valid
inputs. It doesn't matter what happens for invalid inputs - there is no
sensible value (in the real numbers) to assign to the square root of
negative numbers. Any attempt to specify that in a square root function
is bound to fail - so don't try. Complicating things by saying your
function called "sqrt" isn't really a square root function, but a sort
of mishmash that sometimes does square roots, and sometimes jumps wildly
about in your code is not helpful.

You are concerned about proving code correctness. With my
specifications, I know that after "y = sqrt(x)", y * y equals x. That's
nice, and very helpful in a correctness proof - as well as simply making
the code clear to the reader. With your specification, you don't know
that. It might be true - or it might not. Perhaps "y" doesn't exist
because it has been skipped by the hidden long jump. Again, a mess.

How about debugging? After all, real programmers make mistakes. With
undefined behaviour, debugging tools (such as "sanitizer" options in
modern compilers) can spot many kinds of undefined behaviour and give
the developer a nice report about the problem. It can be combined with
run-time precondition checkers for even more helpful information.

But your version doesn't have undefined behaviour - throwing an
exception is part of the specification. That means the debugger or
sanitizer can't help you here - the caller gave a negative number, but
the precondition checked out okay. The problem propagates until you
have lost all chance of finding where it originated.

>> The implementer knows the function is only ever called with
>> non-negative values and can efficiently calculate the square root.
>
> He does not know that without correctness proofs, which are rarely
> possible and rarely used.
>

He knows because he wrote correct code that calls the function
appropriately.

>> The responsibility for using appropriate inputs is where it should be
>> - with the caller.
>
> Exactly, shoveling your problems to the caller, causing distributed
> overhead and making sure of having untraceable bugs in the production
> code under to motto:
>
> "no qualified programmer would ever do X"
>

No, it is not passing on /your/ problems to anyone. It is letting the
caller take responsibility for /their/ code. That is their job - not
mine, as the function implementer. My job is to make sure the
postcondition is fulfilled, assuming the precondition is satisfied. The
caller programmer's job is to make sure the precondition is fulfilled,
and then they can rely on the postcondition.

Jumbling this up and making a precondition of "true" helps no one - it
just messes up the roles so that no one knows what is going on or who is
doing what.

Dmitry A. Kazakov

unread,

Dec 5, 2022, 3:39:17 PM12/5/22

to

On 2022-12-05 17:56, David Brown wrote:
> On 05/12/2022 16:14, Dmitry A. Kazakov wrote:
>> On 2022-12-05 15:38, David Brown wrote:
>>> On 05/12/2022 12:06, Dmitry A. Kazakov wrote:
>>>> On 2022-12-05 11:40, David Brown wrote:
>>
>>>> The task of a specification designer is to mandate weakest
>>>> precondition possible. Any precondition is a heavy burden on the
>>>> code user and a source of bugs.
>>>>
>>>
>>> No.
>>>
>>> A good strong precondition makes the use of the function clear, the
>>> implementation simpler and more efficient, and the whole thing easier
>>> to test.
>>
>> Strong precondition is a booby trap.
>
> Only if it is not clear and documented.

I see it in the boldest font available: "garbage ahead, proceed at your
own risk."

>>> A weak precondition makes the function more flexible to call, but
>>> harder to implement.
>>
>> Weak precondition guaranties defined behavior.
>
> Which is pointless if the defined behaviour is useless or wrong.

Any defined behavior is better than any undefined one.

>>>> People frequently confuse preconditions with erroneous inputs.
>>>
>>> If the inputs satisfy the preconditions, they are not erroneous.
>>
>> They are. Violated precondition is a bug. Erroneous input is just
>> anticipated event, e.g. a user typed 68 minutes.
>
> That all depends on your definitions of "error" and "bug".

Error is an undesired, exceptional state. Bug is an illegal state.

> If you specify your square root function with a precondition of "true",
> then no input is erroneous.

No. It is so that input is a caller's bug = contract violation. Negative
input is an erroneous, exceptional input. No difference to inputs
causing + to overflow. Care to write a strong precondition for +, *, /,
gamma etc?

> Some values might be senseless in
> mathematical terms, and be the result of bugs in the code, but you have
> specified your function to be happy with senseless input - it is
> therefore not an error.

Yes, it is not a bug, because a computational model is only a model.
Most modeled things are incomputable. Being inadequate model is not yet
a bug so long you can detect and handle inadequacy.

>>> (What satisfies the preconditions may vary at run-time - for example,
>>> a "draw rectangle" function may have a precondition that the
>>> coordinates must be within the current window frame. That kind of
>>> thing is usually a bad idea for specifications, but people don't
>>> always write good specifications!)
>>>
>>>> The precondition or real sqrt is plain True (assuming types are
>>>> enforced). Negative argument is as valid as positive:
>>>>
>>>> Pre: True
>>>> Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)
>>>>
>>>> Now a bad design would be:
>>>>
>>>> Pre: x >= 0
>>>> Post: sqrt(x)**2 = x
>>>>
>>>> Here you have undefined behavior for x < 0.
>>>
>>> The second version is /far/ better than the first.
>>>
>>> The first version has a confusing specification. It leaves the user
>>> wondering what it really does when reading the precondition - will
>>> this function give me the square root of negative numbers? Is it
>>> giving complex results, or saturating, or expecting me to multiply
>>> the result by "i" myself?
>>
>> It unambiguously states that sqrt has a real argument.
>
> But not that the result of the function is real - that is not specified.

Because it cannot specify what is mathematically incorrect.

>>> Then the user reads the postcondition and sees that for negative x it
>>> raises an exception. Now my code needs to handle the possibility of
>>> an exception - that's not what I want!
>>
>> If you do not want it, don't pass negative number as an argument.
>
> I don't want the possibility to exist in my code.

Use a different type. E.g. define sqrt on a subtype:

subtype Non_Negative_Float is Float range 0.0..Float'Last;
function Sqrt (X : Non_Negative_Float) return Float;

Note that this does not save you. Because it simply moves the
postcondition to the constraint check.

sqrt(-1.0)

will still propagate exception: Constraint_Error instead of, maybe,
Numeric_Error.

> I don't want to use a
> function that might throw an exception - that would just mean more
> effort on my part to deal with the possibility and test that handler code.

You don't use arithmetic +,-,*,/? That is very courageous...

>>> I know I never give the sqrt function a negative number,
>>
>> And how do know that?
>
> That's my side of the bargain.

Not if your code is used by somebody else in turn.

> Software development is all about contracts. A function has a
> precondition and a postcondition - it promises that if the precondition
> is fulfilled before the call, the postcondition will be fulfilled after
> the call. It says /nothing/ about what will happen if the precondition
> is not fulfilled.

Yep, and we don't want such contracts unless absolutely necessary.

> If you can't rely on the calling code to do its job, what makes you
> think you can rely on the called function to do its job?

Each party must fulfill its side of contract.

The advise about strengths addresses the way contracts are designed to
be useful, safer and easier to fulfill.

> The whole concept of a "precondition" is nonsensical if the
> function implementation cannot rely on it being true!

Which is why the precondition must be weakest possible. Thanks.

>>> so the exception handling path will never be taken and cannot be
>>> tested without writing extra test code to simulate an impossible
>>> situation. That's more extra code, more testing, more risk of
>>> errors, more inefficiencies, for exactly /zero/ benefit.
>>
>> Pre- and postconditions are for correctness proofs, not for testing.
>> It is stupid to test program proven correct.
>
> Well, that's /very/ questionable - you still have to test that the
> proofs have been made correctly.

That is a test of the prover, not a test of the application program. It
is as stupid as write compiler, linker, OS tests for the application.
That prover, compiler, linker, CPU, Maxwell's equations work is a
*precondition*!

> Correctness proofs and tests work
> together, not as alternatives.

Nope. Applied to the *same* object, they are complementary.

> However, I was not talking about testing the pre- or postconditions - I
> was talking about testing the exception handling code I, as the caller,
> have to write since the function may throw an exception.

No, you have to write a test since the caller is not proven not to cause
the function to propagate an exception. Again, what is the object here?
What do you want to test? The caller, the callee, the compiler, the CPU?

>>> Or I could leave the exception unhandled,
>>
>> You cannot do that. Again, it is about proofs. If your subprogram has
>> no exception in its postcondition you could not prove it correct
>> unless x is non-negative.
>>
>> If you do not prove correctness, then exception propagation is just
>> defined behavior as opposed to undefined one.
>
> This kind of misuse of exceptions to hide bugs in code is not about
> proofs - it is about arse-covering.

No, it is about keeping contracts clean. It is impossible to define
model types that would produce mathematically correct results. Start
with + and produce types of X and Y such that the result is always
mathematically correct X + Y. Exceptions is a method to fill the gaps
between the model and the problem space without breaking the abstraction
altogether.

> Oh wait, you claimed that the person writing the calling code couldn't
> possibly have forgotten to handle the exception, because then they
> wouldn't have been able to prove their code is correct. So you trust
> them to have the skills, training, and resources to do formal
> correctness proofs on their code - but you don't trust them to be able
> to see if a value is negative?

Yep, and this is trivially a halting problem:

X := 2.0;
if HALT(p) then
X := -1.0;
end if;
X := sqrt (X);

And, BTW, prover does proofs, not the developer.

In the above the prover would say: "Sorry, I cannot prove the
postcondition saying no exception on negative argument. Try better."

There are basically two cases #1 with prover, #2 without.

#1 Precondition plays no role. It is strictly same to prove that X>=0
and that sqrt does raise exception. One directly implies another.

#2 Exception was massive advantage. Since it cannot be proven that X>=0,
then it is expected to be X<0 and exception propagation allows most
eager detection. While precondition hides the problem. sqrt(-1.0) might
return 13.2 and the application will continue normally generating garbage.

>>> Exceptions are a valid mechanism for handling unusual and rare
>>> run-time circumstances, such as a database transaction failure, loss
>>> of contact in a network connection, and so on. They are /not/ a
>>> crutch for poor programming and coding errors!
>>
>> Sure undefined behavior is much better than that! (:-))
>>
>
> The point about undefined behaviour is that it does not happen.

sqrt(-1.0)

> It is /better/ to leave situations that must not be allowed to occur, as
> undefined.

sqrt(-1.0)

> With your specification, roles are mixed - should the caller check for
> negative x?

Nothing in the contract of sqrt requires it. You are confusing contract
parties. Without seeing the caller's contract nothing can be said.

> Is that the function's job?

The function's job is to implement the contract. How it does this is not
the caller's business.

> If the caller has checked
> this, does it really have to handle an exception?

It is not sqrt business what the caller does with the result. It is
between the caller and the caller's caller.

>>> The second version of the specification is clear and simple.
>>
>> It leaves behavior undefined without any good reason. Strong
>> precondition if not an utter necessity is a grave design fault.
>
> /Please/ read a book about how to write specifications for functions,
> and how to program using them. There's a good one called "Programming
> from Specifications" that is freely available - though it is quite
> theoretical and mathematical.
>
> In short - you define behaviour for what you want to happen for valid
> inputs.

-1.0 is a valid input, it is a value of the type. -1.0 is a real value.

[...]

> How about debugging? After all, real programmers make mistakes. With
> undefined behaviour, debugging tools (such as "sanitizer" options in
> modern compilers) can spot many kinds of undefined behaviour and give
> the developer a nice report about the problem.

You fundamentally cannot detect undefined behavior, because any behavior
can realize when undefined. It is allowed to fire a gamma jet that would
roast the silly programmer...

> But your version doesn't have undefined behaviour - throwing an
> exception is part of the specification.

Yep, and it can be trivially verified by tests or in any debugger.

>>> The implementer knows the function is only ever called with
>>> non-negative values and can efficiently calculate the square root.
>>
>> He does not know that without correctness proofs, which are rarely
>> possible and rarely used.
>
> He knows because he wrote correct code that calls the function
> appropriately.

He thinks he knows...

>>> The responsibility for using appropriate inputs is where it should be
>>> - with the caller.
>>
>> Exactly, shoveling your problems to the caller, causing distributed
>> overhead and making sure of having untraceable bugs in the production
>> code under to motto:
>>
>> "no qualified programmer would ever do X"
>
> No, it is not passing on /your/ problems to anyone. It is letting the
> caller take responsibility for /their/ code.

"no qualified programmer would forget to check his code"

James Harris

unread,

Dec 5, 2022, 5:08:53 PM12/5/22

to

On 04/12/2022 22:23, David Brown wrote:

> On 04/12/2022 14:54, James Harris wrote:
>> On 04/12/2022 12:31, David Brown wrote:
>>> On 03/12/2022 20:17, James Harris wrote:

...

>>>> For example, who decided that buffers in a chain couldn't be
>>>> enlarged? Who invented that prohibition and why is it there?

...

>>> Look, at the lowest level, buffers are /never/ enlarged.
>>
>> Rubbish! Just because /you/ never want such a thing does not mean that
>> others do not want it, nor does your dislike for it or your lack of
>> experience of it mean that it's a bad idea.
>>
>
> They are /not/ enlarged.

On the contrary, in what I proposed buffers /would be/ enlarged. You are
evidently arguing against my proposal when you don't understand it.
Please put aside whatever model you have in mind and consider the
proposal if you want to raise objections to it!! We've been arguing
about this for over a week, now, and it's not a good use of time for
objections to be unrelated to the proposal they are supposed to be
against. :-(

>
> You allocate some memory for a buffer. You allocate more memory later
> for something else - it gets put after the first buffer. How are you
> going to enlarge the first buffer? Stomp all over the second allocation?

I cannot work out what model you have in mind but it doesn't seem to
match what I proposed. To illustrate, say there is a chain of two buffers:

buffer A is 1000 bytes
buffer B is 100 bytes

A string already occupies all of A and B. The application tells the
string library to append 80 bytes to the string. There is no room in B
so the string library asks the memory allocator to resize B in place to
make it large enough. Let's say the allocator finds 20 bytes of free
space after B but not the 80 asked for. It makes B 20 bytes longer. As
it's not long enough the string library then requests space for another
buffer, C, and add it to the chain; it would then put 20 bytes in B and
60 in C.

Please note especially that last sentence.

What allows the string library to know? Two calls:

resize(B, ....) - requests resizing of B in place
msize(B) - requests the actual size of B

You may not like the proposal but if you you are going to continue to
object to it please say /why/ it wouldn't work rather than just saying
"that's not how it's done". How things are done ATM in your experience
is irrelevant.

...

> If the string type is part of the language's basic standard library (and
> I'd recommend that),

OK.

> then the buffer size is going to be picked to match
> the allocation size from the standard library allocator used. It would
> be downright idiotic to have an allocator that, say, rounds sizes up to
> 32-byte blocks and then have a string structure that was 20 bytes long,
> supporting short strings of up to 18 bytes. It would be even dafter to
> have a string structure that is of varying size and have users
> re-allocate it to grow it to use the extra size that the library (not
> the user) knows is always there.

There is no need to make the string library dependent on a particular
allocator.

...

>> Well, the software would be in layers.
>>
>>       app
>>        |
>>        V
>>   string library
>>        |
>>        V
>>    allocator
>>
>
> Yes.
>
>> What I've proposed in this thread is a simple and small change to the
>> allocator which would allow the string library to make best use of the
>> spaces which the allocator returns to the string library.
>>
>
> No.

What do you mean by No? Are you trying to tell me I didn't propose what
I proposed?

>
> The simple method is that the string library knows the size of chunks
> provided by the allocator, and always uses that information.

That is a poor design. The dependency you propose is unnecessary. There
is no need for a library to be bound to a particular allocator.

>
>> I am not sure why you keep objecting to that but I am pretty sure you
>> and I aren't going to agree on this stuff. Still, no problem.
>> Agreement is not mandatory. :-)
>>
>
> It would be boring indeed if we all agreed! /Disagreement/ is mandatory
> for an interesting thread :-)
>
> But we do agree on many things.

Indeed, as programmers I get the impression that we agree on most
things. These discussions, by their nature, focus on the areas of
disagreement.

--
James Harris

James Harris

unread,

Dec 5, 2022, 5:25:17 PM12/5/22

to

On 04/12/2022 22:59, David Brown wrote:
> On 04/12/2022 19:21, James Harris wrote:

...

>> The case in point is two libraries: a memory allocator and a string
>> library which uses the memory allocator. What I am saying to you is
>> that such libraries need to be correct but they also need to scale to
>> large strings, many strings, many string operations, many other
>> non-string allocation activity, etc, because it's impossible to say
>> just now how they will be used. It's no good to just "focus on
>> correctness".
>>
>
> I did not say that you should ignore performance or efficiency. I said
> /correctness/ trumps performance every time. /Every/ time. No exceptions.

You can say it yet again but that doesn't mean it's true. Correctness is
essential, of course, but for some programs which need hard real-time
guarantees performance is equally as important.

>
> If particular specific performance expectations are part of the
> requirements, then they are part of /correctness/.

Indeed.

>
> If you are just looking for "as fast as practically possible given the
> budget constraints for the development process", then that's fine too -
> but it /always/ comes secondary to correctness.

That is at least a development on what you said before but it is still
impractical. If you implemented a simple n-squared sort "due to budget
constraints" it may work fine for weeks but then hang a system which got
hit with a much larger data set. It does not scale. I dread to think how
much 'professional' and paid-for software has followed similar
principles but I've been unfortunate enough to use some of it.

--
James Harris

David Brown

unread,

Dec 5, 2022, 5:31:25 PM12/5/22

to

On 05/12/2022 21:39, Dmitry A. Kazakov wrote:
> On 2022-12-05 17:56, David Brown wrote:
>> On 05/12/2022 16:14, Dmitry A. Kazakov wrote:
>>> On 2022-12-05 15:38, David Brown wrote:
>>>> On 05/12/2022 12:06, Dmitry A. Kazakov wrote:
>>>>> On 2022-12-05 11:40, David Brown wrote:
>>>
>>>>> The task of a specification designer is to mandate weakest
>>>>> precondition possible. Any precondition is a heavy burden on the
>>>>> code user and a source of bugs.
>>>>>
>>>>
>>>> No.
>>>>
>>>> A good strong precondition makes the use of the function clear, the
>>>> implementation simpler and more efficient, and the whole thing
>>>> easier to test.
>>>
>>> Strong precondition is a booby trap.
>>
>> Only if it is not clear and documented.
>
> I see it in the boldest font available: "garbage ahead, proceed at your
> own risk."
>
>>>> A weak precondition makes the function more flexible to call, but
>>>> harder to implement.
>>>
>>> Weak precondition guaranties defined behavior.
>>
>> Which is pointless if the defined behaviour is useless or wrong.
>
> Any defined behavior is better than any undefined one.
>

Wrong. I've given clear arguments why that is the case - if they don't
convince you, there's little point in repeating them.

>>>>> People frequently confuse preconditions with erroneous inputs.
>>>>
>>>> If the inputs satisfy the preconditions, they are not erroneous.
>>>
>>> They are. Violated precondition is a bug. Erroneous input is just
>>> anticipated event, e.g. a user typed 68 minutes.
>>
>> That all depends on your definitions of "error" and "bug".
>
> Error is an undesired, exceptional state. Bug is an illegal state.
>

That's one choice of definitions - it is very far from a universal
choice, but we'll go with them for now.

>> If you specify your square root function with a precondition of
>> "true", then no input is erroneous.
>
> No. It is so that input is a caller's bug = contract violation. Negative
> input is an erroneous, exceptional input. No difference to inputs
> causing + to overflow. Care to write a strong precondition for +, *, /,
> gamma etc?

int add_int(int a, int b);
precondition: (a + b >= INT_MIN) && (a + b <= INT_MAX)

It's not rocket science.

And yet you want to specify what a square root function should do when
given this mathematically unworkable input.

>>>> Then the user reads the postcondition and sees that for negative x
>>>> it raises an exception. Now my code needs to handle the possibility
>>>> of an exception - that's not what I want!
>>>
>>> If you do not want it, don't pass negative number as an argument.
>>
>> I don't want the possibility to exist in my code.
>
> Use a different type. E.g. define sqrt on a subtype:
>
> subtype Non_Negative_Float is Float range 0.0..Float'Last;
> function Sqrt (X : Non_Negative_Float) return Float;
>

You could do that, yes.

> Note that this does not save you. Because it simply moves the
> postcondition to the constraint check.
>
> sqrt(-1.0)
>
> will still propagate exception: Constraint_Error instead of, maybe,
> Numeric_Error.

That's all up to the typing system you use. Presumably that would be a
compile-time error, not an exception.

>
>> I don't want to use a function that might throw an exception - that
>> would just mean more effort on my part to deal with the possibility
>> and test that handler code.
>
> You don't use arithmetic +,-,*,/? That is very courageous...
>

These don't throw exceptions in the languages I use. In some languages,
they are specified with constraints on their inputs. In other
languages, arbitrary precision arithmetic is used to get correct answers.

(In high level programming, such as Python, I do make use of exceptions
for the appropriate purposes - exceptional run-time events, rather than
program errors.)

>>>> I know I never give the sqrt function a negative number,
>>>
>>> And how do know that?
>>
>> That's my side of the bargain.
>
> Not if your code is used by somebody else in turn.

They do their job too.

I don't comprehend how you assume the person calling "sqrt" is so
incompetent, and have such an incompetent development process and team,
that they might call it with a negative number and not find the problem
unless the implementation throws an exception. And yet you think that
same programmer is competent to write essentially untestable exception
handling code without flaw, dealing with the vastly more complex
situation of code that runs partially and stops unexpectedly.

>
>> Software development is all about contracts. A function has a
>> precondition and a postcondition - it promises that if the
>> precondition is fulfilled before the call, the postcondition will be
>> fulfilled after the call. It says /nothing/ about what will happen if
>> the precondition is not fulfilled.
>
> Yep, and we don't want such contracts unless absolutely necessary.
>

They are necessary. We want them. They make life simple and clear, and
everyone knows what they are doing and gets it right.

>> If you can't rely on the calling code to do its job, what makes you
>> think you can rely on the called function to do its job?
>
> Each party must fulfill its side of contract.
>

Yes.

> The advise about strengths addresses the way contracts are designed to
> be useful, safer and easier to fulfill.
>

But it is wrong advice - it makes everything /harder/.

>> The whole concept of a "precondition" is nonsensical if the function
>> implementation cannot rely on it being true!
>
> Which is why the precondition must be weakest possible. Thanks.
>

From the function implementers viewpoint, it should be as /strong/ as
possible. And regardless of how weak or strong it is, the function
implementer can rely upon it - assuming you are actually doing software
engineering and not just faffing around getting paid by the number of
lines of code you throw out.

>>>> so the exception handling path will never be taken and cannot be
>>>> tested without writing extra test code to simulate an impossible
>>>> situation. That's more extra code, more testing, more risk of
>>>> errors, more inefficiencies, for exactly /zero/ benefit.
>>>
>>> Pre- and postconditions are for correctness proofs, not for testing.
>>> It is stupid to test program proven correct.
>>
>> Well, that's /very/ questionable - you still have to test that the
>> proofs have been made correctly.
>
> That is a test of the prover, not a test of the application program. It
> is as stupid as write compiler, linker, OS tests for the application.
> That prover, compiler, linker, CPU, Maxwell's equations work is a
> *precondition*!
>
>> Correctness proofs and tests work together, not as alternatives.
>
> Nope. Applied to the *same* object, they are complementary.
>

No, they work together. Some things are best verified by proofs, some
things by testing, some things by both. A /vastly/ better programmer
than you or I said "Be careful of this code. I have only proven it
correct, not tested it."

>> However, I was not talking about testing the pre- or postconditions -
>> I was talking about testing the exception handling code I, as the
>> caller, have to write since the function may throw an exception.
>
> No, you have to write a test since the caller is not proven not to cause
> the function to propagate an exception. Again, what is the object here?
> What do you want to test? The caller, the callee, the compiler, the CPU?
>
>>>> Or I could leave the exception unhandled,
>>>
>>> You cannot do that. Again, it is about proofs. If your subprogram has
>>> no exception in its postcondition you could not prove it correct
>>> unless x is non-negative.
>>>
>>> If you do not prove correctness, then exception propagation is just
>>> defined behavior as opposed to undefined one.
>>
>> This kind of misuse of exceptions to hide bugs in code is not about
>> proofs - it is about arse-covering.
>
> No, it is about keeping contracts clean.

What could be cleaner than the contract for a square root function that
returns the square root of valid inputs? Certainly not some drivel that
pulls exception handling and all its types, hierarchies, handlers and
mechanisms into the picture.

> It is impossible to define
> model types that would produce mathematically correct results. Start
> with + and produce types of X and Y such that the result is always
> mathematically correct X + Y. Exceptions is a method to fill the gaps
> between the model and the problem space without breaking the abstraction
> altogether.
>
>> Oh wait, you claimed that the person writing the calling code couldn't
>> possibly have forgotten to handle the exception, because then they
>> wouldn't have been able to prove their code is correct. So you trust
>> them to have the skills, training, and resources to do formal
>> correctness proofs on their code - but you don't trust them to be able
>> to see if a value is negative?
>
> Yep, and this is trivially a halting problem:
>
>    X := 2.0;
>    if HALT(p) then
>       X := -1.0;
>    end if;
>    X := sqrt (X);
>
> And, BTW, prover does proofs, not the developer.
>

So you are not sure if the programmer might try to write a halting
machine inside their function that calls square root? Okay, I suppose,
if you are /that/ paranoid - but why should the person specifying or
implementing the square root function have to deal with that?

> In the above the prover would say: "Sorry, I cannot prove the
> postcondition saying no exception on negative argument. Try better."
>
> There are basically two cases #1 with prover, #2 without.
>
> #1 Precondition plays no role. It is strictly same to prove that X>=0
> and that sqrt does raise exception. One directly implies another.
>
> #2 Exception was massive advantage. Since it cannot be proven that X>=0,
> then it is expected to be X<0 and exception propagation allows most
> eager detection. While precondition hides the problem. sqrt(-1.0) might
> return 13.2 and the application will continue normally generating garbage.
>
>>>> Exceptions are a valid mechanism for handling unusual and rare
>>>> run-time circumstances, such as a database transaction failure, loss
>>>> of contact in a network connection, and so on. They are /not/ a
>>>> crutch for poor programming and coding errors!
>>>
>>> Sure undefined behavior is much better than that! (:-))
>>>
>>
>> The point about undefined behaviour is that it does not happen.
>
> sqrt(-1.0)
>
>> It is /better/ to leave situations that must not be allowed to occur,
>> as undefined.
>
> sqrt(-1.0)

The problem lies with the programmer that uses the function, or their
development team or process that failed to catch the bug. The function
specifier is not responsible for that, and cannot fix the problem
anyway. So why make everything more complicated and higher risk?

Do you really think it is realistic that in a real program, a mistake
leading to an attempt at finding a negative square root is because
someone wrote "sqrt(-1.0)", or perhaps "sqrt(-x)" instead of "sqrt(x)" ?
No, the realistic source of the problem would be far back in the code
- throwing an exception at this point will help very little. Do you
really think the code would be able to recover sensibly and handle the
situation gracefully when you throw an exception? No, that has no
rooting in real life - at the point where the negative square root is
called, the code's calculations and data are so screwed up that you
won't be able to help.

>
>> With your specification, roles are mixed - should the caller check for
>> negative x?
>
> Nothing in the contract of sqrt requires it. You are confusing contract
> parties. Without seeing the caller's contract nothing can be said.
>

The specifications for the function are the contract.

>> Is that the function's job?
>
> The function's job is to implement the contract. How it does this is not
> the caller's business.
>

Equally, the caller's job is to ensure that it fulfils /its/ side of the
contract (the precondition). How it does this is not the callee's business.

>> If the caller has checked this, does it really have to handle an
>> exception?
>
> It is not sqrt business what the caller does with the result. It is
> between the caller and the caller's caller.
>

Indeed that is true. And equally, it is not sqrt's business how the
caller ensures that it has satisfied the precondition before calling it.
The specified of sqrt has no interest in making a pointlessly weak
precondition that forces a pointlessly complex postcondition. Just let
the caller ensure that it only passes non-negative numbers.

>>>> The second version of the specification is clear and simple.
>>>
>>> It leaves behavior undefined without any good reason. Strong
>>> precondition if not an utter necessity is a grave design fault.
>>
>> /Please/ read a book about how to write specifications for functions,
>> and how to program using them. There's a good one called "Programming
>> from Specifications" that is freely available - though it is quite
>> theoretical and mathematical.
>>
>> In short - you define behaviour for what you want to happen for valid
>> inputs.
>
> -1.0 is a valid input, it is a value of the type. -1.0 is a real value.
>

Not to a square root function that is worthy of the name.

> [...]
>
>> How about debugging? After all, real programmers make mistakes. With
>> undefined behaviour, debugging tools (such as "sanitizer" options in
>> modern compilers) can spot many kinds of undefined behaviour and give
>> the developer a nice report about the problem.
>
> You fundamentally cannot detect undefined behavior, because any behavior
> can realize when undefined. It is allowed to fire a gamma jet that would
> roast the silly programmer...
>

Yes. But you can spot many types of undefined behaviour, both through
static analysis and run-time checking in debug modes.

>> But your version doesn't have undefined behaviour - throwing an
>> exception is part of the specification.
>
> Yep, and it can be trivially verified by tests or in any debugger.
>

It is not a bug. The debugger and tools don't know this defined
behaviour is not intentional - because it is fully defined behaviour
that the function caller can rely on.

>>>> The implementer knows the function is only ever called with
>>>> non-negative values and can efficiently calculate the square root.
>>>
>>> He does not know that without correctness proofs, which are rarely
>>> possible and rarely used.
>>
>> He knows because he wrote correct code that calls the function
>> appropriately.
>
> He thinks he knows...
>
>>>> The responsibility for using appropriate inputs is where it should
>>>> be - with the caller.
>>>
>>> Exactly, shoveling your problems to the caller, causing distributed
>>> overhead and making sure of having untraceable bugs in the production
>>> code under to motto:
>>>
>>> "no qualified programmer would ever do X"
>>
>> No, it is not passing on /your/ problems to anyone. It is letting the
>> caller take responsibility for /their/ code.
>
> "no qualified programmer would forget to check his code"
>

Programmers can make mistakes - including the people calling sqrt, the
people implementing it, and the people specifying it. It makes sense to
make life as clear and simple as practically possible for them.

When you give someone a recipe for merengues, you say "Beat the egg
whites then slowly add the sugar while beating all the time in order to
make the merengue mixture". You don't add "alternatively, add the sugar
then try to beat it in order to make a mess". In pretty much every walk
of life, we specify the expected input to get the desired output, and do
not concern ourselves about what happens when someone puts in garbage to
the set of instructions. Why would you want to do something different
for programming?

James Harris

unread,

Dec 5, 2022, 5:32:49 PM12/5/22

to

On 04/12/2022 23:02, Bart wrote:
> On 04/12/2022 15:56, James Harris wrote:

...

>> Again, I wasn't talking about strings but a general case. I thought it
>> was you who, along with David, suggested a 'free' which included the
>> length as in
>>
>> free(address, length)

...

> Freeing a hole in the middle is not going to be much better in either
> case, because of this behind-scenes size rounding, assumptions about
> alignment, and the machinations of malloc.

Sure. I don't have a use case, either. I was just interested in the fact
that unmap appears to allow frees from within the middle of allocations
and your proposal allowed similar.

You mention malloc and alignment but if arbitrary frees were to be
allowed then they'd probably have to be at the granularity of individual
memory cells; the structures which managed the allocations would have to
be stored elsewhere, not as malloc-style headers between each block.

--
James Harris

James Harris

unread,

Dec 5, 2022, 5:38:23 PM12/5/22

to

On 04/12/2022 23:12, David Brown wrote:
> On 04/12/2022 15:07, James Harris wrote:
>> On 04/12/2022 13:27, David Brown wrote:

...

>> I thought that was what you and Bart (who IIRC also mentioned
>> something similar) had in mind and I spend some time designing an
>> allocator to support splitting memory in that way down to the byte
>> level. It was an interesting exercise and I may use it one day but I
>> thought you had some use cases in mind.
>>
>>> Who cares? Launch the nasal daemons.
>>
>> Never, never, and never!
>>
>
> Garbage in, garbage out.
>
> You cannot give the program sensible answers to stupid questions. And
> no matter how much you try to check the program's requests for sanity
> and valid values, there will be a programmer who will outwit you and
> find new ways to be wrong that are beyond your imagination. If you have
> a low-level language that allows things like pointers and conversions
> between points and integers, this will happen /very/ easily, and there
> is nothing you can do to stop it.

...

> The concept of undefined behaviour - of "garbage in, garbage out" - is
> as old as computing:

Garbage out is not the same as undefined behaviour. Here's an example of
garbage in leading to garbage out

Enter your age: 1000
You owe us $-1,234,567

Here's an example of UB

Enter your age: 1000
Deleting files in your home directory

Nasal demons should never be permitted.

--
James Harris

James Harris

unread,

Dec 5, 2022, 6:15:41 PM12/5/22

to

On 05/12/2022 11:35, Dmitry A. Kazakov wrote:
> On 2022-12-05 12:17, Bart wrote:
>> On 05/12/2022 11:06, Dmitry A. Kazakov wrote:

...

>>> People frequently confuse preconditions with erroneous inputs. The
>>> precondition or real sqrt is plain True (assuming types are
>>> enforced). Negative argument is as valid as positive:
>>>
>>> Pre: True
>>> Post: (x >= 0 and sqrt(x)**2 = x) or (x < 0 and exception raised)
>>>
>>> Now a bad design would be:
>>>
>>> Pre: x >= 0
>>> Post: sqrt(x)**2 = x
>>>
>>> Here you have undefined behavior for x < 0.

The behaviour of a function not meeting a precondition may not be
evident in the code but may well be defined by the rules of the language.

>>
>> Not at all. For x<0, my sqrt() gives an invalid number (displayed as
>> -1.#IND00, depending on the stringify library).
>
> You did not understand the point.
>
> Your sqrt is not real-value sqrt at all, because the result is not a
> real number. However, if we considered rather sqrt defined on some
> subset of R U NaN, here NaN is a not-a-number ideal. Then the point can
> be easily made. Good design:
>
> Pre: True
> Post: (x /= NaN and x >= 0 and sqrt(x)**2 = x) or
> (x /= NaN and x < 0 and sqrt(x) = NaN) or
> (x = NaN and sqrt(x) = NaN)

Beware comparing floats for equality. sqrt(x)**2 may not exactly equal x
- which rather upsets the whole idea of such precise and tautologous
postconditions, doesn't it!

--
James Harris

James Harris

unread,

Dec 5, 2022, 6:36:42 PM12/5/22

to

On 05/12/2022 14:38, David Brown wrote:

...

> Exceptions are a valid mechanism for handling unusual and rare run-time
> circumstances, such as a database transaction failure, loss of contact
> in a network connection, and so on. They are /not/ a crutch for poor
> programming and coding errors!

That's one school of thought. Another is that exceptions be used to
indicate also errors which a programmer would not normally want to test
for each time they can occur so that his code can focus on the logic of
the task at hand, e.g. OOM in

p = memory_allocate(SIZE);

or IO error from

nbytes = read(fd, &buf, BUFSIZE);

--
James Harris

James Harris

unread,

Dec 5, 2022, 6:40:52 PM12/5/22

to

On 05/12/2022 16:56, David Brown wrote:

...

> Remember the old joke about politicians in a crisis? "We must do
> something! This is something - therefore we must do it!".

That's known as the Politician's Syllogism: "We must do something. This
is something. Therefore, we must do this."

https://en.wikipedia.org/wiki/Politician%27s_syllogism

--
James Harris

James Harris

unread,

Dec 5, 2022, 6:47:54 PM12/5/22

to

On 05/12/2022 20:39, Dmitry A. Kazakov wrote:
> On 2022-12-05 17:56, David Brown wrote:

...

> Any defined behavior is better than any undefined one.

Indeed!

...

>> If you specify your square root function with a precondition of
>> "true", then no input is erroneous.
>
> No. It is so that input is a caller's bug = contract violation. Negative
> input is an erroneous, exceptional input. No difference to inputs
> causing + to overflow. Care to write a strong precondition for +, *, /,
> gamma etc?

Rather than a contract (which can lead to an error if the precondition
is not met) it may be better to /define/ the range of possible inputs to
a function: a call which has parameters which do not meet the
definitions will not even match with the function which cannot handle
them. For example,

sqrt(-4.0)

would not match with a definition

function sqrt(real value :: value >= 0) -> real

but would match with

function sqrt(real value) -> complex

--
James Harris

David Brown

unread,

Dec 6, 2022, 2:46:14 AM12/6/22

to

The first extension of B is an example of pissing around wasting effort
to squeeze in a few extra bytes. The rest is done by a /new/
allocation, not an enlargement. That means all the time spend trying to
figure out if you have an extra byte or two is wasted.

> Please note especially that last sentence.
>
> What allows the string library to know? Two calls:
>
> resize(B, ....) - requests resizing of B in place
> msize(B) - requests the actual size of B
>
> You may not like the proposal but if you you are going to continue to
> object to it please say /why/ it wouldn't work rather than just saying
> "that's not how it's done". How things are done ATM in your experience
> is irrelevant.
>

Oh, I am sure it could be made to work.

The big question is why anyone would bother with the large amount of
extra effort and complication of code and data structures, and very
significant extra run-time overhead on the off-chance that they might
occasionally save a small number of bytes of heap. The answer, of
course, is that the would not do so.

There are basically two situations for strings. One case is that you
know its size - it's a literal, or immutable object. Allocate one
single buffer big enough to hold it, and you're done. No enlargement.

The other case is that it is unknown in size. The simple handling is to
use "realloc" as needed, growing as necessary. That's going to be very
efficient for a lot of cases, but poor for big strings. Or allocate
buffers appropriately as it grows, keeping a chain.

Why should you not faff around with "resize", "get_real_size" and all
your other unnecessary extras? Because it is is /slow/. It is extra
effort and library calls that slow down everything for so little
potential benefit.

Memory allocation is slow. Freeing memory is slow. You do it as rarely
as you can, and you do it as locally as you can. When you need
something that might grow or shrink by small sizes, you track it locally
in the data structure - your string, or dynamic array, or whatever, has
a "size" and a "capacity" concept. If "size" is already at "capacity"
and you need to grow, you take a /big/ lump of memory from the next
level, so that you won't have to ask for more any time soon. At the
library level you have an allocator that handles mid-sized lumps, from
pools that are kept per thread. When these run low, it asks for more
from the OS. And so on.

Any time the size of something is dynamic, you ask for memory in
generous lumps - usually in a power-of-two size for best alignment and
easy fitting from the allocator, or at least with a power-of-two factor
(such as a multiple of 32).

There are always trade-offs between wasting some memory, and avoiding
extra calls to allocators or having extra code to track the last few
bytes. For real code, you can afford to waste a /lot/ of memory and get
more efficient results.

> ...
>
>> If the string type is part of the language's basic standard library
>> (and I'd recommend that),
>
> OK.
>
>> then the buffer size is going to be picked to match the allocation
>> size from the standard library allocator used. It would be downright
>> idiotic to have an allocator that, say, rounds sizes up to 32-byte
>> blocks and then have a string structure that was 20 bytes long,
>> supporting short strings of up to 18 bytes. It would be even dafter
>> to have a string structure that is of varying size and have users
>> re-allocate it to grow it to use the extra size that the library (not
>> the user) knows is always there.
>
> There is no need to make the string library dependent on a particular
> allocator.
>

No, but there are benefits in optimising it based on knowledge about the
allocator.

If your string library already knows the size of allocation lumps, there
is never any point in calling "resize" - it will never give you
anything. You want to make the basic "string" structure a neat size and
with enough space for short strings (perhaps 12 - 20 characters), but
you make the lumps it allocate sized to be multiples of your basic
allocation size.