Find data somewhere on the stack

Frederick Virchanza Gotham

unread,

Apr 23, 2023, 4:51:55 PM4/23/23

to

The problem with arbitrarily pushing data onto the stack for it to be retrieved by some other function later on down the chain of function calls, is that each function in the chain of function calls may increment and decrement the stack pointer here and there, and so we won't know what offset to apply to the stack pointer in order to retrieve our data.

They say that if you generate a 128-Bit random number, then it's a one of a kind and you don't have to worry about it ever being duplicated. So let's generate our own UUID:

#define UUID "\x24\x31\x07\x26\x35\x2c\x4f\x9a\x99\x65\xe1\x10\x65\x62\x92\xcc"

So if we push this UUID onto the stack, and then place our data right beside it on the stack, then later on we can search the stack for this UUID and we'll find our data right beside it.

Now at first I was going to write x86_64 assembler to push an arbitrary number of bytes onto the stack, but that would malfunction if the compiler didn't use the frame pointer perfectly, which I don't think you're even guaranteed if you supply '-fno-omit-frame-pointer'. Luckily though, some compilers have a built-in function called 'alloca' which you can read about here:

https://man7.org/linux/man-pages/man3/alloca.3.html

So I'll use the function "__builtin_alloca" to decrement the stack pointer, I'll copy the UUID onto the stack along with my data, and then later on I can search for my UUID on the stack in order to retrieve the data which will be located right beside it.

I got this working, here it is up on GodBolt:

https://godbolt.org/z/ofYPc74Gc

A week ago I shared some code here on comp.lang.c++ for how to write a thunk in machine code onto the stack and then execute the stack. Well, using the technique described in this post, we could instead look for the addresses of lambda objects on the stack without having to execute the stack. I'll write a new thunk generator tomorrow that uses this technique.

Here's the contents of GodBolt copy-pasted:

#include <cstddef> // size_t
#include <cstring> // memcpy

// The unique 128-Bit value we'll use to find our data on the stack
#define UUID "\x24\x31\x07\x26\x35\x2c\x4f\x9a\x99\x65\xe1\x10\x65\x62\x92\xcc"

extern "C" char *stack_pointer(void); // get the current value of the stack pointer

// The following is an x86_64 assembler implementation
// of a function to retrieve the current stack pointer
__asm("stack_pointer: \n mov %rsp,%rax \n ret");

// The following function pushes any amount of bytes of data
// onto the stack. It's implemented as a macro instead of a
// real function because of the use of '__builtin_alloca'.
// I have appended '_x8975w' to the name of every local variable
// so that we don't get a name clash with the caller.
#define push_onto_stack(arg_src,arg_count) \
do \
{ \
using std::size_t; \
char const *const src_x8975w = (arg_src); \
size_t const count_x8975w = (arg_count); \
char *dst_x8975w = static_cast<char*>( \
__builtin_alloca(count_x8975w + 16u + sizeof(size_t))); \
std::memcpy(dst_x8975w, UUID, 16u); \
dst_x8975w += 16u; \
std::memcpy(dst_x8975w, &count_x8975w, sizeof count_x8975w); \
dst_x8975w += sizeof count_x8975w; \
std::memcpy(dst_x8975w, src_x8975w, count_x8975w); \
} while (false);

char *retrieve_from_stack(std::size_t *const p = nullptr) __attribute__((no_sanitize_address));
char *retrieve_from_stack(std::size_t *const p) // amount of bytes retrieved goes in *p
{
char *sp = stack_pointer();

// Instead of simply using 'memcmp' which will
// be intercepted by '-fsanitize' to flag a
// stack-buffer-underflow, I have written a loop
Loop:
{
while ( UUID[0] != *sp++ ) /* Do Nothing */; // ++sp because stack grows down on x86

for ( unsigned i = 1u; i < 16u; ++i, ++sp )
{
if ( UUID[i] != *sp ) goto Loop;
}
}

// If control reaches here, we found the UUID on the stack
// Note: alignment for size_t is guaranteed by __builtin_alloca
if ( nullptr != p ) *p = *static_cast<size_t const*>(static_cast<void const*>(sp));
sp += sizeof *p;
return sp;
}

#include <iostream>
using std::cout, std::endl;

void Func3(void)
{
cout << "Hello from Func3\n";
size_t n;
cout << "Retrieved data = '" << retrieve_from_stack(&n) << "', count bytes = " << n << endl;
}

void Func2(void)
{
cout << "Hello from Func2\n";
Func3();
}

void Func(void)
{
cout << "Hello from Func\n";
Func2();
}

int main(void)
{
push_onto_stack("Monkeys eat bananas", sizeof "Monkeys eat bananas");
Func();
}

Öö Tiib

unread,

Apr 24, 2023, 3:00:41 AM4/24/23

to

On Sunday, 23 April 2023 at 23:51:55 UTC+3, Frederick Virchanza Gotham wrote:
> The problem with arbitrarily pushing data onto the stack for it to be retrieved by some other function later on down the chain of function calls, is that each function in the chain of function calls may increment and decrement the stack pointer here and there, and so we won't know what offset to apply to the stack pointer in order to retrieve our data.
>

Yes but why to use stack? Most operating systems provide APIs to access
per-thread storage (TLS). Also C++ has keyword thread_local since C++11.
It hides whatever tricks behind usual variable usage syntax.

> They say that if you generate a 128-Bit random number, then it's a one of a kind and you don't have to worry about it ever being duplicated. So let's generate our own UUID:
>
> #define UUID "\x24\x31\x07\x26\x35\x2c\x4f\x9a\x99\x65\xe1\x10\x65\x62\x92\xcc"
>

Two points:

Whatever trick the OS TLS services or C++ thread_local keyword use
those are probably more performant than linear search of a pattern
in stack is.

Also someone might figure trick how to prepare input data in the way
to get a copy of your UUID into stack and we have potent security
vulnerability.

Frederick Virchanza Gotham

unread,

Apr 24, 2023, 7:20:51 PM4/24/23

to

On Monday, April 24, 2023 at 8:00:41 AM UTC+1, Öö Tiib wrote:
>
> Yes but why to use stack?

Maybe it could be used for an exception handling system?

For example when an exception is thrown, it looks for the UUID on the stack, and beside the UUID it finds a code address to jump back to (with the exception itself stored in the 'rax' regsiter). The code it jumps back to will check the typeid of what was thrown, and if it hasn't got a 'catch' for that specific typeid then it looks further back on the stack for the same UUID.

So everywhere you have a 'try' block in your code, that's where the UUID + typeid's + function pointers would be pushed onto the stack. And everywhere you have 'catch' in your code, the address of the 'catch' in code would go beside the UUID on the stack. Something along those lines.

Calling the destructors in the right order at the right time would be tricky though.

Öö Tiib

unread,

Apr 25, 2023, 4:03:56 AM4/25/23

to

On Tuesday, 25 April 2023 at 02:20:51 UTC+3, Frederick Virchanza Gotham wrote:
> On Monday, April 24, 2023 at 8:00:41 AM UTC+1, Öö Tiib wrote:
> >
> > Yes but why to use stack?
> Maybe it could be used for an exception handling system?
>
> For example when an exception is thrown, it looks for the UUID on the stack, and beside the UUID it finds a code address to jump back to (with the exception itself stored in the 'rax' regsiter). The code it jumps back to will check the typeid of what was thrown, and if it hasn't got a 'catch' for that specific typeid then it looks further back on the stack for the same UUID.
>

I do not know what you are implementing here. A catch is missing but you want
to handle it anyway?

Is it again smelling like wheel chair to fix programming errors run-time. Majority
of worst programming errors I've met during decades were in such wheel-chairs.
Programming errors should be doctored by programming not by adding complex
and error-prone wheel-chairs and crutches. IOW crash, let backup process to
take over and add catch where needed.

> So everywhere you have a 'try' block in your code, that's where the UUID + typeid's + function pointers would be pushed onto the stack. And everywhere you have 'catch' in your code, the address of the 'catch' in code would go beside the UUID on the stack. Something along those lines.
>
> Calling the destructors in the right order at the right time would be tricky though.
>

How to ensure that something utterly odd and pointless is done
"everywhere"? How it even looks like? What it does? How it differs from
catch(...) block added to bottom level catch chain?

Scott Lurndal

unread,

Apr 25, 2023, 11:10:32 AM4/25/23

to

Frederick Virchanza Gotham <cauldwel...@gmail.com> writes:
>On Monday, April 24, 2023 at 8:00:41=E2=80=AFAM UTC+1, =C3=96=C3=B6 Tiib wr=
>ote:
>>
>> Yes but why to use stack?=20

>
>
>Maybe it could be used for an exception handling system?
>

>For example when an exception is thrown, it looks for the UUID on the stack=
>, and beside the UUID it finds a code address to jump back to (with the exc=
>eption itself stored in the 'rax' regsiter). The code it jumps back to will=
> check the typeid of what was thrown, and if it hasn't got a 'catch' for th=
>at specific typeid then it looks further back on the stack for the same UUI=
>D.

Shades of the VAX, which had an optional vector at the start of each function
that the unwind ($UNWIND) code would call if nonzero. Not particularly thread-safe,
but then that was then, not now.

Frederick Virchanza Gotham

unread,

Jul 18, 2023, 3:34:22 PM7/18/23

to

On Sunday 23 April 2023, Frederick Virchanza Gotham wrote:
>
> So if we push this UUID onto the stack,
> and then place our data right beside it on
> the stack, then later on we can search the
> stack for this UUID and we'll find our data
> right beside it.

It's taken me 3 months but today I've come up with a legitimate use for this.

If you've been following the C++ Standard Proposals mailing list these past few days, I'm trying to come up with a universal header file to ensure elision of move/copy operations with Named Return Value Optimisation.

I order to implement this on 64-bit ARM with the aarch64 instruction set, I need to write a function that will:
(1) Move address of return value into 1st parameter
(2) Jump to a location stored in a thread_local variable

It turns out that accessing thread_local variables on aarch64 isn't so simple, as there's more than one system for storing them, see how complicated it can be here:

http://bambowu.github.io/linux/RelocationAndTheadLocalStorage/

Furthermore I want the code to work fine on a microcontroller that doesn't have multithreading support.

And so instead of putting the jump address in a thread_local variable, I'll just push it onto the stack beside a known UUID.

I knew this would come in handy somewhere.