Make sure shared library doesn't execute any code at 'dlopen'

Frederick Virchanza Gotham

unread,

Jan 26, 2023, 11:50:38 AM1/26/23

to

On Linux x86_64, I've taken the code for an opensource executable program, and I've changed the final linker command to add "-fPIC -shared" so that I get a dynamic shared library instead of an executable file.

So I take this ".so" file and I load it into an executable program:

void *h = dlopen("libSomeProgram.so", RTLD_NOW);

int (*prog_main)(int,char**) = dlsym(h,"main");

And so now I can start this other program by executing its 'main' function as follows:

char *args[] = { "prog", "-v", "-k", "-c", nullptr, nullptr, nullptr };
prog_main(4, args);

I've already built this and tested it and it works fine.

At the beginning of all this, I had the choice of building the original program as either a static shared library or a dynamic shared library. I went with dynamic because I didn't want to be burdened with the library's start-up code (i.e. whatever happens inside '_init') until I actually needed the functionality of the library (i.e. until the point that 'dlopen' is called).

But then I thought to myself, I can just check what the dynamic shared library does when it's loaded. First of all, I used 'readelf' on the ".so" file, and here's what I saw:

(INIT) 0x3000

So when the library gets loaded with "dlopen", it starts executing code at the address 0x3000. So next I used 'objdump' to see what's located at address 0x3000:

endbr64
sub rsp, 8
mov rax, cs:__gmon_start__ptr
test rax, rax
jz short locationA
call rax ; __gmon_start__
locationA:
add rsp, 8
retn

So in C++ code, this basically just does:

int init(void)
{
if ( gmon_start_ptr ) return gmon_start_ptr();
return 0;
}

So next I checked what the function '__gmon_start__' function does, however there is no such function to be found inside the library. It's actually a 'weak' symbol. I used "nm -D" at the command line, and here's what I see:

w __gmon_start__

By the way I linked the shared library with "-Wl,--no-undefined" so there should be no unresolved symbols, but undefined weak symbols are allowed.

So I think what's happening here is that my shared library is seeking a function called '__gmon_start__' inside the executable file that it's loaded into. I've seen this technique before. For example let's say I were to write a library that optionally prints debug info, well I could have a weak symbol called "void debug_print(char const*)" inside my library, and then the executable file can provide that function to my library if it wants to.

Given that the dynamic shared library I built was entirely a C program, I don't have to worry about the constructors of global objects being invoked when 'dlopen' is called. However there are still other ways that code could get executed, for example there might be a global variable that gets initialised with the return value of a function:

int some_global_variable = SomeFunction();

To get to the bottom line on this though, it seems that my dynamic shared libary doesn't do anything when it's loaded except call the function "__gmon_start__". So if I were to build it as a static shared library and link it at compile time, then my executable program shouldn't be burdened with any extra start-up code. Am I right here? Am I missing something?

I need to look into why this executable file looks for "__gmon_start__" in the first place, because the shared library wasn't build with "-pg". It seems that GNU g++ places this weak symbol inside its ELF files even if you don't specify "-pg" at the command line. I don't know why it does that.

Scott Lurndal

unread,

Jan 26, 2023, 12:21:17 PM1/26/23

to

Frederick Virchanza Gotham <cauldwel...@gmail.com> writes:
>
>On Linux x86_64, I've taken the code for an opensource executable program, =
>and I've changed the final linker command to add "-fPIC -shared" so that I =

>get a dynamic shared library instead of an executable file.
>
>So I take this ".so" file and I load it into an executable program:
>

> void *h =3D dlopen("libSomeProgram.so", RTLD_NOW);
>
> int (*prog_main)(int,char**) =3D dlsym(h,"main");
>
>And so now I can start this other program by executing its 'main' function =
>as follows:
>
> char *args[] =3D { "prog", "-v", "-k", "-c", nullptr, nullptr, nullptr =

>};
> prog_main(4, args);
>
>I've already built this and tested it and it works fine.

It is generally not considered to be a good idea
to include a function called 'main' in a dynamic shared object
as it must invariably conflict with the same symbol in the
executable loading the dynamic shared object using dlopen()
which has its own "main" function.

>So when the library gets loaded with "dlopen", it starts executing code at =
>the address 0x3000. So next I used 'objdump' to see what's located at addre=

>ss 0x3000:
>
> endbr64
> sub rsp, 8
> mov rax, cs:__gmon_start__ptr
> test rax, rax
> jz short locationA
> call rax ; __gmon_start__
>locationA:
> add rsp, 8
> retn
>
>So in C++ code, this basically just does:
>
> int init(void)
> {
> if ( gmon_start_ptr ) return gmon_start_ptr();
> return 0;
> }
>

<snip>
>I need to look into why this executable file looks for "__gmon_start__" in =
>the first place, because the shared library wasn't build with "-pg". It see=
>ms that GNU g++ places this weak symbol inside its ELF files even if you do=

>n't specify "-pg" at the command line. I don't know why it does that.

__gmon_start_ptr__ (and a few other static global symbols) are typically provided
by the crt (C-runtime) that gets linked in with the executable. Those symbols
may not be "linked" to the share object when it is loaded, and thus dlopen
may fail. In any case, if you call any of the crt startup functions when
the dynamic object is loaded, you'll conflict with the crt startup functions
that were executed when your program that dlopens the shared object started
(you'll note from objdump that the executable entry point is _start, not main, and _start
calls a bunch of library initialization functions before invoking main).

Which is another reason to not use 'main' as the dynamic symbol, or
include any run-time intialization (other than static class initializers) in
the shared object.

Frederick Virchanza Gotham

unread,

Jan 26, 2023, 12:37:47 PM1/26/23

to

On Thursday, January 26, 2023 at 5:21:17 PM UTC, Scott Lurndal wrote:
>
> It is generally not considered to be a good idea
> to include a function called 'main' in a dynamic shared object
> as it must invariably conflict with the same symbol in the
> executable loading the dynamic shared object using dlopen()
> which has its own "main" function.

This of course will matter when I make a static library because the linker will tell me I have a multiple definition error. But when it comes to dynamic libraries on Linux, you can have more than one function with the same name, and you can even get the other function's address with dlopen(RTLD_NEXT, "main").

> __gmon_start_ptr__ (and a few other static global symbols) are typically provided
> by the crt (C-runtime) that gets linked in with the executable. Those symbols
> may not be "linked" to the share object when it is loaded, and thus dlopen
> may fail. In any case, if you call any of the crt startup functions when
> the dynamic object is loaded, you'll conflict with the crt startup functions
> that were executed when your program that dlopens the shared object started
> (you'll note from objdump that the executable entry point is _start, not main, and _start
> calls a bunch of library initialization functions before invoking main).

I'll build the original program as an executable, check what its entry point is, and see what it does before main.

Paavo Helde

unread,

Jan 26, 2023, 12:43:51 PM1/26/23

to

26.01.2023 18:50 Frederick Virchanza Gotham kirjutas:
>
> On Linux x86_64, I've taken the code for an opensource executable program, and I've changed the final linker command to add "-fPIC -shared" so that I get a dynamic shared library instead of an executable file.

-fPIC is a compiler option, so in general it's not enough to add it only
to the linker command.

[...]

> At the beginning of all this, I had the choice of building the original program > as either a static shared library

What does "static shared library" mean? I guess you mean "static library".

or a dynamic shared library. I went with dynamic because I didn't want
to be burdened with the library's start-up code (i.e. whatever happens
inside '_init') until I actually needed the functionality of the library
(i.e. until the point that 'dlopen' is called).

I think you said this is open-source program, so why don't you just look
in the source to see if there is any expensive static initialization
taking place. If it does, change the code to perform this initialization
only on demand, and add an initialization function to be called from
your program to trigger this.

An why do you care? Does the program startup take 10 seconds more with
the extra library linked in? If not, why do you care?

Frederick Virchanza Gotham

unread,

Jan 26, 2023, 6:07:19 PM1/26/23

to

On Thursday, January 26, 2023 at 5:43:51 PM UTC, Paavo Helde wrote:

> What does "static shared library" mean? I guess you mean "static library".
> or a dynamic shared library. I went with dynamic because I didn't want
> to be burdened with the library's start-up code (i.e. whatever happens
> inside '_init') until I actually needed the functionality of the library
> (i.e. until the point that 'dlopen' is called).
> I think you said this is open-source program, so why don't you just look
> in the source to see if there is any expensive static initialization
> taking place. If it does, change the code to perform this initialization
> only on demand, and add an initialization function to be called from
> your program to trigger this.
>
> An why do you care? Does the program startup take 10 seconds more with
> the extra library linked in? If not, why do you care?

I have forked two opensource projects on Github and I'm amalgamating them together.

With regard to the program which I want to turn into a static library, well I've gathered all of the ".a" files it needs and I've unzipped them and then combined them with the object files of the main program, and made one big ".a" file out of them, and so now I have one big file called SomeProgram.a.

Now if I write a new program and get it to link with SomeProgram.a, I might get a few 'multiple definition' errors, such as 'main' and 'options'.

So first I got a list of all the exported symbols in all the object files: find -iname "*.o" | nm -i -r -n1 "{}" | grep -Ev "( U )|( W )|( w )" | cut -d ' ' -f3- | sort | uniq > all_symbols.txt

Next I made a command line argument list from them: cat all_symbols.txt | awk '{print "--redefine-sym " $s "=SomeProgram_" $s}' | tr '\n' ' ' > cmd_line_args.txt

Next I renamed all of the symbols in all of the object files: find -iname "*.o" | xargs -i -r -n1 objcopy `cat cmd_line_args.txt` "{}"

After doing all that, I was assured that there wouldn't be a name collision, so I linked it all together and I didn't get a multiple definition error.

Michael S

unread,

Jan 27, 2023, 5:37:57 AM1/27/23

to

You still didn't explain what exactly do you try to achieve. And you didn't
explain what you don't like about normal method where you keep your
executable program as is and call them with spawn().

Paavo Helde

unread,

Jan 27, 2023, 6:40:19 AM1/27/23

to

You sure get some points for originality in the software development.
Brings the no-code movement to new heights. Maybe we should call it
blind-source development?

My only question still remains: why???

Frederick Virchanza Gotham

unread,

Jan 27, 2023, 6:44:09 AM1/27/23

to

On Friday, January 27, 2023 at 11:40:19 AM UTC, Paavo Helde wrote:

> You sure get some points for originality in the software development.
> Brings the no-code movement to new heights. Maybe we should call it
> blind-source development?
>
> My only question still remains: why???

Program A writes into a TCP socket.

Program B reads in from a TCP socket.

I'm making one program, i.e. Program C, out of them.

Program A is about 10 times the size of Program B, so it makes sense to put B into A rather than put A into B.

Program C will have two threads, one thread running the 'main' of Program A, and one thread running the 'main' of Program B.

Since Program A and Program B are now in the one process and have access to each other's memory, I can do away with the TCP socket between them, and replace it with a lockfree container, e.g. boost::lockfree:spsc_queue.

Paavo Helde

unread,

Jan 27, 2023, 8:14:03 AM1/27/23

to

27.01.2023 13:44 Frederick Virchanza Gotham kirjutas:
> On Friday, January 27, 2023 at 11:40:19 AM UTC, Paavo Helde wrote:
>
>> You sure get some points for originality in the software development.
>> Brings the no-code movement to new heights. Maybe we should call it
>> blind-source development?
>>
>> My only question still remains: why???
>
>
> Program A writes into a TCP socket.
>
> Program B reads in from a TCP socket.
>
> I'm making one program, i.e. Program C, out of them.
>
> Program A is about 10 times the size of Program B, so it makes sense to put B into A rather than put A into B.

This is non-sequitur. Also, in the previous line you said you put both
of them in C.

>
> Program C will have two threads, one thread running the 'main' of Program A, and one thread running the 'main' of Program B.
>
> Since Program A and Program B are now in the one process and have access to each other's memory, I can do away with the TCP socket between them, and replace it with a lockfree container, e.g. boost::lockfree:spsc_queue.

So what are the timings and which operations are too slow, and by how
much? I.e. is this project going to solve a real or an imagined problem?

You are planning extensive modifications in the source code of both A
and B. What prevents you to change the name of main() and other
conflicting symbols in the source code of A and B?

Frederick Virchanza Gotham

unread,

Jan 27, 2023, 9:20:44 AM1/27/23

to

On Friday, January 27, 2023 at 1:14:03 PM UTC, Paavo Helde wrote:
>>
> > Program A is about 10 times the size of Program B, so it makes sense to put B into A rather than put A into B.
> This is non-sequitur. Also, in the previous line you said you put both of them in C.

C is the combination of A and B. In order to create C, I had three options:
(1) Start with nothing, then add in A, then add in B
(2) Start with A, then add in B
(3) Start with B, then add in A

I opted for option 2. So I forked A on Github, and started copying files from B into A.

> So what are the timings and which operations are too slow, and by how
> much? I.e. is this project going to solve a real or an imagined problem?

The main reason I'm doing this is to greatly simplify the running of these two programs. As things stand now, you have to run program A with a load of options:

progA --opt1 --opt2 --opt3=monkey.txt -k2 -f6 -m8 -c7 --save-n2

and then you have to create a virtual network device, then you've to alter the routing table, then you've to wait for progA to take effect, and then you've to analyse the effect that progA has had, and then you take what you analysed about progA and feed it as the command line to progB along with another bunch of command line arguments:

progB --opt1 --opt2=something_from_progA --opt3 -k -m -n3 --open-n2

I will be able to reduce this all to a simple one-liner at the command line:

progC --my-simple-option

Literally you will only need to give one simple command line argument when starting Program C.

Program C will start Program A (as a new thread), it will wait until A's ready, then it will create a virtual network device, then it will analyse the routing table and make changes, then it will run program B to communicate with program A.

> You are planning extensive modifications in the source code of both A
> and B. What prevents you to change the name of main() and other
> conflicting symbols in the source code of A and B?

I *do* change the symbol names, but not in the C++ source and header files. I wait until the object files are produced and then I use 'objcopy --redefine-sym' on the object files. It works and it means I can automate the process without having to write a C/C++ parser.

People have written scripts to do what I'm doing, i.e. to combine Program A and Program B, but my program (i.e. Program C) will be much more capable of dealing with adversity, for example it will analyse the routing table and try to find a free network even if there's already 8 entries in there. It will use ephemeral port numbers where possible.

The lockfree container between the two threads is just the icing on the cake although I'll smile when it's working and I get the CPU usage down below 1%.

Scott Lurndal

unread,

Jan 27, 2023, 10:20:27 AM1/27/23

to

Frederick Virchanza Gotham <cauldwel...@gmail.com> writes:

So use mmap(2) or shmat(2) to share memory between the processes.

Kenny McCormack

unread,

Jan 27, 2023, 10:32:26 AM1/27/23

to

In article <tr0d6f$1m1sc$1...@dont-email.me>,
Paavo Helde <ees...@osa.pri.ee> wrote:
...

>You sure get some points for originality in the software development.
>Brings the no-code movement to new heights. Maybe we should call it
>blind-source development?

Blah, blah, blah.

>My only question still remains: why???

This isn't a question. It's just a slam.

(OP's situation is perfectly clear to me)

--
Republican Congressman Matt Gaetz claims that only ugly women want
abortions, which they will never need since no one will impregnate them.

Paavo Helde

unread,

Jan 27, 2023, 10:57:27 AM1/27/23

to

27.01.2023 16:20 Frederick Virchanza Gotham kirjutas:
> On Friday, January 27, 2023 at 1:14:03 PM UTC, Paavo Helde wrote:
>>>
>>> Program A is about 10 times the size of Program B, so it makes sense to put B into A rather than put A into B.
>> This is non-sequitur. Also, in the previous line you said you put both of them in C.
>
> C is the combination of A and B. In order to create C, I had three options:
> (1) Start with nothing, then add in A, then add in B
> (2) Start with A, then add in B
> (3) Start with B, then add in A
>
> I opted for option 2. So I forked A on Github, and started copying files from B into A.
>
>> So what are the timings and which operations are too slow, and by how
>> much? I.e. is this project going to solve a real or an imagined problem?
>
> The main reason I'm doing this is to greatly simplify the running of these two programs. As things stand now, you have to run program A with a load of options:
>
> progA --opt1 --opt2 --opt3=monkey.txt -k2 -f6 -m8 -c7 --save-n2
>
> and then you have to create a virtual network device, then you've to alter the routing table, then you've to wait for progA to take effect, and then you've to analyse the effect that progA has had, and then you take what you analysed about progA and feed it as the command line to progB along with another bunch of command line arguments:
>
> progB --opt1 --opt2=something_from_progA --opt3 -k -m -n3 --open-n2
>
> I will be able to reduce this all to a simple one-liner at the command line:
>
> progC --my-simple-option
>
> Literally you will only need to give one simple command line argument when starting Program C.

Sounds like a perfect jobs for a shell script.

>
> Program C will start Program A (as a new thread), it will wait until A's ready, then it will create a virtual network device, then it will analyse the routing table and make changes, then it will run program B to communicate with program A.
>
>> You are planning extensive modifications in the source code of both A
>> and B. What prevents you to change the name of main() and other
>> conflicting symbols in the source code of A and B?
>
> I *do* change the symbol names, but not in the C++ source and header files. I wait until the object files are produced and then I use 'objcopy --redefine-sym' on the object files. It works and it means I can automate the process without having to write a C/C++ parser.

This is insane. Why would you need a C++ parser? How many name conflicts
do you exactly have, something like 3? Why do you need to automate
replacing them?

Compiling and linking libraries (static or dynamic) is very common
practice. I have some programs with tens of third-party libraries linked
in, mostly as static libraries. Never ever have I needed to use objcopy
or C++ parser with that.

It's true that when some C code is not written with the mindset to be
used in a library, it might contain some too generic names which may
easily get into conflict with other code. In C they have their own hacks
to cope with that. In C++ we luckily have a standard way to solve this,
just put all code in a library-specific namespace.

Christian Gollwitzer

unread,

Jan 27, 2023, 1:53:23 PM1/27/23

to

Am 27.01.23 um 15:20 schrieb Frederick Virchanza Gotham: > The main

reason I'm doing this is to greatly simplify the running of these two
programs. As things stand now, you have to run program A with a load of
options:
>
> progA --opt1 --opt2 --opt3=monkey.txt -k2 -f6 -m8 -c7 --save-n2
>
> and then you have to create a virtual network device, then you've to alter the routing table, then you've to wait for progA to take effect, and then you've to analyse the effect that progA has had, and then you take what you analysed about progA and feed it as the command line to progB along with another bunch of command line arguments:
>
> progB --opt1 --opt2=something_from_progA --opt3 -k -m -n3 --open-n2
>
> I will be able to reduce this all to a simple one-liner at the command line:
>
> progC --my-simple-option
>
> Literally you will only need to give one simple command line argument when starting Program C.

Sounds like Program C could be a smallish shell script. Bash is an
excellent language for this kind of thing.

Christian

Christian Gollwitzer

unread,

Jan 27, 2023, 2:03:25 PM1/27/23

to

Am 27.01.23 um 16:32 schrieb Kenny McCormack:

> In article <tr0d6f$1m1sc$1...@dont-email.me>,
> Paavo Helde <ees...@osa.pri.ee> wrote:
> ...
>> You sure get some points for originality in the software development.
>> Brings the no-code movement to new heights. Maybe we should call it
>> blind-source development?
>
> Blah, blah, blah.
>
>> My only question still remains: why???
>
> This isn't a question. It's just a slam.
>
> (OP's situation is perfectly clear to me)
>

The situation maybe clear, but I can't understand how anyone in their
right mind could think that editing object files is a good idea *when
you have the source code*

Christian

Chris M. Thomasson

unread,

Jan 27, 2023, 3:31:42 PM1/27/23

to

On 1/27/2023 5:13 AM, Paavo Helde wrote:
> 27.01.2023 13:44 Frederick Virchanza Gotham kirjutas:
>> On Friday, January 27, 2023 at 11:40:19 AM UTC, Paavo Helde wrote:
>>
>>> You sure get some points for originality in the software development.
>>> Brings the no-code movement to new heights. Maybe we should call it
>>> blind-source development?
>>>
>>> My only question still remains: why???
>>
>>
>> Program A writes into a TCP socket.
>>
>> Program B reads in from a TCP socket.
>>
>> I'm making one program, i.e. Program C, out of them.
>>
>> Program A is about 10 times the size of Program B, so it makes sense
>> to put B into A rather than put A into B.
>
> This is non-sequitur. Also, in the previous line you said you put both
> of them in C.
>
>>
>> Program C will have two threads, one thread running the 'main' of
>> Program A, and one thread running the 'main' of Program B.
>>
>> Since Program A and Program B are now in the one process and have
>> access to each other's memory, I can do away with the TCP socket
>> between them, and replace it with a lockfree container, e.g.
>> boost::lockfree:spsc_queue.

You have to be careful with them.

> So what are the timings and which operations are too slow, and by how
> much? I.e. is this project going to solve a real or an imagined problem?
>
> You are planning extensive modifications in the source code of both A
> and B. What prevents you to change the name of main() and other
> conflicting symbols in the source code of A and B?

Fwiw, are you familiar with the two lock queue?

https://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html

Chris M. Thomasson

unread,

Jan 27, 2023, 3:34:08 PM1/27/23

to

You can share memory between processes.

Chris M. Thomasson

unread,

Jan 27, 2023, 3:35:43 PM1/27/23

to

Oops, I meant to respond to:

Frederick Virchanza Gotham

Sorry Paavo.

Frederick Virchanza Gotham

unread,

Jan 28, 2023, 8:34:25 AM1/28/23

to

On Thursday, January 26, 2023 at 4:50:38 PM UTC, Frederick Virchanza Gotham wrote:
> On Linux x86_64, I've taken the code for an opensource executable program,
> and I've changed the final linker command to add "-fPIC -shared" so that I get
> a dynamic shared library instead of an executable file.

It's actually three programs instead of two now.

I've taken the ssh client from 'openssh' and I've added two other programs to it:
- The 'tun2socks' program from badvpn
- The 'route' program from busybox

So my new program will analyse the routing table, find an available private network (e.g. 10.10.10.0/24), create a TUN device and set its IP address, then get the ssh client to connect and establish a SOCKS server, then get tun2socks to forward traffic from the TUN to the SOCKS. So then you will be able to use any remote SSH server as a transparent proxy simply by doing:

ssh user@server --vpn

The selling point here though is that you don't need admin rights on the remote server.
I hope to have this in good working order by the end of February. Plus I'll build it as a static executable that doesn't need any shared libraries, and then I'll make a fat binary for a few different architectures (x86, x86_64, arm32, aarch64), I might even build it for macOS too. Afterward I might make a GUI in wxWidgets.

Frederick Virchanza Gotham

unread,

Jan 28, 2023, 8:44:18 AM1/28/23

to

On Friday, January 27, 2023 at 7:03:25 PM UTC, Christian Gollwitzer wrote:
>
> The situation maybe clear, but I can't understand how anyone in their
> right mind could think that editing object files is a good idea *when
> you have the source code*

Because editing object files can be automated. Use 'nm file.o' to get all the symbols, then use 'objcopy file.o --redefine-sym main=pro_main' to edit the symbol names. You can put a prefix on *every* symbol name and then forget your worries about name collisions.

if I want to edit the name of a function or a variable in C++ source and header files, I need to do it manually myself, I can't just do a 'Find & Replace' in files because there might be a global variable named 'monkey' and also a stack variable within a function named 'monkey'.

In the my last job writing firmware for embedded Linux cameras, I made good use of 'objcopy' and also 'patchelf' to automate these processes. Once you have automated these processes, you can upgrade the 3rd party libraries to the latest version without having to go fixing name collisions all over again.

The format of object files and ELF files are well documented, you don't need to be squeamish about editing these files.

You'd wanna see some of the sorcery I've been able to pull off with 'patchelf', it is a very beautiful little program.

Paavo Helde

unread,

Jan 28, 2023, 9:00:51 AM1/28/23

to

Just a side note: BusyBox is distributed under GPL v2, so if you want to
distribute your program, you must also make your source code available.

Mut...@dastardlyhq.com

unread,

Jan 28, 2023, 10:54:00 AM1/28/23

to

On Sat, 28 Jan 2023 05:44:10 -0800 (PST)
Frederick Virchanza Gotham <cauldwel...@gmail.com> wrote:
>On Friday, January 27, 2023 at 7:03:25 PM UTC, Christian Gollwitzer wrote:
>>

>> The situation maybe clear, but I can't understand how anyone in their=20
>> right mind could think that editing object files is a good idea *when=20
>> you have the source code*=20
>
>Because editing object files can be automated. Use 'nm file.o' to get all t=
>he symbols, then use 'objcopy file.o --redefine-sym main=3Dpro_main' to edi=
>t the symbol names. You can put a prefix on *every* symbol name and then fo=

>rget your worries about name collisions.

Brilliant idea! Until a bug occurs in the program and some poor maintenance
programmer comes along who isn't aware that the binary doesn't match the source
code.

If anyone in my team edited binaries directly on a production system they'd be
out the door.

Frederick Virchanza Gotham

unread,

Jan 28, 2023, 12:22:44 PM1/28/23

to

On Saturday, January 28, 2023 at 3:54:00 PM UTC, Mut...@dastardlyhq.com wrote:
>
> >Because editing object files can be automated. Use 'nm file.o' to get all t=
> >he symbols, then use 'objcopy file.o --redefine-sym main=3Dpro_main' to edi=
> >t the symbol names. You can put a prefix on *every* symbol name and then fo=
> >rget your worries about name collisions.
>
> Brilliant idea! Until a bug occurs in the program and some poor maintenance
> programmer comes along who isn't aware that the binary doesn't match the source
> code.

I don't know what you mean here when you say 'The binary doesn't match the source code'.

The object files will become part of an executable, which I will later strip, so it doesn't matter what the symbols were.

> If anyone in my team edited binaries directly on a production system they'd be
> out the door.

Object files and ELF files have a more rigid structure and format than C++ source and header files. It doesn't make sense that you're more eager to alter the latter.

When making any kind of alteration to a program, there's always the risk of introducing a bug, however I minimise this risk by making the most risk-free alteration.

Renaming a variable or function in a C++ source file has a lot more implications that renaming it in an object file or in a dynamic shared library file.

Mut...@dastardlyhq.com

unread,

Jan 28, 2023, 12:31:14 PM1/28/23

to

On Sat, 28 Jan 2023 09:22:32 -0800 (PST)

Frederick Virchanza Gotham <cauldwel...@gmail.com> wrote:

>On Saturday, January 28, 2023 at 3:54:00 PM UTC, Mut...@dastardlyhq.com wrote:
>>
>> >Because editing object files can be automated. Use 'nm file.o' to get all
>t=
>> >he symbols, then use 'objcopy file.o --redefine-sym main=3Dpro_main' to
>edi=
>> >t the symbol names. You can put a prefix on *every* symbol name and then fo=
>
>> >rget your worries about name collisions.
>>
>> Brilliant idea! Until a bug occurs in the program and some poor maintenance
>> programmer comes along who isn't aware that the binary doesn't match the
>source
>> code.
>
>
>I don't know what you mean here when you say 'The binary doesn't match the
>source code'.

You know the symbol names map on to functions and variables in the source
right? How do you think debuggers work?

>The object files will become part of an executable, which I will later strip,
>so it doesn't matter what the symbols were.

Well if you strip them then it doesn't matter. Most people don't.

>> If anyone in my team edited binaries directly on a production system they'd
>be
>> out the door.
>
>
>Object files and ELF files have a more rigid structure and format than C++
>source and header files. It doesn't make sense that you're more eager to alter
>the latter.

??????!!!!

>Renaming a variable or function in a C++ source file has a lot more
>implications that renaming it in an object file or in a dynamic shared library
>file.

I'm lost for words.

Paavo Helde

unread,

Jan 28, 2023, 4:54:25 PM1/28/23

to

Muttley, it's rare that I agree with you, but this time I do.

Frederick Virchanza Gotham

unread,

Jan 28, 2023, 6:28:28 PM1/28/23

to

On Saturday, January 28, 2023 at 5:31:14 PM UTC, Mut...@dastardlyhq.com wrote:
>
> You know the symbol names map on to functions and variables in the source
> right? How do you think debuggers work?

In the debugger, the symbol will be 'busybox_read_back_twice" instead of "read_back_twice".

> Well if you strip them then it doesn't matter. Most people don't.

I can't recall if I've ever been given a release executable with the symbols left inside. I don't think I ever have.

> ??????!!!!

> I'm lost for words.

Let's say you combine two programs together and you try to link and you get a multiple definition error for a symbol called 'options' -- just like I did when I tried to combine 'busybox' with 'openssh'. Let's say that this symbol is used 17 times in busybox, and 63 times in openssh. Are you going to do a 'Find & Replace in Files' to change all of them? And what about if you see the following in the code:

int options;

namespace UDP {
int options;
}

namespace TCP {
int options;

void Func(void)
{
using namespace UDP;

int j = options;
}
}

Will you meticulously check all 81 uses of 'options' in both programs to make sure you're replacing the correct one? Or will you just replace all of them, so that even local stack variables are affected?

And what about when the preprocessor is used to make a variable name? For example:

#define OPT(name) name ## _option

int OPT(global) = 7;

int main(void)
{
global_option = 7;
}

After you've gone over all the 81 uses, a few months goes by and a new version of the library comes out, so now you 76 instead of 81 uses, and they're in different places so you have to go over them all over again.

Are you seriously telling me that editing the C++ source files will be less susceptible to introducing bugs than if you were to simply compile the source files to object files and then follow three simple steps:
Step 1) Get a list of all the exported symbols in all the object files:
find -iname "*.o" | xargs -i -r -n1 nm "{}" | grep -Ev "( U )|( W )|( w )" | cut -d ' ' -f3 | sort | uniq > a.txt
Step 2) Make a list of command line options to give to 'objcopy' :
cat a.txt | awk '{print "--redefine-sym "$$s"=busybox_"$$s }' > b.txt
Step 3) Run 'objcopy' on all the object files:
find -iname "*.o" | while read line; do cat b.txt | xargs -r -n2 objcopy $$line; done

I've put these 3 steps into a Makefile, and so now in the future if I upgrade either 'busybox' or 'badvpn', I don't need to go sorting out name collisions all over again.

I think your main boggle with what I'm doing owes to a decades-old belief that object file shouldn't be meddled with. The two programs, 'objcopy' and 'patchelf' are well written, and they do their job properly. There's nothing proprietary or elusive about the format of object files. In my last job I used 'patchelf' in a Makefile to compensate for a bug in a 3rd proprietary driver that I only had machine code for.

Editing object files is safer and quicker than editing C++ source files.

Paavo Helde

unread,

Jan 29, 2023, 3:20:07 AM1/29/23

to

29.01.2023 01:28 Frederick Virchanza Gotham kirjutas:
> On Saturday, January 28, 2023 at 5:31:14 PM UTC, Mut...@dastardlyhq.com wrote:
>>
>> You know the symbol names map on to functions and variables in the source
>> right? How do you think debuggers work?
>
> In the debugger, the symbol will be 'busybox_read_back_twice" instead of "read_back_twice".
>
>> Well if you strip them then it doesn't matter. Most people don't.
>
> I can't recall if I've ever been given a release executable with the symbols left inside. I don't think I ever have.
>
>> ??????!!!!
>> I'm lost for words.
>
> Let's say you combine two programs together and you try to link and you get a multiple definition error for a symbol called 'options' -- just like I did when I tried to combine 'busybox' with 'openssh'. Let's say that this symbol is used 17 times in busybox, and 63 times in openssh. Are you going to do a 'Find & Replace in Files' to change all of them? And what about if you see the following in the code:
>
> int options;
>
> namespace UDP {
> int options;
> }
>
> namespace TCP {
> int options;
>
> void Func(void)
> {
> using namespace UDP;
>
> int j = options;
> }
> }
>
> Will you meticulously check all 81 uses of 'options' in both programs to make sure you're replacing the correct one? Or will you just replace all of them, so that even local stack variables are affected?

Why on earth should you search for 'options'? If there is a conflicting
function name 'options', you search for 'options(' in the "match whole
word" mode. It is enough to change the declaration and the definition (2
places in normal source code). Then you recompile the project and fix
the error lines (in case of C, you might need to turn the corresponding
warning into an error first).

This is one-time activity and you get clean code as a result. For
updating git forks one is supposed to use git merge anyway, which will
cope with such changes, so there is no need to automate such things.
Without git merge, how else are you planning to keep your added no-lock
queue changes intact in the source code?

An alternative would be to compile the library as a shared .so with
hidden symbols, except of the one which you will call. I do not like
that approach very much because it requires a platform-specific compiler
option, but it's still better than a whole build step consisting of
platform-specific hacks.

When developing software, the aim is to make things simpler after each
alteration, not more complicated. Each time when you add a kludgy hack,
you make the code twice worse. Add 4 such hacks, and you have a program
which is 16 times more difficult to deal with, meaning that you are not
able to maintain it any more.

David Brown

unread,

Jan 29, 2023, 5:47:05 AM1/29/23

to

On 29/01/2023 00:28, Frederick Virchanza Gotham wrote:
> On Saturday, January 28, 2023 at 5:31:14 PM UTC, Mut...@dastardlyhq.com wrote:
>>
>> You know the symbol names map on to functions and variables in the source
>> right? How do you think debuggers work?
>
> In the debugger, the symbol will be 'busybox_read_back_twice" instead of "read_back_twice".
>
>> Well if you strip them then it doesn't matter. Most people don't.
>
> I can't recall if I've ever been given a release executable with the symbols left inside. I don't think I ever have.
>
>> ??????!!!!
>> I'm lost for words.
>

It is certainly one of the most bizarre hacks I have heard of for a while.

> Let's say you combine two programs together and you try to link and
> you get a multiple definition error for a symbol called 'options' --
> just like I did when I tried to combine 'busybox' with 'openssh'.
> Let's say that this symbol is used 17 times in busybox, and 63 times
> in openssh. Are you going to do a 'Find & Replace in Files' to change
> all of them? And what about if you see the following in the code:

Yes, going through the source code and making the changes in the right
places is /absolutely/ the thing you have to do. Hacking the generated
object code is insanity and a maintainer's worst nightmare.

But a simple search-and-replace is a clumsy way to handle it - you can
use better tools and get better results. A good IDE can figure out
every point in a project that references a particular symbol,
differentiating between local variables, functions, internal and
external linkage. Often it is just a matter of choosing the "refactor -
rename identifier" tool and the job is done.

Another good method is to rename the original variable at definition and
declaration. If the code is well-written, with a single declaration in
a single header, then a re-compile will show errors for all the
references. Fix the errors one by one, and you have re-named the
variable (or function, or whatever).

For a different kind of hack, that is IMHO significantly less bad that
hacking the object code, you could add gcc flags like :

-Doptions=BUSY_BOX_options

and

-Doptions=OPENSSH_options

to the different makefile parts. That will lead to a renaming of the
symbols.

Frederick Virchanza Gotham

unread,

Jan 29, 2023, 10:57:43 AM1/29/23

to

Paavo Helde wrote:
> When developing software, the aim is to make things
> simpler after each alteration, not more complicated.
> Each time when you add a kludgy hack, you make the
> code twice worse. Add 4 such hacks, and you have a
> program which is 16 times more difficult to deal with,
> meaning that you are not able to maintain it any more.

David Brown wrote:
> It is certainly one of the most bizarre hacks I have
> heard of for a while.

Both of you are speaking from a viewpoint that's been
engendered and indoctrinated in you, rather than just
looking at my solution for what it is. If you consider the
editing of compiled files to be an abomination, then my
solution is an abomination. If you don't have any qualms
about editing object files, then I've given a few reasons
why my solution is superior to editing source files.

You have suggested just changing the declaration and
definition and then cleaning up the resultant compiler errors,
but that's work that might introduce bugs. And you've to
re-do it every time the library is upgraded.

This all boils down to one simple issue: Can we depend on
'objcopy' and 'patchelf' to do their job properly without
creating unforseen problems? I believe that we can, and so I
depend on them.

Let's not make this out to be a simple case of "I think my solution
is better than your solution". This is more of a cultural matter
-- with the binary editors on one side, and the binary intacters
on the other. Cultural clash. Within one lifetime it's unlikely
either of us will defect.

Frederick Virchanza Gotham

unread,

Jan 29, 2023, 5:44:21 PM1/29/23

to

On Sunday, January 29, 2023 at 3:57:43 PM UTC, Frederick Virchanza Gotham wrote:
<snip>

Just now I got the 'ssh client' from 'openssh' to compile and link as a static executable, with both 'busybox_route' and 'badvpn_tun2socks' built in. I'm really looking forward to seeing how this turns out.

Here's what my final linker command looks like:

g++ -o ssh -static -std=c++23 ssh.o readconf.o clientloop.o sshtty.o sshconnect.o sshconnect2.o mux.o ssh-sk-client.o vpn/vpn-core.cpp.o vpn/vpn-linux-tun.o vpn/from_busybox_route/xconnect.c.o vpn/from_busybox_route/inet_common.c.o vpn/from_busybox_route/route.c.o vpn/from_busybox_route/xfuncs_printf.c.o vpn/from_busybox_route/perror_msg.c.o vpn/from_busybox_route/xfunc_die.c.o vpn/from_busybox_route/ptr_to_globals.c.o vpn/from_busybox_route/signals.c.o vpn/from_busybox_route/verror_msg.c.o vpn/from_busybox_route/read.c.o vpn/from_busybox_route/fflush_stdout_and_exit.c.o vpn/from_busybox_route/time.c.o vpn/from_busybox_route/messages.c.o vpn/from_busybox_route/wfopen.c.o vpn/from_busybox_route/xfuncs.c.o vpn/from_busybox_route/full_write.c.o vpn/from_busybox_route/default_error_retval.c.o vpn/from_busybox_route/xatonum.c.o vpn/from_busybox_route/sysconf.c.o vpn/from_busybox_route/copyfd.c.o vpn/from_busybox_route/bb_strtonum.c.o vpn/from_busybox_route/getopt32.c.o vpn/from_busybox_route/safe_write.c.o vpn/from_busybox_route/safe_strncpy.c.o vpn/from_busybox_route/llist.c.o vpn/from_busybox_route/appletlib_STRIPPED_DOWN.c.o vpn/from_badvpn_tun2socks/SingleStreamReceiver.c.o vpn/from_badvpn_tun2socks/BReactor_badvpn.c.o vpn/from_badvpn_tun2socks/ip6_addr.c.o vpn/from_badvpn_tun2socks/ip4.c.o vpn/from_badvpn_tun2socks/BLog.c.o vpn/from_badvpn_tun2socks/BufferWriter.c.o vpn/from_badvpn_tun2socks/StreamPacketSender.c.o vpn/from_badvpn_tun2socks/PacketPassConnector.c.o vpn/from_badvpn_tun2socks/BTap.c.o vpn/from_badvpn_tun2socks/ip6.c.o vpn/from_badvpn_tun2socks/BNetwork.c.o vpn/from_badvpn_tun2socks/SocksUdpClient.c.o vpn/from_badvpn_tun2socks/tcp_in.c.o vpn/from_badvpn_tun2socks/BTime.c.o vpn/from_badvpn_tun2socks/icmp.c.o vpn/from_badvpn_tun2socks/KeepaliveIO.c.o vpn/from_badvpn_tun2socks/SinglePacketSender.c.o vpn/from_badvpn_tun2socks/UdpGwClient.c.o vpn/from_badvpn_tun2socks/nd6.c.o vpn/from_badvpn_tun2socks/BProcess.c.o vpn/from_badvpn_tun2socks/mem.c.o vpn/from_badvpn_tun2socks/timeouts.c.o vpn/from_badvpn_tun2socks/pbuf.c.o vpn/from_badvpn_tun2socks/udp.c.o vpn/from_badvpn_tun2socks/def.c.o vpn/from_badvpn_tun2socks/ip4_addr.c.o vpn/from_badvpn_tun2socks/BInputProcess.c.o vpn/from_badvpn_tun2socks/icmp6.c.o vpn/from_badvpn_tun2socks/BDatagram_common.c.o vpn/from_badvpn_tun2socks/init.c.o vpn/from_badvpn_tun2socks/inet_chksum.c.o vpn/from_badvpn_tun2socks/PacketRecvBlocker.c.o vpn/from_badvpn_tun2socks/tcp_out.c.o vpn/from_badvpn_tun2socks/sys.c.o vpn/from_badvpn_tun2socks/RouteBuffer.c.o vpn/from_badvpn_tun2socks/PacketPassNotifier.c.o vpn/from_badvpn_tun2socks/BDatagram_unix.c.o vpn/from_badvpn_tun2socks/StreamPassConnector.c.o vpn/from_badvpn_tun2socks/netif.c.o vpn/from_badvpn_tun2socks/PacketPassPriorityQueue.c.o vpn/from_badvpn_tun2socks/BUnixSignal.c.o vpn/from_badvpn_tun2socks/BPending.c.o vpn/from_badvpn_tun2socks/memp.c.o vpn/from_badvpn_tun2socks/StreamRecvInterface.c.o vpn/from_badvpn_tun2socks/ip4_frag.c.o vpn/from_badvpn_tun2socks/PacketPassFairQueue.c.o vpn/from_badvpn_tun2socks/stats.c.o vpn/from_badvpn_tun2socks/BSignal.c.o vpn/from_badvpn_tun2socks/DebugObject.c.o vpn/from_badvpn_tun2socks/PacketProtoDecoder.c.o vpn/from_badvpn_tun2socks/BThreadSignal.c.o vpn/from_badvpn_tun2socks/BConnection_unix.c.o vpn/from_badvpn_tun2socks/PacketRecvConnector.c.o vpn/from_badvpn_tun2socks/PacketRecvInterface.c.o vpn/from_badvpn_tun2socks/ip6_frag.c.o vpn/from_badvpn_tun2socks/SinglePacketBuffer.c.o vpn/from_badvpn_tun2socks/BConnection_common.c.o vpn/from_badvpn_tun2socks/PacketPassFifoQueue.c.o vpn/from_badvpn_tun2socks/PacketPassInactivityMonitor.c.o vpn/from_badvpn_tun2socks/PacketProtoEncoder.c.o vpn/from_badvpn_tun2socks/SingleStreamSender.c.o vpn/from_badvpn_tun2socks/PacketStreamSender.c.o vpn/from_badvpn_tun2socks/PacketProtoFlow.c.o vpn/from_badvpn_tun2socks/ip.c.o vpn/from_badvpn_tun2socks/tcp.c.o vpn/from_badvpn_tun2socks/BSocksClient.c.o vpn/from_badvpn_tun2socks/LineBuffer.c.o vpn/from_badvpn_tun2socks/PacketPassInterface.c.o vpn/from_badvpn_tun2socks/StreamRecvConnector.c.o vpn/from_badvpn_tun2socks/BLog_syslog.c.o vpn/from_badvpn_tun2socks/PacketBuffer.c.o vpn/from_badvpn_tun2socks/PacketCopier.c.o vpn/from_badvpn_tun2socks/PacketRouter.c.o vpn/from_badvpn_tun2socks/StreamPassInterface.c.o vpn/from_badvpn_tun2socks/BLockReactor.c.o -L. -Lopenbsd-compat/ -Wl,-z,relro -Wl,-z,now -Wl,-z,noexecstack -fstack-protector-strong -pie -lssh -lopenbsd-compat -lcrypto -lz

David Brown

unread,

Jan 30, 2023, 2:19:14 AM1/30/23

to

On 29/01/2023 16:57, Frederick Virchanza Gotham wrote:
> Paavo Helde wrote:
>> When developing software, the aim is to make things
>> simpler after each alteration, not more complicated.
>> Each time when you add a kludgy hack, you make the
>> code twice worse. Add 4 such hacks, and you have a
>> program which is 16 times more difficult to deal with,
>> meaning that you are not able to maintain it any more.
>
> David Brown wrote:
>> It is certainly one of the most bizarre hacks I have
>> heard of for a while.
>
> Both of you are speaking from a viewpoint that's been
> engendered and indoctrinated in you, rather than just
> looking at my solution for what it is. If you consider the
> editing of compiled files to be an abomination, then my
> solution is an abomination. If you don't have any qualms
> about editing object files, then I've given a few reasons
> why my solution is superior to editing source files.
>

That is a completely meaningless thing to write. Yes, if I think that
editing compiled files is a terrible idea, then I will think your idea
of editing compiled files is terrible - and if I don't have anything
against editing compiled files, then I won't object to it.

I /do/ have something against it - and it is not indoctrination.
Frankly, I haven't heard anyone suggest it before now, much less argue
either for or against it.

The norm, however, is that programmers write or edit source code, and it
is compiled and then linked. If you do something that is wildly
breaking that norm, you are going to cause chaos to anyone maintaining
the code or working with it later - so it is not something to consider
without a /huge/ benefit. And you don't have a huge benefit - you've
got nothing more than the laziness of not wanting to edit a few files.
(I don't think combining these programs like this in the first place is
a great idea, but that's a different matter.)

> You have suggested just changing the declaration and
> definition and then cleaning up the resultant compiler errors,
> but that's work that might introduce bugs. And you've to
> re-do it every time the library is upgraded.

You've got to re-do your Frankenstein program for every code change
anyway. This is one reason why it is a bad idea to mix the different
programs (especially when they are security-related programs). You are
far better off using the programs as separate programs, or using libraries.

Once you have done the renaming once, you have a patchset and a git
branch. For many small updates to the original programs, you merely
need to re-apply the patches.

>
> This all boils down to one simple issue: Can we depend on
> 'objcopy' and 'patchelf' to do their job properly without
> creating unforseen problems? I believe that we can, and so I
> depend on them.
>
> Let's not make this out to be a simple case of "I think my solution
> is better than your solution". This is more of a cultural matter
> -- with the binary editors on one side, and the binary intacters
> on the other. Cultural clash. Within one lifetime it's unlikely
> either of us will defect.

It is not about reliability of tools - it's about traceability and
making a clear, maintainable build that takes source and results in a
binary.

"Binary editing" can have its place. I use it all the time in my own
work in embedded systems - it is standard practice that after my
programs are built, debugged and tested as elf files, I have objcopy to
generate binary images to be flashed into an embedded system. It is
quite common that the binary file is augmented after the objcopy, with
checksums, programming information, and the like.

But that is done in a /controlled/ manner, an /expected/ manner - it is
a natural part of the build process. It is not some weird hack done in
code you don't really understand, in a way you don't fully comprehend,
as a lazy alternative to better methods.

Frederick Virchanza Gotham

unread,

Jan 30, 2023, 3:25:48 AM1/30/23

to

On Monday, January 30, 2023 at 7:19:14 AM UTC, David Brown wrote:
>
> The norm, however, is that programmers write or edit source code, and it
> is compiled and then linked.

Generally speaking, when a person encounters something that they've never encountered before, they might react with suspicion, or they might react with intrigue. This is very much down to the person's mind frame and inner stillness. In some communities it is pervasive to react with suspicion, and in some communities it is pervasive to react with intrigue.

There are words we use to describe things we haven't seen before; we call them 'new', or 'strange', or 'abnormal', or 'odd', or 'out of the ordinary', or 'out of the norm'.

This dichotomy between "what is old" and "what is new" is an important part of some people's decision-making processes. Some people, when they come up with an idea, ask themselves, "Is this what everyone else is doing?", and if the answer is no then they might discard their idea. Personally I look at my own ideas and try to judge them by their own merits, rather than considering how frequently other people are doing it.

The only argument you've been able to pose here is that what I'm doing is not frequently done. You literally have no other argument. And the reason why you have no other argument is that my solution is superior for multiple reasons. In fact, you are arguing to me that instead of adding three lines to a Makefile (in fact it's only two lines because the 2nd can be piped into the 3rd), you think it's preferable to individually seek out and every use of the symbol in all of the source and header files, and to alter them individually. And then you will do this every time the library is upgraded. The only way that a sane and rational person could side with you here is if they were to agree that editing object files is taboo.

> If you do something that is wildly
> breaking that norm, you are going to cause chaos to anyone maintaining
> the code or working with it later - so it is not something to consider
> without a /huge/ benefit.

Breaking the norm is fine when the consequences are either minimal or non-existent.

> And you don't have a huge benefit - you've
> got nothing more than the laziness of not wanting to edit a few files.
> (I don't think combining these programs like this in the first place is
> a great idea, but that's a different matter.)

Seriously. There is a possibility for introducing bugs any time you edit any part of a program. Editing source and header files by yourself (i.e. by a human) will always be more error-prone and susceptible to bug introduction than using an automated tool. The 'objcopy' program doesn't make typos, and it doesn't forget what it was supposed to be doing because the phone rang.
If the possibility of introducing new bugs can be eradicated (or at least greatly diminished) by using an automated tool, then a sane and rational person would use such a tool.

> You've got to re-do your Frankenstein program for every code change
> anyway.

I just copy the new source and header files in. If the library has a new source file that it didn't have before, I don't even have to change the Makefile because it use a wilcard: find -iname "*.c"

> This is one reason why it is a bad idea to mix the different
> programs (especially when they are security-related programs). You are
> far better off using the programs as separate programs, or using libraries.

Before I embarked on this endeavour to combine the programs, I considered what the complications might be. For example one of the programs might make a process-wide change that could adversely affect the other programs (for example if they used 'setrlimit' with 'RLIMIT_NOFILE' to limit the number of open files), or if one of them overloaded 'operator new'. Of course the most simple problem would be a name collision at compile time, for example if they both had a function called "save_settings", but I've found an automated solution to that problem using 'objcopy'.

> Once you have done the renaming once, you have a patchset and a git
> branch. For many small updates to the original programs, you merely
> need to re-apply the patches.

In my last job I wrote firmware for embedded Linux cameras. When building the firmware, we built a few dozen opensource libraries and we had a directory containing patch files for the libraries. The Makefile applied the patches before building the libraries. When we upgraded any library, we almost never were able to build it straight away, 9 times out of 10 there would be a patch failure. So I would have to get the old library, examine the effect of the patch on the old source file, and then try to create a new patch file compatible with the newer version of the file. All the while I might be introducing new bugs.

> It is not about reliability of tools - it's about traceability and
> making a clear, maintainable build that takes source and results in a
> binary.

Two lines in a Makefile.

> "Binary editing" can have its place. I use it all the time in my own
> work in embedded systems - it is standard practice that after my
> programs are built, debugged and tested as elf files, I have objcopy to
> generate binary images to be flashed into an embedded system. It is
> quite common that the binary file is augmented after the objcopy, with
> checksums, programming information, and the like.
>
> But that is done in a /controlled/ manner, an /expected/ manner - it is
> a natural part of the build process. It is not some weird hack done in
> code you don't really understand, in a way you don't fully comprehend,
> as a lazy alternative to better methods.

In the list of words I gave above to describe something not frequently seen, I forgot 'weird'.

My idea of editing object files isn't just a little bit superior -- it's vastly superior. Also I account for 'weak symbols' which you really need to be careful with.

Paavo Helde

unread,

Jan 30, 2023, 3:52:46 AM1/30/23

to

30.01.2023 10:25 Frederick Virchanza Gotham kirjutas:

> In my last job I wrote firmware for embedded Linux cameras. When building the firmware, we built a few dozen opensource libraries and we had a directory containing patch files for the libraries. The Makefile applied the patches before building the libraries. When we upgraded any library, we almost never were able to build it straight away, 9 times out of 10 there would be a patch failure. So I would have to get the old library, examine the effect of the patch on the old source file, and then try to create a new patch file compatible with the newer version of the file. All the while I might be introducing new bugs.

Let me guess - you attempted to apply diff patches in blind, not the
3-way merges provided by a proper version control system.

BTW, a proper way to use an open-source program as a library is to split
it up into a library and a small executable part, and convince the
project managers to accept those changes in the original repo. No
merging problems anymore.

Frederick Virchanza Gotham

unread,

Jan 30, 2023, 3:56:34 AM1/30/23

to

On Monday, January 30, 2023 at 8:52:46 AM UTC, Paavo Helde wrote:

> Let me guess - you attempted to apply diff patches in blind, not the
> 3-way merges provided by a proper version control system.

Sometimes the new version of the library had deleted a function whose body was patched, and replaced it with another function by another name.

> BTW, a proper way to use an open-source program as a library is to split
> it up into a library and a small executable part, and convince the
> project managers to accept those changes in the original repo. No
> merging problems anymore.

Yeah I could have built busybox and badvpn both as dynamic shared libraries. That's still an option.
I want to be able to provide 'ssh' as a statically linked executable though, and also as a fat binary to run on nearly any system.

David Brown

unread,

Jan 30, 2023, 4:39:46 AM1/30/23

to

On 30/01/2023 09:25, Frederick Virchanza Gotham wrote:
> On Monday, January 30, 2023 at 7:19:14 AM UTC, David Brown wrote:
>>
>> The norm, however, is that programmers write or edit source code,
>> and it is compiled and then linked.
>
>
> Generally speaking, when a person encounters something that they've
> never encountered before, they might react with suspicion, or they
> might react with intrigue. This is very much down to the person's
> mind frame and inner stillness. In some communities it is pervasive
> to react with suspicion, and in some communities it is pervasive to
> react with intrigue.
>
> There are words we use to describe things we haven't seen before; we
> call them 'new', or 'strange', or 'abnormal', or 'odd', or 'out of
> the ordinary', or 'out of the norm'.
>

Of course people should be sceptical to something new and unusual in a
software build. I don't know how many C and C++ programmers there have
been for the last few decades, but it is a /lot/. When someone comes up
with a radical divergence from established practices, the /correct/
reaction is scepticism. That does not mean it should immediately be
dismissed as a bad idea - it means you should have good justifications
for going outside the norm.

You don't have good reasons, other than a quick-fix hack. Sometimes a
hack is all you need. Sometimes an unusual solution is useful in rare
and niche circumstances. But if you want your program to be taken
seriously and used by other people, you need to have something that
other people are comfortable with. A hack solution is not that. An
unmaintainable cyborg program is not that. Amateur messing around with
security-critical code is not that.

Your solution here might work as a quick hack. But you are missing the
big picture. If you have any ambition for the program to be used
outside of your own testing on your own computer, you have to play with
others - you have to have a solution that others are comfortable working
with and using.

You've been given a variety of good advice and suggestions for
alternative ways to handle the situation. You can take your pick, or
combine them, or find other ways. Or you can try to persuade the rest
of the world to change.

Just to be clear here, the idea of being able to conveniently set up a
proxy via an ssh link is quite nice - I can see it appealing to some
people. But no one will use it when it is made the way you are
suggesting. If you put together a bash or Python script that started
the relevant programs with the right parameters, it could easily be a
useful little tool that could make its way into common Linux
distributions. But building it with glued bits of different projects,
with post-compilation muddling of object files, it will /never/ be
acceptable by anyone who takes security, reliability or maintainability
seriously.

Kenny McCormack

unread,

Jan 30, 2023, 5:58:59 AM1/30/23

to

In article <tr7r0v$36k7e$2...@dont-email.me>,
David Brown <david...@hesbynett.no> wrote:
...

>I /do/ have something against it - and it is not indoctrination.

I'm sure the Jonestown folks would state in all sincerity that they were
not "indoctrinated". No one ever thinks it could happen to them.

--
The motto of the GOP "base": You can't *be* a billionaire, but at least you
can vote like one.

Frederick Virchanza Gotham

unread,

Jan 31, 2023, 5:29:37 PM1/31/23

to

On Monday, January 30, 2023 at 10:58:59 AM UTC, Kenny McCormack wrote:
>
> I'm sure the Jonestown folks would state in all sincerity that they were
> not "indoctrinated". No one ever thinks it could happen to them.

Some things just feel wrong to us because of where we grew up and the people we had around us. I consider keeping a dog as a pet, and I'm happy to have people farm cows for us to eat beef. I wouldn't want dogs farmed for food though -- it just feels wrong to me, however in other parts of the world they eat dogs.

Some people won't eat dog, and some people won't edit a binary. You probably won't change that streak in them. It just feels wrong to some people.

Kenny McCormack

unread,

Jan 31, 2023, 6:55:58 PM1/31/23

to

In article <1511fb1e-314d-42b5...@googlegroups.com>,

Frederick Virchanza Gotham <cauldwel...@gmail.com> wrote:

Exactly. And well put.

It really is, as you say, cultural.

--
Nov 4, 2008 - the day when everything went
from being Clinton's fault to being Obama's fault.

Paavo Helde

unread,

Feb 1, 2023, 2:01:10 AM2/1/23

to

30.01.2023 11:39 David Brown kirjutas:
>
> Just to be clear here, the idea of being able to conveniently set up a
> proxy via an ssh link is quite nice - I can see it appealing to some
> people. But no one will use it when it is made the way you are
> suggesting. If you put together a bash or Python script that started
> the relevant programs with the right parameters, it could easily be a
> useful little tool that could make its way into common Linux
> distributions. But building it with glued bits of different projects,
> with post-compilation muddling of object files, it will /never/ be
> acceptable by anyone who takes security, reliability or maintainability
> seriously.

Why do you think he is targeting anyone who takes security seriously? So
far what I have gathered from his posts the aim is to get a precompiled
binary on users' computers, named 'ssh' and probably needing root
privileges to run (to "alter the routing table").

David Brown

unread,

Feb 1, 2023, 3:07:45 AM2/1/23

to

I am assuming he is writing a program that he hopes will be useful to
people, rather than a rootkit or malware! And while end users rarely
know much about security, the people who put together Linux (or other
*nix) distributions do - if that is his ultimate aim for the program.

Of course, maybe he's just doing this for fun and the challenge.

Frederick Virchanza Gotham

unread,

Feb 1, 2023, 3:15:00 AM2/1/23

to

On Wednesday, February 1, 2023 at 8:07:45 AM UTC, David Brown wrote:

> I am assuming he is writing a program that he hopes will be useful to
> people, rather than a rootkit or malware! And while end users rarely
> know much about security, the people who put together Linux (or other
> *nix) distributions do - if that is his ultimate aim for the program.
>
> Of course, maybe he's just doing this for fun and the challenge.

Again, I can't see your arguments amounting to more than "I don't eat dog because I come from a place where dogs are kept as pets".

You're adverse to editing object files, that's the summation of all of this.

The consequences of editing an object file can be determined from this 106-page document:
https://refspecs.linuxfoundation.org/elf/elf.pdf

The consequences of editing a C++ source file can be determined from this 1995-page document:
https://open-std.org/JTC1/SC22/WG21/docs/papers/2022/n4910.pdf

I'll go with the 106-pager.

Paavo wrote:
> Why do you think he is targeting anyone who takes security seriously? So
> far what I have gathered from his posts the aim is to get a precompiled
> binary on users' computers, named 'ssh' and probably needing root
> privileges to run (to "alter the routing table").

It will need privileges to create a TUN device and also to edit the routing table, so most people will run it using 'sudo', however I think there might be a way to grant these two privileges to the process without going all out and running it as root, I remember years ago selectively giving particular permissions to a process, it's been a while since I've done it.

David Brown

unread,

Feb 1, 2023, 3:22:36 AM2/1/23

to

Kenny is a dedicated troll - he just posts to cause annoyance. There is
rarely any point in what he says, and even more rarely any point in
replying to him.

"Indoctrination" means that someone has repeatedly heard "do this" or
"don't do that" from a source they consider authoritative, and then they
follow those rules because of the authoritative source rather than
because they have considered the matter carefully themselves.

It is extremely rare that the topic of editing binaries as part of a
build process comes up - I have never read of it in any books, learned
of it in any course, or otherwise come across it in any serious context.
Ergo, it is impossible for my opinion on it to be the result of
indoctrination. And I think that applies to pretty much everyone.

As I have explained, manipulation of binaries can be appropriate as part
of a build process - I do it regularly. Your case is not appropriate
use. I have tried to explain way - based on /considered/ opinion and
not some "cultural" thing, indoctrination, or that it "feels wrong".

If you want people to thoughtlessly praise your cool tricks, take up
juggling or do magic shows for children. If you want people to give
honest opinions based on their knowledge and experience, come to a
newsgroup like this one.

But if you think it is better to insult or demean those who disagree
with you, you are on your own.

Frederick Virchanza Gotham

unread,

Feb 1, 2023, 8:53:28 AM2/1/23

to

On Wednesday, February 1, 2023 at 8:22:36 AM UTC, David Brown wrote:
>
> "Indoctrination" means that someone has repeatedly heard "do this" or
> "don't do that" from a source they consider authoritative, and then they
> follow those rules because of the authoritative source rather than
> because they have considered the matter carefully themselves.
>
> It is extremely rare that the topic of editing binaries as part of a
> build process comes up - I have never read of it in any books, learned
> of it in any course, or otherwise come across it in any serious context.
> Ergo, it is impossible for my opinion on it to be the result of
> indoctrination. And I think that applies to pretty much everyone.

Indoctrination can come from silence just as much as it can come from discussion. And it can some from a grey area between silence and discussion, like when people talk in parables or talk in a round-about way about something. There are some Christian churches near where I live who don't ever mention certain topics -- so children from the age of 4 up through their teenage years learn that it's "something you don't talk about".

I don't recall in my childhood people ever talking about farming dogs for food, but nonetheless I found the idea repugnant when I first heard it.

> As I have explained, manipulation of binaries can be appropriate as part
> of a build process - I do it regularly. Your case is not appropriate
> use. I have tried to explain way - based on /considered/ opinion and
> not some "cultural" thing, indoctrination, or that it "feels wrong".

Okay so then you think it's okay for some reasons, but not okay for others. Like if someone doesn't mind farming dogs for their hair but refuses to kill them and eat them. There's always a spectrum.

> If you want people to thoughtlessly praise your cool tricks, take up
> juggling or do magic shows for children. If you want people to give
> honest opinions based on their knowledge and experience, come to a
> newsgroup like this one.
>
> But if you think it is better to insult or demean those who disagree
> with you, you are on your own.

I'm making arguments here David, that's all. I've never sat at your desk and looked over your shoulder for a few hours, but you're probably good at your job. Two people can both be good at their job and still disagree on everything. There's more than one way to be a successful computer programmer.

Ben Bacarisse

unread,

Feb 1, 2023, 8:55:18 AM2/1/23

to

Frederick Virchanza Gotham <cauldwel...@gmail.com> writes:

> On Wednesday, February 1, 2023 at 8:07:45 AM UTC, David Brown wrote:
>
>> I am assuming he is writing a program that he hopes will be useful to
>> people, rather than a rootkit or malware! And while end users rarely
>> know much about security, the people who put together Linux (or other
>> *nix) distributions do - if that is his ultimate aim for the program.
>>
>> Of course, maybe he's just doing this for fun and the challenge.
>
> Again, I can't see your arguments amounting to more than "I don't eat
> dog because I come from a place where dogs are kept as pets".

The "culture" analogy is OK, but picking that example is biased.
Editing object code rather than source code is something that is
relatively common in hacker culture. (One of the few times I've ever
edited a binary was to gain access to a system.) Editing source code
allows for reviews and auditing so is preferred in the security
conscious programming culture.

I'm not saying there is anything nefarious going on here, but it's not
simply a matter of taste as your analogy suggests.

> You're adverse to editing object files, that's the summation of all of
> this.

I'd say the my objection -- aversion if you will -- is not just a matter
of taste. Editing source code has all sort of very practical advantages
though there are times -- lost source for example -- when it's
impossible. That's the only other time I can recall doing this for any
practical purpose, and all the while I wished I had an alternative.

> The consequences of editing an object file can be determined from this
> 106-page document: https://refspecs.linuxfoundation.org/elf/elf.pdf
>
> The consequences of editing a C++ source file can be determined from
> this 1995-page document:
> https://open-std.org/JTC1/SC22/WG21/docs/papers/2022/n4910.pdf
>
> I'll go with the 106-pager.

That's a weird comparison. Neither tell you anything like what you need
to know about editing source or object files (in general).

--
Ben.

Paavo Helde

unread,

Feb 2, 2023, 3:58:25 AM2/2/23

to

ssh is a program which is legally *expected* to ask for network
passwords, to see all network traffic going through it, and to initiate
network connections on another machine.

Malice or not, no sane person would use a variant of it which does not
come from a trusted source, not to speak about a variant which comes as
a precompiled binary, and not to speak about a variant which requires
sudo to run.

The best case scenario here is that the process just exposes a larger
than needed attack surface to hackers before dropping the superuser
rights. And this is the best case scenario.