weak symbols and dlopen()

Christophe Lohr

unread,

Mar 6, 2008, 3:54:25 AM3/6/08

to

Hi,

Here is a simple case of my problem:
I use dlopen() to load libraries dynamically. However, the first library
loaded uses symbols that do not exist at the time of loading, but which
are defined in a second library that will be loaded later.
(Yes, I know, the problem is trivial, I have to load them in reverse
order, but that is for the demonstation ;-))

In my example, the first library is called "libA", the second is "libB",
and my program is "loader".
Function get_n() is used by libA, but is defined in libB.

For this to be accepted by dlopen (), I declare this symbol "# pragma weak".
Obviously, the weak pragma makes the loading to go well because it does
not try to solve this symbol while loading the library.
My problem is that it does not try to bind it any more...
Resulting in a segmentation fault at run-time...

My question is: is there a "simple" way to try "binding in a second
step" all pending symbols (the weak ones I introduced)?

Many thanks

loader.c

libA.c

libB.c

Paul Pluzhnikov

unread,

Mar 7, 2008, 2:56:55 AM3/7/08

to

Christophe Lohr <christo...@enst-bretagne.fr> writes:

> In my example, the first library is called "libA", the second is
> "libB", and my program is "loader".
> Function get_n() is used by libA, but is defined in libB.
>
> For this to be accepted by dlopen (), I declare this symbol "# pragma weak".

You don't need to do that, *provided* you build libA.so correctly
(which you didn't: you *must* use -fPIC when building shared
libraries, unless you know exactly what you are doing).

You also have to load your libraries with RTLD_GLOBAL, if you want
their symbols to be accessible to other libraries.

Proof:

$ diff loader.c.orig loader.c
12c12
< handleA = dlopen("./libA.so", RTLD_LAZY);
---
> handleA = dlopen("./libA.so", RTLD_LAZY|RTLD_GLOBAL);
17c17
< handleB = dlopen("./libB.so", RTLD_LAZY);
---
> handleB = dlopen("./libB.so", RTLD_LAZY|RTLD_GLOBAL);

$ gcc -shared -fPIC -o libA.so libA.c &&
gcc -shared -fPIC -o libB.so libB.c &&
gcc -o loader loader.c -ldl &&
./loader
run
Hello, 10!

$ sed '/#pragma/d' libA.c > libA1.c &&
gcc -shared -fPIC -o libA.so libA1.c &&
./loader
run
Hello, 10!

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.

Christophe Lohr

unread,

Mar 10, 2008, 8:23:01 AM3/10/08

to

Paul Pluzhnikov a écrit :

>
> You don't need to do that, *provided* you build libA.so correctly
> (which you didn't: you *must* use -fPIC when building shared
> libraries, unless you know exactly what you are doing).
>
> You also have to load your libraries with RTLD_GLOBAL, if you want
> their symbols to be accessible to other libraries.

Many thanks!

Note that the example provided into the dlopen() manual page uses
"-rdynamic" flag.
Is this flag dedicated to compile the program? should it also be used to
compile libraries?

I have a subsequent problem: dlclose() is unable to unload my library
"libB.so" (Moreover, it doesn't warn me about that, and simply returns 0)

How may I force dlclose() to unload my libB.c (e.g. to replace it by
another code), without unloading libA?

Many thanks
Christophe.

Paul Pluzhnikov

unread,

Mar 11, 2008, 1:39:01 AM3/11/08

to

Christophe Lohr <christo...@enst-bretagne.fr> writes:

> Note that the example provided into the dlopen() manual page uses
> "-rdynamic" flag.

This flag is only required for the main executable, and only when
you want to be able to do dlsym() on its symbols.

The flag is a no-op when building shared libraries.

> I have a subsequent problem: dlclose() is unable to unload my library
> "libB.so"

How do you know that?

[That happens to be true, but I am just wondering how you came to
this realization.]

Running your original test with 'LD_DEBUG=files' reveals:

13405: calling init: ./libA.so
13405: opening file=./libA.so; opencount == 1
13405: file=./libB.so; generating link map
...
13405: calling init: ./libB.so
13405: opening file=./libB.so; opencount == 1
run
13405: file=./libB.so; needed by ./libA.so (relocation dependency)
Hello, 10!
13405: closing file=./libB.so; opencount == 2

Note the link count on libB.so -- it got incremented when libA.so
reference to get_n() was resolved to libB.so definition.

> How may I force dlclose() to unload my libB.c (e.g. to replace it by
> another code), without unloading libA?

You can't.

Actually, you can call dlclose(handleB) twice in a row, and it will
get unloaded. But the resolution from run() in libA.so will *not*
happen again -- it only happens once, so even if you later load a
different implementation of get_n(), run() will remember the old
pointer, and will keep jumping to that location (which can land it
either nowhere, or in the middle of new code; with a crash likely
either way).

A fairly detailed explanation of how ELF lazy symbol resolution
works, and why it works only once (it's an optimization) can be
found here: http://www.iecc.com/linker/linker10.html (look for
Lazy procedure linkage with the PLT).

If you want to be able to load different implementations of get_n(),
then you have to explicitly manage the binding between run()
and get_n() -- by using dlopen()/dlsym()/dlclose().

If you can't control libA.so, then you can still control the binding
by manipulating the GOT in libA.so, but it will get quite tricky --
you'll need to understand how it works, you'll need to find get_n()'s
entry in the GOT, and reset it to "unresolved" state after unloading
libB.so.

Christophe Lohr

unread,

Mar 11, 2008, 5:48:08 AM3/11/08

to

Paul Pluzhnikov a écrit :

>
>> I have a subsequent problem: dlclose() is unable to unload my library
>> "libB.so"
>
> How do you know that?

On my orginal test:
$ diff loader.c loader_v2.c
37c37,41
< dlclose(handleB);
---
> printf("dlcolse: %d\n", dlclose(handleB));
> (*p_run) ();
> printf("dlcolse: %d\n", dlclose(handleB));
> (*p_run) ();
>
After attempts to remove libB.so, calls to "get_n()" still work...

> Actually, you can call dlclose(handleB) twice in a row, and it will
> get unloaded. But the resolution from run() in libA.so will *not*
> happen again -- it only happens once, so even if you later load a
> different implementation of get_n(), run() will remember the old
> pointer, and will keep jumping to that location (which can land it
> either nowhere, or in the middle of new code; with a crash likely
> either way).

So, in my example, the first dlclose() is actually a success (it returns
0). And the following "run" is just a hazard: the code of get_n() is
still in memory, but there are no warranties for that.
And the second dlclose() is really a failure (retuns -1), libB.so is
already unloaded.
Is it?

> If you can't control libA.so, then you can still control the binding
> by manipulating the GOT in libA.so, but it will get quite tricky --
> you'll need to understand how it works, you'll need to find get_n()'s
> entry in the GOT, and reset it to "unresolved" state after unloading
> libB.so.

On could imagine a variant of dlclose() that keep in mind symbols of the
program still binded to objects of the library to remove. So that a
variant of dlopen() may use this list of "pending symbols" to bind them
to objects of the library to load...
Do you know some works in this direction?

Many thanks,
Christophe

Paul Pluzhnikov

unread,

Mar 12, 2008, 1:31:59 AM3/12/08

to

Christophe Lohr <christo...@enst-bretagne.fr> writes:

> On my orginal test:
> $ diff loader.c loader_v2.c
> 37c37,41
> < dlclose(handleB);
> ---
> > printf("dlcolse: %d\n", dlclose(handleB));
> > (*p_run) ();
> > printf("dlcolse: %d\n", dlclose(handleB));
> > (*p_run) ();
> >
> After attempts to remove libB.so, calls to "get_n()" still work...

At least for me, only the second call to run() works,
and the third one crashes.

> So, in my example, the first dlclose() is actually a success (it
> returns 0). And the following "run" is just a hazard: the code of
> get_n() is still in memory, but there are no warranties for that.

Correct.

> And the second dlclose() is really a failure (retuns -1), libB.so is
> already unloaded. Is it?

Huh? I get what I expect:

Hello, 10!
dlclose: 0
run
Hello, 10!
dlclose: 0
run
Segmentation fault

IOW, the second call to dlclose() also succeeds, and actually is
the one that removes libB.so from the process.

> On could imagine a variant of dlclose() that keep in mind symbols of
> the program still binded to objects of the library to remove. So that
> a variant of dlopen() may use this list of "pending symbols" to bind
> them to objects of the library to load...

One can imagine a lot of things.

> Do you know some works in this direction?

I don't believe such an extension to dlopen()/dlclose() is generally
useful, nor that it will ever be implemented as part of glibc.

Christophe Lohr

unread,

Mar 12, 2008, 5:42:45 AM3/12/08

to

Paul Pluzhnikov a écrit :

>
> Huh? I get what I expect:
>
> Hello, 10!
> dlclose: 0
> run
> Hello, 10!
> dlclose: 0
> run
> Segmentation fault
>
> IOW, the second call to dlclose() also succeeds, and actually is
> the one that removes libB.so from the process.

This is what i get on a PC (Linux 2.6.22, gcc 4.2.3, libc6 2.7):

run
Hello, 10!
dlcolse: 0
run
Hello, 10!
dlcolse: -1
run
Hello, 10!

And this is what i get on a sun (SunOS 5.9, gcc 3.3.1):

run
Hello, 10!
dlcolse: 0
run
Hello, 10!
dlcolse: 1
run
Hello, 10!

> I don't believe such an extension to dlopen()/dlclose() is generally
> useful, nor that it will ever be implemented as part of glibc.

dlopen/dlclose/dlsym imply working with pointers to function and data.
Todays, this is necessary to be able to load/unload/reload libraries.
It works well. Maybe nothing else is useful. However it is a little bit
heavy. To avoid that, IMHO, one needs some extension to handle symbol
table (e.g. retry binding pending symbols, etc.) So that one could call
functions and data from a library to another as usual, without coding a
pointer to them.

A classical use case is to "replace" a library. Replacing a library is
not equivalent to a sequence dlclose+dlopen since it has to handle an
intermediate inconsistent state. While working with pointers to function
and data (retrieved by dlsym), such inconsistent state is managed by the
programmer, and not by the library loader. This is a risk of failure.
This is why I believe such an extension to dlopen()/dlclose() may be
useful. However, I have a lot of imagination :-)

Many thanks
Christophe

Paul Pluzhnikov

unread,

Mar 12, 2008, 12:37:46 PM3/12/08

to

Christophe Lohr <christo...@enst-bretagne.fr> writes:

> This is what i get on a PC (Linux 2.6.22, gcc 4.2.3, libc6 2.7):
>
> run
> Hello, 10!
> dlcolse: 0
> run
> Hello, 10!
> dlcolse: -1
> run
> Hello, 10!

I see. My results were on glibc 2.3.3.
They must have fixed symbol resolution to mark the DSO "not
unloadable" if its symbols are used.

> A classical use case is to "replace" a library.

But when exactly is such "replacing" useful?
I can only imagine very limited circumstances where this might be
useful during development.

> This is why I believe such an extension to dlopen()/dlclose() may be
> useful.

Suppose your extension is implemented. Suppose loader loads libA,
libB, and you call run() which in turn calls libB`get_n().

Suppose now loader loads libC, which also defines get_n().

If you call run() now, which get_n() should be called?
If libB is unloaded, and you call run(), which get_n() should be called?
If libB is "replaced" with libD, which get_n() should be called?

> However, I have a lot of imagination :-)o

I don't believe you've worked through all the implications of your
"design".

Christophe Lohr

unread,

Mar 13, 2008, 8:57:54 AM3/13/08

to

Paul Pluzhnikov a écrit :

>
> Running your original test with 'LD_DEBUG=files' reveals:

(..)
> 13405: file=./libA.so [0]; needed by ./loader [0]
> 13405: file=./libA.so [0]; generating link map
> 13405: dynamic: 0xb7f6b518 base: 0xb7f6a000 size: 0x00001618
> 13405: entry: 0xb7f6a3c0 phdr: 0xb7f6a034 phnum: 4
(..)

By the way, what are the relationship between addresses displayed here
and thouse visited by dl_iterate_phdr(), and the "handle" returned by
dlopen()?

Many thanks
Christophe.

Christophe Lohr

unread,

Mar 13, 2008, 8:48:40 AM3/13/08

to

Paul Pluzhnikov a écrit :

> But when exactly is such "replacing" useful?

There are some (academic) works about "dynamic reconfiguration"
i.e. the ability to change the behavior of an application in live,
without stopping it, while continuing offering services (maybe with a
lower quality for a short time)
This might be useful to fix, enhance, ... systems in use.

Fractal (Think, Cecila, Julia, etc.) could be an answer for that.
http://fractal.objectweb.org/
http://think.objectweb.org/
http://fractal.objectweb.org/c.html
http://fractal.objectweb.org/java.html

However it is very conceptual (it is about meta-modelization...)
But under layer mechanisms are still about loading/unloading/replacing a
code during execution. This is mostly ad-hoc mechanism.
So what kind of services may libc (or so) offers for that?

> Suppose your extension is implemented. Suppose loader loads libA,
> libB, and you call run() which in turn calls libB`get_n().
>
> Suppose now loader loads libC, which also defines get_n().
>
> If you call run() now, which get_n() should be called?

Maybe the extended-dlopen() may implements a specific behavior regarding
the "RTLD_DEEPBIND" flag (or so) to let the user decides if libC'get_n()
should be called "deeper" than libB'get_n()...
(It's still imagination :-) )

> If libB is unloaded, and you call run(), which get_n() should be called?

The extended-dlclose() may introduces flags to let the user to decide of
that...

> If libB is "replaced" with libD, which get_n() should be called?

One could imagine a dlreplace("libB", "libD", flag)
And depending if the flag includes RTLD_DEEPBIND or not, libD'get_n()
could be deeper than existing libC'get_n() or not...

There could be many semantics to "replace" a code. I don't believe one
is better than others. The user should have the ability to choose.

>> However, I have a lot of imagination :-)o
>
> I don't believe you've worked through all the implications of your
> "design".

Yes, that's why I am looking for existing work by serious people!
;-)

Regards,
Christophe

Paul Pluzhnikov

unread,

Mar 14, 2008, 12:50:27 AM3/14/08

to

Christophe Lohr <christo...@enst-bretagne.fr> writes:

> (..)
>> 13405: file=./libA.so [0]; needed by ./loader [0]
>> 13405: file=./libA.so [0]; generating link map
>> 13405: dynamic: 0xb7f6b518 base: 0xb7f6a000 size: 0x00001618
>> 13405: entry: 0xb7f6a3c0 phdr: 0xb7f6a034 phnum: 4
> (..)
>
> By the way, what are the relationship between addresses displayed here
> and thouse visited by dl_iterate_phdr(), and the "handle" returned by
> dlopen()?

dl_iterate_phdr() will call you with a pointer to dl_phdr_info
(allocated on stack) with the following member values:

dlpi_addr = 0xb7f6a000
dlpi_name = "./libA.so"
dlpi_phdr = 0xb7f6a034
dlpi_phnum = 4

The handle returned by dlopen is really a 'struct link_map *',
and has no relationship with any of the addresses above.