Shared libraries support for mmu-less Embox builds

Felix Sulima

unread,

Sep 10, 2013, 7:30:20 AM9/10/13

to an...@koroneynikov.info, embox...@googlegroups.com

Hello, Anton.

I think this will not be much of a mistake if I say that the Embox team would enormously appreciate if you could help estimate the efforts for implementing support for shared libraries for mmu-less builds.

I have found some information how it can be done on ucLinux (although there are some other methods):

http://www.securecomputing.com/index.cfm?sKey=1828

http://gcc.gnu.org/ml/gcc/2004-06/msg00648.html

The corresponding mode in modern gcc is enabled by -mid-shared-library flag, which is unfortunately available only for M68K and Blackfin processors, and I'm not highly aware about all necessary support from other tools.

The external static library number allocation and assignement, as you already said, looks more like a kludge for ucLinux, however in my opinion for all-statically-linked Embox it looks like a good solution.

The general question is: how difficult it may be estimated to extend the GCC compiler support for other architectures (arm appears to be of high importance, but also ubiquitous x86 may get use of this mode at least for testing purposes). Of course any constructive criticism and discussion are welcome, in fact I would hope to hereby start it.

Regards,

Felix.

Felix Sulima

unread,

Sep 10, 2013, 7:31:31 AM9/10/13

to an...@korobeynikov.info, embox...@googlegroups.com

Anton Korobeynikov

unread,

Sep 12, 2013, 1:49:07 PM9/12/13

to Felix Sulima, embox...@googlegroups.com

Felix,

I looked over the issues for MMU-less PIC stuff and the situation
looks more or less clear.

In fact, I do not understand why you need something special here at
all! Regardless of MMU after you loaded the shared library you have
the relocation process. And it's the job of the dynamic OS linker to
fill in necessary offsets here.

So, maybe you can clarify, why do you think one needs something special here?

--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Eldar Abusalimov

unread,

Sep 12, 2013, 5:32:39 PM9/12/13

to embox...@googlegroups.com, Anton Korobeynikov, Felix Sulima

Anton,

If I'm not mistaken, the problem is that one can't run two applications that share a library simultaneously in case when the library have some data/bss, i.e. for each running application there should be an own copy of these sections. Because of the absence of MMU, these sections will have different memory addresses, and the library code must be able to address both of them somehow, depending on a currently running application.

--
You received this message because you are subscribed to the Google Groups "embox-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to embox-devel...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Best regards,
Eldar Sh. Abusalimov

Anton Korobeynikov

unread,

Sep 12, 2013, 5:43:47 PM9/12/13

to Eldar Abusalimov, embox...@googlegroups.com, Felix Sulima

Eldar,

Right. The only question is whether the code segment is shared or not.

If the code segment is not shared, then I do not see the problem - in
such a case we'll have two full copies of the DSO in the memory and
dynamic linker should provide separate GOT / PLTs for them.

If the code segment is shared, then we surely cannot use pcrel
relocation to calculate the address of GOT inside the DSO. Instead, we
will need to reserve the register for GOT address, initialize it
during the binary / DSO loading and make sure is saved somewhere (e.g.
in the zero GOT entry).

Then, the proper save / restore of this register should be performed
in each PLT entry (initialized to proper GOT offset via calling
through PLT trampoline and restored during the return) and this way
every "part-copy" of DSO will receive its own GOT.

In any case, most of the job is for dynamic linker...

Felix Sulima

unread,

Sep 12, 2013, 6:04:56 PM9/12/13

to Anton Korobeynikov, Eldar Abusalimov, embox...@googlegroups.com

Hello, Anton.

Thanks for responses and efforts on this matter.

Are you now talking about FDPIC which you mentioned?

http://gcc.gnu.org/ml/gcc/2008-02/msg00619.html

If no, please explain what do you mean.

Regarding dynamic linker: we do not intend to have one because we plan to have everything linked statically together, including OS core, shared libraries and applications. For any particular application the list of shared libraries to initialize at it's startup will be determined statically, there is no need for expensive dynamic symbol resolver.

So, we are not talking about dynamic libraries, but only about shared libraries. Therefore, it looks undesirable to have PLTs at all.

In this light id-shared-library method with all libraries IDs statically preallocated looks preferable to me compared to FDPIC in terms of implementation cost.

So, what do you think? Does this somehow change the situation in your opinion?

And also I beg to excuse me of being a little sluggish on this matter, this is because I didn't deal with PIC before, I mostly come from MS Windows DLL world where things are different.

Regards,

Felix.

13.09.2013, в 1:43, Anton Korobeynikov <an...@korobeynikov.info> написал(а):

Eldar Abusalimov

unread,

Sep 12, 2013, 6:15:23 PM9/12/13

to embox...@googlegroups.com, Anton Korobeynikov, Felix Sulima

Anton,

Thanks for the quick answer.

On Fri, Sep 13, 2013 at 1:43 AM, Anton Korobeynikov <an...@korobeynikov.info> wrote:

If the code segment is shared, then we surely cannot use pcrel
relocation to calculate the address of GOT inside the DSO. Instead, we
will need to reserve the register for GOT address, initialize it
during the binary / DSO loading and make sure is saved somewhere (e.g.
in the zero GOT entry).

I thought GOT always receives a static address/offset. Is this GOT indirection-through-a-register just another compilation mode?

Then, the proper save / restore of this register should be performed
in each PLT entry (initialized to proper GOT offset via calling
through PLT trampoline and restored during the return) and this way
every "part-copy" of DSO will receive its own GOT.

Aha, I guessed, there have to be some kind of magic like that... =)

patacongo

unread,

Sep 13, 2013, 6:37:23 PM9/13/13

to embox...@googlegroups.com, an...@koroneynikov.info

There is also XFLAT: http://xflat.sourceforge.net/

That is something that I did for uClinux years ago. It does not depend on any features of the compiler and so is portable to any architecture that supports position independent execution. It includes some tools and the dynamic loader and should be more or less a drop in. There are some limitations to this approach as documented in the above reference.

Greg

Anton Korobeynikov

unread,

Sep 14, 2013, 12:40:05 PM9/14/13

to Felix Sulima, Eldar Abusalimov, embox...@googlegroups.com

Felix,

Sorry for delay with answer, I caught a cold...

> Are you now talking about FDPIC which you mentioned?
> http://gcc.gnu.org/ml/gcc/2008-02/msg00619.html
> If no, please explain what do you mean.

Forget about precise implementation for a moment. Basically, I'd like
to understand all the problems.

> Regarding dynamic linker: we do not intend to have one because we plan to
> have everything linked statically together, including OS core, shared
> libraries and applications.

Right. However, this need not to be a fully dynamic linker. You can
postprocess the binaries, etc.

> So, we are not talking about dynamic libraries, but only about shared
> libraries.

Please clarify the differences between "shared" and "dynamic"
libraries you're thinking. For me these look like pretty same beasts.

> In this light id-shared-library method with all libraries IDs statically
> preallocated looks preferable to me compared to FDPIC in terms of
> implementation cost.
> So, what do you think? Does this somehow change the situation in your
> opinion?

Not yet. I'd prefer to keep the toolchain changes minimal for the sake
of the maintenance costs.

Anton Korobeynikov

unread,

Sep 14, 2013, 12:42:40 PM9/14/13

to Eldar Abusalimov, embox...@googlegroups.com, Felix Sulima

Eldar,

> I thought GOT always receives a static address/offset. Is this GOT
> indirection-through-a-register just another compilation mode?

Sorry, I do not follow... Usually the address of GOT is resolved via
pc-relative relocation. It will be fulfilled by a dynamic linker and
the address of pc we can know in the runtime (e.g. via "call .+5, popl
reg" trick on x86 and so on).

Anton Korobeynikov

unread,

Sep 14, 2013, 12:47:20 PM9/14/13

to Felix Sulima, Eldar Abusalimov, embox...@googlegroups.com

Actually, it may be better to discuss all the details on, say,
Tuesday, so we may end with proper design.

Felix Sulima

unread,

Sep 14, 2013, 6:15:27 PM9/14/13

to Anton Korobeynikov, Eldar Abusalimov, embox...@googlegroups.com

14.09.2013, в 20:42, Anton Korobeynikov <an...@korobeynikov.info> написал(а):

> Eldar,
>
>> I thought GOT always receives a static address/offset. Is this GOT
>> indirection-through-a-register just another compilation mode?
> Sorry, I do not follow... Usually the address of GOT is resolved via
> pc-relative relocation. It will be fulfilled by a dynamic linker and
> the address of pc we can know in the runtime (e.g. via "call .+5, popl
> reg" trick on x86 and so on).
>

I will permit myself to intervene.
I think Eldar was asking if what you are describing is some sort of already 'standardized' compilation mode for some compiler.
And in my opinion the answer should be 'no': this ABI is new, toolchain modifications are required.

Anton Korobeynikov

unread,

Sep 14, 2013, 6:28:12 PM9/14/13

to Felix Sulima, Eldar Abusalimov, embox...@googlegroups.com

> I will permit myself to intervene.
> I think Eldar was asking if what you are describing is some sort of already 'standardized' compilation mode for some compiler.

I am describing how *usual* PIC is done. Nothing fancy. The only magic
occurs later, i.e. how the loading and address resolution is being
done.

> And in my opinion the answer should be 'no': this ABI is new, toolchain modifications are required.

If you want to change the toolchain think twice. If you still think
that this is a good idea, think once again. Think about the
maintenance - who will support the patches in a year? In two? Will you
require specific version of say, gcc, to be used? What's about
compiler bugs which will need to be fixed? Are you going to support
the whole zoo of potential platforms? Are you going to introduce &
support Embox as full OS in, say, gcc / binutils? Note that these
folks are pretty fast on removing obsolete / unmaintained stuff :)

As for now, I still do not understand, why conventional PIC won't work
for Embox (with possible some post-processing at the link time, like,
e.g. turning ELF objects into some "embox blob").

Felix Sulima

unread,

Sep 14, 2013, 6:50:57 PM9/14/13

to embox...@googlegroups.com, an...@korobeynikov.info, Eldar Abusalimov

Anton,

please see inline

14.09.2013, в 20:40, Anton Korobeynikov <an...@korobeynikov.info> написал(а):

Felix,

Sorry for delay with answer, I caught a cold...

Are you now talking about FDPIC which you mentioned?
http://gcc.gnu.org/ml/gcc/2008-02/msg00619.html
If no, please explain what do you mean.
Forget about precise implementation for a moment. Basically, I'd like
to understand all the problems.

Ok. We have discussed what you proposed and we understood it such that you suggest to generate (on the fly by dynamic linker) the 'trampoline' for each PLT entry for each library-application run-time instance, such that for the same PLT function for different tasks GOT will point to different trampolines.

So, after some thoughts I see the problem here with pointers to functions, like every time compiler encounters pointer to function the trampoline has to be generated, because pointer can be called from the other library code. Not sure if I understand how it will all work together.

Regarding dynamic linker: we do not intend to have one because we plan to
have everything linked statically together, including OS core, shared
libraries and applications.
Right. However, this need not to be a fully dynamic linker. You can
postprocess the binaries, etc.

Yes, this is what we want to have finally.

However with such an approach we can never get rid of dynamic linker completely, since someone has to generate trampolines.

So, we are not talking about dynamic libraries, but only about shared
libraries.
Please clarify the differences between "shared" and "dynamic"
libraries you're thinking. For me these look like pretty same beasts.

Well, shared is different from dynamic in the sense that dynamic libraries are supposed to be loaded and linked at run-time (and they do not necessarily have to be shared, e.g. windows DLLs) whereas main purpose of shared libraries is to help to conserve memory by sharing code-rodata segments.

Therefore when it comes to symbol resolution at runtime, it's more about dynamic libraries, whereas our main goal is to have as much as possible done statically and have a shared property.

I believe these terms are often confused because usually sharing property is usually obtained by means of dynamic linking

http://en.wikipedia.org/wiki/Library_(computing)#Shared_libraries

In this light id-shared-library method with all libraries IDs statically
preallocated looks preferable to me compared to FDPIC in terms of
implementation cost.
So, what do you think? Does this somehow change the situation in your
opinion?
Not yet. I'd prefer to keep the toolchain changes minimal for the sake
of the maintenance costs.

This priority is well understood. The problem is that it is not exactly clear in which case the changes would be minimal. And since, for example, I do not understand how is it possible to get what we want without changing the toolchain at all, and therefore I suppose that it would be necessary to do at least some modifications, so it appears to me that this method could be cheaper. This is of course disputable, but at least lets consider it compared to the others.

--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Felix Sulima

unread,

Sep 14, 2013, 6:57:27 PM9/14/13

to Anton Korobeynikov, Eldar Abusalimov, embox...@googlegroups.com

15.09.2013, в 2:28, Anton Korobeynikov <an...@korobeynikov.info> написал(а):

I will permit myself to intervene.
I think Eldar was asking if what you are describing is some sort of already 'standardized' compilation mode for some compiler.
I am describing how *usual* PIC is done. Nothing fancy. The only magic
occurs later, i.e. how the loading and address resolution is being
done.

Hm, you have said the following:

On Fri, Sep 13, 2013 at 1:43 AM, Anton Korobeynikov <an...@korobeynikov.info> wrote:

If the code segment is shared, then we surely cannot use pcrel
relocation to calculate the address of GOT inside the DSO. Instead, we
will need to reserve the register for GOT address, initialize it
during the binary / DSO loading and make sure is saved somewhere (e.g.
in the zero GOT entry).

So, how does this 'register reservation for GOT address' is achieved with unmodified toolchain? By some compiler flags maybe?

And in my opinion the answer should be 'no': this ABI is new, toolchain modifications are required.
If you want to change the toolchain think twice. If you still think
that this is a good idea, think once again. Think about the
maintenance - who will support the patches in a year? In two? Will you
require specific version of say, gcc, to be used? What's about
compiler bugs which will need to be fixed? Are you going to support
the whole zoo of potential platforms? Are you going to introduce &
support Embox as full OS in, say, gcc / binutils? Note that these
folks are pretty fast on removing obsolete / unmaintained stuff :)

Please be sure - these issues are well understood. Modification of the toolchain is the last thing we would prefer to do.

As for now, I still do not understand, why conventional PIC won't work
for Embox (with possible some post-processing at the link time, like,
e.g. turning ELF objects into some "embox blob").

Well, if you have understanding of how it could work, maybe you may share your vision with the others.

Felix Sulima

unread,

Sep 14, 2013, 7:01:09 PM9/14/13

to Anton Korobeynikov, Eldar Abusalimov, embox...@googlegroups.com

Anton,

as you prefer and whenever you prefer.
Please come to us when you feel so.

Regards,
Felix.

14.09.2013, в 20:47, Anton Korobeynikov <an...@korobeynikov.info> написал(а):

Felix Sulima

unread,

Sep 14, 2013, 7:06:18 PM9/14/13

to embox...@googlegroups.com, patacongo, an...@koroneynikov.info

Hello, Greg.

Thank you for this input.

It will take some time to evaluate the XFLAT solution.

I hope to come up with opinion and feedback in few days.

Regards,

Felix.

14.09.2013, в 2:37, patacongo <spud...@gmail.com> написал(а):

There is also XFLAT: http://xflat.sourceforge.net/

That is something that I did for uClinux years ago. It does not depend on any features of the compiler and so is portable to any architecture that supports position independent execution. It includes some tools and the dynamic loader and should be more or less a drop in. There are some limitations to this approach as documented in the above reference.

Greg

Felix Sulima

unread,

Oct 3, 2013, 6:40:52 AM10/3/13

to embox...@googlegroups.com, patacongo, an...@koroneynikov.info

Hello, Greg.

I briefly looked at XFLAT and other stuff. Your overall work looks impressive.

However, there are some concerns.

I didn't find the exact documentation, only the general description of the solution, to gain deeper understanding it is necessary to look closely at source code, but as far as I understood, XFLAT relies on gcc compiler flag -msingle-pic-base, but the problem is that as far as I can see that this flag only exists on Arm and PowerPC platforms. Therefore it is not clear, how to extend this solution to other platforms, especially x86. This is the main issue - and I would be happy to be wrong about it.

Also, it appears to be more effecient to perform all the linking at compile-time and have minimal runtime overhead, however XFLAT still have to use dynamic linking keeping images on filesystem. This also poses additional requirement of debugger support for loaded executables.

Another problem, you have described it here: http://nuttx.org/doku.php?id=wiki:vfs:nxflat , I'm not sure, but probably it applies also to XFLAT.

In general, in my opinion it's a pity that toolchains are not targeted for embedded development, it appears like development of special embedded ABI is required free of all the unnecessary features existing for full-scale platforms. However it doesn't appear that this concern more or less addressed in the industry these days :(

As for us, we're still facing the same dilemma of choosing the solution that would satisfy all the requirements and not to be too burdened with support efforts.

Regards,

Felix.

patacongo

unread,

Oct 6, 2013, 10:14:57 PM10/6/13

to embox...@googlegroups.com

NXFLAT is a completely different thing. NXFLAT is stripped down for deeply embedded systems. NXFLAT does not support shared libraries and nothing that you read about NXFLAT has any relevance to XFLAT.

Greg

Reply all

Reply to author

Forward