Depending on the situation, it may be *IMPOSSIBLE* to load all the
shared libraries at standard but globally unique addresses, as there
probably isn't a master registry of every version of every shared
library ever made for the platform, and they probably all wouldn't
fit simultaneously in the limited address space. What probably
happens is that the library gets loaded at a different address (with
different relocation patches), and that copy can't be shared with
programs using the library at the "standard" address. This can be
tuned in favor of putting the current-version heavily-used system
libraries at standard places.
If you use -fpic, you can use a single copy of the code in memory
for 87 programs using it simultaneously at 29 different load
addresses. If you don't use -fpic, you need at least 29 different
copies because the linker/loader needs to put different addresses
in the different copies.
It is possible that if you don't use -fpic, the system won't even
TRY to share it. The recipe to build a shared library may *require*
-fpic. Otherwise you have a non-shared library which is linked
(copied) into the executable.
> If you compile with -fpic, the code generated can be loaded at any address
> but will be slightly slower.
Here are some examples in a hypothetical assembly language that probably
matches no real machine but is close to a lot of them.
If the program can use the Program Counter (it might have a different
name, like Instruction Counter, for various CPUs) as an index
register, the code will likely contain instructions like:
jp <some_constant>(PC)
rather than
jp <relocatable_address>
If it can't, -fpic will probably start off the code with something like:
load PC, R8172 ; the destiation for "load" is on the right
; in this assembly language.
where Register 8172 will be loaded with an address relative to the
base address for the code segment. (It may need to save and restore
Register 8172 for the caller, also). Branches within the code
look like like:
jp <some_constant>(R8172)
Data references may be made relative to the program counter or the
base address register also.
The use of an index register typically slows down the instruction
by one or a few clock cycles compared to not using an index register.
So do instructions used to set up registers needed only for
position-independence. An added complication: if it is possible
to use an instruction with an index register and a short offset
(say, 16 bits instead of 64 bits), not loading the other 6 bytes
from an instruction may cancel out the cost of an index register,
but only if the library is small enough to have all the branches
"reach" with a 16-bit offset.
There's a trade-off here:
Using -fpic:
- Allows sharing all the copies in use simultaneously.
- The code is slightly slower.
Not using -fpic:
- Does not allow sharing the copies, which requires more
memory, which may involve more paging/swapping, which
slows things down.
- Loading more copies also requires more CPU.
- The code needed to apply relocation patches to the copies
requires time to execute, and possibly more disk I/O to
read this information from the executable. This may end
up slower than -fpic.
Which is better may depend on the library (the C library, used by
almost every program, vs. the Egyptian Hieroglyphics to Klingon
translation library, which is hardly ever used at all, vs. a graphics
library which is heavily used by the X server on the one and only
set of display hardware on this system (so more than one copy is
never used).