QUESTION: Program crashes when its dependency .so is replaced. Why ?

Izo

unread,

May 5, 2005, 7:41:15 AM5/5/05

to

Kernel 2.4.x, 2.6.x

First of all, looking from the point of correct binary distribution
manner, the situation I want to describe is not quite valid. I am aware
of that, still, I want to know the reason for the system behaviour
caused by such situation.

Let's suppose the running program P uses the shared object S.so.C.A.R.
During development, of course, the shared object is changed, many times
not only the implementation but also function prototypes are added or
changed while the version numbers (C.A.R) remain unchanged. Many times,
during tests, the shared object happens to be replaced with new one when
the program (that uses it) runs. What happens is that the program most
probably crashes, at least it always does at first try to call the
function from replaced shared object.

The behaviour seems slightly confusing for me. Why ? The file system /
kernel allows the shared object rewrite / replace without complaint and
it harms the program run-time behaviour while it blocks the program's
binary replacement while such action (in contrary to shared object
replacement) does not harm the program's operation.

What is the reason for such behaviour (program crash etc.) anyway ?

Izo

unread,

May 6, 2005, 3:01:10 AM5/6/05

to

Well, I have been googling a while and have found this thread
(https://www.redhat.com/archives/fedora-list/2004-October/msg04466.html)
in redhat archives dealing with the matter. Can just somebody be kind
and prove the thread content being correct or is there maybe some other
explanation/solution to this problem ?

Izo

Scott Lurndal

unread,

May 12, 2005, 12:30:31 AM5/12/05

to

Izo <I...@siol.net> writes:
>Kernel 2.4.x, 2.6.x
>

>The behaviour seems slightly confusing for me. Why ? The file system /
>kernel allows the shared object rewrite / replace without complaint and
>it harms the program run-time behaviour while it blocks the program's
>binary replacement while such action (in contrary to shared object
>replacement) does not harm the program's operation.
>
>What is the reason for such behaviour (program crash etc.) anyway ?

When the kernel pages out a code page, it just discards it, since it
can always reload it directly from the object itself (e.g. executable
or shared object). So, while some of the code pages will be
in memory when you overwrite the .so, not all will be. When the next
code page is paged in from the new shared object, the function calls
from the old code pages will call incorrect addresses in the new
code pages and will cause arbitrary random code to be executed which
will quickly cause a system fault.

scott

>
>Izo

Paul Colquhoun

unread,

May 12, 2005, 7:20:02 AM5/12/05

to

Unfortunatly it doeszn't work quite that way. When the library is opened,
it gets a file handle attached. This file handle acts just like a directory
entry. The file will not be deleted from the disk until all directory entries
and file handles are removed/closed.

When the library is updated, the new library gets its own disk blocks, and
a new directory entry, but the old library is still on disk and the file
handle pointer to it is still valid.

When the OS needs to page in from teh library, it uses the file handle
it already has, not the (new) directory entry.

--
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/~paulcol
Asking for technical help in newsgroups? Read this first:
http://catb.org/~esr/faqs/smart-questions.html#intro

Scott Lurndal

unread,

May 14, 2005, 12:41:44 PM5/14/05

to

Paul Colquhoun <postm...@andor.dropbear.id.au> writes:
>On Thu, 12 May 2005 04:30:31 GMT, Scott Lurndal <sc...@slp53.sl.home> wrote:
>| Izo <I...@siol.net> writes:
>|>Kernel 2.4.x, 2.6.x
>|>
>|
>|>The behaviour seems slightly confusing for me. Why ? The file system /
>|>kernel allows the shared object rewrite / replace without complaint and
>|>it harms the program run-time behaviour while it blocks the program's
>|>binary replacement while such action (in contrary to shared object
>|>replacement) does not harm the program's operation.
>|>
>|>What is the reason for such behaviour (program crash etc.) anyway ?
>|
>| When the kernel pages out a code page, it just discards it, since it
>| can always reload it directly from the object itself (e.g. executable
>| or shared object). So, while some of the code pages will be
>| in memory when you overwrite the .so, not all will be. When the next
>| code page is paged in from the new shared object, the function calls
>| from the old code pages will call incorrect addresses in the new
>| code pages and will cause arbitrary random code to be executed which
>| will quickly cause a system fault.
>
>
>Unfortunatly it doeszn't work quite that way. When the library is opened,
>it gets a file handle attached. This file handle acts just like a directory
>entry. The file will not be deleted from the disk until all directory entries
>and file handles are removed/closed.

Actually, it does work the way I described. When you copy over a file,
the contents are replaced, not the directory entry. SVR3.2 was the
last release of unix which supported ETXTBUSY, since SVR4 (and linux
follows this), it is possible to overwrite the _contents_ of both an
a.out and a .so (shared object) while it is executing.

I never said anything about the file being deleted. It isn't. cp
or mv will replace the contents of the file without creating a new
inode.

If you want to keep the original content of the file around, you must
explicity unlink (man rm) the file first, then copy the new library
or a.out into place. If you do this, then your comments about
"File Handles" (which is a windows-ism) are correct. File handles don't act like
directory entries in unix. However, there are reference counts on
on-disk inodes and reference counts on in-core inodes. The on-disk
reference counts count all hard links (i.e. directory entries) that
refer to the inode. The in-core inode count includes all references
from open file descriptors and mmapped memory regions (man mmap), the
in-core inode won't be released until it is no longer referenced. That
doesn't prevent cp from replacing the content of the file in toto without
replacing the inode.

>
>When the library is updated, the new library gets its own disk blocks, and
>a new directory entry, but the old library is still on disk and the file
>handle pointer to it is still valid.

This is where you are incorrect. A directory entry in unix is simply a
name-value pair where the name is the leaf file name and the value is
the inode number. And when the new library is copied _over_ the old
library by cp, it doesn't change the directory entry, nor does it change
the inode number. It simply replaces the allocated blocks with newly
allocated blocks.

>
>When the OS needs to page in from teh library, it uses the file handle
>it already has, not the (new) directory entry.

There is no new directory entry. The soi disant file handle (the
inode number) hasn't changed, but the data contained in the file
referred to by the inode number has.

scott

(linux/unix kernel engineer for 20+ years)

Paul Colquhoun

unread,

May 14, 2005, 9:00:01 PM5/14/05

to

Shit, this has been happening since SVR4! How embarassing not to have noticed
for all this time. Admittedly I'm not a programmer, but this seems like a step
backwards to me.

And yes, I meant to say "file descriptor" instead of "file handle", but I resent
your implication I am a Windows geek. I got the "handle" terminology from PERL.