Unused functions

Chuck Robey

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

On Sun, 13 Sep 1998, Kent A Vander Velden wrote:

>
> Is the code associated with unused functions of a used archive ever removed
> from the executable that uses the archive? After linking I see with 'nm'
> there are many functions in the executable that are never called. This is
> making my executable rather large since the archive is huge. Would elf
> help in anyway with this?

Do you realize that many of the libc functions need each other as
dependencies? Many functions just don't work alone. At least for
shared executeables, they don't exist in your executeable until you
actually execute it, anyways, so they don't bloat your executeable at
all. They get linked in during the startup task of locating and loading
the executeable into memory, before it really starts to work.

----------------------------+-----------------------------------------------
Chuck Robey | Interests include any kind of voice or data
chu...@glue.umd.edu | communications topic, C programming, and Unix.
213 Lakeside Drive Apt T-1 |
Greenbelt, MD 20770 | I run Journey2 and picnic (FreeBSD-current)
(301) 220-2114 | and jaunt (NetBSD).
----------------------------+-----------------------------------------------

To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Kent Vander Velden

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

>For statically linked images, only the functions that are actually
>used are linked in.
>
>For static linkage, the smalled chunk you can pull in during the
>link is one ".o" file from the archive (library). So if you have
>one ".o" file that resulted from a ".c" file that implements the
>functions "bob" and "superbob", you will get both these functions
>code, even if you only call one of them.

Just so I completely understand, if I truely use only one function in from
a .o file and no other function is using anything in this .o file, the
entire .o file is still pulled into the executable? So, there are could be
a lot of unused, unreachable code in an executable. Nothing can be done to
remove the bloat after the executable has been linked? Is this commonly the
way its done on other systems as well? I had always assumed that unused
functions and data were tosed out.

Thanks.

---
Kent Vander Velden
ke...@iastate.edu

Terry Lambert

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

> Is the code associated with unused functions of a used archive ever removed
> from the executable that uses the archive? After linking I see with 'nm'
> there are many functions in the executable that are never called. This is
> making my executable rather large since the archive is huge. Would elf
> help in anyway with this?

Functions that are called by functions you call will cause code
to be drug in, even if you don't call them directly.

For shared libraries, the symbols are used to look up the
code addresses, and called through a table. Since the pages
aren't there unless they are used, this is good enough.

For statically linked images, only the functions that are actually
used are linked in.

For static linkage, the smalled chunk you can pull in during the
link is one ".o" file from the archive (library). So if you have
one ".o" file that resulted from a ".c" file that implements the
functions "bob" and "superbob", you will get both these functions
code, even if you only call one of them.

If this is a problem for you, then consider breaking the file
into two (or more) files to make them seperate compilation units,
and therefore seperate ".o" files, and therefore seperately
linked from the archive (library) file (".a").

Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Mike Smith

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

> Just so I completely understand, if I truely use only one function in from
> a .o file and no other function is using anything in this .o file, the
> entire .o file is still pulled into the executable? So, there are could be
> a lot of unused, unreachable code in an executable. Nothing can be done to
> remove the bloat after the executable has been linked? Is this commonly the
> way its done on other systems as well? I had always assumed that unused
> functions and data were tosed out.

In most object formats, reference information is kept on a per-object
basis (ie. per .o file). Keeping this sort of information on any
smaller granularity would lead to an insane increase in the complexity
and corresponding performance reduction of the link phase.

This is why an experienced programmer will group only related items in
a given object. It allows the programmer and the C scoping rules to
work together to determine what should be associated and what need not.

--
\\ Sometimes you're ahead, \\ Mike Smith
\\ sometimes you're behind. \\ mi...@smith.net.au
\\ The race is long, and in the \\ msm...@freebsd.org
\\ end it's only with yourself. \\ msm...@cdrom.com

Terry Lambert

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

> In most object formats, reference information is kept on a per-object
> basis (ie. per .o file). Keeping this sort of information on any
> smaller granularity would lead to an insane increase in the complexity
> and corresponding performance reduction of the link phase.

...and corresponding reduction in time for *re*linking, and reduction
in size of the resulting binaries.

Basically, it's a compiler-write-benchmark-less-work-to-write thing.

> This is why an experienced programmer will group only related items in
> a given object.

Yes. This is why.

> It allows the programmer and the C scoping rules to
> work together to determine what should be associated and what need not.

Instead of the compiler merely calculating hamiltonian cycles in
the dependency graph to do dead code elimination.

Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majo...@FreeBSD.org

Terry Lambert

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

> Just so I completely understand, if I truely use only one function in from
> a .o file and no other function is using anything in this .o file, the
> entire .o file is still pulled into the executable? So, there are could be
> a lot of unused, unreachable code in an executable.

Yes, if it's statically linked, and you poorly organize your code.

This is why software engineers make more money than mere programmers.

> Nothing can be done to remove the bloat after the executable has
> been linked?

Not quite. But nothing *is* done...

> Is this commonly the way its done on other systems as well?

Yes. This is a compiler/linker technology issue, not a system issue.

> I had always assumed that unused functions and data were tosed out.

No. Only for compilers and linkers that optimize for image size
and execution speed, instead of for compiler benchmarks.

Q: Do you buy a compiler that is very fast, or do you buy the
compiler that makes very fast code?

Terry Lambert

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

> > > It allows the programmer and the C scoping rules to
> > > work together to determine what should be associated and what need not.
> >
> > Instead of the compiler merely calculating hamiltonian cycles in
> > the dependency graph to do dead code elimination.
>

> And if I happen to *want* all of the items in a given object (eg. I am
> using a scripting language that supports primitive lookup via the
> symbol table, or any other form of lazy runtime linking)?
>
> The current rules give the best of both worlds. Don't fix what isn't
> really broken.

Dynamic linking requires dlopen, requires a name-to-table-offset
mapping (*not* your standard symbol table mechanism!), and will
work, even on stripped executables.

Yes, dead-code elimination would be "bad" if you were loading
objects that called otherwise dead code in the program address
space (ie: used the program symbol space as a library symbol
space).

In such cases, I suggest you don't use -O2, since the compiler
will do dead code elimination on you within modules if it thinks
it can get away with it right now.

So I think the dynmaic linking case and the static case are two
*very* different cases.

Joel Ray Holveck

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

>> Just so I completely understand, if I truely use only one function in from
>> a .o file and no other function is using anything in this .o file, the
>> entire .o file is still pulled into the executable? So, there are could be
>> a lot of unused, unreachable code in an executable.
> Yes, if it's statically linked, and you poorly organize your code.
> This is why software engineers make more money than mere programmers.

This is also why most libraries tend to define one exported function
per object file, or sometimes exported functions that are always used
together (foodb_open, foodb_close).

>> Nothing can be done to remove the bloat after the executable has
>> been linked?
> Not quite. But nothing *is* done...

What can be done in the compiler and linker alone? If an object file
contains both foo() and bar(), and foo calls bar, the linker doesn't
know that. Neither does it know about static baz(), which only quux()
calls. The user program doesn't call quux, and therefore doesn't call
baz, but the linker doesn't know that.

I don't see any way to resolve the issue without instrumenting the
object file (perhaps by extending stabs?). If that were to be done,
then something similar to lorder would do the trick.

>> I had always assumed that unused functions and data were tosed out.
> No. Only for compilers and linkers that optimize for image size
> and execution speed, instead of for compiler benchmarks.
> Q: Do you buy a compiler that is very fast, or do you buy the
> compiler that makes very fast code?

I prefer a compiler that has the choice. This isn't hard,
particularly since several optimizations-- including the one under
discussion-- are handled by passes over intermediate representations
of the code. Given no choice, I tend towards the one that makes fast
code.

Best,
joelh

--
Joel Ray Holveck - jo...@gnu.org - http://www.wp.com/piquan
Fourth law of programming:
Anything that can go wrong wi
sendmail: segmentation violation - core dumped

Terry Lambert

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

> What can be done in the compiler and linker alone? If an object file
> contains both foo() and bar(), and foo calls bar, the linker doesn't
> know that.

What?

Then how can my definition of "printf" override the library
definition?

The object file format certainly knows about intra-object
dependencies.

> Neither does it know about static baz(), which only quux()
> calls.

True.

> The user program doesn't call quux, and therefore doesn't call
> baz, but the linker doesn't know that.

False.

Libraries are *not* stripped. There is also a difference between
strip, strip -d, and strip -x. if -x was used, then the symbol
will have been eliminated, and the code dependency will be known
(or agregated).

If you aim the gun at your foot and pull the trigger, it's UNIX's
job to ensure reliable delivery of the bullet to where you aimed
the gun (in this case, Mr. Foot).

> I don't see any way to resolve the issue without instrumenting the
> object file (perhaps by extending stabs?). If that were to be done,
> then something similar to lorder would do the trick.

That's what symbols are for. There are alread sufficient symbols
tro support this (presuing you don't -x anything).

Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majo...@FreeBSD.org

Joel Ray Holveck

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

>> What can be done in the compiler and linker alone? If an object file
>> contains both foo() and bar(), and foo calls bar, the linker doesn't
>> know that.
> What?
> Then how can my definition of "printf" override the library
> definition?

Because printf is alone in its object file. (I am referring to the
.o, before it goes into libc.a or libc.so.*, of course.) There's
nothing else in that object file that calls it, so all calls are bound
at link time.

Try this simple test:

--- hello_lib.h ---
extern char* hello_str(void);
--- hello_lib.c ---
#include "hello_lib.h"

char*
hello_str1(void)
{
return "hello world";
}

char*
hello_str(void)
{
return hello_str1();
}
--- hello_tst.c ---
#include <stdio.h>
#include "hello_lib.h"

char*
hello_str1(void)
{
return "Hello, world!";
}

void
main(void)
{
puts(hello_str());
}
--- Makefile ---
LIB=hello
SRCS=hello_lib.c
SHLIB_MAJOR=1
SHLIB_MINOR=0

hello_tst:
gcc -o hello_tst hello_tst.c -L. -lhello

.include <bsd.lib.mk>
--- end ---

detlev$ make all hello_tst
Warning: Object directory not changed from original /home/joelh/src/hello
gcc -O -pipe -c hello_lib.c -o hello_lib.o
building standard hello library
ranlib libhello.a
gcc -fpic -DPIC -O -pipe -c hello_lib.c -o hello_lib.so
building shared hello library (version 1.0)
gcc -o hello_tst hello_tst.c -L. -lhello
detlev$ ./hello_tst
hello world
detlev$

Note that the message originally defined by the library was displayed,
implying that the binding of the call was done at hello_lib.o's link
time, not at hello_tst's.

> The object file format certainly knows about intra-object
> dependencies.

Where? I couldn't see anything about it in either the stabs info from
a .s file or from a ldd -v examination. I don't mind looking for
more, but in point of fact, I don't know where to look.

>> The user program doesn't call quux, and therefore doesn't call
>> baz, but the linker doesn't know that.
> False.
> Libraries are *not* stripped. There is also a difference between
> strip, strip -d, and strip -x. if -x was used, then the symbol
> will have been eliminated, and the code dependency will be known
> (or agregated).

Where are these code dependencies located?

Happy hacking,
joelh

--
Joel Ray Holveck - jo...@gnu.org - http://www.wp.com/piquan
Fourth law of programming:
Anything that can go wrong wi
sendmail: segmentation violation - core dumped

To Unsubscribe: send mail to majo...@FreeBSD.org

Eivind Eklund

unread,

Sep 15, 1998, 3:00:00 AM9/15/98

to

[Moved to -chat]

On Mon, Sep 14, 1998 at 06:06:24PM +0000, Terry Lambert wrote:
[... on dead code elimination at the final code level ...]

> > It allows the programmer and the C scoping rules to
> > work together to determine what should be associated and what need not.
>
> Instead of the compiler merely calculating hamiltonian cycles in
> the dependency graph to do dead code elimination.

I don't get what Hamilton cycles has to do with this. It looks like a
simple mark-and-sweep GC to me, and I can't see how looking for
Hamilton cycles are going to find.

Also, I can't think of a single case where I have written code that is
likely to have even a single Hamilton cycle - I usually don't call
main() from elsewhere in my program (and I certainly don't call
_start). If you can involve Hamilton cycles at all here, it sounds
like it must be on a subgraph. How?

For those following: A Hamilton cycle touch every node in the graph
exactly once, and forms a cycle.

Eivind.

Terry Lambert

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

> [... on dead code elimination at the final code level ...]
> > > It allows the programmer and the C scoping rules to
> > > work together to determine what should be associated and what need not.
> >
> > Instead of the compiler merely calculating hamiltonian cycles in
> > the dependency graph to do dead code elimination.
>
> I don't get what Hamilton cycles has to do with this. It looks like a
> simple mark-and-sweep GC to me, and I can't see how looking for
> Hamilton cycles are going to find.

You are looking for code without cycles through known referenced
symbols (if it has a cyle through a known referenced symbol, you
add it to the list of symbols). Since you already have the tree
in memory, with the edges correctly arranged, this is simpler than
trying to run Warshall's across the entrie symbol table from the
point of view of each node (and cheaper by an order of magnitude).

> Also, I can't think of a single case where I have written code that is
> likely to have even a single Hamilton cycle - I usually don't call
> main() from elsewhere in my program (and I certainly don't call
> _start). If you can involve Hamilton cycles at all here, it sounds
> like it must be on a subgraph. How?

Consider all top level "known referenced" symbols to be connected
by an edge.

> For those following: A Hamilton cycle touch every node in the graph
> exactly once, and forms a cycle.

More importantly, you can tell as DAG from a DCG. Symbols in a DCG,
where "known refrenced" forms an edge to the rest of the graph, means
that you have to reference increasingly fewer nodes as you go on.

Note: this rather dry argument still applies only to dead code
elimination in library archives in statically linked images).

Consider the user's symbols seperate from other symbols (you have
to do this because of Mike Smith's point about the user's symbols
vs. library symbols). By definition, these symbols are considered
"used". Then you go through the archives (libraries), and examine
these symbols. These are second order symbols.

You sort these onto either the "used" or the "unknown" list.

Then you look for a DCG using the "used" as a list, and traverse only
the "unknown" entries. If you find a cycle, you add the "unknown"
symbols in the cycle to the "used" list.

You repeat this process until, after a pass, no new symbols are added
to the "used" list. The remaining symbols on the "unknown" list now
represent "dead code"/"dead data".

Now you eliminate this code from inclusion in the relocation and
image write phase of the link. This works, even in the case of
static symbols, since statics must also be relocated, and therefore
will have a pseudo-symbol. All code/data between an unused symbol
or pseudo-symbol and a used one is dead, and may be eliminated.

Actually, you should talk to SEF about this; he's an old compiler
guru, and he worked on the Microsoft compiler for SCO. He has an
interesting "shaggy dog" story about emission of pseudo-symbols
for data on something like:

main()
{
char *str = "Hello World!" + 6;

puts( str - 6);
puts( "Goodbye ");
puts( str);
}

[ oversimplification; the exact case was the loss of "Hello " through
dead-data elimination for the "whoami" program, if I remember
correctly... Microsoft failed to emit a correct pseudo-symbol for
the start of the str data, emitting only _str for "World!"... ]

Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majo...@FreeBSD.org

Eivind Eklund

unread,

Sep 16, 1998, 3:00:00 AM9/16/98

to

On Tue, Sep 15, 1998 at 09:27:00PM +0000, Terry Lambert wrote:
> > [... on dead code elimination at the final code level ...]
> > > > It allows the programmer and the C scoping rules to
> > > > work together to determine what should be associated and what need not.
> > >
> > > Instead of the compiler merely calculating hamiltonian cycles in
> > > the dependency graph to do dead code elimination.
> >
> > I don't get what Hamilton cycles has to do with this. It looks like a
> > simple mark-and-sweep GC to me, and I can't see how looking for
> > Hamilton cycles are going to find.
>
> You are looking for code without cycles through known referenced
> symbols (if it has a cyle through a known referenced symbol, you
> add it to the list of symbols). Since you already have the tree
> in memory, with the edges correctly arranged, this is simpler than
> trying to run Warshall's across the entrie symbol table from the
> point of view of each node (and cheaper by an order of magnitude).

Hmmm. I can't see any reason why you would want to do a
Floyd-Warshall - that calculate the connectivity of the entire graph,
which is really producing a lot of data you don't need.

I can see two ways of doing this that seem useful. Both of them
require (well, prefer) a bit of used/unused for each symbol (partial
pseudo-code follows below).

(1) Recursive mark'n'sweep
void
Symbol_MarkAllChildren(struct Symbol *pSymbol) {
int iChild;

if (!pSymbol->isUsed) {
pSymbol->isUsed = 1;
for (iChild = 0; iChild < pSymbol->nChildren; iChild++)
Symbol_MarkAllChildren(pSymbol->aChildren[iChild]);
}
}

Clear_isUsed_in_all_symbols();
for (pSymbol over each symbol in the initially used set) {
Symbol_MarkAllChildren(pSymbol);
}
/* Symbols marked with 'isUsed' are needed - the rest can be discarded */

(2) Non-recursive mark'n'sweep
void
Symbol_MarkChildren(struct Symbol *pSymbol) {
int iChild;
for (iChild = 0; iChild < pSymbol->nChildren; iChild++) {
if (!pSymbol->aChildren[iChild]->isUsed) {
pSymbol->aChildren[iChild]->isUsed = 1;
Symbol_TransferToTempUsedList(pSymbol->aChildren[iChild]);
}
}
}

Clear_isUsed_in_all_symbols();
for (pSymbol over each symbol in the initially used set) {
Symbol_MarkChildren(pSymbol);
Symbol_TransferToFinalUsedList(pSymbol);
}
do {
for (pSymbol over each symbol in the TempUsedList) {
if (pSymbol->isUsed) {
Symbol_MarkChildren(pSymbol);
Symbol_TransferToFinalUsedList(pSymbol);
}
}
} while(!List_isEmpty(TempUsedList));
/* Symbols marked isUsed are needed - the rest can be discarded. The
symbols to be used are in the 'FinalUsedList', while the ones that
are to be discarded are left in the inital pool */

Both algorithm 1 and 2 are o(N) and O(N) in the number of edges
connecting the set of used symbols. The only case you can avoid
examining all of these edges is if all symbols are included in the
final used set; then you could do a cutoff. I don't think it would be
worthwhile, though.

The algorithms are sort-of equal - algorithm 1 just use the stack for
storing examination order, while algorithm 2 does the same thing using
three lists, giving a slightly different order for the marking
(basically a depth-first vs a breadth-first walk).

> > Also, I can't think of a single case where I have written code that is
> > likely to have even a single Hamilton cycle - I usually don't call
> > main() from elsewhere in my program (and I certainly don't call
> > _start). If you can involve Hamilton cycles at all here, it sounds
> > like it must be on a subgraph. How?
>
> Consider all top level "known referenced" symbols to be connected
> by an edge.

Connected to _what_ by an edge?

> > For those following: A Hamilton cycle touch every node in the graph
> > exactly once, and forms a cycle.
>
> More importantly, you can tell as DAG from a DCG. Symbols in a DCG,
> where "known refrenced" forms an edge to the rest of the graph, means
> that you have to reference increasingly fewer nodes as you go on.

[...]

> You sort these onto either the "used" or the "unknown" list.
>
> Then you look for a DCG using the "used" as a list, and traverse only
> the "unknown" entries. If you find a cycle, you add the "unknown"
> symbols in the cycle to the "used" list.

Why is this easier/faster than the trivial algorithms I present above?
And where does hamilton cycles (as opposed to just cycles in the
graph) enter into this?

I can sort of get a picture of using cycles to separate used/unused
symbols, but I can't see any direct advantages to it, and I can't see
Hamiltonian cycles in there at all.

What am I missing here? (I'm interested due to sometimes needing this
type of algorithms, and if something else is more effective than the
techniques outlined above - well, then I want that :-)

> Actually, you should talk to SEF about this; he's an old compiler
> guru, and he worked on the Microsoft compiler for SCO.

SEF, this is your call. Speak up.

Eivind.

Terry Lambert

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

> Hmmm. I can't see any reason why you would want to do a
> Floyd-Warshall - that calculate the connectivity of the entire graph,
> which is really producing a lot of data you don't need.

Right. Which is why I wanted to do a Hamiltonian, not a Warshall
(or a Floyd-Warshall).

> Both algorithm 1 and 2 are o(N) and O(N) in the number of edges
> connecting the set of used symbols. The only case you can avoid
> examining all of these edges is if all symbols are included in the
> final used set; then you could do a cutoff. I don't think it would be
> worthwhile, though.

Well, I think it would, at least if the theory were that you didn't
link against things you planned to use, and you were only eliminating
dead code in objects from an archive where you knew you were using
at least one or more symbols (which is the case under consideration).

> > Consider all top level "known referenced" symbols to be connected
> > by an edge.
>
> Connected to _what_ by an edge?

Each other. All used symbols are implicitly connected. This means
than any other used symbol implies a cycle.

> Why is this easier/faster than the trivial algorithms I present above?

Because it's O(2) for the test, and only walks the portion of the
tree related to the symbol being tested. 8-).

> And where does hamilton cycles (as opposed to just cycles in the
> graph) enter into this?

Before you test a symbol, you "connect" it to all of the other used
symbols. If you make it back to the root, then the symbol is used.

> I can sort of get a picture of using cycles to separate used/unused
> symbols, but I can't see any direct advantages to it, and I can't see
> Hamiltonian cycles in there at all.
>
> What am I missing here? (I'm interested due to sometimes needing this
> type of algorithms, and if something else is more effective than the
> techniques outlined above - well, then I want that :-)

You should look at the section of the Sedgewick C++ book on
Hamiltonian cycles.

This is really useful for runtime optimizations for things like
a hierarchical SMP lock manager. You attach all nodes into the
graph for the objects that are intended to be used, and calculate
a Warshalls. Then, as you add leaves, you update using standard
Floyd-Steinberg, so the Warshall's is always accurate.

This allows you to insert leaf nodes in O(1) along the vector to
the root of the graph.

For a list of locks you want to test for deadlock, the list itself
describes a vector.

Using this vector as an arc through the existing Warshall-complete
graph, you look for a Hamiltonian cycle. If the cycle exists, then
you have a deadlock condition.

This allows deadlock detection without having to do stack rewind
and/or resource precommit (presuming that the tentative lock vector's
elements are inserted using intention modes to prevent another
tentative lock from coming back "safe").

This algorithm was actually the basis of the lock manager in the
NetWare for UNIX server; it was capable of 20,000 transactions a
second -- basically 2 orders of magnitude better than Tuxedo. 8-).

You can further optimize this using the Dynix paper's finding on
shared memory multiprocessors, to wit, you establish seperate
intention zones for resource pools on a per processor basis for
things like a free page pool so that you don't have to hold the
top lock for page allocation, only for pool drain/refill.

The same algorithm applies cleanly to the system process table and
the system open file table; if you segment the table by the number
of expecte reeentrancies, then you can establish an intention write
for one processor that does not interfere with a traversal by
another processor (intention read/read).

The idea is to ensure maximal concurrancy. Of course the top
levels of the per processor hierarchies are connected (via an edge
through a system-wide lock), to allow deadlock avoidance. The
actual system resources are in a pseudo-per-cpu ownership zone,
so you lock a cpu zone and that zone, not a "big giant lock", in
order to do the fill/drain.

The computation of deadlock (transitive closure) is done, as in the
symbol table example, by inferring Hamiltonian cycles along the two
edges you know about (the real one and the pretend one made up of
the resources you will require for a given idempotent operation).

This is terrifically fun stuff to watch when it works. It's much
better than Djikstra's soloution (the banker's algorithm), in
that you wouldn't have to rebuild FreeBSD from the ground up to
make it work...

Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majo...@FreeBSD.org

Eivind Eklund

unread,

Sep 17, 1998, 3:00:00 AM9/17/98

to

[I'm keeping a lot of context to make this easy to read; I also note
that I didn't move this to -chat when I thought I did, but we're
getting into some locking-related discussions that are again related
to -hackers, so I keep it here now]

On Wed, Sep 16, 1998 at 09:32:41PM +0000, Terry Lambert wrote:
> > Both algorithm 1 and 2 are o(N) and O(N) in the number of edges
> > connecting the set of used symbols. The only case you can avoid
> > examining all of these edges is if all symbols are included in the
> > final used set; then you could do a cutoff. I don't think it would be
> > worthwhile, though.
>
> Well, I think it would, at least if the theory were that you didn't
> link against things you planned to use, and you were only eliminating
> dead code in objects from an archive where you knew you were using
> at least one or more symbols (which is the case under consideration).

Well, it's two lines extra code to check if you're all out of symbols
(ie, the list containing symbols of "unknown" status is empty).

> > Why is this easier/faster than the trivial algorithms I present above?
>
> Because it's O(2) for the test, and only walks the portion of the
> tree related to the symbol being tested. 8-).

In this case: 2 what's? And how is walking "the portion of the tree
related symbol being tested" compatible with O(2)?

Finding a Hamiltonian cycle is a classic NP-complete problem, ie, only
solvable in exponential time. I'm very curious how this improve on my
simple linear time. (And unfortunately I still don't get it any more
than I did intially :-(

[On what I'm missing wrt using Hamiltonian cycles for this]

> You should look at the section of the Sedgewick C++ book on
> Hamiltonian cycles.

I'll go out and buy this on monday, probably. I don't think I'll have
time before. (That it is missing is really is a gaping hole in my
bookcase...)

> This is really useful for runtime optimizations for things like
> a hierarchical SMP lock manager. You attach all nodes into the
> graph for the objects that are intended to be used, and calculate
> a Warshalls. Then, as you add leaves, you update using standard
> Floyd-Steinberg, so the Warshall's is always accurate.
>
> This allows you to insert leaf nodes in O(1) along the vector to
> the root of the graph.
>
> For a list of locks you want to test for deadlock, the list itself
> describes a vector.
>
> Using this vector as an arc through the existing Warshall-complete
> graph, you look for a Hamiltonian cycle. If the cycle exists, then
> you have a deadlock condition.

Don't you have a deadlock condition if _any_ cycle exists?

Ie, you grab a bunch of locks, blocking on an attempted lock of X.
Whatever is holding X end up blocking on one of your locks. You've
got a cycle, and thus you've got a deadlock.

Why do you need Hamiltonian cycles? They're a pain to find, and I
can't see that looking for them gets you anything; I can't see that
whether they are present make a difference for this case.

> This allows deadlock detection without having to do stack rewind
> and/or resource precommit (presuming that the tentative lock vector's
> elements are inserted using intention modes to prevent another
> tentative lock from coming back "safe").
>
> This algorithm was actually the basis of the lock manager in the
> NetWare for UNIX server; it was capable of 20,000 transactions a
> second -- basically 2 orders of magnitude better than Tuxedo. 8-).

Nice! :-)

> The computation of deadlock (transitive closure) is done, as in the
> symbol table example, by inferring Hamiltonian cycles along the two
> edges you know about (the real one and the pretend one made up of
> the resources you will require for a given idempotent operation).
>
> This is terrifically fun stuff to watch when it works. It's much
> better than Djikstra's soloution (the banker's algorithm), in
> that you wouldn't have to rebuild FreeBSD from the ground up to
> make it work...

That way to look for deadlocks look right for FreeBSD. However, I
still don't get why it need Hamiltonian (as opposed to plain) cycles
:-(

Eivind.

Terry Lambert

unread,

Sep 18, 1998, 3:00:00 AM9/18/98

to

> [I'm keeping a lot of context to make this easy to read; I also note
> that I didn't move this to -chat when I thought I did, but we're
> getting into some locking-related discussions that are again related
> to -hackers, so I keep it here now]

I'll take the compiler discussion to private email...

> > Using this vector as an arc through the existing Warshall-complete
> > graph, you look for a Hamiltonian cycle. If the cycle exists, then
> > you have a deadlock condition.
>
> Don't you have a deadlock condition if _any_ cycle exists?
>
> Ie, you grab a bunch of locks, blocking on an attempted lock of X.
> Whatever is holding X end up blocking on one of your locks. You've
> got a cycle, and thus you've got a deadlock.

Yes. The question is "how do I know when I have a cycle?". The
best way is to take the list of locks that an entity has, and consider
it as a connectivity vector through a graph. For the standard graph,
you have precalculated transitive closure via Warshall's before you
try this.

Basically, this means I have:

1) A DAG of hierarchically connected nodes that model the
granularity of the locking infrastructure for the system;
an individual node is called a "LOCKABLE".

2) An DAG of hierarchically related contexts that can hold
locks. In typical use, this will be a set of related
scheduling entities that can not be permitted to lock
against themselves (e.g., kernel threads, or more generally,
kernel schedulable entities, or KSE's). The most common
DAG will reduce to a simply connected DAG of KSE's, or
a "locking context vector". An individual node is called
a "LOCKING ENTITY".

3) A simply connected DAG (vector) that describes the locks
held by an individual locking entity *or* locakable. This
is a "LOCK VECTOR". An individual vector element is called
a "LOCK". ;-).

So what we have is the intersection of two graphs complexly connected
via one or more vectors that describe the connection points.

When I ask for a lock, I'm asking to extend a vector by one element
off (a) the locakable and (b) the locking entity.

In effect, I have a 5 dimensional graph that I need to search for a
cycle.

> Why do you need Hamiltonian cycles? They're a pain to find, and I
> can't see that looking for them gets you anything; I can't see that
> whether they are present make a difference for this case.

Because most of the work to find them has already been done, and then
cached, into the structure of the graph. I have the transitive
closure over the lockable graph, and I have the closure over the
the locking entity graph, and I inherit the vector back to the
"parent" (topmost) locking entity.

This means I can find a Hamiltonian by examining from the point I
want to lock in the lockable graph to the root of the lockable
graph, looking for instances of my locking entity. If I find it,
then I have a cycle.

So it's mostly precalculated, and the time it takes is the time
needed for exactly one traversal to the root.

In generally, this the depth of the locakable in the graph which
is being locked number of pointer dereferences + compares.

For locking, this is damn fast, and the cost of maintaining a
closure calculation when adding a leaf node (i.e., registering
a new lockable into the tree) is trivial, as is the cost of
inheriting the locks up.

> That way to look for deadlocks look right for FreeBSD. However, I
> still don't get why it need Hamiltonian (as opposed to plain) cycles
> :-(

Because they are nearly free, given the already cached data.

Effectively, each vector is reduced to a single element, and each
locking entity DAG is reduced to a single element.

One of the biggest wins here is that you don't have to take IPI's
for L1/L2 cache coherency for most locks.

The use of intention mode locking to implement the locks on top
of this framework justs ads gravy to already high concurrency.

Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majo...@FreeBSD.org