I'm trying to solve a combinatorial optimization problem that arises
from the "feature selection" task in pattern recognition.  I'm trying
to select, from N features (measurements), the K features that best
discriminate among Q classes (i.e., dimensionality reduction). 
Nevertheless, you may likely provide me a helpful suggestion without
knowing the origin of my optimization problem.
Here's my objective (criterion) function to maximize:
    SUM(i)[ ArgMAX(j)[x[i,j]*z[j] ]
    subject to: SUM(j)[z[j]] <= K
       where:
  z[j]; j={1,N}, z[]={0,1}
  x[i,j]; i={1,(Q,2)}
     (Q,2) is all combinations of Q items taken 2 at a time.
     x[]>0.  (The x[i,j] in my case are the pairwise interclass
distances
     in an L1-norm space...as mentioned above, their origin is not
     necessarily at issue).
This is generically an integer non-linear programming (INLP) problem. 
I believe it could be readily solved by Branch and Bound, since the
objective function optimum is monotonic with K.
Could anyone please provide suggestions/pointers to the computational
means to solve this problem?  Here are some specific departure points
I've explored:
A. Get a Branch and Bound (B&B) algorithm that takes a "wrappered"
criterion function.  Such a B&B function would allow one to
generically define the search tree as well.  I can't merely use an
algorithm that wants a static criterion function, as do so many
mixed-integer linear programming (MILP) codes which employ B&B. 
Preferably the algorithm function (source code) is already implemented
in Mathematica, MATLAB, (or C/C++ if no other options).
B. Run the MINLP program on-line at NEOS:
 http://www-neos.mcs.anl.gov/neos/server-solvers.html.
But would this program succeed?
   The scale of my problem:  N~{10-100}; K~{5-20}
C. Run the Mathematica application package Global Optimization 4.0
(specifically the functions InterchangeMethodMin or TabuSearchMin). 
Once again, are these functions applicable?
D. Get and compile the C program TOOLDIAG.  Does anyone know where to
get a recent version?  Is its B&B algorithm sufficiently flexible? 
How much junk code do I write to plug-in my problem?
E. Code up my own B&B algorithm. I'd prefer to work from something
more concrete than a journal paper, (i.e., good pseudocode), but
something less than a bunch of dense C code (i.e., don't want to
interpret TOOLDIAG code).
Any help appreciated!  Regards,
Frank J. Iannarilli, fra...@aerodyne.com
Aerodyne Research, Inc., 45 Manning Road, Billerica, MA 01821 USA
www.aerodyne.com/cosr.html
Hans Mittelmann
-- 
Hans D. Mittelmann			http://plato.la.asu.edu/
Arizona State University		Phone: (480) 965-6595
Department of Mathematics		Fax:   (480) 965-0461
Tempe, AZ 85287-1804			email: mitte...@asu.edu
Your problem setting interests me, since I work with MIP models for
discrimination and classification.
There are a few confusing points with your notation.  I think there's a
missing closing bracked in the objective, presumably at the end.  Also,
I'm used to the notation "argmax" meaning the argument (index, in this
case value of j) at which the maximum occurs, not the value of that
maximum.  So I assume what you meant was sum(i) max(j) x[i,j]*z[j].
The next one is a bit more critical.  I'm assuming that the x[i,j] are
precomputed (i.e., effectively constant for purposes of the dimension
reduction problem) and not variables to be solved for.  (From Hans
Mittelmann's question about quadratic problems, I suspect he took the x
symbols to be variables.  That's the common usage of the symbol x,
although in my work x tends to represent data rather than variables.) 
If I'm right, the only variables in your data reduction problem are the
0-1 z variables.
Charitably assuming I haven't fallen off the train yet, your problem
needs a little reformulation so that the feasible region of the
continuous relaxation (changing the domain of each z to the interval [0,
1]) is convex.  I think the following additions will do the trick, but
I'm not sure what I have in mind is the easiest way to do it.  Introduce
Q*N new continuous variables w[i,j], with 0 <= w[i,j] <= 1.  Retain the
constraint sum(j) z[j] <= K and the domain restriction z[j] in {0, 1}. 
Change the objective to maximizing sum(i) sum(j) w[i,j]*x[i,j].  Now add
the Q*(N+1) constraints
w[i,j] <= z[j] (j=1,...,N; i=1,...,Q)
and
sum(j) w[i,j] <= 1 (i=1,...,Q).
For a given i, the objective term sum(j) w[i,j]*x[i,j] is maximized
subject to w >= 0 and sum(j) w[i,j] <= 1 by allocating all the weight to
the largest possible x[i,j].  The upper bounds w[i,j] <= z[j] will
prevent any weight from being allocated to x[i,j] if z[j] is 0.  So I
think this duplicates your intended objective.
Hope this helps. (For that matter, I hope it's right. :-)
-- Paul Rubin
"Frank J. Iannarilli, Jr." wrote:
> 
-- 
***************************************************************************
Paul A. Rubin                                    Phone:    (517)
432-3509
Department of Management                         Fax:      (517)
432-1111
The Eli Broad Graduate School of Management      E-mail:   ru...@msu.edu
Michigan State University                       
http://www.msu.edu/~rubin/
East Lansing, MI  48824-1122  (USA)
***************************************************************************
Mathematicians are like Frenchmen:  whenever you say something to them,
they translate it into their own language, and at once it is something
entirely different.                                    J. W. v. GOETHE
Hi Frank,
you didn't mention the scale of (Q,2), which is of interest, if you would
consider to solve your problem by using a model as Paul A. Rubin
proposed.
Another question is, whether there is a time limit for solving one problem
instance?
I mean, if the method should be implemented for a real time pattern
recognition (for instance a door opens, when a person with a proper face
stands in front of it), then this should happen in a few seconds (for
convenience of the person who wants the door to be opened).
I played a little bit with your problem and coded this script at
http://www.tu-chemnitz.de/~luta/plt/pat_rec.html
which consists of a heuristic, that comes close to the optimum (it finds the
optimum for the small problem instances in about 90%). The JavaScript
could be easily translated into C, so you could use this as an lower bound
for a B&B method.
I guess, a person a bit smarter than me, could create an algorithm which
finds the optimum in polynomial time. I hope this person will help you.
Regards, Lutz Tautenhahn