~Subject: jmlr-announce: Special Issue on Learning Theory
~From: "David 'Pablo' Cohn" <David...@acm.org>
~Date: 24 Oct 2003 10:16:07 -0700
The Journal of Machine Learning Research (www.jmlr.org) is pleased to
announce publication of the Special Issue on Learning Theory, guest
edited by Ralf Herbrich and Thore Graepel:
--------------------------
Introduction to the Special Issue on Learning Theory
Ralf Herbrich, Thore Graepel; pp. 755-757.
On the Performance of Kernel Classes
Shahar Mendelson; pp. 759-771.
Path Kernels and Multiplicative Updates
Eiji Takimoto, Manfred K. Warmuth; pp. 773-818.
Tracking Linear-threshold Concepts with Winnow
Chris Mesterharm; pp. 819-838.
Generalization Error Bounds for Bayesian Mixture Algorithms
Ron Meir, Tong Zhang; pp. 839-860.
On the Rate of Convergence of Regularized Boosting Classifiers
Gilles Blanchard, Gabor Lugosi, Nicolas Vayatis; pp. 861-894.
Concentration Inequalities for the Missing Mass and for
Histogram Rule Error
David McAllester, Luis Ortiz; pp. 895-911.
----------------------------------------------------------------------------
On the Performance of Kernel Classes
Shahar Mendelson
JMLR 4(Oct):759-771, 2003.
Abstract
We present sharp bounds on the localized Rademacher averages of the unit
ball in a reproducing kernel Hilbert space in terms of the eigenvalues
of the integral operator associated with the kernel. We use this result
to estimate the performance of the empirical minimization algorithm when
the base class is the unit ball of the reproducing kernel Hilbert space.
----------------------------------------------------------------------------
Path Kernels and Multiplicative Updates
Eiji Takimoto, Manfred K. Warmuth
JMLR 4(Oct):773-818, 2003.
Abstract
Kernels are typically applied to linear algorithms whose weight vector
is a linear combination of the feature vectors of the examples. On-line
versions of these algorithms are sometimes called "additive updates"
because they add a multiple of the last feature vector to the current
weight vector.
In this paper we have found a way to use special convolution kernels to
efficiently implement "multiplicative" updates. The kernels are defined
by a directed graph. Each edge contributes an input. The inputs along a
path form a product feature and all such products build the feature
vector associated with the inputs. We also have a set of probabilities
on the edges so that the outflow from each vertex is one. We then
discuss multiplicative updates on these graphs where the prediction is
essentially a kernel computation and the update contributes a factor to
each edge. After adding the factors to the edges, the total outflow out
of each vertex is not one any more. However some clever algorithms
re-normalize the weights on the paths so that the total outflow out of
each vertex is one again. Finally, we show that if the digraph is built
from a regular expressions, then this can be used for speeding up the
kernel and re-normalization computations.
We reformulate a large number of multiplicative update algorithms using
path kernels and characterize the applicability of our method. The
examples include efficient algorithms for learning disjunctions and a
recent algorithm that predicts as well as the best pruning of a series
parallel digraphs.
----------------------------------------------------------------------------
Tracking Linear-threshold Concepts with Winnow
Chris Mesterharm
JMLR 4(Oct):819-838, 2003.
Abstract
In this paper, we give a mistake-bound for learning arbitrary
linear-threshold concepts that are allowed to change over time in the
on-line model of learning. We use a variation of the Winnow algorithm
and show that the bounds for learning shifting linear-threshold
functions have many of the same advantages that the traditional Winnow
algorithm has on fixed concepts. These benefits include a weak
dependence on the number of irrelevant attributes, inexpensive runtime,
and robust behavior against noise. In fact, we show that the bound for
tracking Winnow has even better performance with respect to irrelevant
attributes. Let X[0,1]n be an instance of the learning problem. In the
previous bounds, the number of mistakes depends on lnn. In this paper,
the shifting concept bound depends on max ln(||X||1). We show that this
behavior is a result of certain parameter choices in the tracking
version of Winnow, and we show how to use related parameters to get a
similar mistake bound for the traditional fixed concept version of
Winnow.
----------------------------------------------------------------------------
Generalization Error Bounds for Bayesian Mixture Algorithms
Ron Meir, Tong Zhang
JMLR 4(Oct):839-860, 2003.
Abstract
Bayesian approaches to learning and estimation have played a significant
role in the Statistics literature over many years. While they are often
provably optimal in a frequentist setting, and lead to excellent
performance in practical applications, there have not been many precise
characterizations of their performance for finite sample sizes under
general conditions. In this paper we consider the class of Bayesian
mixture algorithms, where an estimator is formed by constructing a
data-dependent mixture over some hypothesis space. Similarly to what is
observed in practice, our results demonstrate that mixture approaches
are particularly robust, and allow for the construction of highly
complex estimators, while avoiding undesirable overfitting effects. Our
results, while being data-dependent in nature, are insensitive to the
underlying model assumptions, and apply whether or not these hold. At a
technical level, the approach applies to unbounded functions,
constrained only by certain moment conditions. Finally, the bounds
derived can be directly applied to non-Bayesian mixture approaches such
as Boosting and Bagging.
----------------------------------------------------------------------------
On the Rate of Convergence of Regularized Boosting Classifiers
Gilles Blanchard, Gabor Lugosi, Nicolas Vayatis
JMLR 4(Oct):861-894, 2003.
Abstract
A regularized boosting method is introduced, for which regularization is
obtained through a penalization function. It is shown through oracle
inequalities that this method is model adaptive. The rate of convergence
of the probability of misclassification is investigated. It is shown
that for quite a large class of distributions, the probability of error
converges to the Bayes risk at a rate faster than n-(V+2)/(4(V+1)) where
V is the VC dimension of the "base" class whose elements are combined by
boosting methods to obtain an aggregated classifier. The
dimension-independent nature of the rates may partially explain the good
behavior of these methods in practical problems. Under Tsybakov's noise
condition the rate of convergence is even faster. We investigate the
conditions necessary to obtain such rates for different base classes.
The special case of boosting using decision stumps is studied in detail.
We characterize the class of classifiers realizable by aggregating
decision stumps. It is shown that some versions of boosting work
especially well in high-dimensional logistic additive models. It appears
that adding a limited labelling noise to the training data may in
certain cases improve the convergence, as has been also suggested by
other authors.
----------------------------------------------------------------------------
Concentration Inequalities for the Missing Mass and for Histogram Rule Error
David McAllester, Luis Ortiz;
JMLR 4(Oct):895-911, 2003.
Abstract
This paper gives distribution-free concentration inequalities for the
missing mass and the error rate of histogram rules. Negative association
methods can be used to reduce these concentration problems to
concentration questions about independent sums. Although the sums are
independent, they are highly heterogeneous. Such highly heterogeneous
independent sums cannot be analyzed using standard concentration
inequalities such as Hoeffding's inequality, the Angluin-Valiant bound,
Bernstein's inequality, Bennett's inequality, or McDiarmid's theorem.
The concentration inequality for histogram rule error is motivated by
the desire to construct a new class of bounds on the generalization
error of decision trees.
----------------------------------------------------------------------------
The papers in this issue are available electronically at http://www.jmlr.org in
PostScript and PDF formats. The papers of Volumes 1, 2 and 3 are also
available electronically from the JMLR website, and in hardcopy from the
MIT Press; please see http://mitpress.mit.edu/JMLR for details.
-David Cohn, <david...@acm.org>
[ comp.ai is moderated. To submit, just post and be patient, or if ]
[ that fails mail your article to <com...@moderators.isc.org>, and ]
[ ask your news administrator to fix the problems with your system. ]