2006 | OriginalPaper | Chapter

# Linear Models for Classification

In the previous chapter, we explored a class of regression models having particularly simple analytical and computational properties. We now discuss an analogous class of models for solving classification problems. The goal in classification is to take an input vector x and to assign it to one of K discrete classes C k where k = 1, …, K. In the most common scenario, the classes are taken to be disjoint, so that each input is assigned to one and only one class. The input space is thereby divided into decision regions whose boundaries are called decision boundaries or decision surfaces. In this chapter, we consider linear models for classification, by which we mean that the decision surfaces are linear functions of the input vector x and hence are defined by (D − 1)-dimensional hyperplanes within the D-dimensional input space. Data sets whose classes can be separated exactly by linear decision surfaces are said to be linearly separable.For regression problems, the target variable t was simply the vector of real numbers whose values we wish to predict. In the case of classification, there are various ways of using target values to represent class labels. For probabilistic models, the most convenient, in the case of two-class problems, is the binary representation in which there is a single target variable t ∊ {0,1} such that t = 1 represents class C 1 and t = 0 represents class C2. We can interpret the value of t as the probability that the class is C1, with the values of probability taking only the extreme values of 0 and 1. For K > 2 classes, it is convenient to use a 1-of-K coding scheme in which t is a vector of length K such that if the class is C j , then all elements t k of t are zero except element t j , which takes the value 1. For instance, if we have K = 5 classes, then a pattern from class 2 would be given the target vector