Swipe to navigate through the chapters of this book
Suppose you have evaluated a classifier’s performance on an independent testing set. To what extent can you trust your findings? When a flipped coin comes up heads eight times out of ten, any reasonable experimenter will suspect this to be nothing but a fluke, expecting that another set of ten tosses will give a result closer to reality. Similar caution is in place when measuring classification performance. To evaluate classification accuracy on a testing set is not enough; just as important is to develop some notion of the chances that the measured value is a reliable estimate of the classifier’s true behavior.
Please log in to get access to this content
To get access to this content you need the following product:
As explained in Sect. 12.1 in connection with the distribution of results obtained from different samples, we prefer the term standard error to the more general standard deviation.
With more degrees of freedom, the curve would get closer to the normal distribution, becoming almost indistinguishable from it for 30 or more degrees of freedom.
go back to reference Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895–1923. CrossRef Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895–1923. CrossRef
- Statistical Significance
- Springer International Publishing
- Sequence number
- Chapter number
- Chapter 12