Skip to main content
main-content

About this book

Machine learning techniques provide cost-effective alternatives to traditional methods for extracting underlying relationships between information and data and for predicting future events by processing existing information to train models. Efficient Learning Machines explores the major topics of machine learning, including knowledge discovery, classifications, genetic algorithms, neural networking, kernel methods, and biologically-inspired techniques.

Mariette Awad and Rahul Khanna’s synthetic approach weaves together the theoretical exposition, design principles, and practical applications of efficient machine learning. Their experiential emphasis, expressed in their close analysis of sample algorithms throughout the book, aims to equip engineers, students of engineering, and system designers to design and create new and more efficient machine learning systems. Readers of Efficient Learning Machines will learn how to recognize and analyze the problems that machine learning technology can solve for them, how to implement and deploy standard solutions to sample problems, and how to design new systems and solutions.

Advances in computing performance, storage, memory, unstructured information retrieval, and cloud computing have coevolved with a new generation of machine learning paradigms and big data analytics, which the authors present in the conceptual context of their traditional precursors. Awad and Khanna explore current developments in the deep learning techniques of deep neural networks, hierarchical temporal memory, and cortical algorithms.

Nature suggests sophisticated learning techniques that deploy simple rules to generate highly intelligent and organized behaviors with adaptive, evolutionary, and distributed properties. The authors examine the most popular biologically-inspired algorithms, together with a sample application to distributed datacenter management. They also discuss machine learning techniques for addressing problems of multi-objective optimization in which solutions in real-world systems are constrained and evaluated based on how well they perform with respect to multiple objectives in aggregate. Two chapters on support vector machines and their extensions focus on recent improvements to the classification and regression techniques at the core of machine learning.

Table of Contents

Open Access

Chapter 1. Machine Learning

(ML) is a branch of artificial intelligence that systematically applies algorithms to synthesize the underlying relationships among data and information. For example, ML systems can be trained on automatic speech recognition systems (such as iPhone’s Siri) to convert acoustic information in a sequence of speech data into semantic structure expressed in the form of a string of words.

Mariette Awad, Rahul Khanna

Open Access

Chapter 2. Machine Learning and Knowledge Discovery

The field of data mining has made significant advances in recent years. Because of its ability to solve complex problems, data mining has been applied in diverse fields related to engineering, biological science, social media, medicine, and business intelligence. The primary objective for most of the applications is to characterize patterns in a complex stream of data. These patterns are then coupled with knowledge discovery and decision making. In the Internet age, information gathering and dynamic analysis of spatiotemporal data are key to innovation and developing better products and processes. When datasets are large and complex, it becomes difficult to process and analyze patterns using traditional statistical methods. Big data are data collected in volumes so large, and forms so complex and unstructured, that they cannot be handled using standard database management systems, such as DBMS and RDBMS. The emerging challenges associated with big data include dealing not only with increased volume, but also the wide variety and complexity of the data streams that need to be extracted, transformed, analyzed, stored, and visualized. Big data analysis uses inferential statistics to draw conclusions related to dependencies, behaviors, and predictions from large sets of data with low information density that are subject to random variations. Such systems are expected to model knowledge discovery in a format that produces reasonable answers when applied across a wide range of situations. The characteristics of big data are as follows:

Mariette Awad, Rahul Khanna

Open Access

Chapter 3. Support Vector Machines for Classification

This chapter covers details of the support vector machine (SVM) technique, a sparse kernel decision machine that avoids computing posterior probabilities when building its learning model. SVM offers a principled approach to problems because of its mathematical foundation in statistical learning theory. SVM constructs its solution in terms of a subset of the training input. SVM has been extensively used for classification, regression, novelty detection tasks, and feature reduction. This chapter focuses on SVM for supervised classification tasks only, providing SVM formulations for when the input space is linearly separable or linearly nonseparable and when the data are unbalanced, along with examples. The chapter also presents recent improvements to and extensions of the original SVM formulation. A case study concludes the chapter.

Mariette Awad, Rahul Khanna

Open Access

Chapter 4. Support Vector Regression

Rooted in statistical learning or Vapnik-Chervonenkis (VC) theory, support vector machines (SVMs) are well positioned to generalize on yet-to-be-seen data. The SVM concepts presented in Chapter 3 can be generalized to become applicable to regression problems. As in classification, support vector regression (SVR) is characterized by the use of kernels, sparse solution, and VC control of the margin and the number of support vectors. Although less popular than SVM, SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning approach, SVR trains using a symmetrical loss function, which equally penalizes high and low misestimates. Using Vapnik’s -insensitive approach, a flexible tube of minimal radius is formed symmetrically around the estimated function, such that the absolute values of errors less than a certain threshold are ignored both above and below the estimate. In this manner, points outside the tube are penalized, but those within the tube, either above or below the function, receive no penalty. One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space. Additionally, it has excellent generalization capability, with high prediction accuracy.

Mariette Awad, Rahul Khanna

Open Access

Chapter 5. Hidden Markov Model

Real-time processes produce observations that can be discrete, continuous, stationary, time variant, or noisy. The fundamental challenge is to characterize the observations as a parametric random process, the parameters of which should be estimated, using a well-defined approach. This allows us to construct a theoretical model of the underlying process that enables us to predict the process output as well as distinguish the statistical properties of the observation itself. The hidden Markov model (HMM) is one such statistical model. HMM interprets the (nonobservable) process by analyzing the pattern of a sequence of observed symbols. An HMM consists of a doubly stochastic process, in which the underlying (or hidden) stochastic process can be indirectly inferred by analyzing the sequence of observed symbols of another set of stochastic processes. HMM comprises (hidden) states that represent an unobservable, or latent, attribute of the process being modeled. HMM-based approaches are widely used to analyze features or observations, such as usage and activity profiles and transitions between different states of the process, to predict the most probable sequence of states. The HMM is a stochastic model of discrete events and a variation of the Markov chain, a chain of linked states or events, in which the next state depends only on the current state of the system. The states of an HMM are hidden (or can only be inferred from the observed symbols). For a given model and sequence of observations, HMM is used to analyze the solution to problems related to model selection, state-sequence determination, and model training (for more details, see the section “The Three Basic Problems of HMM”).

Mariette Awad, Rahul Khanna

Open Access

Chapter 6. Bioinspired Computing: Swarm Intelligence

Natural systems solve multifaceted problems using simple rules, and exhibit organized, complex, and intelligent behavior. Natural process control systems are adaptive, evolutionary, distributed (decentralized), reactive, and aware of their environment. Bioinspired computing (or biologically inspired computing) is a field of study that draws its inspiration from the sophistication of the natural world in adapting to environmental changes through self-management, self-organization, and self-learning. Bioinspired computational methods produce informatics tools that are predicated on the profound conceptions of self-adaptive distributed architectures seen in natural systems. Heuristics that imitate these natural processes can be expressed as theoretical methods of constrained optimization. Such heuristics define a representation, in the form of a fitness function. This function describes the problem, evaluates the quality of its solution, and uses its operators (such as crossover, mutation, and splicing) to generate a new set of solutions.

Mariette Awad, Rahul Khanna

Open Access

Chapter 7. Deep Neural Networks

Proposed in the 1940s as a simplified model of the elementary computing unit in the human cortex, artificial neural networks (ANNs) have since been an active research area. Among the many evolutions of ANN, deep neural networks (DNNs) (Hinton, Osindero, and Teh 2006) stand out as a promising extension of the shallow ANN structure. The best demonstration thus far of hierarchical learning based on DNN, along with other Bayesian inference and deduction reasoning techniques, has been the performance of the IBM supercomputer Watson in the legendary tournament on the game show Jeopardy!, in 2011.

Mariette Awad, Rahul Khanna

Open Access

Chapter 8. Cortical Algorithms

Computational models based on the structural and functional properties of the human brain have seen impressive gains since the mid-1980s, owing to significant discoveries in neuroscience and advancements in computing technology. Among these models, cortical algorithms (CAs) have emerged as a biologically inspired approach, modeled after the human visual cortex, which stores sequences of patterns in an invariant form and which recalls those patterns autoassociatively. This chapter details the structure and mathematical formulation of CA. We then present a case study of CA generalization accuracy in identifying isolated Arabic speech using an entropy-based weight update.

Mariette Awad, Rahul Khanna

Open Access

Chapter 9. Deep Learning

Deep learning is on the rise in the machine learning community, because the traditional shallow learning architectures have proved unfit for the more challenging tasks of machine learning and strong artificial intelligence (AI). The surge in and wide availability of increased computing power, coupled with the creation of efficient training algorithms and advances in neuroscience, have enabled the implementation, hitherto impossible, of deep learning principles. These developments have led to the formation of deep architecture algorithms that look to cognitive neuroscience to suggest biologically inspired learning solutions. This chapter presents the concepts of spiking neural networks (SNNs) and hierarchical temporal memory (HTM), whose associated techniques are the least mature of the techniques covered in this book.

Mariette Awad, Rahul Khanna

Open Access

Chapter 10. Multiobjective Optimization

Multiobjective optimization caters to achieving multiple goals, subject to a set of constraints, with a likelihood that the objectives will conflict with each other. Multiobjective optimization can also be explained as a multicriteria decision-making process, in which multiple objective functions have to be optimized simultaneously. In many cases, optimal decisions may require tradeoffs between conflicting objectives. Traditional optimization schemes use a weight vector to specify the relative importance of each objective and then combine the objectives into a scalar cost function. This strategy reduces the complexity of solving a multiobjective problem by converting it into a single-objective problem. Solution techniques for multiobjective optimization involve a tradeoff between model complexity and accuracy. Examples of multiobjective optimization can be found in economics (setting monetary policy), finance (risk–return analysis), engineering (process control, design tradeoff analysis), and many other applications in which conflicting objectives must be obtained.

Mariette Awad, Rahul Khanna

Open Access

Chapter 11. Machine Learning in Action: Examples

Machine learning is an important means of synthesizing and interpreting the underlying relationship between data patterns and proactive optimization tasks. Machine learning exploits the power of generalization, which is an inherent and essential component of concept formation through human learning. The learning process constructs a that is hardened by critical feedback to improve performance. The knowledge base system gathers a collection of facts and processes them through an inference engine that uses rules and logic to deduce new facts or inconsistencies.

Mariette Awad, Rahul Khanna
Additional information