Skip to main content
main-content
Top

About this book

Examine the latest technological advancements in building a scalable machine-learning model with big data using R. This second edition shows you how to work with a machine-learning algorithm and use it to build a ML model from raw data. You will see how to use R programming with TensorFlow, thus avoiding the effort of learning Python if you are only comfortable with R.

As in the first edition, the authors have kept the fine balance of theory and application of machine learning through various real-world use-cases which gives you a comprehensive collection of topics in machine learning. New chapters in this edition cover time series models and deep learning.

What You'll Learn

Understand machine learning algorithms using R

Master the process of building machine-learning models

Cover the theoretical foundations of machine-learning algorithms

See industry focused real-world use cases

Tackle time series modeling in R

Apply deep learning using Keras and TensorFlow in R

Who This Book is For

Data scientists, data science professionals, and researchers in academia who want to understand the nuances of machine-learning approaches/algorithms in practice using R.

Table of Contents

Chapter 1. Introduction to Machine Learning and R

Abstract
Beginners to machine learning are often confused by the plethora of algorithms and techniques being taught in subjects like statistical learning, data mining, artificial intelligence, soft computing, and data science. It’s natural to wonder how these subjects are different from one another and which is the best for solving real-world problems. There is substantial overlap in these subjects and it's hard to draw a clear Venn diagram explaining the differences. Primarily, the foundation for these subjects is derived from probability and statistics. However, many statisticians probably won't agree with machine learning giving life to statistics, giving rise to the never-ending chicken and egg conundrum kind of discussions. Fundamentally, without spending much effort in understanding the pros and cons of this discussion, it’s wise to believe that the power of statistics needed a pipeline to flow across different industries with some challenging problems to be solved and machine learning simply established that high-speed and frictionless pipeline. The other subjects that evolved from statistics and machine learning are simply trying to broaden the scope of these two subjects and putting it into a bigger banner.
Karthik Ramasubramanian, Abhishek Singh

Chapter 2. Data Preparation and Exploration

Abstract
As we emphasized in our introductory chapter on applying machine learning (ML) algorithms with a simplified process flow, in this chapter, we go deeper into the first block of machine learning process flow—data exploration and preparation.
Karthik Ramasubramanian, Abhishek Singh

Chapter 3. Sampling and Resampling Techniques

Abstract
In Chapter 2, we introduced the concept of data import and exploration techniques. Now you are equipped with the skills to load data from different sources and how to store them in an appropriate format. In this chapter we will discuss important data sampling methodologies and their importance in machine learning algorithms. Sampling is an important block in our machine learning process flow and it serves the dual purpose of cost savings in data collection and reduction in computational cost without compromising the power of the machine learning model.
Karthik Ramasubramanian, Abhishek Singh

Chapter 4. Data Visualization in R

Abstract
Data visualization is the process of creating and studying the visual representation of data to bring some meaningful insights.
Karthik Ramasubramanian, Abhishek Singh

Chapter 5. Feature Engineering

Abstract
In machine learning, feature engineering is a blanket term covering both statistical and business judgment aspects of modeling real-world problems. Feature engineering is a term coined to give due importance to the domain knowledge required to select sets of features for machine learning algorithms. It is one of the reasons that most of the machine learning professionals call it an informal process.
Karthik Ramasubramanian, Abhishek Singh

Chapter 6. Machine Learning Theory and Practice

Abstract
The world is quickly adapting the use of machine learning (ML). Whether its driverless cars, the intelligent personal assistant, or machines playing the games like Go and Jeopardy against humans, ML is pervasive. The availability and ease of collecting data coupled with high computing power has made this field even more conducive to researchers and businesses to explore data-driven solutions for some of the most challenging problems. This has led to a revolution and outbreak in the number of new startups and tools leveraging ML to solve problems in sectors such as healthcare, IT, HR, automobiles, manufacturing, and the list is ever expanding.
Karthik Ramasubramanian, Abhishek Singh

Chapter 7. Machine Learning Model Evaluation

Abstract
Model evaluation is the most important step in developing any machine learning solution. At this stage in model development we measure the model performance and decide whether to go ahead with the model or revisit all our previous steps as described in the PEBE, our machine learning process flow, in Chapter 1. In many cases, we may even discard the complete model based on the performance metrics. This phase of the PEBE plays a very critical role in the success of any ML-based project.
Karthik Ramasubramanian, Abhishek Singh

Chapter 8. Model Performance Improvement

Abstract
Model performance is a broad term generally used to measure how the model performs on a new dataset, usually a test dataset. The performance metrics also play the role of thresholds to decide whether the model can be put into actual decision making systems or needs improvements. In the previous chapter, we discussed some performance metrics for our continuous and discrete cases. In this chapter, we discuss how changing the modeling process can help us improve model performance on the metrics.
Karthik Ramasubramanian, Abhishek Singh

Chapter 9. Time Series Modeling

Abstract
Recording data indexed by time is an old way of collecting data for analysis. The time index data primarily serves the purpose of observing events that have high correlation with time and considerable part of the variance is due to changing times. The introduction to time series analysis will help you understand how to count time-dependent variations.
Karthik Ramasubramanian, Abhishek Singh

Chapter 10. Scalable Machine Learning and Related Technologies

Abstract
A few years back, you would have not heard the word "scalable" in machine learning parlance. The reason was mainly attributed to the lack of infrastructure, data, and real-world application. Machine learning was being much talked about in the research community of academia or in well-funded industry research labs. A prototype of any real-world application using machine learning was considered a big feat and a demonstration of breakthrough research. However, time has changed ever since the availability of powerful commodity hardware at a reduced cost and big data technology's widespread adaption. As a result, the data has become easily accessible and software developments are becoming more and more data savvy. Every single byte of data is being captured even if its use is not clear in the near future.
Karthik Ramasubramanian, Abhishek Singh

Chapter 11. Deep Learning Using Keras and TensorFlow

Abstract
There was time when data and computing resources were so scarce that every data point generated by a business application or IT infrastructure was not stored, and application design had no data-driven thinking. The times have indeed changed, with the abundance of computing and storage resources, where we have the "data first" thinking and increasing volumes of data are now available from every business application. Large enterprises are now built with a business model involving revenue generation from inferences coming out from data. The most promising advancement in this surge in data and availability of computing power is how differently we are looking at solving complex problems.
Karthik Ramasubramanian, Abhishek Singh
Additional information