Skip to main content

About this book

Examine the latest technological advancements in building a scalable machine learning model with Big Data using R. This book shows you how to work with a machine learning algorithm and use it to build a ML model from raw data.

All practical demonstrations will be explored in R, a powerful programming language and software environment for statistical computing and graphics. The various packages and methods available in R will be used to explain the topics. For every machine learning algorithm covered in this book, a 3-D approach of theory, case-study and practice will be given. And where appropriate, the mathematics will be explained through visualization in R. All the images are available in color and hi-res as part of the code download.

This new paradigm of teaching machine learning will bring about a radical change in perception for many of those who think this subject is difficult to learn. Though theory sometimes looks difficult, especially when there is heavy mathematics involved, the seamless flow from the theoretical aspects to example-driven learning provided in this book makes it easy for someone to connect the dots..

What You'll Learn

Use the model building process flow

Apply theoretical aspects of machine learning

Review industry-based cae studies

Understand ML algorithms using R

Build machine learning models using Apache Hadoop and Spark

Who This Book is For

Data scientists, data science professionals and researchers in academia who want to understand the nuances of machine learning approaches/algorithms along with ways to see them in practice using R.

The book will also benefit the readers who want to understand the technology behind implementing a scalable machine learning model using Apache Hadoop, Hive, Pig and Spark.

Table of Contents

Chapter 2. Data Preparation and Exploration

As we emphasized in our introductory chapter on applying machine learning (ML) algorithms with a simplified process flow, in this chapter, we go deeper into the first block of machine learning process flow—data exploration and preparation.
Karthik Ramasubramanian, Abhishek Singh

Chapter 4. Data Visualization in R

Information visualization is the broadest term that could be taken to subsume all the developments described here. At this level, almost anything, if sufficiently organized, is information of a sort. Tables, graphs, maps, and even text, whether static or dynamic, provide some means to see what lies within, determine the answer to a question, find relations, and perhaps apprehend things which could not be seen so readily in other forms.
Karthik Ramasubramanian, Abhishek Singh

Chapter 7. Machine Learning Model Evaluation

In many cases, we may even discard the complete model based on the performance metrics. This phase of the PEBE plays a very critical role in the success of any ML based projects.
Karthik Ramasubramanian, Abhishek Singh

Chapter 9. Scalable Machine Learning and Related Technologies

A few years back, you would have not heard the word "scalable" in machine learning parlance. The reason was mainly attributed to the lack of infrastructure, data, and real-world application. Machine learning was being much talked about in the research community of academia or in well-funded industry research labs. A prototype of any real-world application using machine learning was considered a big feat and a demonstration of breakthrough research. However, time has changed ever since the availability of powerful commodity hardware at a reduced cost and big data technology's widespread adaption. As a result, the data has become easily accessible and software developments are becoming more and more data savvy. Every single byte of data is being captured even if its use is not clear in the near future.
Karthik Ramasubramanian, Abhishek Singh
Additional information