Skip to main content
main-content
Top

About this book

This textbook presents the fundamental concepts and methods for understanding and working with images and video in an unique, easy-to-read style which ensures the material is accessible to a wide audience. Exploring more than just the basics of image processing, the text provides a specific focus on the practical design and implementation of real systems for processing video data. Features: includes more than 100 exercises, as well as C-code snippets of the key algorithms; covers topics on image acquisition, color images, point processing, neighborhood processing, morphology, BLOB analysis, segmentation in video, tracking, geometric transformation, and visual effects; requires only a minimal understanding of mathematics; presents two chapters dedicated to applications; provides a guide to defining suitable values for parameters in video and image processing systems, and to conversion between the RGB color representation and the HIS, HSV and YUV/YCbCr color representations.

Table of Contents

1. Introduction

Abstract
Our eyes capture images of the world surrounding us and our brain is capable of extracting detailed information from such images. Video and image processing aim at replicating this ability by constructing a “seeing computer”. To this end a camera replaces the eyes and the (video and image) processing software replaces the human brain. The purpose of this book is to present the basics within these two topics; cameras and video/image processing. This first chapter of the book motivates the need for video and image processing, and points out the many possible application areas. The book is inspired by a system approach and a general framework for video and image processing is therefore discussed before a layout of the different chapters is provided.
Thomas B. Moeslund

2. Image Acquisition

Abstract
Before any video or image processing can commence an image must be captured by a camera and converted into a manageable entity. This is the process known as image acquisition. The image acquisition process consists of three major components; energy reflected from the object of interest, an optical system which focuses the energy and finally a sensor which measures the amount of energy. In this chapter each of these three steps are described in more detail. The chapter provides insides into the nature of light and explains the need for an optical system to focus the incoming light onto a sensor. Camera characteristics such as resolution, field-of-view, zoom, and depth-of-fields are illustrated and discussed. Lastly, the digital image is introduced, where the relationship between incoming light and a pixel value is presented.
Thomas B. Moeslund

3. Color Images

Abstract
In the first two chapters the presentation was restricted to gray-scale images, but, as you might have noticed, the real world consists of colors. Going back some years, many cameras (and displays, e.g., TV monitors) only handled gray-scale images. As the technology matured, it became possible to capture (and visualize) color images and today most cameras capture color images. In this chapter the focus is on color images. The nature of color images is described together with how they are captured and represented. Hereafter the photoreceptors in the human eye and their relation to the color spectrum are presented together with the two fundamentally different ways of describing colors, namely the additives colors and the subtractive colors. Lastly the definition and representation of an RGB color image and its relation to other color representations, such as HIS and rgI, are laid out.
Thomas B. Moeslund

4. Point Processing

Abstract
This chapter introduces a number of algorithms that are characterized by changing the value of each pixel in an image completely independent on the values of the surrounding pixels. The key concept in this regards is gray-level mapping. First linear gray-level mapping is discussed and exemplified by changing the brightness and contrast in images. Next non-linear gray-level mapping is covered. Hereafter the chapter turns to one of the fundamental concepts within image processing, namely the image histogram. This is then combined with gray-level resulting in methods for improving the image quality and methods for binarising the image by the use of a threshold approach. Due to the importance of the thresholding approach it is discussed in details and different approaches are presented. The chapter finally presents how to combine two images through logic or arithmetic.
Thomas B. Moeslund

5. Neighborhood Processing

Abstract
In the previous chapter pixel operations were independent on the surrounding pixels. This principle has many useful applications, but it cannot be applied to investigate the relationship between neighboring pixels. For example, if a significant change in intensity value occurs this could indicate the boundary of an object and by finding the boundary pixels the object is located. In this (and the next) chapter a number of methods are presented where the neighboring pixels play a role when determining the output value of a pixel. One neighborhood operation for removing image noise is the median filter, which is discussed first. Hereafter the notion of correlation is introduced. Correlation is a general approach where a small part of the image is compared with a kernel. This is done for the entire image resulting in a number of important algorithms such as the mean filter, template matching, edge detection and image sharpening.
Thomas B. Moeslund

6. Morphology

Abstract
In this chapter one important branch of neighborhood processing algorithms is presented, namely mathematical morphology—or simply morphology. Morphology is primarily applied to binary images in order to remove binary noise, but is can also have other purposes. It is related to correlation in the sense that a kernel (here denoted a structuring element) is applied to each image part. In the chapter the description of morphology and its applications are divided into three levels. At the first level the two basic operations, Hit and Fit, can be found. On top of them are built Dilation and Erosion, which basically expands or shrinks the number of white pixels in the binary image, respectively. Exactly how is dependent on the content of the structuring element. At the third level the compound operations are located. They are combinations of the erosion and dilation operations. In the chapter three such operations are presented; Opening, Closing and Boundary Detection.
Thomas B. Moeslund

7. BLOB Analysis

Abstract
The purpose of this chapter is to describe methods that can locate and recognize particular objects. Imagine an image containing a number of coins and faced with the question of designing an algorithm that can find all the big coins in the image. The approach is to first binarize the image using a method from a previous chapter. Each object (coin) in the binary image is now defined as a group of connected white pixels, a so-called BLOB. This process is known as BLOB extraction and a grass-fire inspired algorithm for this purpose is described in this chapter. The next step is to extract a number of characteristics, denoted features, for each BLOB. In the case of the coin, the relevant features would be size, roundness, and center of gravity. Lastly the features of each BLOB are compared with the features of a prototype model of the object the system is looking for (big coin) and a classification is performed.
Thomas B. Moeslund

8. Segmentation in Video Data

Abstract
A video sequence is in principle a sequence of images. The methods presented in the previous chapters therefore apply equally well to a video sequence as to an image. One image is simply processed at a time. There are, however, two differences between a video sequence and an image. First, working with video allows us to consider temporal information and hence segment objects based on their motion. Second, video acquisition and image acquisition may not be the same, and that can have some consequences. The latter is first considered by describing the notion of the framerate of the camera together with how video data is compressed. Next the chapter details the most fundamental segmentation algorithm related to video data, namely background subtraction. The principal of the core functionality is laid out followed by different schemes of optimizing the method. Finally the related image differencing method is presented.
Thomas B. Moeslund

9. Tracking

Abstract
One of the central questions in video processing is how to follow an object over time—or in other words finding the trajectory of an object. This chapter introduces a framework for doing so, namely the so-called predict-match-update framework. First the notion of prediction is presented, which basically says something about where the object is expected to be in the future. For this purpose a motion model of the object is required. Different motion models are discussed. It is then described how prediction can be used to introduce a ROI in the following image and how this relates to the uncertainty in the tracking as such. Next the chapter describes how prediction can aid in the process of tracking multiple objects. Here a number of fundamental problems related to tracking are introduced. These include the merging and splitting of objects together with the problematic situation when predicted objects cannot be detected in the image.
Thomas B. Moeslund

10. Geometric Transformations

Abstract
Most people have tried to do a geometric transformation of an image when preparing a presentation or when manipulating an image. The two most well-known are perhaps rotation and scaling, but others exist. In this chapter it is described how such transformations operate and the issues that need to be considered when doing such transformations are discussed. The term “geometric” transformation refers to the class of image transformation where the geometry of the image is changed but the actual pixel values remain unchanged. The chapter first focusses on the class of affine transformations, including translation, scaling, rotation, and shearing. Next, some of the practical issues that need to be considered when implementing such transformations are discussed. Lastly the chapter introduces the more advanced transformation known as homography and relates this to the well-known problems of keystoning and camera calibration.
Thomas B. Moeslund

11. Visual Effects

Abstract
In some situations the end goal of video and image processing is not to extract information, but rather to create some kind of visual effect. Or in other words, just for the fun of it. This can be done in many different ways, where some are more interesting than others. In this chapter ten different methods for creating such visual effects are presented. The first five are based on manipulation of the actual pixel values and the last five on geometric transformations. The effect of each method is illustrated on the same two images so a visual comparison is possible.
Thomas B. Moeslund

12. Application Example: Edutainment Game

Abstract
The purpose of most video and image processing is to construct systems that take video as input and as output deliver some useful information. In this chapter such a system is built from scratch in order to allow the reader an inside into the design considerations required when building real systems. The concrete system that is designed in this chapter is an edutainment system. The chapter is written as a dialog between two software designers in order to enhance the considerations that are required when designing and constructing video and image processing systems.
Thomas B. Moeslund

13. Application Example: Coin Sorting Using a Robot

Abstract
This chapter is similar to the previous chapter except that the concrete system that is designed is different. In this chapter a video processing system capable of sorting coins is designed.
Thomas B. Moeslund
Additional information