Skip to main content
main-content
Top

About this book

This textbook introduces the “Fundamentals of Multimedia”, addressing real issues commonly faced in the workplace. The essential concepts are explained in a practical way to enable students to apply their existing skills to address problems in multimedia. Fully revised and updated, this new edition now includes coverage of such topics as 3D TV, social networks, high-efficiency video compression and conferencing, wireless and mobile networks, and their attendant technologies.
Features: presents an overview of the key concepts in multimedia, including color science; reviews lossless and lossy compression methods for image, video and audio data; examines the demands placed by multimedia communications on wired and wireless networks; discusses the impact of social media and cloud computing on information sharing and on multimedia content search and retrieval; includes study exercises at the end of each chapter; provides supplementary resources for both students and instructors at an associated website.

Table of Contents

Introduction and Multimedia Data Representations

Frontmatter

Chapter 1. Introduction to Multimedia

Abstract
In this chapter, we discuss the uses of the term “multimedia” since people may have quite different, even opposing, viewpoints on what this means. This textbook is aimed at computer science or engineering students, and consequently a more application-oriented view of what multimedia consists of is what is emphasized. The convergence going on in this field, with computers, smartphones, games, digital TV including 3D, multimedia-based search, and so on converging in technology, means that multimedia is a field that is essentially mandatory for such students to study. Moreover with the pervasive penetration of wireless mobile networks and development of mobile applications for smartphones and tablets, and the advent of social media, the contents of a multimedia course arguably forms the basis for much of the further studies many students will engage in. The components of multimedia are first introduced and then current multimedia research topics and projects are discussed to put the field into a perspective of what is actually at play at the edge of work in this field. For a fuller perspective, the remarkably short history of multimedia is synopsized, from the development of the World Wide Web up to current pervasive social media and anytime/anywhere access. Since multimedia is indeed a practical field, Chapter 1 also supplies an overview of multimedia software tools, such as video editors and digital audio programs, that are typically used to produce multimedia products such as those that are indeed produced in a course in this subject.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 2. A Taste of Multimedia

Abstract
In Chapter 2, we introduce a set of tasks and concerns that are considered in studying multimedia, from the point of view of a technically comfortable reader. When it comes to multimedia production and presentation, the issues of graphics styles fonts are discussed, with some surprising conclusions. To provide a further “taste” of multimedia, we show how simple animations may proceed. To round out the discussion of such tasks, we consider a “build your own” video-transition problem, where the intent would be to generate one’s own video transition. We then go on to review the current and future state of multimedia sharing and distribution, outlining later discussions of social media, video sharing, and new forms of TV. Finally, the details of some popular multimedia tools are set out for a quick start into the field.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 3. Graphics and Image Data Representations

Abstract
In this chapter we discuss how images are stored, starting off with the simplest image data, 1 bit per pixel. Grayscale, 8-bit images are considered next, and we consider the problem of how to actually send images to a printer and what problems and solutions this entails. RGB 24-bit color images are then explored, looking at the three color planes as images in their own right, and capturing RGB information content via a three-dimensional histogram of colors is introduced. Returning to 8 bits, the concept of a color palette is discussed; but this then brings up the problem of just what colors shall we distinguished by inclusion in the palette. To this end, the problem of how to devise a color lookup table is considered, with the Median-Cut algorithm brought forward as one solution. Popular file formats, GIF, JPEG, PNG, TIFF, PDF and so on are introduced. The GIF format is simple but representative, and for this reason it is set out in some detail. To wrap up the discussion the PTM (Polynomial Texture Mapping) technique and format is included, with the capability of virtually exploring the surfaces of cultural heritage precious objects.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 4. Color in Image and Video

Abstract
In this chapter, color and color spaces are introduced; without excellent color, images and video would look unacceptable. But how is this accomplished in the real world, light bounces off surfaces and the resulting spectrum enter the eye. Retinal sensors then take over, forming a trio of signal values in (combinations of) red, green, and blue. But if a camera is used instead, then the question arises of just what values should properly be stored, with the objective of eventually perceiving the right colors when the image is viewed on a display. Given that the display itself has spectral characteristics, and again the eye’s sensors come into play, what is the pipeline for producing pleasing-looking images, when starting off from camera images? This chapter sets out the basics of Color Science, which is the science of human vision. It then goes on to consider image formation in the eye and in cameras, arriving at the question of specifications for color display screens. The mismatch between human vision and display systems brings up the notion of out-of-gamut colors and white point correction, as well as perceptual color. Finally, other color-coordinate schemes are introduced, including color in printers and video.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 5. Fundamental Concepts in Video

Abstract
In this chapter, we introduce the principal notions needed to understand video. Here we consider the following aspects of video and how they impact multimedia applications: (1) Analog video; (2) Digital video; (3) Video display interfaces; (4) 3D video. Knowledge of video must include historically important standards in Analog Video. Digital Video has different ideas emphasized, and we go on to look at HDTV as well as Ultra High Definition TV. After a discussion of Video Display Interfaces, we enter into the new and exciting field of 3D Video and TV.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 6. Basics of Digital Audio

Abstract
Audio data has special and unique properties: e.g. while it is useful to occasionally drop a video frame from a stream, we simply cannot do the same with audio information or all sense will be lost from that dimension. Therefore, how to sensibly compress sound information is an important question. We begin with a discussion of just what makes up sound, and consider the digitization of sound information. We introduce the Nyquist Theorem as a fundamental property of sampling. Signal-to-Noise Ratio (SNR) is defined and adopted as a useful measure of audio (and in general, signal) quality, including the effect of quantization noise. Linear and nonlinear quantization, including companding for audio data, are discussed. Synthetic sounds are introduced, and we then go on to a thorough introduction to the use of MIDI as an enabling technology to capture, store, and play back musical notes. We look at some details of audio quantization, and give introductory information on how digital audio is dealt with for storage and transmission. This entails a first discussion of how subtraction of signals from predicted values yields numbers that are close to zero, and hence easier to deal with Pulse Code Modulation (PCM) is introduced, followed by differential coding of audio and lossless predictive coding. Finally, Differential Pulse Code Modulation (DPCM) and Adaptive DPCM are introduced, and we take a look at encoder/decoder schema.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Multimedia Data Compression

Frontmatter

Chapter 7. Lossless Compression Algorithms

Abstract
In this chapter, data compression as it relates to multimedia information is studied from the point of view of lossless algorithms, where the input data is essentially exactly recoverable from the compressed data Lossy algorithms, for which this is not the case, are presented in Chapter 8. Here we introduce the fundamentals of information theory and algorithms whose goal is a savings in bitrate given the entropy, especially Huffman Coding and its adaptive version. We then study Dictionary-based Coding (as in Winzip) and go on to a detailed discussion of Arithmetic Coding. Finally, Lossless Image Compression is examined specifically.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 8. Lossy Compression Algorithms

Abstract
In this chapter we examine compression algorithms such that recovered input data cannot be exactly reconstructed from compressed version. This termed “loss”. What we have, then, is a tradeoff between efficient compression versus a less accurate version of the input data. This tradeoff is captured in the Rate-Distortion Theory. Most of the loss occurs in quantization, and we introduce both Uniform and Nonuniform Scalar Quantization, and then Vector Quantization. Transform Coding, especially the Discrete Cosine Transform (DCT), is the main step in JPEG compression. We study DCT in great length and provide several examples. A newer version, JPEG2000, is supported by Wavelet-Based Coding so we introduce this method here and go on to study Wavelet Packets, the Embedded Zerotree of Wavelet Coefficients, and SPIHT.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 9. Image Compression Standards

Abstract
Recent years have seen an explosion in the availability of digital images. In this chapter, we examine some current image compression standards and demonstrate how techniques presented in Chaps. 7 and 8 are applied in practice. We first describe how transform coding based on DCT (Discrete Cosine Transform), quantization, and entropy coding are explored in the standard JPEG, used in most images, then go on to look at the wavelet-based JPEG2000 standard. Two other standards, JPEG-LS—aimed particularly at a lossless JPEG, outside the main JPEG standard—and JBIG, for bilevel image compression, are included for completeness.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 10. Basic Video Compression Techniques

Abstract
A video can be viewed as a sequence of images stacked in the temporal dimension. Since the frame rate of the video is often relatively high, the contents of consecutive frames are usually similar. In other words, video has temporal redundancy. All digital video compression algorithms (including H.264 and H.265) adopt the so-called motion compensation based video compression technique to exploit temporal redundancy. In this chapter the fundamentals of motion compensation based video compression are introduced. We cover sufficient details of H.261, followed by a brief discussion on H.263. These form the foundation for all modern video compression standards.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 11. MPEG Video Coding: MPEG-1, 2, 4, and 7

Abstract
In this chapter, we examine the ideas behind the MPEG standards, starting with MPEG-1, -2, and then MPEG-4, and 7. In MPEG-1 and -2, bidirectional search for motion vectors is introduced. Interlaced video and high-definition TV (HDTV) are supported in MPEG-2. Moreover, it supports various scalable codings such as SNR, spatial, temporal, and their combination. MPEG-4 and -7 studied the issue of video coding based on video objects. Although the visual object-based video representation and compression approach developed in MPEG-4 and 7 have not been commonly used in current popular standards such as H.264 and H.265, it has great potential to be adopted in the future when the necessary Computer Vision technology for automatic object detection becomes more readily available.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 12. New Video Coding Standards: H.264 and H.265

Abstract
We introduced basic video compression techniques in Chaps. 10 and 11. We examined the ideas behind the MPEG standards, starting with MPEG-1, 2, and then MPEG-4, and 7. In this chapter, we introduce the newer video compression standards H.264 and H.265. For efficiency, integer transform is adopted in the place of the Discrete Cosine Transform (DCT). Other new features include quarter-pixel accuracy in motion vectors, predictive coding in intra frames, in-loop deblocking filtering, and Context-Adaptive Binary Arithmetic Coding (CABAC). Moreover, H.265 also facilitates parallel processing. With their superior compression performance over H.263 and MPEG-2, H.264 and H.265 are currently the leading candidates to carry a whole range of video contents on many potential applications.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 13. Basic Audio Compression Techniques

Abstract
In this chapter, compression of audio information is reviewed, with special consideration paid to speech compression. To begin with, we recall some of the issues covered in Chap. 6 on digital audio in multimedia. Here, this is combined with techniques that exploit the temporal redundancy present in audio signals. We extend the Pulse Code Modulation (PCM) scheme to DPCM, prepending the word “Differential,” as briefly introduced in Chap. 6 but fleshed out here. Specifically, in this chapter, we look at ADPCM, Vocoders, and more general Speech Compression: LPC, CELP, MBE, and MELP. Adaptive DPCM is ADPCM. In speech coding, a number of standards have evolved and we set these out here, including some of their fundamental strategies. We then go on to study coders (encoding/decoding algorithms) specifically aimed at speech compression. The properties of Vocoders are examined, including the notion of phase insensitivity, channels, and formants. Next, LPC (Linear Predictive Coding) vocoders are discussed, followed by CELP (Code Excited Linear Prediction), a more complex family of coders. Hybrid Excitation Vocoders are another large class of speech coders, and we round the discussion off by having a look at MBE (Multi-Band Excitation) and MELP (Multiband Excitation Linear Predictive) vocoders.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 14. MPEG Audio Compression

Abstract
In this chapter, we consider the set of tools for audio compression applicable to general audio, such as music, as opposed to specifically speech compression developed under the aegis of the Motion Picture Experts Group, MPEG. Surprisingly, this subject has much to do with psychology, specifically within the field of aural sense perception–psychoacoustics. The phenomena of frequency masking and temporal masking are exploited in a waveform coding approach that makes use of a psychoacoustic model of hearing, with the result generally referred to as perceptual coding. We look in some detail at audio compression as it benefits from psychoacoustics, and how this plays out in MPEG-1 Audio Compression (mp3) and later MPEG audio developments: MPEG-2 and 4 including MPEG Advanced Audio Coding (AAC). We begin the study of psychoacoustics as it applies here with the determination of the equal-loudness relations, which leads to a discussion of frequency masking. Critical Bands are introduced as well as the Bark Unit. Temporal Masking is a familiar phenomenon from our own experience. MPEG Audio is introduced to make use of these properties, along with MPEG Audio Layers including MP3. MPEG-2 AAC (Advanced Audio Coding) is considered next and MPEG-4 Audio is also discussed.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Multimedia Communications and Networking

Frontmatter

Chapter 15. Network Services and Protocols for Multimedia Communications

Abstract
Computer communication networks are essential to the modern computing environment we know and have come to rely upon. Multimedia communications and networking share all the major issues and technologies of computer communication networks. Indeed, the evolution of the Internet, particularly in the past two decades, has been largely driven by the ever-growing demands from numerous conventional and new generation multimedia applications. As such, multimedia communications and networking have become a very active area for research and industrial development. This chapter will start with a review of the common terminologies and techniques in modern computer communication networks, specifically the Internet, followed by an introduction to various network services and protocols for multimedia communications and content sharing, since they are becoming a central part of most contemporary multimedia systems. We also use Internet telephony as an example to illustrate the design and implementation of a typical interactive multimedia communication application.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 16. Internet Multimedia Content Distribution

Abstract
A simple client/server architecture can quickly become infeasible when more media contents are made available online and more users are network- and multimedia-ready. There have been significant studies on efficient content distribution over the Internet, targeting a large number of users. Most of them were optimized for delivering conventional web objects or for file download. The huge size, intensive bandwidth use, and rich interactivity of streaming media however pose new challenges. Many emerging applications, such as Internet TV and live event broadcast, further demand real-time multimedia streaming services with a massive audience, and the scaling challenge can be enormous. In this chapter, we discuss content distribution mechanisms that enable high quality and scalable multimedia content streaming, including proxy caching, multicast, content distribution networks, peer-to-peer, and HTTP streaming.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 17. Multimedia Over Wireless and Mobile Networks

Abstract
The rapid developments in computer and communication technologies have made ubiquitous computing a reality. From cordless phones in the early days to later cellular phones, wireless mobile communication has been the core technology that enables anywhere and anytime information access and sharing. The new generation of smart mobile devices that emerged only in recent years are driving the revolution further. Multimedia over wireless and mobile networks share many similarities as over the wired Internet; yet the unique characteristics of wireless channels and the frequent movement of users also pose new challenges that must be addressed. This chapter reviews wireless channel characteristics, and describes representative wireless technologies for wide-area cellular networks (from 1G to 4G) and local area networks (WiFi and Bluetooth). We then examine the key issues for multimedia over wireless networks, including error detection and correction, error-resilient coding, error concealment, and re-synchronization. Finally, we examine mobility management, including both the network layer mobile IP and the link layer handoff.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Multimedia Information Sharing and Retrieval

Frontmatter

Chapter 18. Social Media Sharing

Abstract
Social media, a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, allow the creation and exchange of user-generated content. With the pervasive penetration of wireless mobile networks, the advanced development of smartphones and tablets, and the massive market of mobile applications, social media contents can now be easily generated and accessed at any time and anywhere. They have substantially changed the way organizations, communities, and individuals communicate.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 19. Cloud Computing for Multimedia Services

Abstract
The emergence of cloud computing has dramatically changed the service models for modern computer applications. Utilizing elastic resources in powerful data-centers, it enables end users to conveniently access computing infrastructure, platforms, and software provided by remote cloud providers (e.g., Amazon, Google, and Microsoft) in a pay-as-you-go manner or with long-term lease contracts. This new generation of computing paradigm, offering reliable, elastic, and cost-effective resource provisioning, can significantly mitigate the overhead for enterprises to construct and maintain their own computing, storage, and network infrastructures. It has provided countless new opportunities for both new and existing applications. In this chapter, we provide an overview of cloud computing, focusing on its impact on multimedia services. We then discuss multimedia content sharing with cloud storage and multimedia computation offloading to the cloud. We also use cloud gaming as a case study to examine the role of the cloud in the new generation of interactive multimedia services.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu

Chapter 20. Content-Based Retrieval in Digital Libraries

Abstract
This chapter is concerned with finding images or video from (possibly very large) collections of these. Each type of modality in multimedia information, e.g., text and image, provides its own type of semantic information to help in search for content. That is, text-based search is bolstered by information in images and video, from low-level features to high-level semantic content. In this book we focus only on techniques and systems that make use of image features themselves, without text, to retrieve images or video from databases or from the web. Detail is provided on specific features useful to this purpose. Search engines devised on these features are said to be content based: the search is guided by image similarity measures based on the statistical content of each image. At a higher semantic level,action recognition in video is also examined in some detail.
Ze-Nian Li, Mark S. Drew, Jiangchuan Liu
Additional information