Skip to main content
main-content
Top

About this book

Multicore Programming Using the ParC Language discusses the principles of practical parallel programming using shared memory on multicore machines. It uses a simple yet powerful parallel dialect of C called ParC as the basic programming language. Designed to be used in an introductory course in parallel programming and covering basic and advanced concepts of parallel programming via ParC examples, the book combines a mixture of research directions, covering issues in parallel operating systems, and compilation techniques relevant for shared memory and multicore machines.

Multicore Programming Using the ParC Language provides a firm basis for the ‘delicate art’ of creating efficient parallel programs. Students can exercise parallel programming using a simulation software, which is portable on PC/Unix multicore computers, to gain experience without requiring specialist hardware. Students can also help to cement their learning by completing the great many challenging and exciting exercises which accompany each chapter.

Table of Contents

Chapter 1. Basic Concepts in Parallel Algorithms and Parallel Programming

Abstract
The world of parallel processing is complex and combines many different ideas together. We first consider the question what is a parallel machine? We answer this question by presenting a model to build parallel machines. Separately, we consider the need to define what “parallel programs” are. We use partial orders to define the notion of parallel programs and show how they can be potentially executed on parallel machines.
Yosi Ben-Asher

Chapter 2. Principles of Shared Memory Parallel Programming Using ParC

Abstract
This chapter introduces the basic concepts of parallel programming. It is based on the ParC language, which is an extension of the C programming language with block-oriented parallel constructs that allow the programmer to express fine-grain parallelism in a shared memory model. It can be used to express parallel algorithms, and it is also conducive for the parallelization of sequential C programs. The chapter covers several topics in shared memory programming. Each topic is presented with simple examples demonstrating its utility. The chapter supplies the basic tools and concepts needed to write parallel programs and covers these topics:
  • Practical aspects of threads, the sequential “atoms” of parallel programs.
  • Closed constructs to create parallelism.
  • Possible bugs.
  • The structure of the software environment that surrounds parallel programs.
  • The extension of C scoping rules to support private variables and local memory accesses.
  • The semantics of parallelism.
  • The discrepancy between the limited number of physical processors and the much larger number of parallel threads used in a program.
Yosi Ben-Asher

Chapter 3. Locality and Synchronization

Abstract
The discussion so far has assumed a shared memory address space with uniform access cost to every address. This assumption is not practical and in particular for multicore machines where some of the memory references made by a parallel program can be significantly longer than other memory references. In this chapter we consider a simplified model of parallel machines that demonstrates this claim and is used as a “formal” model to study ParC’s memory references. Though this model is not simulating a multicore machine it can be regarded as an intermediate stage between a uniform cost of shared memory references and the complexity of real multicore machines.
Yosi Ben-Asher

Chapter 4. Multicore Machines

Abstract
Multicore machines are an extension of the single core personal computer that include several processors (cores) and a shared memory. As such, they are suitable for running parallel programs, including ParC, that use shared memory. Multicore machines replace single processor personal computers and are thus widely used. In this work we will use the term “core” instead of processor or CPU to indicate that several cores are packed in a single chip or actually in one die, functioning as a multi-processor machine. Basically, the parallelism available in multicore machines is used by the operating system to execute several unrelated processes in parallel (e.g., running two compilations and a web search on different cores). We will demonstrate that multicore machines can be used to execute parallel programs efficiently, programs that spawn many threads all communicating through shared memory. However, as an extension of a single core machine designed for personal computers, the mechanism using the shared memory is not as effective as it would have been had it been designed from scratch as a parallel machine. In fact, the shared memory of multicore machines is basically a simulation of shared memory over the single port memory module of the single core personal computer. Thus, it is important to understand how the shared memory of multicore machines works in order to determine how ParC can be implemented and used by multicore machines.
Yosi Ben-Asher

Chapter 5. Improving the Performance of Parallel Programs: The Analytical Approach

Abstract
In this chapter we will examine the gap between the expected execution time of a parallel algorithm and the actual running time achieved by executing its encoding as a ParC program in an actual shared memory machine. Though there is such a gap for sequential programs, it is more problematic with parallel programs where users often encounter cases of parallel programs that fail to run fast enough or as fast as expected. In particular, a parallel program that runs on a parallel machine with P processors is expected to run about P times faster than its sequential version. There are two issues involved with this problem:
  • Determining the execution time of a parallel program and comparing it with a desired execution time.
  • If there is a significant gap between the two, we need to identify which factors in the parallel program should be corrected so that the performance is improved.
Yosi Ben-Asher

Chapter 6. Compilation Techniques

Abstract
Compiler technologies are crucial for the efficient execution of sequential programs, a statement which is also true for parallel programs. For parallel programs its the operating system performs most of the work. As a result, the overhead for scheduling and distributed shared memory simulation increases. In this chapter we present simple compilation techniques that can be used to guarantee efficient execution of shared memory parallel programs and in particular for multicore machines. We address the difficulties involved with supporting preemption/context-switch of threads in compiled code. Note that preemption is crucial for fair execution of shared memory programs but not necessarily every thread should be preempted repeatedly.
Yosi Ben-Asher

Chapter 7. Working with Sequential Versions

Abstract
In here we propose a programming methodology for developing efficient parallel programs. We basically advocate the idea that the parallel program is obtained from its sequential version by a sequence of gradual changes as follows:
  • Most of the developing stages are made sequentially and in gradual stages.
  • Each stage can be debugged sequentially, such that many bugs are found and removed before the parallel program is tested. Clearly, debugging a parallel program is harder than a sequential one, due to the fact that for a parallel program all possible execution orders must be tested.
  • Dependence analysis of the sequential program can reveal potential ways to invoke more parallelism to the desired parallel program.
Yosi Ben-Asher

Chapter 8. Performance and Overhead Measurements

Abstract
The previous sections used a set of parameters (overheads, number of processors, etc) to analyze and predict the performance of parallel programs. This section addresses the problem of measuring those and other important parameters, for a given shared memory machine. These parameters are not likely to be found in the manuals, since the combination of the operating system with the hardware is not usually documented (it depends for instance, on the underlying compiler). In this section we use a specific multicore parallel machine called MC, however the proposed methods should work for other machines as well.
Yosi Ben-Asher
Additional information