Computers are just as busy as the rest of us nowadays. They have lots of tasks to do at once, and need some cleverness to get them all done at the same time.That's why threads are seen more and more often as a new model for programming. Threads have been available for some time. The Mach operating system, the Distributed Computer Environment (DCE), and Windows NT all feature threads.One advantage of most UNIX implementations, as well as DCE, is that they conform to a recently ratified POSIX standard (originally 1003.4a, now 1003.1c), which allows your programs to be portable between them. POSIX threads are commonly known as pthreads, after the word that starts all the names of the function calls. The standard is supported by Solaris, OSF/1, AIX, and several other UNIX-based operating systems.The idea behind threads programming is to have multiple tasks running concurrently within the same program. They can share a single CPU as processes do, or take advantage of multiple CPUs when available. In either case, they provide a clean way to divide the tasks of a program while sharing data.A window interface can read input on dozens of different buttons, each responsible for a separate task. A network server has to accept simultaneous calls from many clients, providing each with reasonable response time. A multiprocessor runs a number-crunching program on several CPUs at once, combining the results when all are done. All these kinds of applications can benefit from threads.In this book you will learn not only what the pthread calls are, but when it is a good idea to use threads and how to make them efficient (which is the whole reason for using threads in the first place). The authors delves into performance issues, comparing threads to processes, contrasting kernel threads to user threads, and showing how to measure speed. He also describes in a simple, clear manner what all the advanced features are for, and how threads interact with the rest of the UNIX system.Topics include:
Code is often written in a serialized (or sequential) fashion. What is meant by the term serialized? Ignoring instruction level parallelism (ILP), code is executed sequentially, one after the next in a monolithic fashion, without regard to possibly more available processors the program could exploit. Often, there are potential parts of a program where performance can be improved through the use of threads.
With increasing popularity of machines with symmetric multiprocessing (largely due in part to the rise of multicore processors), programming with threads is a valuable skill set worth learning.
Why is it that most programs are sequential? One guess would be that students are not taught how to program in a parallel fashion until later or in a difficult-to-follow manner. To make matters worse, multithreading non-trivial code is difficult. Careful analysis of the problem, and then a good design is not an option for multithreaded programming; it is an absolute must.
We will dive into the world of threads with a little bit of background first. We will examine thread synchronization primitives and then a tutorial on how to use POSIX pthreads will be presented.
Threads can provide benefits... for the right applications! Don't waste your time multithreading a portion of code or an entire program that isn't worth multithreading.
Gene Amdahl argued the theoretical maximum improvement that is possible for a computer program that is parallelized, under the premise that the program is strongly scaled (i.e. the program operates on a fixed problem size). His claim is a well known assertion known as Amdahl's Law. Essentially, Amdahl's law states that the speedup of a program due to parallelization can be no larger than the inverse of the portion of the program that is immutably sequential. For example, if 50% of your program is not parallelizable, then you can only expect a maximum speedup of 2x, regardless the number of processors you throw at the problem. Of course many problems and data sets that parallel programs process are not of fixed size or the serial portion can be very close to zero. What is important to the reader here, is to understand that most interesting problems that are solved by computer programs tend to have some limitations in the amount of parallelism that can be effectively expressed (or introduced by the very mechanism to parallelize) and exploited as threads or some other parallel construct.
It must be underscored how important it is to understand the problem the computer program is trying to solve first, before simply jumping in head first. Careful planning and consideration of not only what the program must attack in a parallel fashion and the means to do so by way of the algorithms employed and the vehicle for which they are delivered must be performed.
There is a common saying: "90% of processor cycles are spent in 10% of the code." This is more formally known as the Pareto Principle. Carefully analyze your code or your design plan; don't spend all of your time optimizing/parallelizing the 90% of the code that doesn't matter much! Code profiling and analysis is outside of the scope of this document, but it is recommended reading left to those unfamiliar with the subject.
There are different ways to use threads within a program. Here, three common thread design patterns are presented. There is no hard and fast rule on which is the best. It depends on what the program is intended to tackle and in what context. It is up to you to decide which best pattern or patterns fit your needs.
One thread dispatches other threads to do useful work which are usually part of a worker thread pool. This thread pool is usually pre-allocated before the boss (or master) begins dispatching threads to work. Although threads are lightweight, they still incur overhead when they are created.
Similar to how pipelining works in a processor, each thread is part of a long chain in a processing factory. Each thread works on data processed by the previous thread and hands it off to the next thread. You must be careful to equally distribute work and take extra steps to ensure non-blocking behavior in this thread model or you could experience pipeline "stalls."
Threads may operate on disparate data, but often threads may have to touch the same data. It is unsafe to allow concurrent access to such data or resources without some mechanism that defines a protocol for safe access! Threads must be explicitly instructed to block when other threads may be potentially accessing the same resources.
Mutual exclusion is the method of serializing access to shared resources. You do not want a thread to be modifying a variable that is already in the process of being modified by another thread! Another scenario is a dirty read where the value is in the process of being updated and another thread reads an old value.
Mutual exclusion allows the programmer to create a defined protocol for serializing access to shared data or resources. Logically, a mutex is a lock that one can virtually attach to some resource. If a thread wishes to modify or read a value from a shared resource, the thread must first gain the lock. Once it has the lock it may do what it wants with the shared resource without concerns of other threads accessing the shared resource because other threads will have to wait. Once the thread finishes using the shared resource, it unlocks the mutex, which allows other threads to access the resource. This is a protocol that serializes access to the shared resource. Note that such a protocol must be enforced for the data or resource a mutex is protecting across all threads that may touch the resource being protected. If the protocol is violated (e.g., a thread modifies a shared resource without first requesting a mutex lock), then the protocol defined by the programmer has failed. There is nothing preventing a thread programmer, whether unintentionally (most often the case, i.e., a bug -- see race conditions below) or intentionally from implementing a flawed serialization protocol.
As an analogy, you can think of a mutex as a safe with only one key (for a standard mutex case), and the resource it is protecting lies within the safe. Only one person can have the key to the chest at any time, therefore, is the only person allowed to look or modify the contents of the chest at the time it holds the key.
The code between the lock and unlock calls to the mutex, is referred to as a critical section. Minimizing time spent in the critical section allows for greater concurrency because it potentially reduces the amount of time other threads must wait to gain the lock. Therefore, it is important for a thread programmer to minimize critical sections if possible.
An important problem associated with mutexes is the possibility of deadlock. A program can deadlock if two (or more) threads have stopped execution or are spinning permanently. For example, a simple deadlock situation: thread 1 locks lock A, thread 2 locks lock B, thread 1 wants lock B and thread 2 wants lock A. Instant deadlock. You can prevent this from happening by making sure threads acquire locks in an agreed order (i.e. preservation of lock ordering). Deadlock can also happen if threads do not unlock mutexes properly.
A race condition is when non-deterministic behavior results from threads accessing shared data or resources without following a defined synchronization protocol for serializing such access. This can result in erroneous outcomes that cause failure or inconsistent behavior making race conditions particularly difficult to debug. In addition to incorrectly synchronized access to shared resources, library calls outside of your program's control are common culprits. Make sure you take steps within your program to enforce serial access to shared file descriptors and other external resources. Most man pages will contain information about thread safety of a particular function, and if it is not thread-safe, if any alternatives exist (e.g., gethostbyname() and gethostbyname_r()).
Another problem with mutexes is that contention for a mutex can lead to priority inversion. A higher priority thread can wait behind a lower priority thread if the lower priority thread holds a lock for which the higher priority thread is waiting. This can be eliminated/reduced by limiting the number of shared mutexes between different priority threads. A famous case of priority inversion occurred on the Mars Pathfinder.