#### System Architecture

### 9 Mutual Exclusion

Critical Section & Critical Region Busy-Waiting versus Blocking Lock, Semaphore, Monitor

> November 24 2008 Winter Term 2008/09 Gerd Liefländer



- HW Precondition
- Mutual Exclusion
  - Problem
  - Critical Regions
  - Critical Sections
- Requirements for valid solutions
- Implementation levels
  - User-level approaches
  - HW support
  - Kernel support
- Semaphores
- Monitors

### **J**Literature

- Bacon, J.: Operating Systems (9, 10, 11)
- Nehmer, J.: Grundlagen moderner BS (6, 7, 8)
- Silberschatz, A.: Operating System Concepts (4, 6, 7)
- Stallings, W.: Operating Systems (5, 6)
- Tanenbaum, A.: Modern Operating Systems (2)
- Research papers on various locks

# Atomic Instructions

- To understand concurrency, we need to know what the underlying indivisible HW instructions are
- Atomic instructions run to completion or not at all
  - It is indivisible: it cannot be stopped in the middle and its state cannot be modified by someone else in the middle
  - Fundamental building block: without atomic instructions ⇒ we have no way for threads to work together properly
- load, store of words are usually atomic
- However, some instructions are not atomic
  - VAX and IBM 360 had an instruction to copy a whole array

### **Mutual Exclusion**

#### Mutual Exclusion Problem

Assume at least two concurrent activities

- 1. Access to a physical or to a logical resource or to shared data has to be done exclusively
- 2. A program section of an activity, that has to be executed indivisibly and exclusively is called critical section CS<sup>1</sup>
- 3. We have to establish a specific execution protocol in front and after each CS in order to provide its mutual exclusive execution
- 4. Activities executing a CS are either threads or processes<sup>2</sup>

<sup>1</sup>Some textbooks require atomic critical sections

<sup>2</sup>In the kernel exception/interrupt handlers also have CSs

#### Example: Critical Section

| <pre>integer a, b =1;</pre> | {shared data}             |
|-----------------------------|---------------------------|
| {Thread 1}                  | {Thread 2}                |
| while true do               | while true do             |
| a = a + 1;                  | b = b + 2;                |
| b = b + 1;                  | a = a + 2;                |
| {do something else}<br>od   | {do something else}<br>od |

Both Threads read (and write to) shared global data a, b  $\Rightarrow$  data inconsistency, a !=b after some time



- All related CSs in the threads of a multithreaded application form a critical region
- A CS is related to another one iff both CSs should not run concurrently, e.g. in case they access
  - an exclusive resource
  - the same global data
- Non related CSs can be executed concurrently at will

#### **Second Second Second Example: Critical Regions**

Suppose: All  $T_i$  are KLTs of the same application task the IP of  $T_1$  is in its "red CS" All red CSs build up the red critical region CR All blue CSs build up another CR

<u>*Question:*</u> What other IP<sub>i</sub> are valid at the same time?



#### Framework of Critical Sections

Sections of code implementing this protocol:

- enter\_section
- critical section (CS)
- exit\_section

The remaining code (outside of a CS):

remainder section (RS)

We need a serialization protocol:

Results of involved threads no longer depend on arbitrary interleaving of their execution phases



- Mutual exclusion of two related critical sections
- The solution space depends on the design of the
  - waiting function in case of a locked CS
  - enter\_function 200 & exit\_function 200

## **Implementation Levels**

### Implementation Levels<sup>1</sup>

#### **User-level**

- Relies neither on HW instructions nor on specific kernel features
- HW-supported
  - Relies on special HW instructions
- Kernel-supported
  - Low-level
  - High-level



### Applications of Solution Levels

- A multi-threaded application consisting of CSs in its p>1 threads can be solved with
  - Coordination-objects provided by a thread library (e.g. userlevel monitor) in case of PULTs
  - Kernel-lock in case of KLTs
  - Specific HW instructions (portability problem?)
- An application consisting of CSs in different processes/tasks must be solved with
  - Kernel-locks or kernel-monitors or
  - Specific HW instructions (portability problem?)
  - Shared memory concept

# Requirements for Valid Solutions

#### Note:

In many textbooks only three requirements are postulated

#### **Four Necessary CS Requirements**

#### Exclusiveness

• At most one activity is in the related CS

#### Portability

- Make no assumptions about
  - speed
  - number of CPUs
  - scheduling policy
  - ...



#### Progress

 No activity running outside the related CS prevents another activity from entering its related CS

#### Bounded Waiting (no starvation)

• Waiting time of an activity in front of a CS must be limited

### Analysis: Necessary Requirements

#### Exclusiveness

- If not fulfilled the approach is incorrect
- Portability
  - Some approaches heavily depend on whether we have a
    - single- or a multi-processor
    - time-slice based preemptive scheduling
- Progress
  - Often violated by too simple approaches
- Bounded Waiting
  - Often violated by busy waiting <u>and</u> static priority scheduling

### Desirable Properties of CS

#### Performance<sup>1</sup>

- Overhead of entering and exiting a CS is small with respect to the execution time of the CS
- Nevertheless, avoid busy waiting whenever it's possible

#### Adaptability

- Depending on the current load
- Scalability
  - Does it still work well with t>>1 threads

#### Simplicity<sup>2</sup>

Should be easy to use

<sup>1</sup>One of the major goals in our research group

<sup>2</sup>very hard to decide ("do not use a sledgehammer to kill a fly"), however, that's exactly what we expect from you in the future

### **User-Level Solutions**

Study these approaches and "solutions" very carefully

There are many published approaches that do not fulfill all KIT criteria

#### User-Level Approaches/Solutions

#### Simplification:

We first only discuss problems with either two processes or with two KLTs of the same task or with two PULTs of the same task

- Synchronization is done via global variables
- User-level "solutions" work with busy-waiting
  - Waiting for an event to occur, by polling a condition variable, e.g.

#### while(condition!=true); //just spin

 Busy-waiting consume CPU-time, it is pure overhead, thus in many cases it is inefficient



Use global variable turn to denote who can enter CS

| while (TRUE) {                 | while (TRUE) {                |
|--------------------------------|-------------------------------|
| while (turn != 0) /* loop */ ; | while (turn != 1) /* loop */; |
| critical_region();             | critical_region();            |
|                                |                               |
| noncritical_region();          | noncritical_region();         |
| }                              | }                             |
| (a)                            | (b)                           |

Proposed solution to critical section problem (a) Thread 0. (b) Thread 1

### Analysis of Approach 1

Shared variable turn initialized.  $T_i$ 's CS executed iff turn = i  $T_i$  is busy waiting if  $T_j$  is in CS  $\Rightarrow$ mutual exclusion satisfied. Progress not satisfied since strict alternation of both CSs is presumed

```
turn = 0; /*shared*/
thread Ti:
repeat while(turn≠i){};
    CSi
    turn=j;
    RSi
forever
```

Analysis:

Suppose: long RS<sub>0</sub>, short RS<sub>1</sub>. If turn=0, T<sub>0</sub> may enter CS<sub>0</sub>, leaves it (turn=1), then is executing its very long RS<sub>0</sub>. Meanwhile T<sub>1</sub> in CS<sub>1</sub>, leaves it (turn=0), executes its short RS1. If T<sub>1</sub> tries to enter CS<sub>1</sub> soon after again, it must wait until T<sub>0</sub> leaves its long RS<sub>0</sub> and its following CS<sub>0</sub>

### Template for future Analysis

| Requirement          | Valid | Reason                                                                                              |
|----------------------|-------|-----------------------------------------------------------------------------------------------------|
| Mutual Exclusion     | yes   | Due to turn either thread is in ist CS                                                              |
| No. or speed of CPUs | yes   | Threads are alternating in their CS                                                                 |
| Scheduling policy    | NO    | On a single processor and static priority scheme none of the threads might enter their CS, lifelock |
| Progress             | NO    | A long RS prohibits, that the other thread can enter twice in a row                                 |
| Bounded Waiting      | yes   | In case of a fitting scheduling policy, no if one thread terminates earlier                         |
| Performance          | NO    | Busy waiting induces CPU overhead                                                                   |
| Scalability          | NO    | The disadvantages increase with more threads                                                        |



- What was the major problem of the first approach?
  - Yes, it is very simple, thus it is robust, but the TID of the PULTs/KLTs have been stored in the synchronization variable, to decide who is allowed to enter and who has to wait
- Next idea:
  - Every PULT/KLT has its own key for its critical section, thus we can achieve, that in case one thread terminates, the other one is still capable to enter its critical section CS
  - No every of the two PULTs/KLTs can compete independently from the other





```
enum boolean {false =0, true =1};
boolean flag[2]={false, false};
//indicating: no thread is initially in ist CS
```

```
Thread T0: Threwson Threwson Threwson Threwson Three Th
```

```
Thread T1:
while(true){
   while(flag[0]);
   flag[1]=true;
   CS;
   flag[1]=false;
   RS;
```

### Analysis of Approach 2

| Requirement          | Valid | Reason |
|----------------------|-------|--------|
| Mutual Exclusion     |       |        |
| No. or speed of CPUs |       |        |
| Scheduling policy    |       |        |
| Progress             |       |        |
| Bounded Waiting      |       |        |
| Performance          |       |        |
| Scalability          |       |        |

Analyze carefully and compare your analysis with the one in your textbook

| Approach3                                                                                                                                                                                                                                       | <pre>flag[2]={false,false};</pre>                                              |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| Keep boolean variables for each<br>KLT: <b>flag[0]</b> and <b>flag[1]</b><br>T <sub>i</sub> signals that it is ready to enter<br>CS by setting: <b>flag[i]=true</b><br>Mutual exclusion is satisfied <u>but</u><br>not the progress requirement | <pre>thread Ti: repeat flag[i]=true; while(flag[j]){}; CS flag[i]=false;</pre> |
| <u>Analysis:</u> Suppose following ex<br>$T_0$ : flag[0]=true<br>$T_1$ : flag[1]=true                                                                                                                                                           | RS<br>forever<br>ecution sequence holds:                                       |

What will happen?

<u>Result:</u> Both threads will wait forever, neither will ever enter its  $CS \Rightarrow classical deadlock$ 

### Analysis of Approach 3

| Requirement          | Valid | Reason |
|----------------------|-------|--------|
| Mutual Exclusion     |       |        |
| No. or speed of CPUs |       |        |
| Scheduling policy    |       |        |
| Progress             |       |        |
| Bounded Waiting      |       |        |
| Performance          |       |        |
| Scalability          |       |        |

Analyze carefully and compare your analysis with the one in your textbook



- One problem of approach 3 is, that the PULTs/KLTs set their state related to their CS without bothering the state of the other thread
- If both threads insist of being allowd to enter their CS, the result is a deadlock
- Idea:

Every thread sets its flag, indicating, that it wants to enter its CS, but it is willing to rest this flag, in order to give the other thread to enter first



```
enum boolean {false=0, true=1};
boolean flag[2]={false, false}; //no one in CS
Thread T0:
                                thread T1:
while(true) {
                                while(true) {
   flag[0]=true;
                                   flag[1]=true;
   while(flag[1]) {
                                   while(flag[0]{
    flag[0]=false;
                                       flag[1]=false
    //some delay
                                       //some delay
    flag[0]=true;
                                       flag[1]=true;
   }
                                     }
   CS;
                                    CS;
   flag[0]=false;
                                     flag[1]=false;
  RS;
                                    RS;
};
                                   };
```

### Analysis of Approach 4

| Requirement          | Valid | Reason |
|----------------------|-------|--------|
| Mutual Exclusion     |       |        |
| No. or speed of CPUs |       |        |
| Scheduling policy    |       |        |
| Progress             |       |        |
| Bounded Waiting      |       |        |
| Performance          |       |        |
| Scalability          |       |        |

Analyze carefully and compare your analysis with the one in your textbook

### Approach 5: Dekker-Algorithm<sup>1</sup>

```
enum boolean {false=0, true=1};
boolean flag[2]={false, false}; //no one in CS
int turn = 1; // signals what thread is to be preferred
Thread TO:
                            Thread T1:
while(true) {
                            while(true) {
   flag[0]=true;
                                      flag[1]=true;
   while(flag[1]) {
                               while(flag[0]{
     if (turn==1) {
                                   if [turn==0) {
    flag[0]=false;
                                   flag[1]=false;
    while(turn==1);
                                   while(turn==0);
    flag[0]=true;
                                   flag[1]=true;
   }
                                   }
   CS;
                                CS;
   turn=1;
                                turn=0;
   flag[0]=false;
                                flag[1]=false;
  RS:
                                RS:
                            };
};
```

<sup>1</sup>Published by Dutch mathematician T. Dekker 1965

### Analysis of Dekker Algorithm

| Requirement          | Valid | Reason |
|----------------------|-------|--------|
| Mutual Exclusion     |       |        |
| No. or speed of CPUs |       |        |
| Scheduling policy    |       |        |
| Progress             |       |        |
| Bounded Waiting      |       |        |
| Performance          |       |        |
| Scalability          |       |        |

Analyze carefully and compare your analysis with the one in your textbook

#### Approach 6: Peterson Algorithm

<u>Initialization:</u> flag[0]=flag[1]=false, and turn= 0

Willingness to enter CS specified by flag[i]=true

If both threads attempt to enter their CS simultaneously, the turn value decides which one will win

```
thread Ti:
repeat
flag[i]=true;
turn=j;
do {} while
(flag[j]and turn==j);
CS
flag[i]=false;
RS
forever
```

\*Stallings' notation slightly different from Tanenbaum's template

### "Proof" of Algorithm 3

#### To prove that mutual exclusion is preserved:

- T0 and T1 are both in their CS only if flag[0] = flag[1] = true and only if turn = i for each Ti (which is impossible by definition)
- To prove that the progress and bounded waiting requirements are satisfied:
  - Ti prevented from entering CS only if stuck in '*while()..*' with condition
  - 'flag[j] ' and 'turn = j '.
  - If Tj is not ready to enter CS then '! flag[ j] 'and Ti can enter its CS
  - If Tj has set 'flag[ j]' and is in its 'while()..', then either turn=i or turn=j
  - If *turn=i*, then Ti enters CS. If *turn=j* then Tj enters CS, but it will reset *flag[ j]* on exit: allowing Ti to enter CS
  - but if Tj has time to set *flag[j]*, it must also set *turn=1*
  - since Ti does not change value of *turn* while stuck in '*while()..'*,
     Ti will enter CS after at most one CS entry by Tj (bounded waiting)

### Analysis of Peterson's Algorithm

| Requirement          | Valid | Reason |
|----------------------|-------|--------|
| Mutual Exclusion     |       |        |
| No. or speed of CPUs |       |        |
| Scheduling policy    |       |        |
| Progress             |       |        |
| Bounded Waiting      |       |        |
| Performance          |       |        |
| Scalability          |       |        |

Analyze carefully and compare your analysis with the one in your textbook

## Problems with Faulty Threads

If all four necessary criteria (mutual exclusion, progress, portability, bounded waiting) are satisfied, a valid solution will provide robustness against bugs in the RS<sub>i</sub> of a KLT. Bugs within RS do not affect the other KLTs.

However, no valid solution can ever provide robustness, if a KLT fails within its critical section

A KLT failing within its CS might never perform exit\_section , i.e. no other KLT related to that CS can ever perform enter\_section successfully!

# Bakery Algorithm for n Threads

Before entering their CS, each  $T_{\rm i}$  receives a number. The holder of the smallest number enters its CS

If T<sub>i</sub> and T<sub>j</sub> receive the same number: if i < j then T<sub>i</sub> is served first, else T<sub>j</sub> is served first T<sub>i</sub> resets its number to 0 in its exit section

 $\begin{array}{ll} \underline{Notation:} & (a,b) < (c,d) \mbox{ if } a < c \mbox{ or if } a = c \mbox{ and } b < d \\ max(a_0,\ldots a_k) \mbox{ is a number } b \mbox{ such that: } b >= a_i \mbox{ for } i=0,\ldots k \end{array}$ 

<u>Shared data:</u> choosing: array[0..n-1] of boolean; initialized to false number: array[0..n-1] of integer; initialized to 0

Correctness relies on the following fact:

If T<sub>i</sub> is in CS and T<sub>k</sub> has already chosen its number[k] != 0, then (number[i],i) < (number[k],k)</pre>



## Summary: User-Level Approaches

- Activities trying to enter a locked critical section are busy waiting (⇒ wasting processor time)
- If the critical section CS has a long execution phase it is more efficient to block the activity
- On a single processor with static priority scheduling, busy waiting can always lead to starvation, i.e. to a life lock

# **HW-Support for CS**

Interrupt Locking Test And Set Instruction

### How to implement Atomic Operations?

### We need additional HW support:

- Disabling interrupts
  - Why can this prevent a thread switch?
  - For all systems?
- Atomic instructions
  - CPU and bus guarantee entire action will execute atomically
    - Test And Set (TAS instruction)
    - Compare And Swap



- It is a quite primitive mechanism, only valid for single processor systems
  - Portability requirement not fulfilled
  - Disabling interrupts for CPU0 does not prevent that on another CPU a conflicting CS is exectuted
  - However, for specific "very short CS" inside the kernel this approach can be a solution for ingle-processors
- Before an activity A<sub>i</sub> enters its CS, this A<sub>i</sub> disables all interrupts
  - Especially the time-slice interrupt, thus there is no possibility that a thread switch might be induced
  - Structure of mutual-exclusion protocol

```
Activity Ai:
while (true) {
   disable interrupt;
   CS;
   enable interrupt;
   RS;
};
```

### Disabling Interrupts at User Level?

#### Single processor:

Mutual exclusion preserved, but efficiency degraded: while in CS, no interrupt handling anymore

- No time-slicing anymore
- *Delay* of interrupt handling may affect the whole system
- Application programmers may abuse → system hangs

#### Multi processors: Not effective at all

activity Ai: while (true) { disable interrupt CS; enable interrupt; RS; };

<u>Summary:</u> Approach is unacceptable due to its side effects <u>Good news</u>: disable\_interrupts is privileged on CPUs, i.e. it can not be used at application level at all



- Problems with interrupt locking
  - CS must be very short,
    - Interrupts can not be delayed too long, otherwise interrupt signals might be lost
  - Suppose inside the CS, the KLT aborts, then the rest of the system is in a life lock, no interrupt can be handled anymore
  - Mechanism can be used only on single processors
- Summary:
  - Mechanism is not suitable for mutual exclusion and conditional synchronization at application level
  - However, it might be useful inside the kernel

### Disabling Interrupts at Kernel Level

Could help us in implementing atomic Spin Lock operations
 Acquire\_lock() & Release\_lock()

```
struct lock{
  int held = 0;
void Acquire lock(lock)
  Disable Interrupts;
  while (lock->held);
  lock->held=1;
  Enable Interrupts;
void Release lock(lock) {
  Disable Interrupts;
  lock->held=0
  Enable Interrupts;
```

Discuss this approach carefully

- Do we still have a race condition?
- Would you use this approach when you have to solve a critical section problem?

```
What are the major disadvantages?
```

Still Busy Waiting with Side Effects on System

# Severe Bug on Previous Slide

```
struct lock{
    int held = 0;
    }
void Acquire_lock(lock) {
    Disable_Interrupts;
    while (lock->held);
    lock->held=1;
    Enable_Interrupts;
    }
void Release_lock(lock){
    Disable_Interrupts;
    lock->held=0
    Enable_Interrupts;
  }
}
```

```
Improved Simple Spin Lock?
struct lock{
  int held = 0;
void acquire(lock) {
  Disable Interrupts;
  while (lock->held)
      Enable Interrupts;
                                -∃ possibility for a thread switch
      Disable Interrupts;
  } //systemno longer spinning with pending interrupts
  lock->held=1;
  Enable Interrupts;
void set free(lock) {
  Disable Interrupts;
  lock->held=0
  Enable Interrupts;
```

## More HW Support for a Spin-Lock

- If we could test and set the synchronization variable in one atomic instruction, we might have solved the spin-lock problem
- Some processors offer this atomic testandset instruction
- Two possible implementations of the TAS instruction:

```
TAS1:
boolean test_and_set( boolean *flag) {
  boolean old = *flag;
  *flag = true;
  return old;
}
TAS2:
boolean testset( int i) {
  if (i==0) {
    i=1; return true
  }
  else return false
  }
```

# TAS2 for Mutual Exclusion

```
const int t=100; // number of KLTs
int spinlock;
void T(inti) {
while(true) {
   while (!testset(spinlock)); //busy waiting
   CS;
   spinlock = 0;
   RS;
}
void main() {
  spinlock=0;// initially no one is in ist CS
  parbegin (T(1),T(2), ...,T(n));
}
```

# Spin Lock with TAS1

```
struct lock SMP{
  int held = false; /* initialization */
void Acquire SMP(lock SMP) {
  while (test and set(&lock SMP->held));
void Release SMP(lock SMP) {
  lock SMP->held=false
Do we still have a race condition at the variable lock_SMP_held?
Is this solution portable and efficient?
```

What happens when many KLTs try to acquire the same Spin Lock?

### Approach with Lock\_SMP

```
static lock SMP Spin = false;
thread Ti {
repeat
  Acquire SMP(Spin);
     CS
  Release SMP(Spin);
     RS
forever
}
```

## Analysis of TAS1 or TAS2

#### Advantages:

- Number of involved threads is not limited
- Quite simple approach and easy to understand
- You can use it also to control many critical regions CRs, as long as you provide a different spinlock variable per CR
- Disadvantages:
  - Busy waiting can always be inefficient
  - A KLT can starve in front of its critical section in case it has only low priority
  - Deadlocks and priority inversion can happen with nested CS

## Deeper Analysis of Lock\_SMP (1)

- Mutual exclusion is preserved
- However, if one T<sub>i</sub> is in its CS, all other T<sub>j</sub> –trying to enter their CS- perform busy waiting
- $\Rightarrow$  a potential efficiency problem

When  $T_i$  exits CS, selection of  $T_j$  that will enter its CS is arbitrary  $\Rightarrow$  no bounded waiting guaranteed

possible starvation of a T

However, what is the main problem with any of these "spin lock" solutions?

# Analysis of LOCK\_SMP (2)

Repeated test-and-set-instructions can monopolize the system bus affecting other activities (whether related to that critical section or whether not)

What about consequences concerning cache coherence?

Furthermore there is a severe danger of another sort of starvation on a single processor system (compare with busy waiting at application level)



<u>Result:</u>

Corollary:

"Ping-pong" effect between cache(CPU1) & cache(CPU2) wasting system-bus capacity Design & implement a better spin lock

## Other Atomic CPU Instructions

- (1) Some machines offer instructions that perform read-modify-write operations atomically (indivisible, same memory location):
  - inc [mem]
  - xchg [mem],reg
  - bts [mem]

{bit test and set}

(2) Some machines offer conditional LD/ST instructions instead:

| • | LDL | [mem] | processor becomes sensitive<br>for memory address <i>mem</i>                   |
|---|-----|-------|--------------------------------------------------------------------------------|
| • | STC | [mem] | fails if another processor executed STC<br>on the same address in the meantime |

- instructions like (1) execute mutually exclusive on multiple CPUs
- like (2) allow emulating mutually exclusive instructions



Can you use this Swap instruction to enable a suitable & portable CS protocol? (see assignments)

# **Kernel-Level Solutions**

Low-Level High-Level

### Mutex at Kernel-Level

- Instead of implementing Acquire\_lock() with busy waiting we can use our Kernel API, i.e. BLOCK(),
   UNBLOCK()
- If lock is currently held by another thread, the thread having called Acquire\_mutex () will be blocked
  - Put the thread to sleep until it can acquire the lock
  - Free the CPU for other KLTs to run
- However, we must also change Release mutex (), because in the meantime some thread could have been blocked waiting for the mutex
- Each mutex has an associated wait queue (similar to a semaphore)
- Design and implement a kernel object mutex with atomic Acquire\_mutex() and Release\_mutex() operations at least for a single-processor system

### First Approach: A Simple Lock\*

- A lock is an object (in main memory) providing the following two operations:
  - Acquire\_lock(): before entering a CS
    - If lock is held, KLT must wait in front of CS
  - Release lock(): after leaving a CS
    - Allows another KLT to enter the CS

### First Approach: Simple Lock

- After an Acquire lock() there must follow a Release lock()
  - Between Acquire\_lock() & Release\_lock(), a KLT is holding the lock (=current lock holder)
  - Acquire()\_lock with blocking waiting only returns when the caller is the current lock holder
- 1. What might happen if Acquire\_lock() and Release\_lock() calls are not paired?
- 2. What happens when the current lock holder tries to acquire the same lock once more?

## Using the Simple Lock

```
int withdraw(account, amount) {
 Acquire lock(lock1);
 balance = get_balance(account);
                                    CS
 balance -= amount;
 put balance(account, balance);
 Release lock(lock1);
 return balance;
}
```

#### **Mutual Exclusion**



What happens when thread 2 tries to acquire the lock?



Observation: There is a severe bug *Where?* 

# Implementing a Spin Lock

Problem:

Internals of both operations have critical sections

- Acquire\_lock() and Release\_lock() must be atomic
- The all or nothing principle (see: transactions)

Now, we face a really hard *dilemma*\*:

We've introduced spin locks to provide a mutual exclusive protocol to solve critical section problems, but our solution contains yet another critical section problem

What to do? Who can help us poor system architects?

Help comes from the processor architect He helps us to end the recursion

\*Baron Münchhausen Syndrome



- Waiting in front of an adaptive lock is done
  - either via spinning
  - or via blocking
- Criteria for spinning
  - When lock holder is currently running
  - When trying to acquire the lock a timer is started that blocks the caller after n time units
- Criteria for blocking
  - When lock holder is ready or waiting



- How to enhance a simple lock to be able to note that the current lock holder wants to acquire the lock again?
- What to do in the release function?

# **Counting Semaphore**

## Counting Semaphore for CS

- 1. Positive value of counter  $\rightarrow$  #threads that can enter the "CS" concurrently
  - If mutual exclusion is required, initialize the semaphore counter with 1
- 2. Negative value of counter  $\rightarrow$  #waiting threads in front of CS, i.e. queued at semaphore object
- 3. Counter  $== 0 \rightarrow$  no thread is waiting and maximal #threads are currently in CS

Still an open problem:

*How to establish atomic semaphore operations?* 

## Implement Counting Semaphores

```
module semaphore {
 export p, v
 import BLOCK, UNBLOCK
 type semaphore = record{
    Count: integer = 1
                                 {CS not yet locked}
    QWT: list of Threads = empty {no waiting threads}
   }
 p(S:semaphore) {
    S.Count = S.Count - 1
    if (S.Count < 0) {
     insert (S.QWT, myself) {+ 1 waiting thread}
     BLOCK(myself)
    }
 v(S:semaphore) {
    S.Count = S.Count + 1
                                   {unlock CS }
    if (S.Count \leq 0) {
     UNBLOCK(take any of S.QWT)
                                  {weak semaphore}
    } }
```

### **Atomic Semaphore Operations**

#### Problem:

p() and v() -each consisting of multiple machine instructionshave to be atomic!

#### Solution:

Use "another" type of critical sections, hopefully with shorter execution times, establishing atomic and exclusive semaphore operations

### "very short" enter\_section



"very short" exit\_section

# Revisiting Dijkstra's Semaphores

Short CS implementing atomic, exclusive p(S) and v(S)

Possible solutions for short CS around p(S), v(S):

Single processor:

- Disable interrupts as long as p() or v() running
- Contradiction to our recommendation not to manipulate interrupts at application level?

Multi processor:

Use special instructions (e.g. TAS)

# Single Processor Solution

```
v(sema S)
p(sema S)
                                begin
begin
DisableInterupt
                                DisableInterrupt
                                  s.count++
   s.count--
                                  if (s.count \leq 0) {
   if (s.count < 0) {
                                     remove T(s.QWT)
     insert T(s.QWT)
     BLOCK′ (S) ← - - - - -
                                    UNBLOCK' (S)
                                   }
   else
                                EnableInterrupt
 EnableInterrupt
                               end
 end
```

What happens, if switching to another thread? Interrupts still disabled?

## Multiprocessor Solution

```
V(semaS)
P(sema S)
                             begin
begin
                             while (TAS(S.flag)==1){};
 while (TAS(S.flag)==1){};
                                { busy waiting }
   { busy waiting }
                                S.Count= S.Count+1
   S.Count= S.Count-1
                                if S.Count \leq 0 {
   if (S.Count < 0) {
                                   remove T(S.QWT)
      insert T(S.QWT)
                                   UNBLOCK (S)
      BLOCK(S)
                                }
      {inkl.S.flag=0)!!!}
                              S.flag =0
                             end
   else S.flag =0
end
```

# Weak Counting Semaphores

```
p(S:semaphore)
  S.Count = S.Count - 1;
  if S.Count < 0 {
   insert(S.QWT, myself); {i.e. somewhere}
   BLOCK(myself);
   }
  fi.
v(S:semaphore)
  S.Count = S.Count + 1;
  if (S.Count \leq 0) {
   thread = take any of S.QWT; {no order}
   UNBLOCK (thread);
  fi
```



### Application of Counting Semaphores

Suppose: *n* concurrent threads

Initialize S.Count to  $1 \Rightarrow$  only 1 thread allowed to enter its CS (i.e. *mutual exclusion*)

Initialize S.Count to  $k>1 \implies k>1$ threads allowed to enter their CS *When to use this semantics?*  thread Ti: repeat **p(S)**; CS **v**(S); RS forever

# Producer/Consumer

A semaphore S to perform mutual exclusion on the buffer: Only one thread at a time should access the buffer

A semaphore N to synchronize producer and consumer on the number N (= in - out) of items in the buffer: An item can be consumed only after it has been created

Producer is free to add an item into the buffer at any time, but it has to do P(S) before appending and V(S) afterwards to prevent concurrent accesses by the consumer

It also performs V(N) after each append to increment N Consumer must first do P(N) to see if there is an item to consume, then it uses P(S) and V(S) while accessing the buffer

## Producer/Consumer (∞ Buffer)

Initialization: S.count:=1; N.count:=0; in:=out:=0; Auxiliary functions: append(v) { b[in]:=v; in++;}

take() {
w:=b[out];
out++;
return w;}

Producer: Consumer: repeat repeat produce v; p(N); p(S); p(S); append(v); w:=take(); v(S); v(S); v(N); consume(w); forever forever

## *Q: Semaphore Solutions*\*

- Why do we need mutual exclusion at the buffer?
- Why does the producer **v(N)** ?
- Why is the order of the p() in the consumer important?
- Is order of the v() in the producer important?
- Is this solution extensible to p>1 producers and/or c>1 consumers?

\*Be prepared for similar questions in assignments and exams

# Summary on Semaphores

Semaphores provide a primitive coordination tool

- for enforcing mutual exclusion and/or
- for synchronizing threads

p(S) and v(S) are scattered among several threads. Hence, it's difficult to understand all their effects

Usage must be correct in all threads

One buggy (malicious) thread can crash an entire application or sub system

Recommendation



personal recommendation, Jochen Liedtke

What to use instead of?

∃ better synchronizations tools?

**Monitors** 

# "Software" Monitors



Java language offers monitors You should be familiar with them

# Monitor (1)

- High-level "language construct"
- ~ semantics of binary semaphore, but easier to control
- Offered in some programming languages
  - Concurrent Pascal
  - Modula-3
  - Java
  - ••••
- Can be implemented using semaphores or other synchronization mechanisms



### A software module<sup>\*</sup> consisting of:

- one or more interface procedures
- an initialization sequence
- local data variables

Characteristics:

- Iocal variables accessible only inside monitor methods
- thread enters monitor by invoking a monitor method
- only one thread can run inside a monitor at any time, i.e. a monitor can be used to implement mutual exclusion

<sup>\*</sup>Java's synchronized classes enable monitor-objects



Monitor already ensures mutual exclusion  $\Rightarrow$  no need to program this constraint explicitly

Hence, shared data are protected automatically by placing them inside a monitor. Monitor locks its data whenever a thread enters

Additional thread synchronization inside the monitor can be done by the programmer using condition variables

A condition variable represents a certain condition (e.g. an event) that has to be met before a thread may continue to execute one of the monitor procedures

## Approach for a Monitor Solution\*

Cyclic buffer of N slots with interface operations **fetch()** and **deposit()** 



\*Detailed example for the development of a solution *"step by step"* 

```
Monitors
```



end monitor modul

Concurrent deposits or fetches are serialized, but you can still deposit to a full buffer and you can still try to fetch from an empty buffer!  $\Rightarrow$  two additional constraints have to be considered.

#### Monitors

```
monitor module bounded buffer
                                                                           n
 export fetch, deposit;
 import BLOCK, UNBLOCK;
                                                                      free
buffer object = record
   array buffer[1..n] of datatype
  head: integer = 1
                                     head
                                                              tail
  tail: integer = 1
                                     ⊢fetch
                                                               - deposit
   count: integer = 0
   SWT D,SWT F of threads = empty {2 waiting queues due to deposit, fetch}
 end
 procedure deposit(b:buffer object, d:datatype)
begin
  while (b.count == n) (do BLOCK(b.SWT D)) Also blocks the monitor
  b.buffer[b.tail] = d
  b.tail = b.tail \mod n + 1
  b.count = b.count + 1
   if (b.SWT F \neq empty) UNBLOCK(b.SWT F)
 end
procedure fetch(b:buffer object, result:datatype)
begin
   while (b.count == 0) (do BLOCK(b.SWT F)) Also blocks the monitor
  result = b.buffer[b.head]
  b.head = b.head mod n + 1
  b.count = b.count - 1
   if (b.SWT D \neq empty) UNBLOCK(b.SWT D)
 end
end monitor modul
```

No longer deposits to a full buffer or fetches from an empty buffer, but ...???

# Condition Variables

Local to the monitor (accessible only inside the monitor) can be accessed only by:

CondWait(cv)blocks execution of the calling thread on<br/>condition variable cv

This blocked thread can resume its execution only

if another thread executes CondSignal(cv)

**CondSignal(cv)** resumes execution of some thread

blocked on this condition variable cv

If there are several such threads: choose any one

If no such thread exists: void, i.e. nothing to do

### Monitors



Waiting threads are either in the entrance queue or in a condition queue

A thread puts itself into the condition queue cn by invoking CondWait(cn)

CondSignal(cn) enables one thread, waiting at condition queue cn, to continue

Hence CondSignal(cn) blocks the calling thread and puts it into the urgent queue (unless Condsignal is the last operation of the monitor procedure)





## Summary: Producer/Consumer

Two types of threads:

- Producer(s)
- Consumer(s)

Synchronization is now confined to the monitor

deposit(...) and fetch(...) are
monitor interface methods

If these 2 methods are correct, synchronization will be correct for all participating threads. **ProducerI:** repeat produce v; deposit(v); forever ConsumerI: repeat fetch(v); consume v; forever

# **Reader/Writer with Monitor**

Using monitors you can also solve reader/writer problems with either reader or writer preference,

Compare your solution with the one using semaphores in one of the text books

## Remarks and Open Questions

- Which of the 2 threads T and T' should continue when T executes CondSignal(cv) while T' was waiting due to a previous CondWait(cv)?
- A monitor must stay closed if some externally initiated event occurs, e.g. end of time slice (Otherwise no mutual exclusion anymore)
- However, what to do when a monitor method of monitor M invokes a method of monitor M'?