# µ-Kernel Construction (9)

## Local IPC Optimization for Multi-Threaded Applications



Thread A Monitor Thread B





## Load Distribution via IPC

Server Client A Client B Distributor W<sub>1</sub> W<sub>2</sub>





# Load Distribution via IPC







- Would achieve required speed
- But ...
  - Not known to the kernel
  - Execute in a single thread's context
  - Making them kernel-schedulable does not pay
  - Two concepts inelegant, contradicts minimality
- We want ...
  - Kernel-level threads
  - The speed of user-level threads



- Assume IPC  $t_1^{} \rightarrow t_2^{}$  , same address space
- Let t<sub>1</sub> execute t<sub>2</sub>-code
- Postpone real switch **until the kernel is activated**
- Pays if multiple lazy switches occur before first kernel activation, e.g.:
  - $t_1 \rightarrow t_2$  , work,  $t_2 \rightarrow t_1$ 
    - Costs 0 kernel-level IPC
  - client  $\rightarrow$  t<sub>1</sub>  $\rightarrow$  t<sub>2</sub>  $\rightarrow$  client
    - Costs 2 kernel-level IPCs



 $t_2$ 

t₁



## Strict Switching

























 $A \rightarrow B$ : SendAndWaitForReply in user-mode call IPC function, i.e. push A's instruction pointer if B is valid thread id **and** thread B waits for thread A then

> save A's stack pointer set A's status to "wait for B" set B's status to "run" load B's stack pointer current thread := B return, i.e. pop B's instruction pointer

Atomicity? Kernel Data?

#### else

more complicated IPC handling endif

Atomicity

## $A \rightarrow B$ : SendAndWaitForReply in user-mode call IPC function, i.e. push A's instruction pointer save A's stack pointer

- restart point -

if B is valid thread id **and** thread B waits for thread A **then** 

#### - forward point -

set A's status to "wait for B"
set B's status to "run"
load B's stack pointer
current thread := B
- completion point return, i.e. pop B's instruction pointer

#### else

more complicated IPC handling

#### endif



Interruption between forward point and completion point:

**if** is page fault **then** kill thread A

#### else

set A's status to "wait for B"
set B's status to "run"
load B's stack pointer
current thread := B
set interrupted instruction pointer to completion point
endif



A's TCB: stack pointer status

B's TCB: stack pointer status

current thread

Stack pointer

Can be user accessible

Status

- User-level effects
  - Local to A's task can be ignored
  - Indirect effects on other tasks can be ignored
- System-level effects
  - Must be avoided
  - Validate values or
  - Maintain twin variable in kernel









# Current\_thread Inconsistency

if CurrentUTCB is valid UTCB then

NewKTCB := CurrentUTCB.ktcb

if NewKTCB is valid KTCB and NewKTCB.space = CurrentKTCB.space and NewKTCB.utcb = CurrentUTCB

#### then

update kernel state

CurrentKTCB := NewKTCB

## return

## endif

### endif

kill thread(CurrentKTCB)











## IPC Performance – Prototype

## LIPC: 23 cycles

1/15<sup>th</sup> of regular IPC (no sysops, no fastpath)

## Overhead on IPC due to LIPC extensions

- 43 cycles intra-AS IPC
- 146 cycles inter-AS IPC
  - UTCB synchronization

Too much for real-world systems: P3 inter-AS IPC was only 180 cycles w/o LIPC support!

Overhead due to kernel fixup
 ???



- Register-only IPC, no map/grant/string
- Always send and receive phase
- Infinite receive timeout

# Tricky

Change from Wait\_for\_X to Wait\_for\_Any