Construction of Microkernels

Lecture of Prof. J. Liedtke, summer term 2000

Summary by Felix Hupfeld

1 Outline

Some features of L4:

2 Threads, Thread Switch

A thread has:


Operations for tcbs:

Thread switch:

Enter kernel (by int, exception, interrupt)

Kernel Stack

From hardware perspective there are two stacks in the system: The user-mode stack and the kernel-stack. There are two different modells for execution in the kernel. One uses one global kernel stack for every entry in the kernel mode (system call, interrupt, ...), the other switches to a dedicated stack per thread.

Kernel stack

Per Thread Stacks are feasible with a microkernel because their potential stack size is tightly bounded (no deep nestings of calls in the kernel). Otherwise stack size would not be bounded and a potential big stack per thread would be needed.

See: Draves, Bershad, Rashid et al.: Using Continuations to implement Thread management and Communication in Operation Systems

In L4 implementation, the kernel stack per thread is on bottom of the aligned tcb (find current tcb by getting alignment of current esp0).

Operations on Kernel entry

Kernel entry/exit on x86:

Resulting Stack Layout:

X permits to differentiate between stack layouts:

Thread Switch (x86)

MIPS R4600

3 TCBs, AS layout, booting

Basic TCB data structure (no AS)

additional fields:

Size 0.5 - 1kb

ThreadID -> TCB

Layout of initial AS

Current Limitations:



Loading of L4, sigma0, L4 init, OS modules in the physical memory. L4 init is run and destroyed, sigma0 constructs the initial AS, the pagers of the OS construct further ASes:.

4 Clans & Chiefs, IPC API

Clans & Chiefs

A clan consists of a number of tasks of which one is designated as being the chief. Intra-Clan IPC are direct IPC by the microkernel. IPC to other clans (Inter-Clan IPCs) are redirected to the next chief of the clan. He may forward the message to the next chief of the other clan, which forwards the message to the receiver. The chief acts as being the sender of the message and thus deceives the receiver of the message. All redirections are direction-preserving: Chiefs may only pretend that a message comes from outside for messages into the clan and that they come from inside for message ids inside the clan.

This results in some trust relationships. Clan members can here always trust their chief and so on....

Applications of this concept:


Messages must be copied, otherwise an attack called "ToC ToU" is possible: After checking the message (e.g. by the chief), the message can changed by the sender as it still located in the user space.

The Clans&Chiefs model is outdated, the new model uses a mechanism called IPC redirection.

Thread IDs

Varying number of bits for



Message Types


Parameters are in registers, wherever possible.

Message buffer is a memory region which contains some parameters and space for the direct string buffer and pointers and space for additional indirect string buffers.



L4 Version 2, X.0

L4 Version X.1, 4

6 IPC Implementation: Short IPC, Long IPC

Critical Path

Short IPC

Long IPC

7 Small Address Spaces

Context-switch costs (untagged TLB)

enter/exit kernel
20...200 cycles
switch thread
10 cycles
switch AS


flush TLB

50 cycles

refill TLB

9..96 refills
15...40 cycles/refill
100-4000 cycles

Small AS on x86 - emulating a tagged TLB

Include several L4 AS in one Hardware AS, separation via segmentation. All Hardware AS (HwAS) share the same Small ASes in a dedicated region.

source offset
destination offset
large -> large

temp mapping
switch HwAS

kernel com area
large -> small
direct (no HwAS switch)
dest AS base
large -> same
direct (no HwAS switch)
small -> same
source AS base
source AS base
small -> small
source AS base
dest AS base
small -> current large
source AS base
small -> non-current large
temp mapping, switch HwAS or
switch HwAS, direct
source AS base
kernel com area

TCB additions:

Potential problem during small -> large IPC: IPC is interrupted, HwAS changes, IPC is continued, then IPC is resumed in wrong HwAS. So IPC small AS must be marked to be in the partner space of the IPC so that the kernel switches back.

8 IPC Abort

During long IPC, sender is locked running, receiver is locked waiting, the pagers are waiting. Nested IPC (IPC during IPC) is needed on pagefault, which requires a thread-ipc stack. The stack size is bound, only one short IPC (which is atomic) to pager within normal ipc.

If pagefault IPC fails, the ipc-state is poped, ie. the pagefault IPC is canceled. Then the real IPC is aborted, ie. A and B get a cc of abort.

Abort is for threads currently sending/receiving a message (transfer has started), cancel is for threads waiting to receive/send a message (transfer not yet started).

Pagefault IPC can be cancelled because it's a short (atomic) IPC.

9 Dispatching, Timeouts

Thread Switch

Dispatcher/Idle thread

tcb[A].sp = SP;                 // save stack pointer (SP) of A
SP = dispatcher thread bottom  	// switch to (dispatcher) from A

B = select next ready thread
if B == nil
	idle                        // on next scheduling event, switch
                                // from here to first line of this code

SP := tcb[A].sp;                // discard stack, restore A's SP

if B != A
	switch from A to B 			// see section 1

Priorities, Preemption

Lazy dispatching

Timeouts & Wakeups

Data structure operations:

Unsorted/sorted list, sorted tree are to expensive.

Solution: Separation into wakeup classes (aka. calendar queue). The list of wakeups is divided into three intervals (t = tick intervall)

Insert puts the entry (a TCB) into the corresponding list, depending on the interval the entry's point in time is in. Find next has to search only the soon list for pending timeouts. Every s time steps, parts of the late list (those below s*t milliseconds to raisal) have to be sorted into the soon list ("correction phase"). Every ll time steps, those entries of the late late list, whose point in time is below ll*t from now have to be sorted into the late list.

10 Mapping, Implementation

grant and map induce a tree-like data structure, which contains the physical frames as root node, address spaces as nodes which are connected due to grants/map relations for the address spaces. There is a tree for every physical frame/tile, called tile-map tree.

Needed operations on this datastructure:

Tree Parsing:

Node structure:

Pointers between PTE and mapnode


To change a link you have to obtain the up and the down pointers between the linked nodes.

Store parent/child pointer XORed in one node entry. Depending on the direction of traversal, one of the entries is known from the traversal predecessor. The other can be calculated this way.

See: LN Reference Manual V2.2, 1.1 Implementing the Model

A.1 x86 page tables

2000, 2001 Felix Hupfeld - last change: 31.05.2001 15:05