Lecture #4: Causal ordering of message and global state

These topics are from Chapter 5.5-5.10 in Advanced Concepts in OS.

Topics for today


Birman-Schiper-Stephenson Protocol for the causal ordering of messages.

Global State Problem

How to collect or record a coherent (consistent) snapshot of the state of an entire distributed system?

One application of this problem is in implementing a breakpoint for debugging a distributed application. That is, suppose we want to suspend execution of all the processes in a way that we can examine what each of them is doing, and later resume them.

Banking Example

For consistency, we need to take into account the messages that are in transit.

Let n be the number of messages sent by A along a channel before A's state is recorded, n' be the number of messages sent by A along the channel before the channel's state is recorded, a consistent global state requires n = n'

In the global state we want to view as in-transit all the messages sent along a channel before the sender's state was recorded that were not yet received when the receivers state was recorded.

A global state includes snapshots of the states of all the channels along with the states of all the sites. Since the channels are passive, the snapshots of the channels must be computed by the sites to which they are connected.

Global State: Notation

For a site Si, its local state, LSi, at a given time is defined by the local context of the application.

send(mi,j) is the event of Si sending message mi,j to Sj

rec(mi,j) is the event of Sj receiving message mi,j from Si

time(x) is the time at which state x was recorded

time(send(m)) is the time at which message m was sent

send(mi,j) LSi iff time(send(mi,j)) < time(LSi)

rec(mi,j) LSi iff time(rec(mi,j)) < time(LSj)

GS = { LS1, LS2, LSn}

Global State: Definitions

transit(LSi, LSj) = {mi,j | send(mi,j) LSi and rec(mi,j) LSj}

inconsistent(LSi, LSj) = {mi,j | send(mi,j) LSi and rec(mi,j) LSj}

GS is consistent iff it is not inconsistent.
GS is strongly consistent if it consistent and transitless.

In a consistent global state, causes are recorded if the corresponding effects are recorded.

In a strongly consistent global state, causes are recorded iff the corresonding effects are recorded.


{LS12, LS23, LS33} is consistent.

{LS11, LS22, LS32} is inconsistent.

{LS11, LS21, LS31} is strongly consistent.

Chandy-Lamport GS Recording Algorithm

Uses a marker message to initiate taking the snapshot, and to separate messages within each channel.

Chandy-Lamport GS Recording Alg: Sending Rule

Chandy-Lamport Receiving Rule

When a marker is received on channel C:

Chandy-Lamport Example

Suppose site S0 sends markers to sites $ S1and S_2$, and site S2, with account B, receives the marker first, checkpointing the valuer of B in a local snapshot. The request message "[B+=$50]" arrives later, before the marker on channel C1, and so is recorded as part of the state of that channel.

How does this algorithm get the data back to the process that requested the snapshot? How does the algorithm terminate?

Usefulness of Recorded Global State

Termination Detection

System model:

Huang's Termination Algorithm

This algorithm views termination as a flow analysis problem.


Proof of correctness

Things for you to do