# Topics for today

• Some inherent limitations of a distributed system and their implication.
• Lamport logical clocks
• Vector clocks

These topics are from Chapter 5-5.4 in Advanced Concepts in OS.

# Distributed systems

• A collection of computers that do not share a common clock and a common memory
• Processes in a distributed system exchange information over the communication channel, the message delay is unpredictable.

# Inherent limitations of a distributed system

## Distributed processes cannot rely on having an accurate view of global state, due to transmission delays.

Effectively, we cannot talk meaningfully about global state.

# Lamport's logical clocks

• the "time" concept in distributed systems -- used to order events in a distributed system.
• assumption:
• the execution of a process is characterized by a sequence of events. An event can be the execution of one instruction or of one procedure.
• sending a message is one event, receiving a message is one event.
• The events in a distributed system are not total chaos. Under some conditions, it is possible to ascertain the order of the events. Lamport's logical clocks try to catch this.

## Lamport's ``happened before'' relation

The ``happened before'' relation (®) is defined as follows:

• A ® B if A and B are within the same process (same sequential thread of control) and A occurred before B.
• A ® B if A is the event of sending a message M in one process and B is the event of receiving M by another process
• if A ® B and B ® C then A ® C

Event A causally affects event B iff A ® B.

Distinct events A and B are concurrent (A | | B) if we do not have A ® B or B ® A.

# Lamport Logical Clocks

• are local to each process (processor?)
• do not measure real time
• only measure ``events''
• are consistent with the happened-before relation
• are useful for totally ordering transactions, by using logical clock values as timestamps

# Logical Clock Conditions

Ci is the local clock for process Pi
• if a and b are two successive events in Pi, then
Ci(b) = Ci(a) + d1, where d1 > 0
• if a is the sending of message m by Pi, then m is assigned timestamp tm = Ci(a)
• if b is the receipt of m by Pj, then
Cj(b) = max{Cj(b), tm + d2}, where d2 > 0

# Logical Clock Conditions

The value of d could be 1, or it could be an approximation to the elapsed real time. For example, we could take d1 to be the elapsed local time, and d2 to be the estimated message transmission time. The latter solves the problem of waiting forever for a virtual time instant to pass.

# Total Ordering

We can extend the partial ordering of the happened-before relation to a total ordering on ervents, by using the logical clocks and resolving any ties by an arbitrary rule based on the processor/process ID.

If a is an event in Pi and b is in Pj, aÞ b iff

• Ci(a)< Cj(b)    or
• Ci(a)=Cj(b)     and    Pi < Pj

where < is an arbitrary total ordering of the processes

How useful is this? How close to real time?

# Example of Lamport Logical Clocks

C(a) < C(b) does not imply a ® b

That is, the ordering we get from Lamport's clocks is not enough to guarantee that if two events precede one another in the ordering relation they are also causally related. The following Vector Clock scheme is intended to improve on this.

# Vector Clocks

• Clock values are vectors
• Vector length is n, the number of processes
• Ci[i](a) = local time of Pi at event a
• Ci[j](a) = time Cj[j](b) of last event b at Pj that is known to happen before local event a

# Vector Clock Algorithm

• if a and b are successive events in Pi, then Ci[i](b) = Ci[i](a) + d1
• if a is sending of m by Pi with vector timestamp tm
b is receipt of m by Pj then
Cj[k](b) = max{Cj[k](b), tm[k]}

# Vector Clock Ordering Relation

• t = t¢Û"i t[i] = t¢[i]
• t ¹ t¢Û\$i t[i] ¹ t¢[i]
• t £ t¢Û"i t[i] £ t¢[i]
• t < t¢Û(t £ t¢and t ¹ t¢)
• t | | t¢Ûnot (t < t¢or t¢ < t)

The relation £ defined above is a partial ordering.

# Vector Clocks

• a ® b if ta < tb
• b ® a if tb < ta
• otherwise a and b are concurrent

This is not a total ordering, but it is sufficient to guarantee a causal relationship, i.e.,

a ® b iff ta < tb

How scalable is this?

Figure 5.5 in the book.

# Non-causal Ordering of Messages

Message delivery is said to be causal if the order in which messages are received is consistent with the order in which they are sent. That is, if Send(M1) ® Send (M2) then for every recipient of both messages, M1 is received before M2.

# Enforcing Causal Ordering of Messages

Basic idea: Buffer each message until the message that immediately precedes it is delivered.

The text describes two protocols for implementing this idea:

• Birman-Shiper-Stephenson: uses all broadcast messages
• Shiper-Eggli-Sandoz: does not have this restriction

Note: These methods serialize the actions of the system. That makes the behavior more predictable, but also may mean loss of performance, due to idle time. That, plus scaling problems, means these algorithms are not likely to be of much use for high-performance computing.

## Birman-Shiper-Stephenson Causal Message Ordering

1. Before Pi broadcasts m, it increments VTPi[i] and timestamps m. Thus VTPi[i]-1 is the number of messages from Pi preceding m.
2. When Pj (j ¹ i) receives message m with timestamp VTm from Pi, delivery is delayed locally until both of the following are satisfied:
1. VTPj[i] = VTm[i] - 1
2. VTPj[k] ³ VTm[k] for all k ¹ i
Delayed messages are queued at each process, sorted by their vector timestamps, with concurrent messages ordered by time of receipt.
3. When m is delivered to Pj, VTPj is updated as usual for vector clocks.

## Schiper-Eggli-Sandoz Protocol

Generalizes the above, so that messages do not need to be broadcast, but are just sent between pairs of processes, and the communication channels do not need to be FIFO.

How would you implement and test the above algorithms?