Lecture #3: Theoretical Foundations -- Clocks in a Distributed Environment

Topics for today

These topics are from Chapter 5-5.4 in Advanced Concepts in OS.

Distributed systems

Inherent limitations of a distributed system

Absence of a global clock

Distributed processes cannot rely on having an accurate view of global state, due to transmission delays.

Effectively, we cannot talk meaningfully about global state.

The traditional notions of "time" and "state" do not work in distributed systems. We need to develop some concepts that are corresponding to "time" and "state" in a uniprocessor system.

Lamport's logical clocks

Lamport's ``happened before'' relation

The ``happened before'' relation (®) is defined as follows:

Event A causally affects event B iff A ® B.

Distinct events A and B are concurrent (A | | B) if we do not have A ® B or B ® A.

Lamport Logical Clocks

Logical Clock Conditions

Ci is the local clock for process Pi

Logical Clock Conditions

The value of d could be 1, or it could be an approximation to the elapsed real time. For example, we could take d1 to be the elapsed local time, and d2 to be the estimated message transmission time. The latter solves the problem of waiting forever for a virtual time instant to pass.

Total Ordering

We can extend the partial ordering of the happened-before relation to a total ordering on ervents, by using the logical clocks and resolving any ties by an arbitrary rule based on the processor/process ID.

If a is an event in Pi and b is in Pj, aÞ b iff

where < is an arbitrary total ordering of the processes

How useful is this? How close to real time?

Example of Lamport Logical Clocks

C(a) < C(b) does not imply a ® b

That is, the ordering we get from Lamport's clocks is not enough to guarantee that if two events precede one another in the ordering relation they are also causally related. The following Vector Clock scheme is intended to improve on this.

Vector Clocks

Vector Clock Algorithm

Vector Clock Ordering Relation

The relation £ defined above is a partial ordering.

Vector Clocks

This is not a total ordering, but it is sufficient to guarantee a causal relationship, i.e.,

a ® b iff ta < tb

How scalable is this?

Figure 5.5 in the book.

Non-causal Ordering of Messages

Message delivery is said to be causal if the order in which messages are received is consistent with the order in which they are sent. That is, if Send(M1) ® Send (M2) then for every recipient of both messages, M1 is received before M2.

Enforcing Causal Ordering of Messages

Basic idea: Buffer each message until the message that immediately precedes it is delivered.

The text describes two protocols for implementing this idea:

Note: These methods serialize the actions of the system. That makes the behavior more predictable, but also may mean loss of performance, due to idle time. That, plus scaling problems, means these algorithms are not likely to be of much use for high-performance computing.

Birman-Shiper-Stephenson Causal Message Ordering

  1. Before Pi broadcasts m, it increments VTPi[i] and timestamps m. Thus VTPi[i]-1 is the number of messages from Pi preceding m.
  2. When Pj (j ¹ i) receives message m with timestamp VTm from Pi, delivery is delayed locally until both of the following are satisfied:
    1. VTPj[i] = VTm[i] - 1
    2. VTPj[k] ³ VTm[k] for all k ¹ i
      Delayed messages are queued at each process, sorted by their vector timestamps, with concurrent messages ordered by time of receipt.
  3. When m is delivered to Pj, VTPj is updated as usual for vector clocks.

Schiper-Eggli-Sandoz Protocol

Generalizes the above, so that messages do not need to be broadcast, but are just sent between pairs of processes, and the communication channels do not need to be FIFO.

How would you implement and test the above algorithms?