Lecture 16: 3-Phase Commit

These topics are from Chapter 13 (Fault Tolerance) in Advanced Concepts in OS.

Topics for Today

2-Phase Commit Blocking Problem

If a site fails, other sites may block until it recovers and completes its role in the protocol.

For example, if the coordinator fails in state w1, after sending COMMIT_REQUEST, the sites will be stuck waiting for the coordinator to follow up with an abort or commit message until the coordinator recovers.

What happens in other failure cases? For example, suppose a cohort fails in state wi?

Compare the impact in terms of locking effects if one cohort fails versus if the coordinator fails.

Nonblocking Commit Protocols Needed

How to get there?

Analysis: What Causes Blocking in the 2-Phase Commit?

How do we achieve reliable point-to-point communication?

How do we detect failure of a site?


Concurrency sets are an abstraction of what one site knows about the possible states of other sites.

Concurrency Set Examples: 3 sites

Suppose site 1 initiates the commit protocol, and sites 2 and 3 respond.

C(q1) = { q2, q3 }
Click here for diagram
C(q2) = { q1, w1, q3, a3, w3 }
Click here for diagram

Note that we cannot have a1 in C(q2), since site 1 must wait for responses from all of the other sites before it makes the transition from state w1.

C(w1) = { q2, a2, w2, q3, a3, w3 }
Click here for diagram
C(w2) = { w1, a1, c1, q3, a3, w3 }
Click here for diagram

Note that C(w2) contains both an abort state and a commit state for site 3. This means that it is unsafe at this point for site 2 to take any independent recovery action, because site 3 might choose a different action. For this reason, site 2 must block until it receives a message from the coordinator.

C(w2) = { w1, a1, c1, q3, a3, w3 }
Click here for diagram
C(c1) = { w2, c2, w3, c3 }
Click here for diagram
C(f1) = { b2, c2, b3, c3 }
Click here for diagram

Match these up with the state diagrams above, and see why the sets contain the elements they do.

Conditions that Cause Blocking

If C(si) contains both commit and abort states, then site i cannot decide to abort the transaction, since some other site may be in a commit state.

It cannot commit, either, since some other site may be in the abort state.

Therefore, site i must block.

If a protocol contains a local state of a site with both abort and commit states in its concurrency set, then under independent recovery conditions it is not resilient to an arbitrary single failure.

Simplified FSM Model of the 2-Phase Commit Protocol

The state diagrams in the text are a further abstraction from the ones shown above, with fewer states. The relationship is shown in the picture below.

The state f1 and the transitions to it are eliminated, states a1 and c1 are made into final states, and the states ai and bi are merged.

Conceptually, the elimination of state f1 amounts to modifying the protocol so that the coordinator does not block to wait for ACK messages.

One can then argue that merging states ai and bi is an allowable further simplification, since the only effect of the transition from ai to bi is to send the ACK that is now ignored.

The simplified diagram is no longer a complete description of a fault-tolerant protocol. Without the ACK messages from everyone, the Coordinator does not know that the Cohorts have caught up, and so cannot safely go on with its next computation.

However, the simplified diagram does make a clearer separation between abort states and commit states, which is the main focus of our interest.

Therefore, we will follow the textbook by using the simplified diagram in the analysis of whether the protocol permits independent recovery from failures below. Alternate diagrams are provided at some points, via links.

Simplified State Diagram used in Text

Concurrency Sets

Concurrency Sets

3-Phase Commit Protocol

The 3-phase commit protocol splits state wi, thereby eliminating the problem of having both abort and commit states in the concurrency set of state w1.

Concurrency Sets with 3-Phase Commit

As with the 2-Phase Commit, the state diagrams in the textbook for the 3-Phase Commit Protocol are simplified. There is no Coordinator state to receive the ACK messages generated when the Cohort makes the transition from wi to ai.

The following version includes the full state set.

Failure & Timeout Rules

3-Phase Commit Protocol with Failure Transitions

3-Phase Protocol Theorem

Rules 1 and 2 are sufficient for designing commit protocols resilient to a single site failure during a transaction.

Multiple Site Failure Theorems