COP 5611: OPERATING SYSTEMS -Spring 2003- Ted Baker

Distributed Deadlock Detection

These notes were used in a subsititute lecture, for Xiuwen Liu's class in Spring 2003.

The material is mainly from Chapter 7 (Distributed Deadlock Detection) in Advanced Concepts in OS.

You should be already familiar with deadlock handling in single-processor systems, from the undergraduate course that is prerequisite to this one. If not, you should review it now, as background for distributed deadlock.

For reference, see COP 4610 Notes on Deadlock.

Beyond that, you should have read and be familiar with Chapter 3 of the text for this course, especially the sections on graph models for deadlock detection.

Distributed Deadlock Models

Based on WFG (not GRG)

The text fudges a bit by forgetting that in some models the existence of a cycle is necessary but not sufficient, and the existence of a knot is sufficient but not necessary, leaving a gap.

Distributed Deadlock Handling Strategies

Distributed Deadlock Detection Issues & Requirements

What do we do when a deadlock is detected?

Categorization of Methods

Simple Centralized Control

What are the advantages and disadvantages?

False Deadlock Example

An external observer can see deadlock where there is none.

4 sites:

Two transactions run concurrently:

T1: lock R1 T2: lock R1
T1: unlock R1
T2: unlock R1
T1: lock R2
T2: lock R2
T1: unlock R2
T2: unlock R2

False Deadlock Example: Event Trace Diagram

False Deadlock Snapshots

What if each site maintains its local view of the system state, and periodically sends this to the control site?

Ho-Ramamoorthy Algorithm: Two-Phase

May still report false deadlocks.

What lesson(s) does this teach?

Ho-Ramamoorthy Algorithm: One-Phase

Notice the similarity in principle here to the Chandy-Lamport global state recording algorithm, i.e., we need to capture not just the states of the processes but also the states of the messages in transit.

Classification of Distributed Detection Algorithms

Obermarck's Path-Pushing Algorithm

How can we totally order the transactions?

Obermarck's Path-Pushing Algorithm

  1. wait for info from previous iteration of Step 3
  2. combine received info with local TWFG
    detect all cycles
    break local cycles
  3. send each cycle including the node Ex
    to the external nodes it is waiting for
  4. time-saver: only send path to other sites if first
    transaction is higher in lexical order than the last

Note: TWF = "transaction wait-for graph", just another name for a WFG, when applied to transactions.

Note: In the figures below, the rule above is reversed, i.e., we send the path of the first transaction is lower in lexical order than the first. This was an accident when I drew up the figures, but it does not matter. Either rule will work. The essential idea is simply to have one canonical representation of each path.

Note: The example above does not fully illustrate all the things that can go on in a more complicated graph. In particular, there are no nodes with multi-node AND-dependences.

Problem and Questions on Obermarck's Path-Pushing Algorithm

Detects false deadlocks, due to asynchronous snapshots at different sites. (lesson?)

Message complexity? Message size? Detection delay?

Exactly how are paths combined and checked?

Obermarck's Path-Pushing Algorithm: Performance

The O(n(n-1)/2) message bound is a little bit misleading, since the algorithm is described as being run continually, in an iterative fashion. That is, the sites exchange paths continually, even when no deadlock is detected.

Chandy-Misra-Haas Edge-Chasing Algorithm


Algorithm Initiation by Pi

if Pi is locally dependent on itself then declare a deadlock
else send probe (i, j, k) to home site of Pk for each j, k such that all of the following hold

Algorithm on receipt of probe (i,j,k) by node of Pk

check the following conditions

if these are all true, do the following

What is the purpose of the dependent relation?

The above example does not include any nontrivial AND-requests. What is it about this algorithm that allows it to handle AND-requests, but not OR-requests?

Chandy Misra Haas Complexity


Other Edge-Chasing Algorithms

We will not have time to go into these in detail in class.

Diffusion Based Algorithms: Chandy et al.

On receipt of query(i,j,k) by m

On receipt of reply(i,j,k) by k

The black dashed arrows indicate the engaging process for each engaged process. This information is needed to route the reply when the number of other processes for which the engaged process is waiting goes to zero.

Observe that the engaging process arrows form a spanning tree of the subgraph corresponding to the set of process for which the initiating process is waiting. If every process in this subgraph is blocked, we have a knot.

At this point, a knot has been detected.

Global State Detection Based Algorithms

Details differ.

Recall: What is graph reduction?

Graph Reduction

General idea: simulate the result of execution, assuming all unblocked processes complete without requesting any more resources

If you have never done one, it would be a good idea to work through an example of graph reduction. In particular, note that backtracking may be necessary with the more complex resource models such as consumable resource graphs.

© 1999, 2001, 2003 T. P. Baker
Last updated by Author: baker on Date: 2003/02/18