spinlock.html

Linux Spinlocks

These notes explain how spinlocks were implemented in the Linux kernel at the time the notes were written. The details may have evolved, but the implementation explained here is still valid and remains good an and example for a person who wants to understand how spinlocks work at a detailed level.

Macros

#define spin_lock_init(x)       do { (x)->lock = 0; } while(0)

The above is semantically the same as "x->lock =0". The "do{ ... }while()" is a syntactic wrapper that prevents spin_lock_init from being used as an expression, i.e., makes sure it belongs to the syntactic category statement. In general, it is important for the execution order of synchronization operations to be uniquely defined, as is the case for statement execution order. In contrast, the compiler is free to reorder the evaluation of subexpressions, subject to the limits of parenthesization. This is a conventional way of enclosing a block of code in a macro. To see why simple braces "{ ... }" would not do as well, consider the following.

        if (foo)
                spin_lock_init(x);
        else
                do_somthing_else();

This looks (and should be) perfectly legal, but if you protected the macro only with '{ }' you would get the following:

        if (foo)
                { (x)->lock = 0; };
        else
                do_somthing_else();

The {} block will be interpreted as the if-block (ok so far). The semicolon will be taken as a null statement. That is, since it is not an "else" the compiler concludes that the "if" doesn't have a matching "else". When it then sees the "else" on the next line it will treat that as a syntax error. A "do {} while(0)" is the correct way of making sure that the semicolon following the macro always gets eaten by the parser. A decent compiler will not generate any extra machine instructions for this degenerate form of loop, so it does not add execution time overhead.

Conditional Compilation

#ifndef __SMP__
...
#define spin_lock_init(lock)	do { } while(0)
...
#else	/* __SMP__ */
...
#define spin_lock_init(x)	do { (x)->lock = 0; } while(0)
...
#endif /* __SMP__ */

If the system is compiled without the symbol __SMP__ defined (the symbol selects support for Symmetric Multi-Processor, or Shared-memory Multi-Processor) support there is no need for spin-locks, so in that case null bodies are provided for all the operations on spinlocks.

Representation of Spin Lock Object

typedef struct {
	volatile unsigned int lock;
} spinlock_t;

The value of x.lock is an integer. The value is zero if-and-only-if the lock is in the unlocked stated.

What does the word "volatile" mean here? Why is this essential?

#define spin_lock_string \
	"\n1:\t" \
	"lock ; btsl $0,%0\n\t" \
	"jc 2f\n" \
	".section .text.lock,\"ax\"\n" \
	"2:\t" \
	"testb $1,%0\n\t" \
	"jne 2b\n\t" \
	"jmp 1b\n" \
	".previous"

If the new-line and tab characters are expanded, the above in-line assembly language code appears as follows:

1:                        // a label
   lock                   // makes the next instruction atomic
   btsl $0,%0             // bit test and set long
   jc 2f                  // 
.section .text.lock,"ax"
2:
  testb $1,%0
  jne 2b
  jmp 1b
.previous

The "bit test and set long" instruction is explained in terms of register transfers as follows:

CF <- Bit(BitBase, BitOffset)
Bit(BitBase, BitOffset) <- 1

That is, the Carry Flag (CF) is set if the value of the 0th bit of register operand %0 before the instruction executes is 1, and in any case sets the 0th bit of %0 to 1. Here, $0 denotes an immediate operand with value zero.

Thus, the net effect of the first three instructions is to jump to label 2 if the lock is not free, and otherwise seize the lock.

.section .text.lock,"ax"

The above pseudo-operation causes the assembler to put the following block of code, up to the next .previous pseudo-operation, out-of-line, in a different text (i.e., code) section. We expect that usually the lock will be free and execution will fall through the jc 2f instruction. When it does it will fall through to the block of code following the .previous pseudo-operation. It is because the target is in a different text segment that the conditional jump uses a long address, indicated by the letter "l".

2:
  testb $1,%0    // compare the low-order byte against 1
  jne 2b         // continue looping if the value is still 1
  jmp 1b         // otherwise, go back to label 1 and retry

The above code does the "spinning", i.e., it repeated checks the value of the lock until the holder frees it, which the holder will do by setting the value of the lock to zero. This could have been done using a simple loop on the btsl, but each time that instruction executes is forces a "locked cycle" on the memory bus, which hurts performance for all CPUs. It is is much more efficient to just do a normal fetch on the value of the lock until it is updated, since the fetch will come from the local cache of the CPU until the value in memory is changed by the lock holder. The CPU has bus snooping logic that will notice the write, update (or invalidate) the local cache, and cause the new value of the lock to be fetched when it has changed. Of course, another CPU might be also trying to grab the lock, so it is necessary to go back to the btsl in order to repeat the attempt at atomically seizing the lock.

This kind of lock is not perfect for real-time systems for the following reasons:

While a CPU is waiting for the lock, it is doing no useful work.
Without more information about the hardware, it is indeterminate which CPU will get the lock. In principle, if there is a high degree of contention for the lock, this permits unbounded blocking.
Priority is not taken into account. That is, if jobs on several CPU's are contending for the lock, which CPU gets the lock is not determined by the priority of the job that is running on the CPU. With priority-based scheduling algorithms, such priority inversion is undesirable, because it reduces the schedulable CPU utilization.

The effects of problem (1) can be bounded if the lengths of the critical sections (blocks of code over which the lock is held) for each lock are kept very short, and the frequencies of execution of critical sections are kept low.

Problems (2) and (3) can be addressed by using a more sophisticated locking algorithm, that takes into account priorities. It is an interesting question how much additional execution time overhead is incurred by using such an algorith (which is necessarily somewhat slower), and how the processor utilization lost due to this overhead compares against the gain in utilizable processing capacity due to reduced priority inversion.

The Unlock Operation

#define spin_unlock_string \
	"lock ; btrl $0,%0"

The above Bit Test and Reset (btrl) instruction stores the value of the zeroth bit of the operand into the CF flag, and clears the bit in the memory operand.

© 1998, 2003 T. P. Baker. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without written permission.
$Id: spinlock.html,v 1.1 2003/10/17 12:34:01 baker Exp baker $