## ECE1254H Modeling of Multiphysics Systems. Lecture 10: Nonlinear systems. Taught by Prof. Piero Triverio

October 21, 2014 ece1254 ,

## Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

## Nonlinear systems

On slides, some examples to motivate:

• struts
• fluids
• diode (exponential)Example in fig. 1.

fig. 1. Diode circuit

\label{eqn:multiphysicsL10:20}
I_d = I_s \lr{ e^{V_d/V_t} – 1 } = \frac{10 – V_d}{10}.

## Richardson and Linear Convergence

Seeking the exact solution $$x^\conj$$ for

\label{eqn:multiphysicsL10:40}
f(x^\conj) = 0,

Suppose that

\label{eqn:multiphysicsL10:60}
x^{k + 1} = x^k + f(x^k)

If $$f(x^k) = 0$$ then we have convergence, so $$x^k = x^\conj$$.

### Convergence analysis

Write the iteration equations at a sample point and the solution as

\label{eqn:multiphysicsL10:80}
x^{k + 1} = x^k + f(x^k)

\label{eqn:multiphysicsL10:100}
x^{\conj} = x^\conj +
\underbrace{
f(x^\conj)
}_{=0}

Taking the difference we have

\label{eqn:multiphysicsL10:120}
x^{k+1} – x^\conj = x^k – x^\conj + \lr{ f(x^k) – f(x^\conj) }.

The last term can be quantified using the mean value theorem \ref{thm:multiphysicsL10:140}, giving

\label{eqn:multiphysicsL10:140}
x^{k+1} – x^\conj
= x^k – x^\conj +
\evalbar{\PD{x}{f}}{\tilde{x}} \lr{ x^k – x^\conj }
=
\lr{ x^k – x^\conj }
\lr{
1 + \evalbar{\PD{x}{f}}{\tilde{x}} }.

The absolute value is thus

\label{eqn:multiphysicsL10:160}
\Abs{x^{k+1} – x^\conj } =
\Abs{ x^k – x^\conj }
\Abs{
1 + \evalbar{\PD{x}{f}}{\tilde{x}} }.

We have convergence provided $$\Abs{ 1 + \evalbar{\PD{x}{f}}{\tilde{x}} } < 1$$ in the region where we happen to iterate over. This could easily be highly dependent on the initial guess.

Stated more accurately we have convergence provided

\label{eqn:multiphysicsL10:180}
\Abs{
1 + \PD{x}{f} }
\le \gamma < 1
\qquad \forall \tilde{x} \in [ x^\conj – \delta, x^\conj + \delta ],

and $$\Abs{x^0 – x^\conj } < \delta$$. This is illustrated in fig. 3.

fig. 3. Convergence region

It could very easily be difficult to determine the convergence regions.

We have some problems

• Convergence is only linear
• $$x, f(x)$$ are not in the same units (and potentially of different orders). For example, $$x$$ could be a voltage and $$f(x)$$ could be a circuit current.
• (more on slides)

Examples where we may want to use this:

• Spice Gummal Poon transistor model. Lots of diodes, …
• Mosfet model (30 page spec, lots of parameters).

## Newton’s method

The core idea of this method is sketched in fig. 4. To find the intersection with the x-axis, we follow the slope closer to the intersection.

fig. 4. Newton’s method

To do this, we expand $$f(x)$$ in Taylor series to first order around $$x^k$$, and then solve for $$f(x) = 0$$ in that approximation

\label{eqn:multiphysicsL10:200}
f( x^{k+1} ) \approx f( x^k ) + \evalbar{ \PD{x}{f} }{x^k} \lr{ x^{k+1} – x^k } = 0.

This gives

\label{eqn:multiphysicsL10:220}
\boxed{x^{k+1} = x^k – \frac{f( x^k )}{\evalbar{ \PD{x}{f} }{x^k}}}

### Example: Newton’s method

For the solution of

\label{eqn:multiphysicsL10:260}
f(x) = x^3 – 2,

it was found (table 1.1)

The error tails off fast as illustrated roughly in fig. 6.

fig. 6. Error by iteration

### Convergence analysis

The convergence condition is

\label{eqn:multiphysicsL10:280}
0 = f(x^k) + \evalbar{ \PD{x}{f} }{x^k} \lr{ x^{k+1} – x^k }.

The Taylor series for $$f$$ around $$x^k$$, using a mean value formulation is

\label{eqn:multiphysicsL10:300}
f(x)
= f(x^k)
+ \evalbar{ \PD{x}{f} }{x^k} \lr{ x – x^k }.
+ \inv{2} \evalbar{ \PDSq{x}{f} }{\tilde{x} \in [x^\conj, x^k]} \lr{ x – x^k }^2.

Evaluating at $$x^\conj$$ we have

\label{eqn:multiphysicsL10:320}
0 = f(x^k)
+ \evalbar{ \PD{x}{f} }{x^k} \lr{ x^\conj – x^k }.
+ \inv{2} \evalbar{ \PDSq{x}{f} }{\tilde{x} \in [x^\conj, x^k]} \lr{ x^\conj – x^k }^2.

and subtracting this from \ref{eqn:multiphysicsL10:280} we are left with

\label{eqn:multiphysicsL10:340}
0 = \evalbar{\PD{x}{f}}{x^k} \lr{ x^{k+1} – x^k – x^\conj + x^k }
– \inv{2} \evalbar{\PDSq{x}{f}}{\tilde{x}} \lr{ x^\conj – x^k }^2.

Solving for the difference from the solution, the error is

\label{eqn:multiphysicsL10:360}
x^{k+1} – x^\conj
= \inv{2} \lr{ \PD{x}{f} }^{-1} \evalbar{\PDSq{x}{f}}{\tilde{x}} \lr{ x^k – x^\conj }^2,

or in absolute value

\label{eqn:multiphysicsL10:380}
\Abs{ x^{k+1} – x^\conj }
= \inv{2} \Abs{ \PD{x}{f} }^{-1} \Abs{ \PDSq{x}{f} } \Abs{ x^k – x^\conj }^2.

We see that convergence is quadratic in the error from the previous iteration. We will have trouble if the derivative goes small at any point in the iteration region, for example in fig. 7, we could easily end up in the zero derivative region.

fig. 7. Newton’s method with small derivative region

### When to stop iteration

One way to check is to look to see if the difference

\label{eqn:multiphysicsL10:420}
\Norm{ x^{k+1} – x^k } < \epsilon_{\Delta x},

however, when the function has a very step slope

this may not be sufficient unless we also substitute our trial solution and see if we have the match desired.

Alternatively, if the slope is shallow as in fig. 7, then checking for just $$\Abs{ f(x^{k+1} } < \epsilon_f$$ may also mean we are off target.

Finally, we may also need a relative error check to avoid false convergence. In fig. 10, we may have both

fig. 10. Possible relative error difference required

\label{eqn:multiphysicsL10:440}
\Abs{x^{k+1} – x^k} < \epsilon_{\Delta x}

\label{eqn:multiphysicsL10:460}
\Abs{f(x^{k+1}) } < \epsilon_{f},

however, we may also want a small relative difference

\label{eqn:multiphysicsL10:480}
\frac{\Abs{x^{k+1} – x^k}}{\Abs{x^k}}
< \epsilon_{f,r}.

This can become problematic in real world engineering examples such as to diode of fig. 11, where we have shallow regions and fast growing or dropping regions.

fig. 11. Diode current curve

## Theorems

### Mean value theorem

For a continuous and differentiable function $$f(x)$$, the difference can be expressed in terms of the derivative at an intermediate point

\begin{equation*}
f(x_2) – f(x_1)
= \evalbar{ \PD{x}{f} }{\tilde{x}} \lr{ x_2 – x_1 }
\end{equation*}

where $$\tilde{x} \in [x_1, x_2]$$.

This is illustrated (roughly) in fig. 2.

fig. 2. Mean value theorem illustrated

## Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

## Fill ins

The problem of fill ins in LU computations arise in locations where rows and columns cross over zero positions.

Rows and columns can be permuted to deal with these. Here is an ad-hoc permutation of rows and columns that will result in less fill ins.

\label{eqn:multiphysicsL7:180}
\begin{aligned}
&
\begin{bmatrix}
a & b & c & 0 \\
d & e & 0 & 0 \\
0 & f & g & 0 \\
0 & h & 0 & i \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
x_4
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
a & c & 0 & b \\
d & 0 & 0 & e \\
0 & g & 0 & f \\
0 & 0 & i & h \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_4 \\
x_3 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
0 & a & c & b \\
0 & d & 0 & e \\
0 & 0 & g & f \\
i & 0 & 0 & h \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_4 \\
x_1 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
i & 0 & 0 & h \\
0 & a & c & b \\
0 & d & 0 & e \\
0 & 0 & g & f \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_4 \\
x_1 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_4 \\
b_1 \\
b_2 \\
b_3 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
i & 0 & 0 & h \\
0 & c & a & b \\
0 & 0 & d & e \\
0 & g & 0 & f \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_1 \\
x_4 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_4 \\
b_1 \\
b_2 \\
b_3 \\
\end{bmatrix} \\
\end{aligned}

## Markowitz product

To facilitate such permutations the Markowitz product that estimates the amount of fill in required.

### Definition Markowitz product

\begin{equation*}
\begin{aligned}
\text{Markowitz product} =
&\lr{\text{Non zeros in unfactored part of Row -1}} \times \\
&\lr{\text{Non zeros in unfactored part of Col -1}}
\end{aligned}
\end{equation*}

In [1] it is stated “A still simpler alternative, which seems adequate generally, is to choose the pivot which minimizes the number of coefficients modified at each step (excluding those which are eliminated at the particular step). This is equivalent to choosing the non-zero element with minimum $$(\rho_i – 1 )(\sigma_j -1)$$.”

Note that this product is applied only to $$i j$$ positions that are
non-zero, something not explicitly mentioned in the slides, nor in other
locations like [2]

### Example: Markowitz product

For this matrix
\label{eqn:multiphysicsL7:220}
\begin{bmatrix}
a & b & c & 0 \\
d & e & 0 & 0 \\
0 & f & g & 0 \\
0 & h & 0 & i \\
\end{bmatrix},

the Markowitz products are

\label{eqn:multiphysicsL7:280}
\begin{bmatrix}
1 & 6 & 2 & \\
1 & 3 & & \\
& 3 & 1 & \\
& 3 & & 0 \\
\end{bmatrix}.

## Markowitz reordering

The Markowitz Reordering procedure (copied directly from the slides) is

• For i = 1 to n
• Find diagonal $$j >= i$$ with min Markowitz product
• Swap rows $$j \leftrightarrow i$$ and columns $$j \leftrightarrow i$$
• Factor the new row $$i$$ and update Markowitz products

### Example: Markowitz reordering

Looking at the Markowitz products \ref{eqn:multiphysicsL7:280} a swap of rows and columns $$1, 4$$ gives the modified matrix

\label{eqn:multiphysicsL7:300}
\begin{bmatrix}
i & 0 & h & 0 \\
0 & d & e & 0 \\
0 & 0 & f & g \\
0 & a & b & c \\
\end{bmatrix}

In this case, this reordering has completely avoided any requirement to do any actual Gaussian operations for this first stage reduction.

Presuming that the Markowitz products for the remaining 3×3 submatrix are only computed from that submatrix, the new products are
\label{eqn:multiphysicsL7:320}
\begin{bmatrix}
& & & \\
& 1 & 2 & \\
& & 2 & 1 \\
& 2 & 4 & 2 \\
\end{bmatrix}.

We have a minimal product in the pivot position, which happens to already lie on the diagonal. Note that it is not necessarily the best for numerical stability. It appears the off diagonal Markowitz products are not really of interest since the reordering algorithm swaps both rows and columns.

## Graph representation

It is possible to interpret the Markowitz products on the diagonal as connectivity of a graph that represents the interconnections of the nodes. Consider the circuit of fig. 2 as an example

fig 2. Simple circuit

The system equations for this circuit is of the form
\label{eqn:multiphysicsL7:340}
\begin{bmatrix}
x & x & x & 0 & 1 \\
x & x & x & 0 & 0 \\
x & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
-1 & 0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_1 \\
V_2 \\
V_3 \\
V_4 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix}.

The Markowitz products along the diagonal are
\label{eqn:multiphysicsL7:360}
\begin{aligned}
M_{11} &= 9 \\
M_{22} &= 4 \\
M_{33} &= 9 \\
M_{44} &= 4 \\
M_{55} &= 4 \\
\end{aligned}

Compare these to the number of interconnections of the graph fig. 3 of the nodes in this circuit. We see that these are the squares of the number of the node interconnects in each case.

fig. 3. Graph representation

Here a 5th node was introduced for the current $$i$$ between nodes $$4$$ and $$1$$. Observe that the Markowitz product of this node was counted as the number of non-zero values excluding the $$5,5$$ matrix position. However, that doesn’t matter too much since a Markowitz swap of row/column 1 with row/column 5 would put a zero in the $$1,1$$ position of the matrix, which is not desirable. We have to restrict the permutations of zero diagonal positions to pivots for numerical stability, or use a more advanced zero fill avoidance algorithm.

The minimum diagonal Markowitz products are in positions 2 or 4, with respective Markowitz reorderings of the form

\label{eqn:multiphysicsL7:380}
\begin{bmatrix}
x & x & x & 0 & 0 \\
x & x & x & 0 & 1 \\
x & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
0 & -1 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_2 \\
V_1 \\
V_3 \\
V_4 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix},

and
\label{eqn:multiphysicsL7:400}
\begin{bmatrix}
x & 0 & 0 & x & -1 \\
0 & x & x & x & 1 \\
0 & x & x & x & 0 \\
x & x & x & x & 0 \\
1 & -1 & 0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_4 \\
V_1 \\
V_2 \\
V_3 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix}.

The original system had 7 zeros that could potentially be filled in the remaining $$4 \times 4$$ submatrix. After a first round of Gaussian elimination, our system matrices have the respective forms

\label{eqn:multiphysicsL7:420}
\begin{bmatrix}
x & x & x & 0 & 0 \\
0 & x & x & 0 & 1 \\
0 & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
0 & -1 & 0 & 1 & 0 \\
\end{bmatrix}

\label{eqn:multiphysicsL7:440}
\begin{bmatrix}
x & 0 & 0 & x & -1 \\
0 & x & x & x & 1 \\
0 & x & x & x & 0 \\
0 & x & x & x & 0 \\
0 & -1 & 0 & x & x \\
\end{bmatrix}

The remaining $$4 \times 4$$ submatrices have interconnect graphs sketched in fig. 4.

fig. 4. Graphs after one round of Gaussian elimination

From a graph point of view, we want to delete the most connected nodes. This can be driven by the Markowitz products along the diagonal or directly with graph methods.

## Summary of factorization costs

### LU (dense)

• cost: $$O(n^3)$$
• cost depends only on size

### LU (sparse)

• cost: Diagonal and tridiagonal are $$O(n)$$, but we can have up to $$O(n^3)$$ depending on sparsity and the method of dealing with the sparsity.
• cost depends on size and sparsity

Computation can be affordable up to a few million elements.

### Iterative methods

Can be cheap if done right. Convergence requires careful preconditioning.

## Iterative methods

Suppose that we have an initial guess $$\Bx_0$$. Iterative methods are generally of the form

DO
$$\Br = \Bb – M \Bx_i$$
UNTIL $$\Norm{\Br} < \epsilon$$

The difference $$\Br$$ is called the residual. For as long as it is bigger than desired, continue improving the estimate $$\Bx_i$$.

The matrix vector product $$M \Bx_i$$, if dense, is of $$O(n^2)$$. Suppose, for example, that we can perform the iteration in ten iterations. If the matrix is dense, we can have $$10 \, O(n^2)$$ performance. If sparse, this can be worse than just direct computation.

This is a method for iterative solution of the equation $$M \Bx = \Bb$$.

This requires symmetric positive definite matrix $$M = M^\T$$, with $$M > 0$$.

We introduce an energy function

\label{eqn:multiphysicsL7:60}
\Psi(\By) \equiv \inv{2} \By^\T M \By – \By^\T \Bb

For a two variable system this is illustrated in fig. 1.

fig. 1. Positive definite energy function

### Theorem: Energy function minimum

The energy function \ref{eqn:multiphysicsL7:60} has a minimum at

\label{eqn:multiphysicsL7:80}
\By = M^{-1} \Bb = \Bx.

To prove this, consider the coordinate representation

\label{eqn:multiphysicsL7:480}
\Psi = \inv{2} y_a M_{ab} y_b – y_b b_b,

for which the derivatives are
\label{eqn:multiphysicsL7:500}
\PD{y_i}{\Psi} =
\inv{2} M_{ib} y_b
+
\inv{2} y_a M_{ai}
– b_i
=
\lr{ M \By – \Bb }_i.

The last operation above was possible because $$M = M^\T$$. Setting all of these equal to zero, and rewriting this as a matrix relation we have

\label{eqn:multiphysicsL7:520}
M \By = \Bb,

as asserted.

This is called the gradient method because the gradient moves us along the path of steepest descent towards the minimum if it exists.

The method is

\label{eqn:multiphysicsL7:100}
\Bx^{(k+1)} = \Bx^{(k)} +
\underbrace{ \alpha_k }_{step size}
\underbrace{ \Bd^{(k)} }_{direction},

where the direction is

\label{eqn:multiphysicsL7:120}
\Bd^{(k)} = – \spacegrad \Phi = \Bb – M \Bx^k = r^{(k)}.

### Optimal step size

Note that for the minimization of $$\Phi \lr{ \Bx^{(k+1)} }$$, we note

\label{eqn:multiphysicsL7:140}
\Phi \lr{ \Bx^{(k+1)} }
= \Phi\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }
=
\inv{2}
\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }^\T
M
\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }

\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }^\T \Bb

If we take the derivative of both sides with respect to $$\alpha_k$$ to find the minimum, we have

\label{eqn:multiphysicsL7:540}
0 =
\inv{2}
\lr{ \Bd^{(k)} }^\T
M
\Bx^{(k)}
+
\inv{2}
\lr{ \Bx^{(k)} }^\T
M
\Bd^{(k)}
+
\alpha_k \lr{ \Bd^{(k)} }^\T
M
\Bd^{(k)}

\lr{ \Bd^{(k)} }^\T \Bb.

Because $$M$$ is symmetric, this is

\label{eqn:multiphysicsL7:560}
\alpha_k \lr{ \Bd^{(k)} }^\T
M
\Bd^{(k)}
=
\lr{ \Bd^{(k)} }^\T \lr{ \Bb – M \Bx^{(k)}}
=
\lr{ \Bd^{(k)} }^\T r^{(k)},

or

\label{eqn:multiphysicsL7:160}
\alpha_k
= \frac{
\lr{\Br^{(k)}}^\T
\Br^{(k)}
}{
\lr{\Br^{(k)}}^\T
M
\Br^{(k)}
}

We will see that this method is not optimal when we pick one direction and keep going down that path.

## Definitions and theorems

### Definition: Positive (negative) definite

A matrix $$M$$ is positive (negative) definite, denoted $$M > 0 (<0)$$ if $$\By^\T M \By > 0 (<0), \quad \forall \By$$.

If a matrix is neither positive, nor negative definite, it is called indefinite.

### Theorem: Positive (negative) definite

A symmetric matrix $$M > 0 (<0)$$ iff $$\lambda_i > 0 (<0)$$ for all eigenvalues $$\lambda_i$$, or is indefinite iff its eigenvalues $$\lambda_i$$ are of mixed sign.

# References

[1] Harry M Markowitz. The elimination form of the inverse and its application to linear programming. Management Science, 3\penalty0 (3):\penalty0 255–269, 1957.

[2] Timothy Vismor. Pivoting To Preserve Sparsity, 2012. URL https://vismor.com/documents/network_analysis/matrix_algorithms/S8.SS3.php. [Online; accessed 15-Oct-2014].

## Differences between Locks, Mutexes, and Semaphores

James writes to me:

I’m having trouble finding concrete definitions for Locks, Mutexes, and Semaphores

What I do know is:

• Semaphores keep track of how many processes have gained access (“P”) to a resource and block (suspend and send to priority queue) processes once it’s maxed out. They have no concept of ownership and anyone can call unlock (“V”)
• Binary semaphores are the same as above except their max is 1
• Mutexes are the same as a binary semaphore except they have a concept of ownership
• Spinlocks are the same as mutexes except they do not perform blocking and instead are polled until unlocked

What I don’t know:

Where does the term “lock” fit in here?

• …Is it a broad category including all of these?
• …Or is it the equivalent of a mutex but to refer to threads within the same process, as opposed to processes within the same system?
• …Or is it something else?

What the hell is a lock? I’ve seen the above things referred to as one on some websites but then I’ve seen it listed as a separate thing.

These are all very good definitions.  Before getting to the nebulous term lock, I’d refine the definitions of mutex and spinlocks slightly.  I view these both as constructs that allow exclusive concurrent access by multiple threads or processes to shared memory.

A mutex has a mechanism to ensure that only one “holder” of the mutex resource exists at any point of time, but to be useful, must also include a mechanism to order access of memory protected by that mutex with the memory of the mutex resource itself.  An implementation of a mutex (or spinlock) will have an acquire operation that does

• an exclusive atomic operation
• “acquire” (or stronger) memory barrier
• if the atomic operation fails, there may be code to queue up on an operating system resource (such as a semaphore, or futex)

with the release operation doing:

• if there are waiters queued up, there may be code to post one or more of these waiters before continuing
• a “release” memory barrier instruction
• a store (or atomic) instruction to allow subsequent acquisition.

The memory barriers here are key.   In some cases, appropriate memory barriers may be built into the atomic operations.  For example the intel “LOCK XCHG” on a memory address is atomic and has barrier semantics suitable for an acquire operation.  However, on powerpc, in addition to an atomic operation used to acquire, an instruction like isync will be required to ensure that subsequent memory operations aren’t initiated before the acquire atomic operation has completed.  Similarly, on powerpc, you will need an instruction like “lwsync” on mutex release to ensure that memory operations initiated while the mutex was “held” complete before the store instruction that releases the mutex.

If you did just the following thinking that you have implemented your own mutex:

if ( pSharedMem->mutex.atomicword->atomic_bit_or(1) == successful )
{
int x = pSharedMem->someInteger ;

pSharedMem->someValue = v ;

pSharedMem->mutex.atomicword = 0 ;
}


you would be very disappointed should the underlying hardware execute this as one of

pSharedMem->someValue = v ;

if ( pSharedMem->mutex.atomicword->atomic_bit_or(1) == successful )
{
int x = pSharedMem->someInteger ;

pSharedMem->mutex.atomicword = 0 ;
}


or

int x = pSharedMem->someInteger ;

if ( pSharedMem->mutex.atomicword->atomic_bit_or(1) == successful )
{

pSharedMem->someValue = v ;

pSharedMem->mutex.atomicword = 0 ;
}


Both of these can occur on platforms that allow out of order execution of memory operations (powerpc, spark, ia64, …) if appropriate memory barrier instructions are not used as part of the mutex implementation.

You would be similarly disappointed with a mutex release operation if it allowed one of the following permutations:

if ( pSharedMem->mutex.atomicword->atomic_bit_or(1) == successful )
{
int x ;

pSharedMem->someValue = v ;

pSharedMem->mutex.atomicword = 0 ;

x = pSharedMem->someInteger ;
}


or

if ( pSharedMem->mutex.atomicword->atomic_bit_or(1) == successful )
{
int x = pSharedMem->someInteger ;

pSharedMem->mutex.atomicword = 0 ;

pSharedMem->someValue = v ;

}


Both of these permutations can occur on out of order platforms if no appropriate memory barrier instructions are used. This surprises many people that attempt to use atomic operations to avoid contention bottlenecks imposed by a mutex that has significant traffic. One also has to be careful to make sure that the compiler does not move the instructions out of the “mutual exclusion region”. Generally atomic operations (or the plain store above, which could also be implemented with an atomic bitand if required) can be implemented in a way that signals to the compiler that it should not move loads and stores around the atomic.

Now, how about this term “lock”.  The wikipedia definition of lock is fairly consistent with most uses of mutex.  However, it has been my experience that this is defined in whatever way suits the writer.  I have not done an extensive survey, but may also be biased since db2 internals uses “lock” to refer to a serialization mechanism that has a rich set of acquisition modes, as well as concepts of fairness that we don’t have in any of our mutex implementations.

It is common to see lock used in combination.  Examples are spin-locks and reader-writer locks.  I prefer reader writer mutex.  Adding to the confusion, the internal db2 implementation of a reader-writer mutex is referred to as a shared-latch (and we call an exclusive mutex a plain “latch”).  Perhaps this was to distinguish our latch from the already used “lock” term, but before mutex became popular for the same.

My nomenclature preferences are:

• use mutex for the common sorts of serialization that are required in shared memory multi-thread or multi-procees systems (when talking to db2 developers about generic serialization mechanisms)
• use spinlock when referring to a mutex that does not use any sort of queuing mechanism for resolving conflict.
• use semaphore for operating system counting mechanisms (like the unix semop).
• use lock or latch when talking to db2 developers where these have well defined meaning.
• use shared latch when talking to db2 developers when I mean reader-writer mutex.

I’d be curious what names other products internal mutex implementations use.