ece1254

ECE1254H Modeling of Multiphysics Systems. Lecture 12: Struts and Joints, Node branch formulation. Taught by Prof. Piero Triverio

October 27, 2014 ece1254 , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

Struts and Joints, Node branch formulation

Let’s consider the simple strut system of fig. 1 again.

lecture12Fig1

fig. 1. Simple strut system

 

Our unknowns are

  1. Forces at each of the points we have a force with two components
    \begin{equation}\label{eqn:multiphysicsL12:20}
    \Bf_A = \lr{ f_{A,x}, f_{A,y} }
    \end{equation}We construct a total force vector\begin{equation}\label{eqn:multiphysicsL12:40}
    \Bf =
    \begin{bmatrix}
    f_{A,x} \\
    f_{A,y} \\
    f_{B,x} \\
    f_{B,y} \\
    \vdots
    \end{bmatrix}
    \end{equation}
  2. Positions of the joints\begin{equation}\label{eqn:multiphysicsL12:60}
    \Br =
    \begin{bmatrix}
    x_1 \\
    y_1 \\
    y_1 \\
    y_2 \\
    \vdots
    \end{bmatrix}
    \end{equation}

Our given variables are

  1. The load force \( \Bf_L \).
  2. The joint positions at rest.
  3. parameter of struts.

Conservation laws

The conservation laws are

\begin{equation}\label{eqn:multiphysicsL12:80}
\Bf_A + \Bf_B + \Bf_C = 0
\end{equation}
\begin{equation}\label{eqn:multiphysicsL12:100}
-\Bf_C + \Bf_D + \Bf_L = 0
\end{equation}

which we can put in matrix form

\begin{equation}\label{eqn:multiphysicsL12:120}
\begin{bmatrix}
1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 \\
0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & -1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 & 0 & -1 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
f_{A,x} \\
f_{A,y} \\
f_{B,x} \\
f_{B,y} \\
f_{C,x} \\
f_{C,y} \\
f_{D,x} \\
f_{D,y}
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
-f_{L,x} \\
-f_{L,y}
\end{bmatrix}
\end{equation}

Here the block matrix is called the incidence matrix \( \BA \), and we write

\begin{equation}\label{eqn:multiphysicsL12:160}
A \Bf = \Bf_L.
\end{equation}

Constitutive laws

Given a pair of nodes as in fig. 2.

lecture12Fig2

fig. 2. Strut spanning nodes

 

each component has an equation relating the reaction forces of that strut based on the positions

\begin{equation}\label{eqn:multiphysicsL12:180}
f_{c,x} = S_x \lr{ x_1 – x_2, y_1 – y_2, p_c }
\end{equation}
\begin{equation}\label{eqn:multiphysicsL12:200}
f_{c,y} = S_y \lr{ x_1 – x_2, y_1 – y_2, p_c },
\end{equation}

where \( p_c \) represent the parameters of the system. We write

\begin{equation}\label{eqn:multiphysicsL12:220}
\Bf =
\begin{bmatrix}
f_{A,x} \\
f_{A,y} \\
f_{B,x} \\
f_{B,y} \\
\vdots
\end{bmatrix}
=
\begin{bmatrix}
S_x \lr{ x_1 – x_2, y_1 – y_2, p_c } \\
S_y \lr{ x_1 – x_2, y_1 – y_2, p_c } \\
\vdots
\end{bmatrix},
\end{equation}

or

\begin{equation}\label{eqn:multiphysicsL12:260}
\Bf = S(\Br)
\end{equation}

Putting the pieces together

The node branch formulation is

\begin{equation}\label{eqn:multiphysicsL12:280}
\begin{aligned}
A \Bf – \Bf_L &= 0 \\
\Bf – S(\Br) &= 0
\end{aligned}
\end{equation}

We’ll want to approximate this system using the Jacobian methods discussed, and can expect the cost of that Jacobian calculation to potentially be expensive. To move to the nodal formulation we eliminate forces (the equivalent of currents in this system)

\begin{equation}\label{eqn:multiphysicsL12:320}
A S(\Br) – \Bf_L = 0
\end{equation}

We cannot use this nodal formulation when we have struts that are so stiff that the positions of some of the nodes are fixed, but can work around that as before by introducing an additional unknown for each component of such a strut.

Damped Newton’s method

We want to be able to deal with the oscillation that we can have in examples like that of fig. 3.

lecture12Fig3

fig. 3. Oscillatory Newton’s iteration

 

Large steps can be dangerous. We want to modify Newton’s method as follows

Our algorithm is

  • Guess \( \Bx^0, k = 0 \).
  • REPEAT
  • Compute \( F \) and \( J_F \) at \( \Bx^k \)
  • Solve linear system \( J_F(\Bx^k) \Delta \Bx^k = – F(\Bx^k) \)
  • \( \Bx^{k+1} = \Bx^k + \alpha^k \Delta \Bx^k \)
  • \( k = k + 1 \)
  • UNTIL converged

with \( \alpha^k = 1 \) we have standard Newton’s method. We want to pick \( \alpha^k \) so that we minimize

Continuation parameters

Newton’s method converges given a close initial guess. We can generate a sequence of problems where the previous problem generates a good initial guess for the next problem.

An example is a heat conducting bar, with a final heat distribution. We can start the numeric iteration with \( T = 0 \), and gradually increase the temperatures until we achieve the final desired heat distribution.

Suppose that we want to solve

\begin{equation}\label{eqn:multiphysicsL12:340}
F(\Bx) = 0.
\end{equation}

We modify this problem by introducing

\begin{equation}\label{eqn:multiphysicsL12:360}
\tilde{F}(\Bx(\lambda), \lambda) = 0,
\end{equation}

where

  • \( \tilde{F}(\Bx(0), 0) = 0 \) is easy to solve
  • \( \tilde{F}(\Bx(1), 1) = 0 \) is equivalent to \( F(\Bx) = 0 \).
  • (more on slides)

The source load stepping algorithm is

  • Solve \(\tilde{F}(\Bx(0), 0) = 0 \), with \( \Bx(\lambda_{\text{prev}} = \Bx(0) \)
  • (more on slides)

This can still have problems, for example, when the parameterization is multivalued as in fig. 4.

lecture12Fig4

fig. 4. Multivalued parameterization

 

We can attempt to adjust \( \lambda \) so that we move along the parameterization curve.

ECE1254H Modeling of Multiphysics Systems. Lecture 11: Nonlinear equations. Taught by Prof. Piero Triverio

October 22, 2014 ece1254 , , ,

[Click here for a PDF of this post with nicer formatting]

ECE1254H Modeling of Multiphysics Systems. Lecture 11: Nonlinear equations. Taught by Prof. Piero Triverio

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

Solution of N nonlinear equations in N unknowns

We’d now like to move from solutions of nonlinear functions in one variable:

\begin{equation}\label{eqn:multiphysicsL11:200}
f(x^\conj) = 0,
\end{equation}

to multivariable systems of the form

\begin{equation}\label{eqn:multiphysicsL11:20}
\begin{aligned}
f_1(x_1, x_2, \cdots, x_N) &= 0 \\
\vdots & \\
f_N(x_1, x_2, \cdots, x_N) &= 0 \\
\end{aligned},
\end{equation}

where our unknowns are
\begin{equation}\label{eqn:multiphysicsL11:40}
\Bx =
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_N \\
\end{bmatrix}.
\end{equation}

Form the vector \( F \)

\begin{equation}\label{eqn:multiphysicsL11:60}
F(\Bx) =
\begin{bmatrix}
f_1(x_1, x_2, \cdots, x_N) \\
\vdots \\
f_N(x_1, x_2, \cdots, x_N) \\
\end{bmatrix},
\end{equation}

so that the equation to solve is

\begin{equation}\label{eqn:multiphysicsL11:80}
\boxed{
F(\Bx) = 0.
}
\end{equation}

The Taylor expansion of \( F \)

around point \( \Bx_0 \) is

\begin{equation}\label{eqn:multiphysicsL11:100}
F(\Bx) = F(\Bx_0) +
\underbrace{ J_F(\Bx_0) }_{Jacobian}
\lr{ \Bx – \Bx_0},
\end{equation}

where the Jacobian is

\begin{equation}\label{eqn:multiphysicsL11:120}
J_F(\Bx_0)
=
\begin{bmatrix}
\PD{x_1}{f_1} & \cdots & \PD{x_N}{f_1} \\
& \ddots & \\
\PD{x_1}{f_N} & \cdots & \PD{x_N}{f_N}
\end{bmatrix}
\end{equation}

Multivariable Newton’s iteration

Given \( \Bx^k \), expand \( F(\Bx) \) around \( \Bx^k \)

\begin{equation}\label{eqn:multiphysicsL11:140}
F(\Bx) \approx F(\Bx^k) + J_F(\Bx^k) \lr{ \Bx – \Bx^k }
\end{equation}

With the approximation

\begin{equation}\label{eqn:multiphysicsL11:160}
0 = F(\Bx^k) + J_F(\Bx^k) \lr{ \Bx^{k + 1} – \Bx^k },
\end{equation}

then multiplying by the inverse Jacobian, and rearranging, we have

\begin{equation}\label{eqn:multiphysicsL11:220}
\boxed{
\Bx^{k+1} = \Bx^k – J_F^{-1}(\Bx^k) F(\Bx^k).
}
\end{equation}

Our algorithm is

  • Guess \( \Bx^0, k = 0 \).
  • REPEAT
  • Compute \( F \) and \( J_F \) at \( \Bx^k \)
  • Solve linear system \( J_F(\Bx^k) \Delta \Bx^k = – F(\Bx^k) \)
  • \( \Bx^{k+1} = \Bx^k + \Delta \Bx^k \)
  • \( k = k + 1 \)
  • UNTIL converged

As with one variable, our convergence is after we have all of the convergence conditions satisfied

\begin{equation}\label{eqn:multiphysicsL11:240}
\begin{aligned}
\Norm{ \Delta \Bx^k } &< \epsilon_1 \\
\Norm{ F(\Bx^{k+1}) } &< \epsilon_2 \\
\frac{\Norm{ \Delta \Bx^k }}{\Norm{\Bx^{k+1}}} &< \epsilon_3 \\
\end{aligned}
\end{equation}

Typical termination is some multiple of eps, where eps is the machine precision. This may be something like:

\begin{equation}\label{eqn:multiphysicsL11:260}
4 \times N \times \text{eps},
\end{equation}

where \( N \) is the “size of the problem”. Sometimes we may be able to find meaningful values for the problem. For example, for a voltage problem, we may not be interested in precisions greater than a millivolt.

Automatic assembly of equations for nolinear system

Nonlinear circuits

We will start off considering a non-linear resistor, designated within a circuit as sketched in fig. 2.

lecture11Fig2

fig. 2. Non-linear resistor

 

Example: diode, with \( i = g(v) \), such as

\begin{equation}\label{eqn:multiphysicsL11:280}
i = I_0 \lr{ e^{v/{\eta V_T}} – 1 }.
\end{equation}

Consider the example circuit of fig. 3. KCL’s at each of the nodes are

lecture11Fig3

fig. 3. Example circuit

 

  1. \( I_A + I_B + I_D – I_s = 0 \)
  2. \( – I_B + I_C – I_D = 0 \)

Introducing the consistuative equations this is

  1. \( g_A(V_1) + g_B(V_1 – V_2) + g_D (V_1 – V_2) – I_s = 0 \)
  2. \( – g_B(V_1 – V_2) + g_C(V_2) – g_D (V_1 – V_2) = 0 \)

In matrix form this is
\begin{equation}\label{eqn:multiphysicsL11:300}
\begin{bmatrix}
g_D & -g_D \\
-g_D & g_D
\end{bmatrix}
\begin{bmatrix}
V_1 \\
V_2
\end{bmatrix}
+
\begin{bmatrix}
g_A(V_1) &+ g_B(V_1 – V_2) & & – I_s \\
&- g_B(V_1 – V_2) & + g_C(V_2) & \\
\end{bmatrix}
=
0
.
\end{equation}

We can write the entire system as

\begin{equation}\label{eqn:multiphysicsL11:320}
\boxed{
F(\Bx) = G \Bx + F'(\Bx) = 0.
}
\end{equation}

The first term, a product of a nodal matrix \( G \) represents the linear subnetwork, and is filled with the stamps we are already familiar with.

The second term encodes the relationships of the nonlinear subnetwork. This non-linear component has been marked with a prime to distinguish it from the complete network function that includes both linear and non-linear elements.

Observe the similarity with the stamp analysis that we did previously. With \( g_A() \) connected on one end to ground we have it only once in the resulting vector, whereas the nonlinear elements connected to two non-zero nodes in the network occur once with each sign.

Stamp for nonlinear resistor

For the non-linear circuit element of fig. 4.

lecture11Fig4

fig. 4. Non-linear resistor circuit element

The stamp is

fprimex

Stamp for Jacobian

\begin{equation}\label{eqn:multiphysicsL11:360}
J_F(\Bx^k) = G + J_{F’}(\Bx^k).
\end{equation}

Here the stamp for the Jacobian, an \( N \times N \) matrix, is

jacobianNonlinearPart

ECE1254H Modeling of Multiphysics Systems. Lecture 10: Nonlinear systems. Taught by Prof. Piero Triverio

October 21, 2014 ece1254 ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

Nonlinear systems

On slides, some examples to motivate:

  • struts
  • fluids
  • diode (exponential)Example in fig. 1.
    lecture10Fig1

    fig. 1. Diode circuit

     

    \begin{equation}\label{eqn:multiphysicsL10:20}
    I_d = I_s \lr{ e^{V_d/V_t} – 1 } = \frac{10 – V_d}{10}.
    \end{equation}

Richardson and Linear Convergence

Seeking the exact solution \( x^\conj \) for

\begin{equation}\label{eqn:multiphysicsL10:40}
f(x^\conj) = 0,
\end{equation}

Suppose that

\begin{equation}\label{eqn:multiphysicsL10:60}
x^{k + 1} = x^k + f(x^k)
\end{equation}

If \( f(x^k) = 0 \) then we have convergence, so \( x^k = x^\conj \).

Convergence analysis

Write the iteration equations at a sample point and the solution as

\begin{equation}\label{eqn:multiphysicsL10:80}
x^{k + 1} = x^k + f(x^k)
\end{equation}
\begin{equation}\label{eqn:multiphysicsL10:100}
x^{\conj} = x^\conj +
\underbrace{
f(x^\conj)
}_{=0}
\end{equation}

Taking the difference we have

\begin{equation}\label{eqn:multiphysicsL10:120}
x^{k+1} – x^\conj = x^k – x^\conj + \lr{ f(x^k) – f(x^\conj) }.
\end{equation}

The last term can be quantified using the mean value theorem \ref{thm:multiphysicsL10:140}, giving

\begin{equation}\label{eqn:multiphysicsL10:140}
x^{k+1} – x^\conj
= x^k – x^\conj +
\evalbar{\PD{x}{f}}{\tilde{x}} \lr{ x^k – x^\conj }
=
\lr{ x^k – x^\conj }
\lr{
1 + \evalbar{\PD{x}{f}}{\tilde{x}} }.
\end{equation}

The absolute value is thus

\begin{equation}\label{eqn:multiphysicsL10:160}
\Abs{x^{k+1} – x^\conj } =
\Abs{ x^k – x^\conj }
\Abs{
1 + \evalbar{\PD{x}{f}}{\tilde{x}} }.
\end{equation}

We have convergence provided \( \Abs{ 1 + \evalbar{\PD{x}{f}}{\tilde{x}} } < 1 \) in the region where we happen to iterate over. This could easily be highly dependent on the initial guess.

Stated more accurately we have convergence provided

\begin{equation}\label{eqn:multiphysicsL10:180}
\Abs{
1 + \PD{x}{f} }
\le \gamma < 1
\qquad \forall \tilde{x} \in [ x^\conj – \delta, x^\conj + \delta ],
\end{equation}

and \( \Abs{x^0 – x^\conj } < \delta \). This is illustrated in fig. 3.

lecture10Fig3

fig. 3. Convergence region

 

It could very easily be difficult to determine the convergence regions.

We have some problems

  • Convergence is only linear
  • \( x, f(x) \) are not in the same units (and potentially of different orders). For example, \(x\) could be a voltage and \( f(x) \) could be a circuit current.
  • (more on slides)

Examples where we may want to use this:

  • Spice Gummal Poon transistor model. Lots of diodes, …
  • Mosfet model (30 page spec, lots of parameters).

Newton’s method

The core idea of this method is sketched in fig. 4. To find the intersection with the x-axis, we follow the slope closer to the intersection.

lecture10Fig4

fig. 4. Newton’s method

 

To do this, we expand \( f(x) \) in Taylor series to first order around \( x^k \), and then solve for \( f(x) = 0 \) in that approximation

\begin{equation}\label{eqn:multiphysicsL10:200}
f( x^{k+1} ) \approx f( x^k ) + \evalbar{ \PD{x}{f} }{x^k} \lr{ x^{k+1} – x^k } = 0.
\end{equation}

This gives

\begin{equation}\label{eqn:multiphysicsL10:220}
\boxed{x^{k+1} = x^k – \frac{f( x^k )}{\evalbar{ \PD{x}{f} }{x^k}}}
\end{equation}

Example: Newton’s method

For the solution of

\begin{equation}\label{eqn:multiphysicsL10:260}
f(x) = x^3 – 2,
\end{equation}

it was found (table 1.1)

Capture
The error tails off fast as illustrated roughly in fig. 6.

lecture10Fig6

fig. 6. Error by iteration

Convergence analysis

The convergence condition is

\begin{equation}\label{eqn:multiphysicsL10:280}
0 = f(x^k) + \evalbar{ \PD{x}{f} }{x^k} \lr{ x^{k+1} – x^k }.
\end{equation}

The Taylor series for \( f \) around \( x^k \), using a mean value formulation is

\begin{equation}\label{eqn:multiphysicsL10:300}
f(x)
= f(x^k)
+ \evalbar{ \PD{x}{f} }{x^k} \lr{ x – x^k }.
+ \inv{2} \evalbar{ \PDSq{x}{f} }{\tilde{x} \in [x^\conj, x^k]} \lr{ x – x^k }^2.
\end{equation}

Evaluating at \( x^\conj \) we have

\begin{equation}\label{eqn:multiphysicsL10:320}
0 = f(x^k)
+ \evalbar{ \PD{x}{f} }{x^k} \lr{ x^\conj – x^k }.
+ \inv{2} \evalbar{ \PDSq{x}{f} }{\tilde{x} \in [x^\conj, x^k]} \lr{ x^\conj – x^k }^2.
\end{equation}

and subtracting this from \ref{eqn:multiphysicsL10:280} we are left with

\begin{equation}\label{eqn:multiphysicsL10:340}
0 = \evalbar{\PD{x}{f}}{x^k} \lr{ x^{k+1} – x^k – x^\conj + x^k }
– \inv{2} \evalbar{\PDSq{x}{f}}{\tilde{x}} \lr{ x^\conj – x^k }^2.
\end{equation}

Solving for the difference from the solution, the error is

\begin{equation}\label{eqn:multiphysicsL10:360}
x^{k+1} – x^\conj
= \inv{2} \lr{ \PD{x}{f} }^{-1} \evalbar{\PDSq{x}{f}}{\tilde{x}} \lr{ x^k – x^\conj }^2,
\end{equation}

or in absolute value

\begin{equation}\label{eqn:multiphysicsL10:380}
\Abs{ x^{k+1} – x^\conj }
= \inv{2} \Abs{ \PD{x}{f} }^{-1} \Abs{ \PDSq{x}{f} } \Abs{ x^k – x^\conj }^2.
\end{equation}

We see that convergence is quadratic in the error from the previous iteration. We will have trouble if the derivative goes small at any point in the iteration region, for example in fig. 7, we could easily end up in the zero derivative region.

lecture10Fig7

fig. 7. Newton’s method with small derivative region

 

When to stop iteration

One way to check is to look to see if the difference

\begin{equation}\label{eqn:multiphysicsL10:420}
\Norm{ x^{k+1} – x^k } < \epsilon_{\Delta x},
\end{equation}

however, when the function has a very step slope

this may not be sufficient unless we also substitute our trial solution and see if we have the match desired.

Alternatively, if the slope is shallow as in fig. 7, then checking for just \( \Abs{ f(x^{k+1} } < \epsilon_f \) may also mean we are off target.

Finally, we may also need a relative error check to avoid false convergence. In fig. 10, we may have both

lecture10Fig10

fig. 10. Possible relative error difference required

 

\begin{equation}\label{eqn:multiphysicsL10:440}
\Abs{x^{k+1} – x^k} < \epsilon_{\Delta x}
\end{equation}
\begin{equation}\label{eqn:multiphysicsL10:460}
\Abs{f(x^{k+1}) } < \epsilon_{f},
\end{equation}

however, we may also want a small relative difference

\begin{equation}\label{eqn:multiphysicsL10:480}
\frac{\Abs{x^{k+1} – x^k}}{\Abs{x^k}}
< \epsilon_{f,r}.
\end{equation}

This can become problematic in real world engineering examples such as to diode of fig. 11, where we have shallow regions and fast growing or dropping regions.

lecture10Fig11

fig. 11. Diode current curve

 

Theorems

Mean value theorem

For a continuous and differentiable function \( f(x) \), the difference can be expressed in terms of the derivative at an intermediate point

\begin{equation*}
f(x_2) – f(x_1)
= \evalbar{ \PD{x}{f} }{\tilde{x}} \lr{ x_2 – x_1 }
\end{equation*}

where \( \tilde{x} \in [x_1, x_2] \).

This is illustrated (roughly) in fig. 2.

lecture10Fig2

fig. 2. Mean value theorem illustrated

ECE1254H Modeling of Multiphysics Systems. Lecture 7: Sparse factorization and iterative methods. Taught by Prof. Piero Triverio

October 21, 2014 ece1254 , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

Fill ins

The problem of fill ins in LU computations arise in locations where rows and columns cross over zero positions.

Rows and columns can be permuted to deal with these. Here is an ad-hoc permutation of rows and columns that will result in less fill ins.

\begin{equation}\label{eqn:multiphysicsL7:180}
\begin{aligned}
&
\begin{bmatrix}
a & b & c & 0 \\
d & e & 0 & 0 \\
0 & f & g & 0 \\
0 & h & 0 & i \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
x_4
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
a & c & 0 & b \\
d & 0 & 0 & e \\
0 & g & 0 & f \\
0 & 0 & i & h \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_4 \\
x_3 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
0 & a & c & b \\
0 & d & 0 & e \\
0 & 0 & g & f \\
i & 0 & 0 & h \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_4 \\
x_1 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
i & 0 & 0 & h \\
0 & a & c & b \\
0 & d & 0 & e \\
0 & 0 & g & f \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_4 \\
x_1 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_4 \\
b_1 \\
b_2 \\
b_3 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
i & 0 & 0 & h \\
0 & c & a & b \\
0 & 0 & d & e \\
0 & g & 0 & f \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_1 \\
x_4 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_4 \\
b_1 \\
b_2 \\
b_3 \\
\end{bmatrix} \\
\end{aligned}
\end{equation}

Markowitz product

To facilitate such permutations the Markowitz product that estimates the amount of fill in required.

Definition Markowitz product

\begin{equation*}
\begin{aligned}
\text{Markowitz product} =
&\lr{\text{Non zeros in unfactored part of Row -1}} \times \\
&\lr{\text{Non zeros in unfactored part of Col -1}}
\end{aligned}
\end{equation*}

In [1] it is stated “A still simpler alternative, which seems adequate generally, is to choose the pivot which minimizes the number of coefficients modified at each step (excluding those which are eliminated at the particular step). This is equivalent to choosing the non-zero element with minimum \( (\rho_i – 1 )(\sigma_j -1) \).”

Note that this product is applied only to \( i j \) positions that are
non-zero, something not explicitly mentioned in the slides, nor in other
locations like [2]

Example: Markowitz product

For this matrix
\begin{equation}\label{eqn:multiphysicsL7:220}
\begin{bmatrix}
a & b & c & 0 \\
d & e & 0 & 0 \\
0 & f & g & 0 \\
0 & h & 0 & i \\
\end{bmatrix},
\end{equation}

the Markowitz products are

\begin{equation}\label{eqn:multiphysicsL7:280}
\begin{bmatrix}
1 & 6 & 2 & \\
1 & 3 & & \\
& 3 & 1 & \\
& 3 & & 0 \\
\end{bmatrix}.
\end{equation}

Markowitz reordering

The Markowitz Reordering procedure (copied directly from the slides) is

  • For i = 1 to n
  • Find diagonal \( j >= i \) with min Markowitz product
  • Swap rows \( j \leftrightarrow i \) and columns \( j \leftrightarrow i \)
  • Factor the new row \( i \) and update Markowitz products

Example: Markowitz reordering

Looking at the Markowitz products \ref{eqn:multiphysicsL7:280} a swap of rows and columns \( 1, 4 \) gives the modified matrix

\begin{equation}\label{eqn:multiphysicsL7:300}
\begin{bmatrix}
i & 0 & h & 0 \\
0 & d & e & 0 \\
0 & 0 & f & g \\
0 & a & b & c \\
\end{bmatrix}
\end{equation}

In this case, this reordering has completely avoided any requirement to do any actual Gaussian operations for this first stage reduction.

Presuming that the Markowitz products for the remaining 3×3 submatrix are only computed from that submatrix, the new products are
\begin{equation}\label{eqn:multiphysicsL7:320}
\begin{bmatrix}
& & & \\
& 1 & 2 & \\
& & 2 & 1 \\
& 2 & 4 & 2 \\
\end{bmatrix}.
\end{equation}

We have a minimal product in the pivot position, which happens to already lie on the diagonal. Note that it is not necessarily the best for numerical stability. It appears the off diagonal Markowitz products are not really of interest since the reordering algorithm swaps both rows and columns.

Graph representation

It is possible to interpret the Markowitz products on the diagonal as connectivity of a graph that represents the interconnections of the nodes. Consider the circuit of fig. 2 as an example

lecture7Fig2

fig 2. Simple circuit

 

The system equations for this circuit is of the form
\begin{equation}\label{eqn:multiphysicsL7:340}
\begin{bmatrix}
x & x & x & 0 & 1 \\
x & x & x & 0 & 0 \\
x & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
-1 & 0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_1 \\
V_2 \\
V_3 \\
V_4 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix}.
\end{equation}

The Markowitz products along the diagonal are
\begin{equation}\label{eqn:multiphysicsL7:360}
\begin{aligned}
M_{11} &= 9 \\
M_{22} &= 4 \\
M_{33} &= 9 \\
M_{44} &= 4 \\
M_{55} &= 4 \\
\end{aligned}
\end{equation}

Compare these to the number of interconnections of the graph fig. 3 of the nodes in this circuit. We see that these are the squares of the number of the node interconnects in each case.

 

lecture7Fig3

fig. 3. Graph representation

Here a 5th node was introduced for the current \( i \) between nodes \( 4 \) and \( 1 \). Observe that the Markowitz product of this node was counted as the number of non-zero values excluding the \( 5,5 \) matrix position. However, that doesn’t matter too much since a Markowitz swap of row/column 1 with row/column 5 would put a zero in the \( 1,1 \) position of the matrix, which is not desirable. We have to restrict the permutations of zero diagonal positions to pivots for numerical stability, or use a more advanced zero fill avoidance algorithm.

The minimum diagonal Markowitz products are in positions 2 or 4, with respective Markowitz reorderings of the form

\begin{equation}\label{eqn:multiphysicsL7:380}
\begin{bmatrix}
x & x & x & 0 & 0 \\
x & x & x & 0 & 1 \\
x & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
0 & -1 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_2 \\
V_1 \\
V_3 \\
V_4 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix},
\end{equation}

and
\begin{equation}\label{eqn:multiphysicsL7:400}
\begin{bmatrix}
x & 0 & 0 & x & -1 \\
0 & x & x & x & 1 \\
0 & x & x & x & 0 \\
x & x & x & x & 0 \\
1 & -1 & 0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_4 \\
V_1 \\
V_2 \\
V_3 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix}.
\end{equation}

The original system had 7 zeros that could potentially be filled in the remaining \( 4 \times 4 \) submatrix. After a first round of Gaussian elimination, our system matrices have the respective forms

\begin{equation}\label{eqn:multiphysicsL7:420}
\begin{bmatrix}
x & x & x & 0 & 0 \\
0 & x & x & 0 & 1 \\
0 & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
0 & -1 & 0 & 1 & 0 \\
\end{bmatrix}
\end{equation}
\begin{equation}\label{eqn:multiphysicsL7:440}
\begin{bmatrix}
x & 0 & 0 & x & -1 \\
0 & x & x & x & 1 \\
0 & x & x & x & 0 \\
0 & x & x & x & 0 \\
0 & -1 & 0 & x & x \\
\end{bmatrix}
\end{equation}

The remaining \( 4 \times 4 \) submatrices have interconnect graphs sketched in fig. 4.

 

lecture7Fig4

fig. 4. Graphs after one round of Gaussian elimination

From a graph point of view, we want to delete the most connected nodes. This can be driven by the Markowitz products along the diagonal or directly with graph methods.

Summary of factorization costs

LU (dense)

  • cost: \( O(n^3) \)
  • cost depends only on size

LU (sparse)

  • cost: Diagonal and tridiagonal are \( O(n) \), but we can have up to \( O(n^3) \) depending on sparsity and the method of dealing with the sparsity.
  • cost depends on size and sparsity

Computation can be affordable up to a few million elements.

Iterative methods

Can be cheap if done right. Convergence requires careful preconditioning.

Iterative methods

Suppose that we have an initial guess \( \Bx_0 \). Iterative methods are generally of the form

DO
\(\Br = \Bb – M \Bx_i\)
UNTIL \(\Norm{\Br} < \epsilon \)

The difference \( \Br \) is called the residual. For as long as it is bigger than desired, continue improving the estimate \( \Bx_i \).

The matrix vector product \( M \Bx_i \), if dense, is of \( O(n^2) \). Suppose, for example, that we can perform the iteration in ten iterations. If the matrix is dense, we can have \( 10 \, O(n^2) \) performance. If sparse, this can be worse than just direct computation.

Gradient method

This is a method for iterative solution of the equation \( M \Bx = \Bb \).

This requires symmetric positive definite matrix \( M = M^\T \), with \( M > 0 \).

We introduce an energy function

\begin{equation}\label{eqn:multiphysicsL7:60}
\Psi(\By) \equiv \inv{2} \By^\T M \By – \By^\T \Bb
\end{equation}

For a two variable system this is illustrated in fig. 1.

 

lecture7Fig1

fig. 1. Positive definite energy function

Theorem: Energy function minimum

The energy function \ref{eqn:multiphysicsL7:60} has a minimum at

\begin{equation}\label{eqn:multiphysicsL7:80}
\By = M^{-1} \Bb = \Bx.
\end{equation}

To prove this, consider the coordinate representation

\begin{equation}\label{eqn:multiphysicsL7:480}
\Psi = \inv{2} y_a M_{ab} y_b – y_b b_b,
\end{equation}

for which the derivatives are
\begin{equation}\label{eqn:multiphysicsL7:500}
\PD{y_i}{\Psi} =
\inv{2} M_{ib} y_b
+
\inv{2} y_a M_{ai}
– b_i
=
\lr{ M \By – \Bb }_i.
\end{equation}

The last operation above was possible because \( M = M^\T \). Setting all of these equal to zero, and rewriting this as a matrix relation we have

\begin{equation}\label{eqn:multiphysicsL7:520}
M \By = \Bb,
\end{equation}

as asserted.

This is called the gradient method because the gradient moves us along the path of steepest descent towards the minimum if it exists.

The method is

\begin{equation}\label{eqn:multiphysicsL7:100}
\Bx^{(k+1)} = \Bx^{(k)} +
\underbrace{ \alpha_k }_{step size}
\underbrace{ \Bd^{(k)} }_{direction},
\end{equation}

where the direction is

\begin{equation}\label{eqn:multiphysicsL7:120}
\Bd^{(k)} = – \spacegrad \Phi = \Bb – M \Bx^k = r^{(k)}.
\end{equation}

Optimal step size

Note that for the minimization of \( \Phi \lr{ \Bx^{(k+1)} } \), we note

\begin{equation}\label{eqn:multiphysicsL7:140}
\Phi \lr{ \Bx^{(k+1)} }
= \Phi\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }
=
\inv{2}
\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }^\T
M
\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }

\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }^\T \Bb
\end{equation}

If we take the derivative of both sides with respect to \( \alpha_k \) to find the minimum, we have

\begin{equation}\label{eqn:multiphysicsL7:540}
0 =
\inv{2}
\lr{ \Bd^{(k)} }^\T
M
\Bx^{(k)}
+
\inv{2}
\lr{ \Bx^{(k)} }^\T
M
\Bd^{(k)}
+
\alpha_k \lr{ \Bd^{(k)} }^\T
M
\Bd^{(k)}

\lr{ \Bd^{(k)} }^\T \Bb.
\end{equation}

Because \( M \) is symmetric, this is

\begin{equation}\label{eqn:multiphysicsL7:560}
\alpha_k \lr{ \Bd^{(k)} }^\T
M
\Bd^{(k)}
=
\lr{ \Bd^{(k)} }^\T \lr{ \Bb – M \Bx^{(k)}}
=
\lr{ \Bd^{(k)} }^\T r^{(k)},
\end{equation}

or

\begin{equation}\label{eqn:multiphysicsL7:160}
\alpha_k
= \frac{
\lr{\Br^{(k)}}^\T
\Br^{(k)}
}{
\lr{\Br^{(k)}}^\T
M
\Br^{(k)}
}
\end{equation}

We will see that this method is not optimal when we pick one direction and keep going down that path.

Definitions and theorems

Definition: Positive (negative) definite

A matrix \( M \) is positive (negative) definite, denoted \( M > 0 (<0) \) if \( \By^\T M \By > 0 (<0), \quad \forall \By \).

If a matrix is neither positive, nor negative definite, it is called indefinite.

Theorem: Positive (negative) definite

A symmetric matrix \( M > 0 (<0)\) iff \( \lambda_i > 0 (<0)\) for all eigenvalues \( \lambda_i \), or is indefinite iff its eigenvalues \( \lambda_i \) are of mixed sign.

References

[1] Harry M Markowitz. The elimination form of the inverse and its application to linear programming. Management Science, 3\penalty0 (3):\penalty0 255–269, 1957.

[2] Timothy Vismor. Pivoting To Preserve Sparsity, 2012. URL https://vismor.com/documents/network_analysis/matrix_algorithms/S8.SS3.php. [Online; accessed 15-Oct-2014].

Illustrating the LU algorithm with pivots by example

October 15, 2014 ece1254 , , , ,

[Click here for a PDF of this post with nicer formatting]

Two previous examples of LU factorizations were given. I found one more to be the key to understanding how to implement this as a matlab algorithm, required for our problem set.

A matrix that contains both pivots and elementary matrix operations is

\begin{equation}\label{eqn:luAlgorithm:20}
M=
\begin{bmatrix}
0 & 0 & 2 & 1 \\
0 & 0 & 1 & 1 \\
2 & 0 & 2 & 0 \\
1 & 1 & 1 & 1
\end{bmatrix}
\end{equation}

Our objective is to apply a sequence of row permutations or elementary row operations to \( M \) that put \( M \) into upper triangular form. At the same time we wish to track the all the inverse operations. When no permutations were required to produce \( U \), then we end up with a factorization \( M = L’ U \) where \( L’ \) is lower triangular.

Let’s express the row operations that we apply to \( M \) as

\begin{equation}\label{eqn:luAlgorithm:40}
U =
L_k^{-1}
L_{k-1}^{-1} \cdots
L_2^{-1}
L_1^{-1}
M,
\end{equation}

with

\begin{equation}\label{eqn:luAlgorithm:60}
L’ = L_0 L_1 L_2 \cdots L_{k-1} L_k
\end{equation}

Here \( L_0 = I \), the identity matrix, and \( L_i^{-1} \) is either a permutation matrix interchanging two rows of the identity matrix, or it is an elementary row operation encoding the operation \( r_j \rightarrow r_j – M r_i \), where \( r_i \) is the pivot row, and \( r_j, j > i \) are the rows that we are applying the Gaussian elimination operations to.

For our example matrix, we see that we cannot use the \( M_{1 1} \) as the pivot element since it is zero. In general, for numeric stability, we wish to use the row with the biggest absolute value in the column that we are operating on. In this case that is row 3. Our first row operation is therefore a \( 1,3 \) permutation

\begin{equation}\label{eqn:luAlgorithm:80}
L_1^{-1} =
\begin{bmatrix}
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix},
\end{equation}

which gives us

\begin{equation}\label{eqn:luAlgorithm:100}
M \rightarrow
L_1^{-1}
M
=
\begin{bmatrix}
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
0 & 0 & 2 & 1 \\
0 & 0 & 1 & 1 \\
2 & 0 & 2 & 0 \\
1 & 1 & 1 & 1 \\
\end{bmatrix}
=
\begin{bmatrix}
2 & 0 & 2 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 2 & 1 \\
1 & 1 & 1 & 1 \\
\end{bmatrix}.
\end{equation}

Computationally, we do not wish to actually do a matrix multiplication to achieve this permutation. Instead we want to just swap the two rows in question.

The inverse of this operation is the same permutation, so for \( L’ \) we compute

\begin{equation}\label{eqn:luAlgorithm:120}
L’ \rightarrow L_0 L_1 = L_1.
\end{equation}

As before, we don’t wish to do a matrix operation. Since we have applied the permutation matrix from the right, it results in an exchange of columns \(1,3\) of our \( L_0 \) matrix (which happens to be identity at this point). We can implement that matrix operation as a column exchange directly using submatrix notation.

We now proceed down the column, doing all the non-zero row elimination operations required. In this case, we have only one

\begin{equation}\label{eqn:luAlgorithm:140}
r_4 \rightarrow r_4 – \frac{1}{2} r_1.
\end{equation}

This has the matrix form

\begin{equation}\label{eqn:luAlgorithm:160}
L_2^{-1} =
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
-1/2 & 0 & 0 & 1 \\
\end{bmatrix}.
\end{equation}

The next stage of the \( U \) computation is

\begin{equation}\label{eqn:luAlgorithm:180}
M
\rightarrow L_2^{-1} L_1^{-1} M
=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
-1/2 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
2 & 0 & 2 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 2 & 1 \\
1 & 1 & 1 & 1 \\
\end{bmatrix}
=
\begin{bmatrix}
2 & 0 & 2 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 2 & 1 \\
0 & 1 & 0 & 1 \\
\end{bmatrix}.
\end{equation}

Again, we do not wish to do this operation as a matrix operation. Instead we act directly on the rows in question with \ref{eqn:luAlgorithm:140}.

Note that the inverse of this matrix operation is very simple. We’ve subtracted \( r_1/2 \) from \( r_4 \), so to invert this we have only to add back \( r_1/2 \). That is

\begin{equation}\label{eqn:luAlgorithm:200}
L_2
=
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
1/2 & 0 & 0 & 1 \\
\end{bmatrix}.
\end{equation}

Observe that when we apply this from the right to \( L_0 L_1 \rightarrow L_0 L_1 L_2\), the interpretation is a column operation

\begin{equation}\label{eqn:luAlgorithm:220}
c_1 \rightarrow c_1 + m c_4,
\end{equation}

In general, if we apply the row operation

\begin{equation}\label{eqn:luAlgorithm:240}
r_j \rightarrow r_j – m r_i,
\end{equation}

to the current state of our matrix \( U \), then we must apply the operation

\begin{equation}\label{eqn:luAlgorithm:260}
r_i \rightarrow r_i + m r_j,
\end{equation}

to the current state of our matrix \( L’ \).

We are now ready to move on to reduction of column 2. We will have only a permutation operation

\begin{equation}\label{eqn:luAlgorithm:280}
L_3 =
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
\end{bmatrix},
\end{equation}

so we apply a \( 2,4 \) row interchange to U, and a \( 2,4 \) column interchange to \( L’ \). This gives us

\begin{equation}\label{eqn:luAlgorithm:300}
M \rightarrow
\begin{bmatrix}
2 & 0 & 2 & 0 \\
0 & 1 & 0 & 1 \\
0 & 0 & 2 & 1 \\
0 & 0 & 1 & 1 \\
\end{bmatrix}.
\end{equation}

Our final operation is a regular row operation

\begin{equation}\label{eqn:luAlgorithm:320}
r_4 \rightarrow r_4 – \inv{2} r_3,
\end{equation}

with matrix
\begin{equation}\label{eqn:luAlgorithm:340}
L_4^{-1} =
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & -1/2 & 1 \\
\end{bmatrix}
\end{equation}

We can also track all the permutations we have performed, which in this case was

\begin{equation}\label{eqn:luAlgorithm:360}
P = L_3 L_1 I.
\end{equation}

This should also be computed by performing row interchanges, not matrix multiplication.

Now should we wish to solve the system

\begin{equation}\label{eqn:luAlgorithm:380}
M \Bx = L’ U \Bx = \Bb,
\end{equation}

we can equivalently solve

\begin{equation}\label{eqn:luAlgorithm:400}
P L’ U \Bx = P \Bb,
\end{equation}

To do this let \( \By = U \Bx \), so that we wish to solve

\begin{equation}\label{eqn:luAlgorithm:420}
P L’ \By = P \Bb.
\end{equation}

The matrix \( L = P L’ \) is lower triangular, as \( P \) contained all the permutations that we applied along the way (FIXME: this is a statement, not a proof, and not obvious).

We can solve the system

\begin{equation}\label{eqn:luAlgorithm:440}
L \By = P \Bb,
\end{equation}

using forward substitution. That leaves us to solve the upper triangular system

\begin{equation}\label{eqn:luAlgorithm:460}
\By = U \Bx,
\end{equation}

which now requires only back substitution.