gradient method

Final notes for ECE1254, Modelling of Multiphysics Systems

December 27, 2014 ece1254 No comments , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Capture

I’ve now finished my first grad course, Modelling of Multiphysics Systems, taught by Prof Piero Triverio.

I’ve posted notes for lectures and other material as I was taking the course, but now have an aggregated set of notes for the whole course posted.
This is now updated with all my notes from the lectures, solved problems, additional notes on auxillary topics I wanted to explore (like SVD), plus the notes from the Harmonic Balance report that Mike and I will be presenting in January.

This version of my notes also includes all the matlab figures regenerating using http://www.mathworks.com/matlabcentral/fileexchange/23629-export-fig, which allows a save-as pdf, which rescales much better than Matlab saveas() png’s when embedded in latex.  I’m not sure if that’s the best way to include Matlab figures in latex, but they are at least not fuzzy looking now.

All in all, I’m pretty pleased with my notes for this course.  They are a lot more readable than any of the ones I’ve done for the physics undergrad courses I was taking (http://peeterjoot.com/writing/).  While there was quite a lot covered in this course, the material really only requires an introductory circuits course and some basic math (linear algebra and intro calculus), so is pretty accessible.

This was a fun course.  I recall, back in ancient times when I was a first year student, being unsatisfied with all the ad-hoc strategies we used to solve circuits problems.  This finally answers the questions of how to tackle things more systematically.

Here’s the contents outline for these notes:

Preface
Lecture notes
1 nodal analysis
1.1 In slides
1.2 Mechanical structures example
1.3 Assembling system equations automatically. Node/branch method
1.4 Nodal Analysis
1.5 Modified nodal analysis (MNA)
2 solving large systems
2.1 Gaussian elimination
2.2 LU decomposition
2.3 Problems
3 numerical errors and conditioning
3.1 Strict diagonal dominance
3.2 Exploring uniqueness and existence
3.3 Perturbation and norms
3.4 Matrix norm
4 singular value decomposition, and conditioning number
4.1 Singular value decomposition
4.2 Conditioning number
5 sparse factorization
5.1 Fill ins
5.2 Markowitz product
5.3 Markowitz reordering
5.4 Graph representation
6 gradient methods
6.1 Summary of factorization costs
6.2 Iterative methods
6.3 Gradient method
6.4 Recap: Summary of Gradient method
6.5 Conjugate gradient method
6.6 Full Algorithm
6.7 Order analysis
6.8 Conjugate gradient convergence
6.9 Gershgorin circle theorem
6.10 Preconditioning
6.11 Symmetric preconditioning
6.12 Preconditioned conjugate gradient
6.13 Problems
7 solution of nonlinear systems
7.1 Nonlinear systems
7.2 Richardson and Linear Convergence
7.3 Newton’s method
7.4 Solution of N nonlinear equations in N unknowns
7.5 Multivariable Newton’s iteration
7.6 Automatic assembly of equations for nonlinear system
7.7 Damped Newton’s method
7.8 Continuation parameters
7.9 Singular Jacobians
7.10 Struts and Joints, Node branch formulation
7.11 Problems
8 time dependent systems
8.1 Assembling equations automatically for dynamical systems
8.2 Numerical solution of differential equations
8.3 Forward Euler method
8.4 Backward Euler method
8.5 Trapezoidal rule (TR)
8.6 Nonlinear differential equations
8.7 Analysis, accuracy and stability (Dt ! 0)
8.8 Residual for LMS methods
8.9 Global error estimate
8.10 Stability
8.11 Stability (continued)
8.12 Problems
9 model order reduction
9.1 Model order reduction
9.2 Moment matching
9.3 Model order reduction (cont).
9.4 Moment matching
9.5 Truncated Balanced Realization (1000 ft overview)
9.6 Problems
Final report
10 harmonic balance
10.1 Abstract
10.2 Introduction
10.2.1 Modifications to the netlist syntax
10.3 Background
10.3.1 Discrete Fourier Transform
10.3.2 Harmonic Balance equations
10.3.3 Frequency domain representation of MNA equations
10.3.4 Example. RC circuit with a diode.
10.3.5 Jacobian
10.3.6 Newton’s method solution
10.3.7 Alternative handling of the non-linear currents and Jacobians
10.4 Results
10.4.1 Low pass filter
10.4.2 Half wave rectifier
10.4.3 AC to DC conversion
10.4.4 Bridge rectifier
10.4.5 Cpu time and error vs N
10.4.6 Taylor series non-linearities
10.4.7 Stiff systems
10.5 Conclusion
10.6 Appendices
10.6.1 Discrete Fourier Transform inversion
Appendices
a singular value decomposition
b basic theorems and definitions
c norton equivalents
d stability of discretized linear differential equations
e laplace transform refresher
f discrete fourier transform
g harmonic balance, rough notes
g.1 Block matrix form, with physical parameter ordering
g.2 Block matrix form, with frequency ordering
g.3 Representing the linear sources
g.4 Representing non-linear sources
g.5 Newton’s method
g.6 A matrix formulation of Harmonic Balance non-linear currents
h matlab notebooks
i mathematica notebooks
Index
Bibliography

ECE1254H Modeling of Multiphysics Systems. Lecture 7: Sparse factorization and iterative methods. Taught by Prof. Piero Triverio

October 21, 2014 ece1254 No comments , , , , , , , , ,


[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

Fill ins

The problem of fill ins in LU computations arise in locations where rows and columns cross over zero positions.

Rows and columns can be permuted to deal with these. Here is an ad-hoc permutation of rows and columns that will result in less fill ins.

\begin{equation}\label{eqn:multiphysicsL7:180}
\begin{aligned}
&
\begin{bmatrix}
a & b & c & 0 \\
d & e & 0 & 0 \\
0 & f & g & 0 \\
0 & h & 0 & i \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
x_4
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
a & c & 0 & b \\
d & 0 & 0 & e \\
0 & g & 0 & f \\
0 & 0 & i & h \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_4 \\
x_3 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
0 & a & c & b \\
0 & d & 0 & e \\
0 & 0 & g & f \\
i & 0 & 0 & h \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_4 \\
x_1 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
i & 0 & 0 & h \\
0 & a & c & b \\
0 & d & 0 & e \\
0 & 0 & g & f \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_4 \\
x_1 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_4 \\
b_1 \\
b_2 \\
b_3 \\
\end{bmatrix} \\
\Rightarrow &
\begin{bmatrix}
i & 0 & 0 & h \\
0 & c & a & b \\
0 & 0 & d & e \\
0 & g & 0 & f \\
\end{bmatrix}
\begin{bmatrix}
x_3 \\
x_1 \\
x_4 \\
x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
b_4 \\
b_1 \\
b_2 \\
b_3 \\
\end{bmatrix} \\
\end{aligned}
\end{equation}

Markowitz product

To facilitate such permutations the Markowitz product that estimates the amount of fill in required.

Definition Markowitz product

\begin{equation*}
\begin{aligned}
\text{Markowitz product} =
&\lr{\text{Non zeros in unfactored part of Row -1}} \times \\
&\lr{\text{Non zeros in unfactored part of Col -1}}
\end{aligned}
\end{equation*}

In [1] it is stated “A still simpler alternative, which seems adequate generally, is to choose the pivot which minimizes the number of coefficients modified at each step (excluding those which are eliminated at the particular step). This is equivalent to choosing the non-zero element with minimum \( (\rho_i – 1 )(\sigma_j -1) \).”

Note that this product is applied only to \( i j \) positions that are
non-zero, something not explicitly mentioned in the slides, nor in other
locations like [2]

Example: Markowitz product

For this matrix
\begin{equation}\label{eqn:multiphysicsL7:220}
\begin{bmatrix}
a & b & c & 0 \\
d & e & 0 & 0 \\
0 & f & g & 0 \\
0 & h & 0 & i \\
\end{bmatrix},
\end{equation}

the Markowitz products are

\begin{equation}\label{eqn:multiphysicsL7:280}
\begin{bmatrix}
1 & 6 & 2 & \\
1 & 3 & & \\
& 3 & 1 & \\
& 3 & & 0 \\
\end{bmatrix}.
\end{equation}

Markowitz reordering

The Markowitz Reordering procedure (copied directly from the slides) is

  • For i = 1 to n
  • Find diagonal \( j >= i \) with min Markowitz product
  • Swap rows \( j \leftrightarrow i \) and columns \( j \leftrightarrow i \)
  • Factor the new row \( i \) and update Markowitz products

Example: Markowitz reordering

Looking at the Markowitz products \ref{eqn:multiphysicsL7:280} a swap of rows and columns \( 1, 4 \) gives the modified matrix

\begin{equation}\label{eqn:multiphysicsL7:300}
\begin{bmatrix}
i & 0 & h & 0 \\
0 & d & e & 0 \\
0 & 0 & f & g \\
0 & a & b & c \\
\end{bmatrix}
\end{equation}

In this case, this reordering has completely avoided any requirement to do any actual Gaussian operations for this first stage reduction.

Presuming that the Markowitz products for the remaining 3×3 submatrix are only computed from that submatrix, the new products are
\begin{equation}\label{eqn:multiphysicsL7:320}
\begin{bmatrix}
& & & \\
& 1 & 2 & \\
& & 2 & 1 \\
& 2 & 4 & 2 \\
\end{bmatrix}.
\end{equation}

We have a minimal product in the pivot position, which happens to already lie on the diagonal. Note that it is not necessarily the best for numerical stability. It appears the off diagonal Markowitz products are not really of interest since the reordering algorithm swaps both rows and columns.

Graph representation

It is possible to interpret the Markowitz products on the diagonal as connectivity of a graph that represents the interconnections of the nodes. Consider the circuit of fig. 2 as an example

lecture7Fig2

fig 2. Simple circuit

 

The system equations for this circuit is of the form
\begin{equation}\label{eqn:multiphysicsL7:340}
\begin{bmatrix}
x & x & x & 0 & 1 \\
x & x & x & 0 & 0 \\
x & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
-1 & 0 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_1 \\
V_2 \\
V_3 \\
V_4 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix}.
\end{equation}

The Markowitz products along the diagonal are
\begin{equation}\label{eqn:multiphysicsL7:360}
\begin{aligned}
M_{11} &= 9 \\
M_{22} &= 4 \\
M_{33} &= 9 \\
M_{44} &= 4 \\
M_{55} &= 4 \\
\end{aligned}
\end{equation}

Compare these to the number of interconnections of the graph fig. 3 of the nodes in this circuit. We see that these are the squares of the number of the node interconnects in each case.

 

lecture7Fig3

fig. 3. Graph representation

Here a 5th node was introduced for the current \( i \) between nodes \( 4 \) and \( 1 \). Observe that the Markowitz product of this node was counted as the number of non-zero values excluding the \( 5,5 \) matrix position. However, that doesn’t matter too much since a Markowitz swap of row/column 1 with row/column 5 would put a zero in the \( 1,1 \) position of the matrix, which is not desirable. We have to restrict the permutations of zero diagonal positions to pivots for numerical stability, or use a more advanced zero fill avoidance algorithm.

The minimum diagonal Markowitz products are in positions 2 or 4, with respective Markowitz reorderings of the form

\begin{equation}\label{eqn:multiphysicsL7:380}
\begin{bmatrix}
x & x & x & 0 & 0 \\
x & x & x & 0 & 1 \\
x & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
0 & -1 & 0 & 1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_2 \\
V_1 \\
V_3 \\
V_4 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix},
\end{equation}

and
\begin{equation}\label{eqn:multiphysicsL7:400}
\begin{bmatrix}
x & 0 & 0 & x & -1 \\
0 & x & x & x & 1 \\
0 & x & x & x & 0 \\
x & x & x & x & 0 \\
1 & -1 & 0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
V_4 \\
V_1 \\
V_2 \\
V_3 \\
i \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
0 \\
0 \\
x \\
\end{bmatrix}.
\end{equation}

The original system had 7 zeros that could potentially be filled in the remaining \( 4 \times 4 \) submatrix. After a first round of Gaussian elimination, our system matrices have the respective forms

\begin{equation}\label{eqn:multiphysicsL7:420}
\begin{bmatrix}
x & x & x & 0 & 0 \\
0 & x & x & 0 & 1 \\
0 & x & x & x & 0 \\
0 & 0 & x & x & -1 \\
0 & -1 & 0 & 1 & 0 \\
\end{bmatrix}
\end{equation}
\begin{equation}\label{eqn:multiphysicsL7:440}
\begin{bmatrix}
x & 0 & 0 & x & -1 \\
0 & x & x & x & 1 \\
0 & x & x & x & 0 \\
0 & x & x & x & 0 \\
0 & -1 & 0 & x & x \\
\end{bmatrix}
\end{equation}

The remaining \( 4 \times 4 \) submatrices have interconnect graphs sketched in fig. 4.

 

lecture7Fig4

fig. 4. Graphs after one round of Gaussian elimination

From a graph point of view, we want to delete the most connected nodes. This can be driven by the Markowitz products along the diagonal or directly with graph methods.

Summary of factorization costs

LU (dense)

  • cost: \( O(n^3) \)
  • cost depends only on size

LU (sparse)

  • cost: Diagonal and tridiagonal are \( O(n) \), but we can have up to \( O(n^3) \) depending on sparsity and the method of dealing with the sparsity.
  • cost depends on size and sparsity

Computation can be affordable up to a few million elements.

Iterative methods

Can be cheap if done right. Convergence requires careful preconditioning.

Iterative methods

Suppose that we have an initial guess \( \Bx_0 \). Iterative methods are generally of the form

DO
\(\Br = \Bb – M \Bx_i\)
UNTIL \(\Norm{\Br} < \epsilon \)

The difference \( \Br \) is called the residual. For as long as it is bigger than desired, continue improving the estimate \( \Bx_i \).

The matrix vector product \( M \Bx_i \), if dense, is of \( O(n^2) \). Suppose, for example, that we can perform the iteration in ten iterations. If the matrix is dense, we can have \( 10 \, O(n^2) \) performance. If sparse, this can be worse than just direct computation.

Gradient method

This is a method for iterative solution of the equation \( M \Bx = \Bb \).

This requires symmetric positive definite matrix \( M = M^\T \), with \( M > 0 \).

We introduce an energy function

\begin{equation}\label{eqn:multiphysicsL7:60}
\Psi(\By) \equiv \inv{2} \By^\T M \By – \By^\T \Bb
\end{equation}

For a two variable system this is illustrated in fig. 1.

 

lecture7Fig1

fig. 1. Positive definite energy function

Theorem: Energy function minimum

The energy function \ref{eqn:multiphysicsL7:60} has a minimum at

\begin{equation}\label{eqn:multiphysicsL7:80}
\By = M^{-1} \Bb = \Bx.
\end{equation}

To prove this, consider the coordinate representation

\begin{equation}\label{eqn:multiphysicsL7:480}
\Psi = \inv{2} y_a M_{ab} y_b – y_b b_b,
\end{equation}

for which the derivatives are
\begin{equation}\label{eqn:multiphysicsL7:500}
\PD{y_i}{\Psi} =
\inv{2} M_{ib} y_b
+
\inv{2} y_a M_{ai}
– b_i
=
\lr{ M \By – \Bb }_i.
\end{equation}

The last operation above was possible because \( M = M^\T \). Setting all of these equal to zero, and rewriting this as a matrix relation we have

\begin{equation}\label{eqn:multiphysicsL7:520}
M \By = \Bb,
\end{equation}

as asserted.

This is called the gradient method because the gradient moves us along the path of steepest descent towards the minimum if it exists.

The method is

\begin{equation}\label{eqn:multiphysicsL7:100}
\Bx^{(k+1)} = \Bx^{(k)} +
\underbrace{ \alpha_k }_{step size}
\underbrace{ \Bd^{(k)} }_{direction},
\end{equation}

where the direction is

\begin{equation}\label{eqn:multiphysicsL7:120}
\Bd^{(k)} = – \spacegrad \Phi = \Bb – M \Bx^k = r^{(k)}.
\end{equation}

Optimal step size

Note that for the minimization of \( \Phi \lr{ \Bx^{(k+1)} } \), we note

\begin{equation}\label{eqn:multiphysicsL7:140}
\Phi \lr{ \Bx^{(k+1)} }
= \Phi\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }
=
\inv{2}
\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }^\T
M
\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }

\lr{ \Bx^{(k)} + \alpha_k \Bd^{(k)} }^\T \Bb
\end{equation}

If we take the derivative of both sides with respect to \( \alpha_k \) to find the minimum, we have

\begin{equation}\label{eqn:multiphysicsL7:540}
0 =
\inv{2}
\lr{ \Bd^{(k)} }^\T
M
\Bx^{(k)}
+
\inv{2}
\lr{ \Bx^{(k)} }^\T
M
\Bd^{(k)}
+
\alpha_k \lr{ \Bd^{(k)} }^\T
M
\Bd^{(k)}

\lr{ \Bd^{(k)} }^\T \Bb.
\end{equation}

Because \( M \) is symmetric, this is

\begin{equation}\label{eqn:multiphysicsL7:560}
\alpha_k \lr{ \Bd^{(k)} }^\T
M
\Bd^{(k)}
=
\lr{ \Bd^{(k)} }^\T \lr{ \Bb – M \Bx^{(k)}}
=
\lr{ \Bd^{(k)} }^\T r^{(k)},
\end{equation}

or

\begin{equation}\label{eqn:multiphysicsL7:160}
\alpha_k
= \frac{
\lr{\Br^{(k)}}^\T
\Br^{(k)}
}{
\lr{\Br^{(k)}}^\T
M
\Br^{(k)}
}
\end{equation}

We will see that this method is not optimal when we pick one direction and keep going down that path.

Definitions and theorems

Definition: Positive (negative) definite

A matrix \( M \) is positive (negative) definite, denoted \( M > 0 (<0) \) if \( \By^\T M \By > 0 (<0), \quad \forall \By \).

If a matrix is neither positive, nor negative definite, it is called indefinite.

Theorem: Positive (negative) definite

A symmetric matrix \( M > 0 (<0)\) iff \( \lambda_i > 0 (<0)\) for all eigenvalues \( \lambda_i \), or is indefinite iff its eigenvalues \( \lambda_i \) are of mixed sign.

References

[1] Harry M Markowitz. The elimination form of the inverse and its application to linear programming. Management Science, 3\penalty0 (3):\penalty0 255–269, 1957.

[2] Timothy Vismor. Pivoting To Preserve Sparsity, 2012. URL https://vismor.com/documents/network_analysis/matrix_algorithms/S8.SS3.php. [Online; accessed 15-Oct-2014].