gradient

ECE1505H Convex Optimization. Lecture 7: Examples of convex and concave functions, local and global minimums. Taught by Prof. Stark Draper

February 2, 2017 Uncategorized No comments , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Today

  • Local and global optimality
  • Compositions of functions
  • Examples

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:20}
\begin{aligned}
F(x) &= x^2 \\
F”(x) &= 2 > 0
\end{aligned}
\end{equation}

strictly convex.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:40}
\begin{aligned}
F(x) &= x^3 \\
F”(x) &= 6 x.
\end{aligned}
\end{equation}

Not always non-negative, so not convex. However \( x^3 \) is convex on \( \textrm{dom} F = \mathbb{R}_{+} \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:60}
\begin{aligned}
F(x) &= x^\alpha \\
F'(x) &= \alpha x^{\alpha-1} \\
F”(x) &= \alpha(\alpha-1) x^{\alpha-2}.
\end{aligned}
\end{equation}

 

fig. 1. Powers of x.

This is convex on \( \mathbb{R}_{+} \), if \( \alpha \ge 1 \), or \( \alpha \le 0 \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:80}
\begin{aligned}
F(x) &= \log x \\
F'(x) &= \inv{x} \\
F”(x) &= -\inv{x^2} \le 0
\end{aligned}
\end{equation}

This is concave.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:100}
\begin{aligned}
F(x) &= x\log x \\
F'(x) &= \log x + x \inv{x} = 1 + \log x \\
F”(x) &= \inv{x}
\end{aligned}
\end{equation}

This is strictly convex on
\( \mathbb{R}_{++} \), where
\( F”(x) \ge 0 \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:120}
\begin{aligned}
F(x) &= e^{\alpha x} \\
F'(x) &= \alpha e^{\alpha x} \\
F”(x) &= \alpha^2 e^{\alpha x} \ge 0
\end{aligned}
\end{equation}

fig. 2. Exponential.

Such functions are plotted in fig. 2, and are convex function for all \( \alpha \).

Example:

For symmetric \( P \in S^n \)

\begin{equation}\label{eqn:convexOptimizationLecture7:140}
\begin{aligned}
F(\Bx) &= \Bx^\T P \Bx + 2 \Bq^\T \Bx + r \\
\spacegrad F &= (P + P^\T) \Bx + 2 \Bq = 2 P \Bx + 2 \Bq \\
\spacegrad^2 F &= 2 P.
\end{aligned}
\end{equation}

This is convex(concave) if \( P \ge 0 \) (\( P \le 0\)).

Example:

A quadratic function

\begin{equation}\label{eqn:convexOptimizationLecture7:780}
F(x, y) = x^2 + y^2 + 3 x y,
\end{equation}

that is neither convex nor concave is plotted in fig 3.

fig 3. Function with saddle point (3d and contours)

This function can be put in matrix form

\begin{equation}\label{eqn:convexOptimizationLecture7:160}
F(x, y) = x^2 + y^2 + 3 x y
=
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
1 & 1.5 \\
1.5 & 1
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix},
\end{equation}

and has the Hessian

\begin{equation}\label{eqn:convexOptimizationLecture7:180}
\begin{aligned}
\spacegrad^2 F
&=
\begin{bmatrix}
\partial_{xx} F & \partial_{xy} F \\
\partial_{yx} F & \partial_{yy} F \\
\end{bmatrix} \\
&=
\begin{bmatrix}
2 & 3 \\
3 & 2
\end{bmatrix} \\
&= 2 P.
\end{aligned}
\end{equation}

From the plot we know that this is not PSD, but this can be confirmed by checking the eigenvalues

\begin{equation}\label{eqn:convexOptimizationLecture7:200}
\begin{aligned}
0
&=
\det ( P – \lambda I ) \\
&=
(1 – \lambda)^2 – 1.5^2,
\end{aligned}
\end{equation}

which has solutions

\begin{equation}\label{eqn:convexOptimizationLecture7:220}
\lambda = 1 \pm \frac{3}{2} = \frac{3}{2}, -\frac{1}{2}.
\end{equation}

This is not PSD nor negative semi-definite, because it has one positive and one negative eigenvalues. This is neither convex nor concave.

Along \( y = -x \),

\begin{equation}\label{eqn:convexOptimizationLecture7:240}
\begin{aligned}
F(x,y)
&=
F(x,-x) \\
&=
2 x^2 – 3 x^2 \\
&=
– x^2,
\end{aligned}
\end{equation}

so it is concave along this line. Along \( y = x \)

\begin{equation}\label{eqn:convexOptimizationLecture7:260}
\begin{aligned}
F(x,y)
&=
F(x,x) \\
&=
2 x^2 + 3 x^2 \\
&=
5 x^2,
\end{aligned}
\end{equation}

so it is convex along this line.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:280}
F(\Bx) = \sqrt{ x_1 x_2 },
\end{equation}

on \( \textrm{dom} F = \setlr{ x_1 \ge 0, x_2 \ge 0 } \)

For the Hessian
\begin{equation}\label{eqn:convexOptimizationLecture7:300}
\begin{aligned}
\PD{x_1}{F} &= \frac{1}{2} x_1^{-1/2} x_2^{1/2} \\
\PD{x_2}{F} &= \frac{1}{2} x_2^{-1/2} x_1^{1/2}
\end{aligned}
\end{equation}

The Hessian components are

\begin{equation}\label{eqn:convexOptimizationLecture7:320}
\begin{aligned}
\PD{x_1}{} \PD{x_1}{F} &= -\frac{1}{4} x_1^{-3/2} x_2^{1/2} \\
\PD{x_1}{} \PD{x_2}{F} &= \frac{1}{4} x_2^{-1/2} x_1^{-1/2} \\
\PD{x_2}{} \PD{x_1}{F} &= \frac{1}{4} x_1^{-1/2} x_2^{-1/2} \\
\PD{x_2}{} \PD{x_2}{F} &= -\frac{1}{4} x_2^{-3/2} x_1^{1/2}
\end{aligned}
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture7:340}
\spacegrad^2 F
=
-\frac{\sqrt{x_1 x_2}}{4}
\begin{bmatrix}
\inv{x_1^2} & -\inv{x_1 x_2} \\
-\inv{x_1 x_2} & \inv{x_2^2}
\end{bmatrix}.
\end{equation}

Checking this for PSD against \( \Bv = (v_1, v_2) \), we have
\begin{equation}\label{eqn:convexOptimizationLecture7:360}
\begin{aligned}
\begin{bmatrix}
v_1 & v_2
\end{bmatrix}
\begin{bmatrix}
\inv{x_1^2} & -\inv{x_1 x_2} \\
-\inv{x_1 x_2} & \inv{x_2^2}
\end{bmatrix}
\begin{bmatrix}
v_1 \\ v_2
\end{bmatrix}
&=
\begin{bmatrix}
v_1 & v_2
\end{bmatrix}
\begin{bmatrix}
\inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 \\
-\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2
\end{bmatrix} \\
&=
\lr{ \inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 } v_1 +
\lr{ -\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2 } v_2
\\
&=
\inv{x_1^2} v_1^2
+ \inv{x_2^2} v_2^2
-2 \inv{x_1 x_2} v_1 v_2 \\
&=
\lr{
\frac{v_1}{x_1}
-\frac{v_2}{x_2}
}^2 \\
&\ge 0,
\end{aligned}
\end{equation}

so \( \spacegrad^2 F \le 0 \). This is a negative semi-definite function (concave). Observe that this check required checking PSD for all values of \( \Bx \).

This is an example of a more general result

\begin{equation}\label{eqn:convexOptimizationLecture7:380}
F(x) = \lr{ \prod_{i = 1}^n x_i }^{1/n},
\end{equation}

which is concave (prove on homework).

Summary.

If \( F \) is differentiable in \R{n}, then check the curvature of the function along all lines. i.e. At all locations and in all directions.

If the Hessian is PSD at all \( \Bx \in \textrm{dom} F \), that is

\begin{equation}\label{eqn:convexOptimizationLecture7:400}
\spacegrad^2 F \ge 0 \, \forall \Bx \in \textrm{dom} F,
\end{equation}

then the function is convex.

more examples of convex, but not necessarily differentiable functions

Example:

Over \( \textrm{dom} F = \mathbb{R}^n \)

\begin{equation}\label{eqn:convexOptimizationLecture7:420}
F(\Bx) = \max_{i = 1}^n x_i
\end{equation}

i.e.
\begin{equation}\label{eqn:convexOptimizationLecture7:440}
\begin{aligned}
F((1,2) &= 2 \\
F((3,-1) &= 3
\end{aligned}
\end{equation}

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:460}
F(\Bx) = \max_{i = 1}^n F_i(\Bx),
\end{equation}

where

\begin{equation}\label{eqn:convexOptimizationLecture7:480}
F_i(\Bx)
=
… ?
\end{equation}

max of a set of convex functions is a convex function.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:500}
F(x) =
x_{[1]} +
x_{[2]} +
x_{[3]}
\end{equation}

where

\( x_{[k]} \) is the k-th largest number in the list

Write

\begin{equation}\label{eqn:convexOptimizationLecture7:520}
F(x) = \max x_i + x_j + x_k
\end{equation}

\begin{equation}\label{eqn:convexOptimizationLecture7:540}
(i,j,k) \in \binom{n}{3}
\end{equation}

Example:

For \( \Ba \in \mathbb{R}^n \) and \( b_i \in \mathbb{R} \)

\begin{equation}\label{eqn:convexOptimizationLecture7:560}
\begin{aligned}
F(\Bx)
&= \sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )^{-1} \\
&= -\sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )
\end{aligned}
\end{equation}

This \( b_i – \Ba^\T \Bx \) is an affine function of \( \Bx \) so it doesn’t affect convexity.

Since \( \log \) is concave, \( -\log \) is convex. Convex functions of affine function of \( \Bx \) is convex function of \( \Bx \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:580}
F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By }
\end{equation}

 

fig. 3. Max length function

 

Here \( C \subseteq \mathbb{R}^n \) is not necessarily convex. We are using \( \sup \) here because the set \( C \) may be open. This function is the length of the line from \( \Bx \) to the point in \( C \) that is furthest from \( \Bx \).

  • \( \Bx – \By \) is linear in \( \Bx \)
  • \( g_\By(\Bx) = \Norm{\Bx – \By} \) is convex in \( \Bx \) since norms are convex functions.
  • \( F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By } \). Each \( \By \) index is a convex function. Taking max of those.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:600}
F(\Bx) = \inf_{\By \in C} \Norm{ \Bx – \By }.
\end{equation}

Min and max of two convex functions are plotted in fig. 4.

fig. 4. Min and max

The max is observed to be convex, whereas the min is not necessarily so.

\begin{equation}\label{eqn:convexOptimizationLecture7:800}
F(\Bz) = F(\theta \Bx + (1-\theta) \By) \ge \theta F(\Bx) + (1-\theta)F(\By).
\end{equation}

This is not necessarily convex for all sets \( C \subseteq \mathbb{R}^n \), because the \( \inf \) of a bunch of convex function is not necessarily convex. However, if \( C \) is convex, then \( F(\Bx) \) is convex.

Consequences of convexity for differentiable functions

  • Think about unconstrained functions \( \textrm{dom} F = \mathbb{R}^n \).
  • By first order condition \( F \) is convex iff the domain is convex and
    \begin{equation}\label{eqn:convexOptimizationLecture7:620}
    F(\Bx) \ge \lr{ \spacegrad F(\Bx)}^\T (\By – \Bx) \, \forall \Bx, \By \in \textrm{dom} F.
    \end{equation}

If \( F \) is convex and one can find an \( \Bx^\conj \in \textrm{dom} F \) such that

\begin{equation}\label{eqn:convexOptimizationLecture7:640}
\spacegrad F(\Bx^\conj) = 0,
\end{equation}

then

\begin{equation}\label{eqn:convexOptimizationLecture7:660}
F(\By) \ge F(\Bx^\conj) \, \forall \By \in \textrm{dom} F.
\end{equation}

If you can find the point where the gradient is zero (which can’t always be found), then \( \Bx^\conj\) is a global minimum of \( F \).

Conversely, if \( \Bx^\conj \) is a global minimizer of \( F \), then \( \spacegrad F(\Bx^\conj) = 0 \) must hold. If that were not the case, then you would be able to find a direction to move downhill, contracting the optimality of \( \Bx^\conj\).

Local vs Global optimum

 

fig. 6. Global and local minimums

Definition: Local optimum
\( \Bx^\conj \) is a local optimum of \( F \) if \( \exists \epsilon > 0 \) such that \( \forall \Bx \), \( \Norm{\Bx – \Bx^\conj} < \epsilon \), we have

\begin{equation*}
F(\Bx^\conj) \le F(\Bx)
\end{equation*}

 

fig. 5. min length function

Theorem:
Suppose \( F \) is twice continuously differentiable (not necessarily convex)

  • If \( \Bx^\conj\) is a local optimum then\begin{equation*}
    \begin{aligned}
    \spacegrad F(\Bx^\conj) &= 0 \\
    \spacegrad^2 F(\Bx^\conj) \ge 0
    \end{aligned}
    \end{equation*}
  • If
    \begin{equation*}
    \begin{aligned}
    \spacegrad F(\Bx^\conj) &= 0 \\
    \spacegrad^2 F(\Bx^\conj) \ge 0
    \end{aligned},
    \end{equation*}then \( \Bx^\conj\) is a local optimum.

Proof:

  • Let \( \Bx^\conj \) be a local optimum. Pick any \( \Bv \in \mathbb{R}^n \).\begin{equation}\label{eqn:convexOptimizationLecture7:720}
    \lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t}
    = \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv
    \ge 0.
    \end{equation}

Here the fraction is \( \ge 0 \) since \( \Bx^\conj \) is a local optimum.

Since the choice of \( \Bv \) is arbitrary, the only case that you can ensure that \( \ge 0, \forall \Bv \) is

\begin{equation}\label{eqn:convexOptimizationLecture7:740}
\spacegrad F = 0,
\end{equation}

( or else could pick \( \Bv = -\spacegrad F(\Bx^\conj) \).

This means that \( \spacegrad F(\Bx^\conj) = 0 \) if \( \Bx^\conj \) is a local optimum.

Consider the 2nd order derivative

\begin{equation}\label{eqn:convexOptimizationLecture7:760}
\begin{aligned}
\lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t^2}
&=
\lim_{t \rightarrow 0} \inv{t^2}
\lr{
F(\Bx^\conj) + t \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv + \inv{2} t^2 \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv + O(t^3)
– F(\Bx^\conj)
} \\
&=
\inv{2} \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv \\
&\ge 0.
\end{aligned}
\end{equation}

Here the \( \ge \) condition also comes from the fraction, based on the optimiality of \( \Bx^\conj \). This is true for all choice of \( \Bv \), thus \( \spacegrad^2 F(\Bx^\conj) \).

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

ECE1505H Convex Optimization. Lecture 6: First and second order conditions. Taught by Prof.\ Stark Draper

February 1, 2017 ece1505 No comments , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Today

  • First and second order conditions for convexity of differentiable functions.
  • Consequences of convexity: local and global optimality.
  • Properties.

Quasi-convex

\( F_1 \) and \( F_2 \) convex implies \( \max( F_1, F_2) \) convex.

 

fig. 1. Min and Max

Note that \( \min(F_1, F_2) \) is NOT convex.

If \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is convex, then \( F( \Bx_0 + t \Bv ) \) is convex in \( t\,\forall t \in \mathbb{R}, \Bx_0 \in \mathbb{R}^n, \Bv \in \mathbb{R}^n \), provided \( \Bx_0 + t \Bv \in \textrm{dom} F \).

Idea: Restrict to a line (line segment) in \( \textrm{dom} F \). Take a cross section or slice through \( F \) alone the line. If the result is a 1D convex function for all slices, then \( F \) is convex.

This is nice since it allows for checking for convexity, and is also nice numerically. Attempting to test a given data set for non-convexity with some random lines can help disprove convexity. However, to show that \( F \) is convex it is required to test all possible slices (which isn’t possible numerically, but is in some circumstances possible analytically).

Differentiable (convex) functions

Definition: First order condition.

If

\begin{equation*}
F : \mathbb{R}^n \rightarrow \mathbb{R}
\end{equation*}

is differentiable, then \( F \) is convex iff \( \textrm{dom} F \) is a convex set and \( \forall \Bx, \Bx_0 \in \textrm{dom} F \)

\begin{equation*}
F(\Bx) \ge F(\Bx_0) + \lr{\spacegrad F(\Bx_0)}^\T (\Bx – \Bx_0).
\end{equation*}

This is the first order Taylor expansion. If \( n = 1 \), this is \( F(x) \ge F(x_0) + F'(x_0) ( x – x_0) \).

The first order condition says a convex function \underline{always} lies above its first order approximation, as sketched in fig. 3.

 

fig. 2. First order approximation lies below convex function

When differentiable, the supporting plane is the tangent plane.

Definition: Second order condition

If \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is twice differentiable, then \( F \) is convex iff \( \textrm{dom} F \) is a convex set and \( \spacegrad^2 F(\Bx) \ge 0 \,\forall \Bx \in \textrm{dom} F\).

The Hessian is always symmetric, but is not necessarily positive. Recall that the Hessian is the matrix of the second order partials \( (\spacegrad F)_{ij} = \partial^2 F/(\partial x_i \partial x_j) \).

The scalar case is \( F”(x) \ge 0 \, \forall x \in \textrm{dom} F \).

An implication is that if \( F \) is convex, then \( F(x) \ge F(x_0) + F'(x_0) (x – x_0) \,\forall x, x_0 \in \textrm{dom} F\)

Since \( F \) is convex, \( \textrm{dom} F \) is convex.

Consider any 2 points \( x, y \in \textrm{dom} F \), and \( \theta \in [0,1] \). Define

\begin{equation}\label{eqn:convexOptimizationLecture6:60}
z = (1-\theta) x + \theta y \in \textrm{dom} F,
\end{equation}

then since \( \textrm{dom} F \) is convex

\begin{equation}\label{eqn:convexOptimizationLecture6:80}
F(z) =
F( (1-\theta) x + \theta y )
\le
(1-\theta) F(x) + \theta F(y )
\end{equation}

Reordering

\begin{equation}\label{eqn:convexOptimizationLecture6:220}
\theta F(x) \ge
\theta F(x) + F(z) – F(x),
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture6:100}
F(y) \ge
F(x) + \frac{F(x + \theta(y-x)) – F(x)}{\theta},
\end{equation}

which is, in the limit,

\begin{equation}\label{eqn:convexOptimizationLecture6:120}
F(y) \ge
F(x) + F'(x) (y – x),
\end{equation}

completing one direction of the proof.

To prove the other direction, showing that

\begin{equation}\label{eqn:convexOptimizationLecture6:140}
F(x) \ge F(x_0) + F'(x_0) (x – x_0),
\end{equation}

implies that \( F \) is convex. Take any \( x, y \in \textrm{dom} F \) and any \( \theta \in [0,1] \). Define

\begin{equation}\label{eqn:convexOptimizationLecture6:160}
z = \theta x + (1 -\theta) y,
\end{equation}

which is in \( \textrm{dom} F \) by assumption. We want to show that

\begin{equation}\label{eqn:convexOptimizationLecture6:180}
F(z) \le \theta F(x) + (1-\theta) F(y).
\end{equation}

By assumption

  1. \( F(x) \ge F(z) + F'(z) (x – z) \)
  2. \( F(y) \ge F(z) + F'(z) (y – z) \)

Compute

\begin{equation}\label{eqn:convexOptimizationLecture6:200}
\begin{aligned}
\theta F(x) + (1-\theta) F(y)
&\ge
\theta \lr{ F(z) + F'(z) (x – z) }
+ (1-\theta) \lr{ F(z) + F'(z) (y – z) } \\
&=
F(z) + F'(z) \lr{ \theta( x – z) + (1-\theta) (y-z) } \\
&=
F(z) + F'(z) \lr{ \theta x + (1-\theta) y – \theta z – (1 -\theta) z } \\
&=
F(z) + F'(z) \lr{ \theta x + (1-\theta) y – z} \\
&=
F(z) + F'(z) \lr{ z – z} \\
&= F(z).
\end{aligned}
\end{equation}

Proof of the 2nd order case for \( n = 1 \)

Want to prove that if

\begin{equation}\label{eqn:convexOptimizationLecture6:240}
F : \mathbb{R} \rightarrow \mathbb{R}
\end{equation}

is a convex function, then \( F”(x) \ge 0 \,\forall x \in \textrm{dom} F \).

By the first order conditions \( \forall x \ne y \in \textrm{dom} F \)

\begin{equation}\label{eqn:convexOptimizationLecture6:260}
\begin{aligned}
F(y) &\ge F(x) + F'(x) (y – x)
F(x) &\ge F(y) + F'(y) (x – y)
\end{aligned}
\end{equation}

Can combine and get

\begin{equation}\label{eqn:convexOptimizationLecture6:280}
F'(x) (y-x) \le F(y) – F(x) \le F'(y)(y-x)
\end{equation}

Subtract the two derivative terms for

\begin{equation}\label{eqn:convexOptimizationLecture6:340}
\frac{(F'(y) – F'(x))(y – x)}{(y – x)^2} \ge 0,
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture6:300}
\frac{F'(y) – F'(x)}{y – x} \ge 0.
\end{equation}

In the limit as \( y \rightarrow x \), this is
\begin{equation}\label{eqn:convexOptimizationLecture6:320}
\boxed{
F”(x) \ge 0 \,\forall x \in \textrm{dom} F.
}
\end{equation}

Now prove the reverse condition:

If \( F”(x) \ge 0 \,\forall x \in \textrm{dom} F \subseteq \mathbb{R} \), implies that \( F : \mathbb{R} \rightarrow \mathbb{R} \) is convex.

Note that if \( F”(x) \ge 0 \), then \( F'(x) \) is non-decreasing in \( x \).

i.e. If \( x < y \), where \( x, y \in \textrm{dom} F\), then

\begin{equation}\label{eqn:convexOptimizationLecture6:360}
F'(x) \le F'(y).
\end{equation}

Consider any \( x,y \in \textrm{dom} F\) such that \( x < y \), where

\begin{equation}\label{eqn:convexOptimizationLecture6:380}
F(y) – F(x) = \int_x^y F'(t) dt \ge F'(x) \int_x^y 1 dt = F'(x) (y-x).
\end{equation}

This tells us that

\begin{equation}\label{eqn:convexOptimizationLecture6:400}
F(y) \ge F(x) + F'(x)(y – x),
\end{equation}

which is the first order condition. Similarly consider any \( x,y \in \textrm{dom} F\) such that \( x < y \), where

\begin{equation}\label{eqn:convexOptimizationLecture6:420}
F(y) – F(x) = \int_x^y F'(t) dt \le F'(y) \int_x^y 1 dt = F'(y) (y-x).
\end{equation}

This tells us that

\begin{equation}\label{eqn:convexOptimizationLecture6:440}
F(x) \ge F(y) + F'(y)(x – y).
\end{equation}

Vector proof:

\( F \) is convex iff \( F(\Bx + t \Bv) \) is convex \( \forall \Bx,\Bv \in \mathbb{R}^n, t \in \mathbb{R} \), keeping \( \Bx + t \Bv \in \textrm{dom} F\).

Let
\begin{equation}\label{eqn:convexOptimizationLecture6:460}
h(t ; \Bx, \Bv) = F(\Bx + t \Bv)
\end{equation}

then \( h(t) \) satisfies scalar first and second order conditions for all \( \Bx, \Bv \).

\begin{equation}\label{eqn:convexOptimizationLecture6:480}
h(t) = F(\Bx + t \Bv) = F(g(t)),
\end{equation}

where \( g(t) = \Bx + t \Bv \), where

\begin{equation}\label{eqn:convexOptimizationLecture6:500}
\begin{aligned}
F &: \mathbb{R}^n \rightarrow \mathbb{R} \\
g &: \mathbb{R} \rightarrow \mathbb{R}^n.
\end{aligned}
\end{equation}

This is expressing \( h(t) \) as a composition of two functions. By the first order condition for scalar functions we know that

\begin{equation}\label{eqn:convexOptimizationLecture6:520}
h(t) \ge h(0) + h'(0) t.
\end{equation}

Note that

\begin{equation}\label{eqn:convexOptimizationLecture6:540}
h(0) = \evalbar{F(\Bx + t \Bv)}{t = 0} = F(\Bx).
\end{equation}

Let’s figure out what \( h'(0) \) is. Recall hat for any \( \tilde{F} : \mathbb{R}^n \rightarrow \mathbb{R}^m \)

\begin{equation}\label{eqn:convexOptimizationLecture6:560}
D \tilde{F} \in \mathbb{R}^{m \times n},
\end{equation}

and
\begin{equation}\label{eqn:convexOptimizationLecture6:580}
{D \tilde{F}(\Bx)}_{ij} = \PD{x_j}{\tilde{F_i}(\Bx)}
\end{equation}

This is one function per row, for \( i \in [1,m], j \in [1,n] \). This gives

\begin{equation}\label{eqn:convexOptimizationLecture6:600}
\begin{aligned}
\frac{d}{dt} F(\Bx + \Bv t)
&=
\frac{d}{dt} F( g(t) ) \\
&=
\frac{d}{dt} h(t) \\
&= D h(t) \\
&= D F(g(t)) \cdot D g(t)
\end{aligned}
\end{equation}

The first matrix is in \( \mathbb{R}^{1\times n} \) whereas the second is in \( \mathbb{R}^{n\times 1} \), since \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) and \( g : \mathbb{R} \rightarrow \mathbb{R}^n \). This gives

\begin{equation}\label{eqn:convexOptimizationLecture6:620}
\frac{d}{dt} F(\Bx + \Bv t)
= \evalbar{D F(\tilde{\Bx})}{\tilde{\Bx} = g(t)} \cdot D g(t).
\end{equation}

That first matrix is

\begin{equation}\label{eqn:convexOptimizationLecture6:640}
\begin{aligned}
\evalbar{D F(\tilde{\Bx})}{\tilde{\Bx} = g(t)}
&=
\evalbar{
\lr{\begin{bmatrix}
\PD{\tilde{x}_1}{ F(\tilde{\Bx})} &
\PD{\tilde{x}_2}{ F(\tilde{\Bx})} & \cdots
\PD{\tilde{x}_n}{ F(\tilde{\Bx})}
\end{bmatrix}
}}{ \tilde{\Bx} = g(t) = \Bx + t \Bv } \\
&=
\evalbar{
\lr{ \spacegrad F(\tilde{\Bx}) }^\T
}{
\tilde{\Bx} = g(t)
} \\
=
\lr{ \spacegrad F(g(t)) }^\T.
\end{aligned}
\end{equation}

The second Jacobian is

\begin{equation}\label{eqn:convexOptimizationLecture6:660}
D g(t)
=
D
\begin{bmatrix}
g_1(t) \\
g_2(t) \\
\vdots \\
g_n(t) \\
\end{bmatrix}
=
D
\begin{bmatrix}
x_1 + t v_1 \\
x_2 + t v_2 \\
\vdots \\
x_n + t v_n \\
\end{bmatrix}
=
\begin{bmatrix}
v_1 \\
v_1 \\
\vdots \\
v_n \\
\end{bmatrix}
=
\Bv.
\end{equation}

so

\begin{equation}\label{eqn:convexOptimizationLecture6:680}
h'(t) = D h(t) = \lr{ \spacegrad F(g(t))}^\T \Bv,
\end{equation}

and
\begin{equation}\label{eqn:convexOptimizationLecture6:700}
h'(0) = \lr{ \spacegrad F(g(0))}^\T \Bv
=
\lr{ \spacegrad F(\Bx)}^\T \Bv.
\end{equation}

Finally

\begin{equation}\label{eqn:convexOptimizationLecture6:720}
\begin{aligned}
F(\Bx + t \Bv)
&\ge h(0) + h'(0) t \\
&= F(\Bx) + \lr{ \spacegrad F(\Bx) }^\T (t \Bv) \\
&= F(\Bx) + \innerprod{ \spacegrad F(\Bx) }{ t \Bv}.
\end{aligned}
\end{equation}

Which is true for all \( \Bx, \Bx + t \Bv \in \textrm{dom} F \). Note that the quantity \( t \Bv \) is a shift.

Epigraph

Recall that if \( (\Bx, t) \in \textrm{epi} F \) then \( t \ge F(\Bx) \).

\begin{equation}\label{eqn:convexOptimizationLecture6:740}
t \ge F(\Bx) \ge F(\Bx_0) + \lr{\spacegrad F(\Bx_0) }^\T (\Bx – \Bx_0),
\end{equation}

or

\begin{equation}\label{eqn:convexOptimizationLecture6:760}
0 \ge
-(t – F(\Bx_0)) + \lr{\spacegrad F(\Bx_0) }^\T (\Bx – \Bx_0),
\end{equation}

In block matrix form

\begin{equation}\label{eqn:convexOptimizationLecture6:780}
0 \ge
\begin{bmatrix}
\lr{ \spacegrad F(\Bx_0) }^\T & -1
\end{bmatrix}
\begin{bmatrix}
\Bx – \Bx_0 \\
t – F(\Bx_0)
\end{bmatrix}
\end{equation}

With \( \Bw =
\begin{bmatrix}
\lr{ \spacegrad F(\Bx_0) }^\T & -1
\end{bmatrix} \), the geometry of the epigraph relation to the half plane is sketched in fig. 3.

 

fig. 3. Half planes and epigraph.

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

ECE1505H Convex Optimization. Lecture 3: Matrix functions, SVD, and types of Sets. Taught by Prof. Stark Draper

January 19, 2017 ece1505 No comments , , , , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper.

Matrix inner product

Given real matrices \( X, Y \in \mathbb{R}^{m\times n} \), one possible matrix inner product definition is

\begin{equation}\label{eqn:convexOptimizationLecture3:20}
\begin{aligned}
\innerprod{X}{Y}
&= \textrm{Tr}( X^\T Y) \\
&= \textrm{Tr} \lr{ \sum_{k = 1}^m X_{ki} Y_{kj} } \\
&= \sum_{k = 1}^m \sum_{j = 1}^n X_{kj} Y_{kj} \\
&= \sum_{i = 1}^m \sum_{j = 1}^n X_{ij} Y_{ij}.
\end{aligned}
\end{equation}

This inner product induces a norm on the (matrix) vector space, called the Frobenius norm

\begin{equation}\label{eqn:convexOptimizationLecture3:40}
\begin{aligned}
\Norm{X }_F
&= \textrm{Tr}( X^\T X) \\
&= \sqrt{ \innerprod{X}{X} } \\
&=
\sum_{i = 1}^m \sum_{j = 1}^n X_{ij}^2.
\end{aligned}
\end{equation}

Range, nullspace.

Definition: Range: Given \( A \in \mathbb{R}^{m \times n} \), the range of A is the set:

\begin{equation*}
\mathcal{R}(A) = \setlr{ A \Bx | \Bx \in \mathbb{R}^n }.
\end{equation*}

Definition: Nullspace: Given \( A \in \mathbb{R}^{m \times n} \), the nullspace of A is the set:

\begin{equation*}
\mathcal{N}(A) = \setlr{ \Bx | A \Bx = 0 }.
\end{equation*}

SVD.

To understand operation of \( A \in \mathbb{R}^{m \times n} \), a representation of a linear transformation from \R{n} to \R{m}, decompose \( A \) using the singular value decomposition (SVD).

Definition: SVD: Given \( A \in \mathbb{R}^{m \times n} \), an operator on \( \Bx \in \mathbb{R}^n \), a decomposition of the following form is always possible

\begin{equation*}
\begin{aligned}
A &= U \Sigma V^\T \\
U &\in \mathbb{R}^{m \times r} \\
V &\in \mathbb{R}^{n \times r},
\end{aligned}
\end{equation*}

where \( r \) is the rank of \(A\), and both \( U \) and \( V \) are orthogonal

\begin{equation*}
\begin{aligned}
U^\T U &= I \in \mathbb{R}^{r \times r} \\
V^\T V &= I \in \mathbb{R}^{r \times r}.
\end{aligned}
\end{equation*}

Here \( \Sigma = \textrm{diag}( \sigma_1, \sigma_2, \cdots, \sigma_r ) \), is a diagonal matrix of “singular” values, where

\begin{equation*}
\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r.
\end{equation*}

For simplicity consider square case \( m = n \)

\begin{equation}\label{eqn:convexOptimizationLecture3:100}
A \Bx = \lr{ U \Sigma V^\T } \Bx.
\end{equation}

The first product \( V^\T \Bx \) is a rotation, which can be checked by looking at the length

\begin{equation}\label{eqn:convexOptimizationLecture3:120}
\begin{aligned}
\Norm{ V^\T \Bx}_2
&= \sqrt{ \Bx^\T V V^\T \Bx } \\
&= \sqrt{ \Bx^\T \Bx } \\
&= \Norm{ \Bx }_2,
\end{aligned}
\end{equation}

which shows that the length of the vector is unchanged after application of the linear transformation represented by \( V^\T \) so that operation must be a rotation.

Similarly the operation of \( U \) on \( \Sigma V^\T \Bx \) also must be a rotation. The operation \( \Sigma = [\sigma_i]_i \) applies a scaling operation to each component of the vector \( V^\T \Bx \).

All linear (square) transformations can therefore be thought of as a rotate-scale-rotate operation. Often the \( A \) of interest will be symmetric \( A = A^\T \).

Set of symmetric matrices

Let \( S^n \) be the set of real, symmetric \( n \times n \) matrices.

Theorem: Spectral theorem: When \( A \in S^n \) then it is possible to factor \( A \) as

\begin{equation*}
A = Q \Lambda Q^\T,
\end{equation*}

where \( Q \) is an orthogonal matrix, and \( \Lambda = \textrm{diag}( \lambda_1, \lambda_2, \cdots \lambda_n)\). Here \( \lambda_i \in \mathbb{R} \, \forall i \) are the (real) eigenvalues of \( A \).

A real symmetric matrix \( A \in S^n\) is “positive semi-definite” if

\begin{equation*}
\Bv^\T A \Bv \ge 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0,
\end{equation*}
and is “positive definite” if

\begin{equation*}
\Bv^\T A \Bv > 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0.
\end{equation*}

The set of such matrices is denoted \( S^n_{+} \), and \( S^n_{++} \) respectively.

Consider \( A \in S^n_{+} \) (or \( S^n_{++} \) )

\begin{equation}\label{eqn:convexOptimizationLecture3:200}
A = Q \Lambda Q^\T,
\end{equation}

possible since the matrix is symmetric. For such a matrix

\begin{equation}\label{eqn:convexOptimizationLecture3:220}
\begin{aligned}
\Bv^\T A \Bv
&=
\Bv^\T Q \Lambda A^\T \Bv \\
&=
\Bw^\T \Lambda \Bw,
\end{aligned}
\end{equation}

where \( \Bw = A^\T \Bv \). Such a product is

\begin{equation}\label{eqn:convexOptimizationLecture3:240}
\Bv^\T A \Bv
=
\sum_{i = 1}^n \lambda_i w_i^2.
\end{equation}

So, if \( \lambda_i \ge 0 \) (\(\lambda_i > 0 \) ) then \( \sum_{i = 1}^n \lambda_i w_i^2 \) is non-negative (positive) \( \forall \Bw \in \mathbb{R}^n, \Bw \ne 0 \). Since \( \Bw \) is just a rotated version of \( \Bv \) this also holds for all \( \Bv \). A necessary and sufficient condition for \( A \in S^n_{+} \) (\( S^n_{++} \) ) is \( \lambda_i \ge 0 \) (\(\lambda_i > 0\)).

Square root of positive semi-definite matrix

Real symmetric matrix power relationships such as

\begin{equation}\label{eqn:convexOptimizationLecture3:260}
A^2
=
Q \Lambda Q^\T
Q \Lambda Q^\T
=
Q \Lambda^2
Q^\T
,
\end{equation}

or more generally \( A^k = Q \Lambda^k Q^\T,\, k \in \mathbb{Z} \), can be further generalized to non-integral powers. In particular, the square root (non-unique) of a square matrix can be written

\begin{equation}\label{eqn:convexOptimizationLecture3:280}
A^{1/2} = Q
\begin{bmatrix}
\sqrt{\lambda_1} & & & \\
& \sqrt{\lambda_2} & & \\
& & \ddots & \\
& & & \sqrt{\lambda_n} \\
\end{bmatrix}
Q^\T,
\end{equation}

since \( A^{1/2} A^{1/2} = A \), regardless of the sign picked for the square roots in question.

Functions of matrices

Consider \( F : S^n \rightarrow \mathbb{R} \), and define

\begin{equation}\label{eqn:convexOptimizationLecture3:300}
F(X) = \log \det X,
\end{equation}

Here \( \textrm{dom} F = S^n_{++} \). The task is to find \( \spacegrad F \), which can be done by looking at the perturbation \( \log \det ( X + \Delta X ) \)

\begin{equation}\label{eqn:convexOptimizationLecture3:320}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det ( X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2}) X^{1/2} ) \\
&=
\log \det ( X (I + X^{-1/2} \Delta X X^{-1/2}) ) \\
&=
\log \det X + \log \det (I + X^{-1/2} \Delta X X^{-1/2}).
\end{aligned}
\end{equation}

Let \( X^{-1/2} \Delta X X^{-1/2} = M \) where \( \lambda_i \) are the eigenvalues of \( M : M \Bv = \lambda_i \Bv \) when \( \Bv \) is an eigenvector of \( M \). In particular

\begin{equation}\label{eqn:convexOptimizationLecture3:340}
(I + M) \Bv =
(1 + \lambda_i) \Bv,
\end{equation}

where \( 1 + \lambda_i \) are the eigenvalues of the \( I + M \) matrix. Since the determinant is the product of the eigenvalues, this gives

\begin{equation}\label{eqn:convexOptimizationLecture3:360}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det X +
\log \prod_{i = 1}^n (1 + \lambda_i) \\
&=
\log \det X +
\sum_{i = 1}^n \log (1 + \lambda_i).
\end{aligned}
\end{equation}

If \( \lambda_i \) are sufficiently “small”, then \( \log ( 1 + \lambda_i ) \approx \lambda_i \), giving

\begin{equation}\label{eqn:convexOptimizationLecture3:380}
\log \det ( X + \Delta X )
=
\log \det X +
\sum_{i = 1}^n \lambda_i
\approx
\log \det X +
\textrm{Tr}( X^{-1/2} \Delta X X^{-1/2} ).
\end{equation}

Since
\begin{equation}\label{eqn:convexOptimizationLecture3:400}
\textrm{Tr}( A B ) = \textrm{Tr}( B A ),
\end{equation}

this trace operation can be written as

\begin{equation}\label{eqn:convexOptimizationLecture3:420}
\log \det ( X + \Delta X )
\approx
\log \det X +
\textrm{Tr}( X^{-1} \Delta X )
=
\log \det X +
\innerprod{ X^{-1}}{\Delta X},
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:440}
\spacegrad F(X) = X^{-1}.
\end{equation}

To check this, consider the simplest example with \( X \in \mathbb{R}^{1 \times 1} \), where we have

\begin{equation}\label{eqn:convexOptimizationLecture3:460}
\frac{d}{dX} \lr{ \log \det X } = \frac{d}{dX} \lr{ \log X } = \inv{X} = X^{-1}.
\end{equation}

This is a nice example demonstrating how the gradient can be obtained by performing a first order perturbation of the function. The gradient can then be read off from the result.

Second order perturbations

  • To get first order approximation found the part that varied linearly in \( \Delta X \).
  • To get the second order part, perturb \( X^{-1} \) by \( \Delta X \) and see how that perturbation varies in \( \Delta X \).

For \( G(X) = X^{-1} \), this is

\begin{equation}\label{eqn:convexOptimizationLecture3:480}
\begin{aligned}
(X + \Delta X)^{-1}
&=
\lr{ X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2} ) X^{1/2} }^{-1} \\
&=
X^{-1/2} (I + X^{-1/2} \Delta X X^{-1/2} )^{-1} X^{-1/2}
\end{aligned}
\end{equation}

To be proven in the homework (for “small” A)

\begin{equation}\label{eqn:convexOptimizationLecture3:500}
(I + A)^{-1} \approx I – A.
\end{equation}

This gives

\begin{equation}\label{eqn:convexOptimizationLecture3:520}
\begin{aligned}
(X + \Delta X)^{-1}
&=
X^{-1/2} (I – X^{-1/2} \Delta X X^{-1/2} ) X^{-1/2} \\
&=
X^{-1} – X^{-1} \Delta X X^{-1},
\end{aligned}
\end{equation}

or

\begin{equation}\label{eqn:convexOptimizationLecture3:800}
\begin{aligned}
G(X + \Delta X)
&= G(X) + (D G) \Delta X \\
&= G(X) + (\spacegrad G)^\T \Delta X,
\end{aligned}
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:820}
(\spacegrad G)^\T \Delta X
=
– X^{-1} \Delta X X^{-1}.
\end{equation}

The Taylor expansion of \( F \) to second order is

\begin{equation}\label{eqn:convexOptimizationLecture3:840}
F(X + \Delta X)
=
F(X)
+
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
+
\inv{2}
\lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}.
\end{equation}

The first trace can be expressed as an inner product

\begin{equation}\label{eqn:convexOptimizationLecture3:860}
\begin{aligned}
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
&=
\innerprod{ \spacegrad F }{\Delta X} \\
&=
\innerprod{ X^{-1} }{\Delta X}.
\end{aligned}
\end{equation}

The second trace also has the structure of an inner product

\begin{equation}\label{eqn:convexOptimizationLecture3:880}
\begin{aligned}
(\Delta X)^\T (\spacegrad^2 F) \Delta X
&=
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X} \\
&=
\innerprod{ (\spacegrad^2 F)^\T \Delta X }{\Delta X},
\end{aligned}
\end{equation}

where a no-op trace could be inserted in the second order term since that quadratic form is already a scalar. This \( (\spacegrad^2 F)^\T \Delta X \) term has essentially been found implicitly by performing the linear variation of \( \spacegrad F \) in \( \Delta X \), showing that we must have

\begin{equation}\label{eqn:convexOptimizationLecture3:900}
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}
=
\innerprod{ – X^{-1} \Delta X X^{-1} }{\Delta X},
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:560}
F( X + \Delta X) = F(X) +
\innerprod{X^{-1}}{\Delta X}
+\inv{2} \innerprod{-X^{-1} \Delta X X^{-1}}{\Delta X},
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture3:580}
\log \det ( X + \Delta X) = \log \det X +
\textrm{Tr}( X^{-1} \Delta X )
– \inv{2} \textrm{Tr}( X^{-1} \Delta X X^{-1} \Delta X ).
\end{equation}

Convex Sets

  • Types of sets: Affine, convex, cones
  • Examples: Hyperplanes, polyhedra, balls, ellipses, norm balls, cone of PSD matrices.

Definition: Affine set:

A set \( C \subseteq \mathbb{R}^n \) is affine if \( \forall \Bx_1, \Bx_2 \in C \) then

\begin{equation*}
\theta \Bx_1 + (1 -\theta) \Bx_2 \in C, \qquad \forall \theta \in \mathbb{R}.
\end{equation*}

The affine sum above can
be rewritten as

\begin{equation}\label{eqn:convexOptimizationLecture3:600}
\Bx_2 + \theta (\Bx_1 – \Bx_2).
\end{equation}

Since \( \theta \) is a scaling, this is the line containing \( \Bx_2 \) in the direction between \( \Bx_1 \) and \( \Bx_2 \).

Observe that the solution to a set of linear equations

\begin{equation}\label{eqn:convexOptimizationLecture3:620}
C = \setlr{ \Bx | A \Bx = \Bb },
\end{equation}

is an affine set. To check, note that

\begin{equation}\label{eqn:convexOptimizationLecture3:640}
\begin{aligned}
A (\theta \Bx_1 + (1 – \theta) \Bx_2)
&=
\theta A \Bx_1 + (1 – \theta) A \Bx_2 \\
&=
\theta \Bb + (1 – \theta) \Bb \\
&= \Bb.
\end{aligned}
\end{equation}

Definition: Affine combination: An affine combination of points \( \Bx_1, \Bx_2, \cdots \Bx_n \) is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that for \( \theta_i \in \mathbb{R} \)

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1.
\end{equation*}

An affine set contains all affine combinations of points in the set. Examples of a couple affine sets are sketched in fig 1.1

For comparison, a couple of non-affine sets are sketched in fig 1.2

 

Definition: Convex set: A set \( C \subseteq \mathbb{R}^n \) is convex if \( \forall \Bx_1, \Bx_2 \in C \) and \( \forall \theta \in \mathbb{R}, \theta \in [0,1] \), the combination

\begin{equation}\label{eqn:convexOptimizationLecture3:700}
\theta \Bx_1 + (1 – \theta) \Bx_2 \in C.
\end{equation}

Definition: Convex combination: A convex combination of \( \Bx_1, \Bx_2, \cdots \Bx_n \) is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that \( \forall \theta_i \ge 0 \)

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1
\end{equation*}

Definition: Convex hull: Convex hull of a set \( C \) is a set of all convex combinations of points in \(C\), denoted

\begin{equation}\label{eqn:convexOptimizationLecture3:720}
\textrm{conv}(C) = \setlr{ \sum_{i=1}^n \theta_i \Bx_i | \Bx_i \in C, \theta_i \ge 0, \sum_{i=1}^n \theta_i = 1 }.
\end{equation}

A non-convex set can be converted into a convex hull by filling in all the combinations of points connecting points in the set, as sketched in fig 1.3.

Definition: Cone: A set \(C\) is a cone if \( \forall \Bx \in C \) and \( \forall \theta \ge 0 \) we have \( \theta \Bx \in C\).

This scales out if \(\theta > 1\) and scales in if \(\theta < 1\).

A convex cone is a cone that is also a convex set. A conic combination is

\begin{equation*}
\sum_{i=1}^n \theta_i \Bx_i, \theta_i \ge 0.
\end{equation*}

A convex and non-convex 2D cone is sketched in fig. 1.4

A comparison of properties for different set types is tabulated in table 1.1

Hyperplanes and half spaces

Definition: Hyperplane: A hyperplane is defined by

\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb, \Ba \ne 0 }.
\end{equation*}

A line and plane are examples of this general construct as sketched in
fig. 1.5

An alternate view is possible should one
find any specific \( \Bx_0 \) such that \( \Ba^\T \Bx_0 = \Bb \)

\begin{equation}\label{eqn:convexOptimizationLecture3:740}
\setlr{\Bx | \Ba^\T \Bx = b }
=
\setlr{\Bx | \Ba^\T (\Bx -\Bx_0) = 0 }
\end{equation}

This shows that \( \Bx – \Bx_0 = \Ba^\perp \) is perpendicular to \( \Ba \), or

\begin{equation}\label{eqn:convexOptimizationLecture3:780}
\Bx
=
\Bx_0 + \Ba^\perp.
\end{equation}

This is the subspace perpendicular to \( \Ba \) shifted by \(\Bx_0\), subject to \( \Ba^\T \Bx_0 = \Bb \). As a set

\begin{equation}\label{eqn:convexOptimizationLecture3:760}
\Ba^\perp = \setlr{ \Bv | \Ba^\T \Bv = 0 }.
\end{equation}

Half space

Definition: Half space: The half space is defined as
\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb }
= \setlr{ \Bx | \Ba^\T (\Bx – \Bx_0) \le 0 }.
\end{equation*}

This can also be expressed as \( \setlr{ \Bx | \innerprod{ \Ba }{\Bx – \Bx_0 } \le 0 } \).

Jacobian and Hessian matrices

January 15, 2017 ece1505 No comments , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Motivation

In class this Friday the Jacobian and Hessian matrices were introduced, but I did not find the treatment terribly clear. Here is an alternate treatment, beginning with the gradient construction from [2], which uses a nice trick to frame the multivariable derivative operation as a single variable Taylor expansion.

Multivariable Taylor approximation

The Taylor series expansion for a scalar function \( g : {\mathbb{R}} \rightarrow {\mathbb{R}} \) about the origin is just

\begin{equation}\label{eqn:jacobianAndHessian:20}
g(t) = g(0) + t g'(0) + \frac{t^2}{2} g”(0) + \cdots
\end{equation}

In particular

\begin{equation}\label{eqn:jacobianAndHessian:40}
g(1) = g(0) + g'(0) + \frac{1}{2} g”(0) + \cdots
\end{equation}

Now consider \( g(t) = f( \Bx + \Ba t ) \), where \( f : {\mathbb{R}}^n \rightarrow {\mathbb{R}} \), \( g(0) = f(\Bx) \), and \( g(1) = f(\Bx + \Ba) \). The multivariable Taylor expansion now follows directly

\begin{equation}\label{eqn:jacobianAndHessian:60}
f( \Bx + \Ba)
= f(\Bx)
+ \evalbar{\frac{df(\Bx + \Ba t)}{dt}}{t = 0} + \frac{1}{2} \evalbar{\frac{d^2f(\Bx + \Ba t)}{dt^2}}{t = 0} + \cdots
\end{equation}

The first order term is

\begin{equation}\label{eqn:jacobianAndHessian:80}
\begin{aligned}
\evalbar{\frac{df(\Bx + \Ba t)}{dt}}{t = 0}
&=
\sum_{i = 1}^n
\frac{d( x_i + a_i t)}{dt}
\evalbar{\PD{(x_i + a_i t)}{f(\Bx + \Ba t)}}{t = 0} \\
&=
\sum_{i = 1}^n
a_i
\PD{x_i}{f(\Bx)} \\
&= \Ba \cdot \spacegrad f.
\end{aligned}
\end{equation}

Similarily, for the second order term

\begin{equation}\label{eqn:jacobianAndHessian:100}
\begin{aligned}
\evalbar{\frac{d^2 f(\Bx + \Ba t)}{dt^2}}{t = 0}
&=
\evalbar{\lr{
\frac{d}{dt}
\lr{
\sum_{i = 1}^n
a_i
\PD{(x_i + a_i t)}{f(\Bx + \Ba t)}
}
}
}{t = 0} \\
&=
\evalbar{
\lr{
\sum_{j = 1}^n
\frac{d(x_j + a_j t)}{dt}
\sum_{i = 1}^n
a_i
\frac{\partial^2 f(\Bx + \Ba t)}{\partial (x_j + a_j t) \partial (x_i + a_i t) }
}
}{t = 0} \\
&=
\sum_{i,j = 1}^n a_i a_j \frac{\partial^2 f}{\partial x_i \partial x_j} \\
&=
(\Ba \cdot \spacegrad)^2 f.
\end{aligned}
\end{equation}

The complete Taylor expansion of a scalar function \( f : {\mathbb{R}}^n \rightarrow {\mathbb{R}} \) is therefore

\begin{equation}\label{eqn:jacobianAndHessian:120}
f(\Bx + \Ba)
= f(\Bx) +
\Ba \cdot \spacegrad f +
\inv{2} \lr{ \Ba \cdot \spacegrad}^2 f + \cdots,
\end{equation}

so the Taylor expansion has an exponential structure

\begin{equation}\label{eqn:jacobianAndHessian:140}
f(\Bx + \Ba) = \sum_{k = 0}^\infty \inv{k!} \lr{ \Ba \cdot \spacegrad}^k f = e^{\Ba \cdot \spacegrad} f.
\end{equation}

Should an approximation of a vector valued function \( \Bf : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m \) be desired it is only required to form a matrix of the components

\begin{equation}\label{eqn:jacobianAndHessian:160}
\Bf(\Bx + \Ba)
= \Bf(\Bx) +
[\Ba \cdot \spacegrad f_i]_i +
\inv{2} [\lr{ \Ba \cdot \spacegrad}^2 f_i]_i + \cdots,
\end{equation}

where \( [.]_i \) denotes a column vector over the rows \( i \in [1,m] \), and \( f_i \) are the coordinates of \( \Bf \).

The Jacobian matrix

In [1] the Jacobian \( D \Bf \) of a function \( \Bf : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m \) is defined in terms of the limit of the \( l_2 \) norm ratio

\begin{equation}\label{eqn:jacobianAndHessian:180}
\frac{\Norm{\Bf(\Bz) – \Bf(\Bx) – (D \Bf) (\Bz – \Bx)}_2 }{ \Norm{\Bz – \Bx}_2 },
\end{equation}

with the statement that the function \( \Bf \) has a derivative if this limit exists. Here the Jacobian \( D \Bf \in {\mathbb{R}}^{m \times n} \) must be matrix valued.

Let \( \Bz = \Bx + \Ba \), so the first order expansion of \ref{eqn:jacobianAndHessian:160} is

\begin{equation}\label{eqn:jacobianAndHessian:200}
\Bf(\Bz)
= \Bf(\Bx) + [\lr{ \Bz – \Bx } \cdot \spacegrad f_i]_i
.
\end{equation}

With the (unproven) assumption that this Taylor expansion satisfies the norm limit criteria of \ref{eqn:jacobianAndHessian:180}, it is possible to extract the structure of the Jacobian by comparison

\begin{equation}\label{eqn:jacobianAndHessian:220}
\begin{aligned}
(D \Bf)
(\Bz – \Bx)
&=
{\begin{bmatrix}
\lr{ \Bz – \Bx } \cdot \spacegrad f_i
\end{bmatrix}}_i \\
&=
{\begin{bmatrix}
\sum_{j = 1}^n (z_j – x_j) \PD{x_j}{f_i}
\end{bmatrix}}_i \\
&=
{\begin{bmatrix}
\PD{x_j}{f_i}
\end{bmatrix}}_{ij}
(\Bz – \Bx),
\end{aligned}
\end{equation}

so
\begin{equation}\label{eqn:jacobianAndHessian:240}
\boxed{
(D \Bf)_{ij} = \PD{x_j}{f_i}
}
\end{equation}

Written out explictly as a matrix the Jacobian is

\begin{equation}\label{eqn:jacobianAndHessian:320}
D \Bf
=
\begin{bmatrix}
\PD{x_1}{f_1} & \PD{x_2}{f_1} & \cdots & \PD{x_n}{f_1} \\
\PD{x_1}{f_2} & \PD{x_2}{f_2} & \cdots & \PD{x_n}{f_2} \\
\vdots & \vdots & & \vdots \\
\PD{x_1}{f_m} & \PD{x_2}{f_m} & \cdots & \PD{x_n}{f_m} \\
\end{bmatrix}
=
\begin{bmatrix}
(\spacegrad f_1)^\T \\
(\spacegrad f_2)^\T \\
\vdots \\
(\spacegrad f_m)^\T
\end{bmatrix}.
\end{equation}

In particular, when the function is scalar valued
\begin{equation}\label{eqn:jacobianAndHessian:261}
D f = (\spacegrad f)^\T.
\end{equation}

With this notation, the first Taylor expansion, in terms of the Jacobian matrix is

\begin{equation}\label{eqn:jacobianAndHessian:260}
\boxed{
\Bf(\Bz)
\approx \Bf(\Bx) + (D \Bf) \lr{ \Bz – \Bx }.
}
\end{equation}

The Hessian matrix

For scalar valued functions, the text expresses the second order expansion of a function in terms of the Jacobian and Hessian matrices

\begin{equation}\label{eqn:jacobianAndHessian:271}
f(\Bz)
\approx f(\Bx) + (D f) \lr{ \Bz – \Bx }
+ \inv{2} \lr{ \Bz – \Bx }^\T (\spacegrad^2 f) \lr{ \Bz – \Bx }.
\end{equation}

Because \( \spacegrad^2 \) is the usual notation for a Laplacian operator, this \( \spacegrad^2 f \in {\mathbb{R}}^{n \times n}\) notation for the Hessian matrix is not ideal in my opinion. Ignoring that notational objection for this class, the structure of the Hessian matrix can be extracted by comparison with the coordinate expansion

\begin{equation}\label{eqn:jacobianAndHessian:300}
\Ba^\T (\spacegrad^2 f) \Ba
=
\sum_{r,s = 1}^n a_r a_s \frac{\partial^2 f}{\partial x_r \partial x_s}
\end{equation}

so
\begin{equation}\label{eqn:jacobianAndHessian:280}
\boxed{
(\spacegrad^2 f)_{ij}
=
\frac{\partial^2 f_i}{\partial x_i \partial x_j}.
}
\end{equation}

In explicit matrix form the Hessian is

\begin{equation}\label{eqn:jacobianAndHessian:340}
\spacegrad^2 f
=
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1 \partial x_1} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_1 \partial x_n} \\
\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2 \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_2 \partial x_n} \\
\vdots & \vdots & & \vdots \\
\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_n \partial x_n}
\end{bmatrix}.
\end{equation}

Is there a similar nice matrix structure for the Hessian of a function \( f : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m \)?

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[2] D. Hestenes. New Foundations for Classical Mechanics. Kluwer Academic Publishers, 1999.

A comparison of Geometric Algebra electrodynamic potential methods

January 7, 2017 math and physics play No comments , , , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Motivation

Geometric algebra (GA) allows for a compact description of Maxwell’s equations in either an explicit 3D representation or a STA (SpaceTime Algebra [2]) representation. The 3D GA and STA representations Maxwell’s equation both the form

\begin{equation}\label{eqn:potentialMethods:1280}
L \boldsymbol{\mathcal{F}} = J,
\end{equation}

where \( J \) represents the sources, \( L \) is a multivector gradient operator that includes partial derivative operator components for each of the space and time coordinates, and

\begin{equation}\label{eqn:potentialMethods:1020}
\boldsymbol{\mathcal{F}} = \boldsymbol{\mathcal{E}} + \eta I \boldsymbol{\mathcal{H}},
\end{equation}

is an electromagnetic field multivector, \( I = \Be_1 \Be_2 \Be_3 \) is the \R{3} pseudoscalar, and \( \eta = \sqrt{\mu/\epsilon} \) is the impedance of the media.

When Maxwell’s equations are extended to include magnetic sources in addition to conventional electric sources (as used in antenna-theory [1] and microwave engineering [3]), they take the form

\begin{equation}\label{eqn:chapter3Notes:20}
\spacegrad \cross \boldsymbol{\mathcal{E}} = – \boldsymbol{\mathcal{M}} – \PD{t}{\boldsymbol{\mathcal{B}}}
\end{equation}
\begin{equation}\label{eqn:chapter3Notes:40}
\spacegrad \cross \boldsymbol{\mathcal{H}} = \boldsymbol{\mathcal{J}} + \PD{t}{\boldsymbol{\mathcal{D}}}
\end{equation}
\begin{equation}\label{eqn:chapter3Notes:60}
\spacegrad \cdot \boldsymbol{\mathcal{D}} = q_{\textrm{e}}
\end{equation}
\begin{equation}\label{eqn:chapter3Notes:80}
\spacegrad \cdot \boldsymbol{\mathcal{B}} = q_{\textrm{m}}.
\end{equation}

The corresponding GA Maxwell equations in their respective 3D and STA forms are

\begin{equation}\label{eqn:potentialMethods:300}
\lr{ \spacegrad + \inv{v} \PD{t}{} } \boldsymbol{\mathcal{F}}
=
\eta
\lr{ v q_{\textrm{e}} – \boldsymbol{\mathcal{J}} }
+ I \lr{ v q_{\textrm{m}} – \boldsymbol{\mathcal{M}} }
\end{equation}
\begin{equation}\label{eqn:potentialMethods:320}
\grad \boldsymbol{\mathcal{F}} = \eta J – I M,
\end{equation}

where the wave group velocity in the medium is \( v = 1/\sqrt{\epsilon\mu} \), and the medium is isotropic with
\( \boldsymbol{\mathcal{B}} = \mu \boldsymbol{\mathcal{H}} \), and \( \boldsymbol{\mathcal{D}} = \epsilon \boldsymbol{\mathcal{E}} \). In the STA representation, \( \grad, J, M \) are all four-vectors, the specific meanings of which will be spelled out below.

How to determine the potential equations and the field representation using the conventional distinct Maxwell’s \ref{eqn:chapter3Notes:20}, … is well known. The basic procedure is to consider the electric and magnetic sources in turn, and observe that in each case one of the electric or magnetic fields must have a curl representation. The STA approach is similar, except that it can be observed that the field must have a four-curl representation for each type of source. In the explicit 3D GA formalism
\ref{eqn:potentialMethods:300} how to formulate a natural potential representation is not as obvious. There is no longer an reason to set any component of the field equal to a curl, and the representation of the four curl from the STA approach is awkward. Additionally, it is not obvious what form gauge invariance takes in the 3D GA representation.

Ideas explored in these notes

  • GA representation of Maxwell’s equations including magnetic sources.
  • STA GA formalism for Maxwell’s equations including magnetic sources.
  • Explicit form of the GA potential representation including both electric and magnetic sources.
  • Demonstration of exactly how the 3D and STA potentials are related.
  • Explore the structure of gauge transformations when magnetic sources are included.
  • Explore the structure of gauge transformations in the 3D GA formalism.
  • Specify the form of the Lorentz gauge in the 3D GA formalism.

Traditional vector algebra

No magnetic sources

When magnetic sources are omitted, it follows from \ref{eqn:chapter3Notes:80} that there is some \( \boldsymbol{\mathcal{A}}^{\mathrm{e}} \) for which

\begin{equation}\label{eqn:potentialMethods:20}
\boxed{
\boldsymbol{\mathcal{B}} = \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}},
}
\end{equation}

Substitution into Faraday’s law \ref{eqn:chapter3Notes:20} gives

\begin{equation}\label{eqn:potentialMethods:40}
\spacegrad \cross \boldsymbol{\mathcal{E}} = – \PD{t}{}\lr{ \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}} },
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:60}
\spacegrad \cross \lr{ \boldsymbol{\mathcal{E}} + \PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } = 0.
\end{equation}

A gradient representation of this curled quantity, say \( -\spacegrad \phi \), will provide the required zero

\begin{equation}\label{eqn:potentialMethods:80}
\boxed{
\boldsymbol{\mathcal{E}} = -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }.
}
\end{equation}

The final two Maxwell equations yield

\begin{equation}\label{eqn:potentialMethods:100}
\begin{aligned}
-\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \spacegrad \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} } &= \mu \lr{ \boldsymbol{\mathcal{J}} + \epsilon \PD{t}{} \lr{ -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } } \\
\spacegrad \cdot \lr{ -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } &= q_e/\epsilon,
\end{aligned}
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:120}
\boxed{
\begin{aligned}
\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{e}} – \inv{v^2} \PDSq{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \spacegrad \lr{
\inv{v^2} \PD{t}{\phi}
+\spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}}
}
&= -\mu \boldsymbol{\mathcal{J}} \\
\spacegrad^2 \phi + \PD{t}{} \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} } &= -q_e/\epsilon.
\end{aligned}
}
\end{equation}

Note that the Lorentz condition \( \PDi{t}{(\phi/v^2)} + \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} = 0 \) can be imposed to decouple these, leaving non-homogeneous wave equations for the vector and scalar potentials respectively.

No electric sources

Without electric sources, a curl representation of the electric field can be assumed, satisfying Gauss’s law

\begin{equation}\label{eqn:potentialMethods:140}
\boxed{
\boldsymbol{\mathcal{D}} = – \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}}.
}
\end{equation}

Substitution into the Maxwell-Faraday law gives
\begin{equation}\label{eqn:potentialMethods:160}
\spacegrad \cross \lr{ \boldsymbol{\mathcal{H}} + \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}} } = 0.
\end{equation}

This is satisfied with any gradient, say, \( -\spacegrad \phi_m \), providing a potential representation for the magnetic field

\begin{equation}\label{eqn:potentialMethods:180}
\boxed{
\boldsymbol{\mathcal{H}} = -\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}.
}
\end{equation}

The remaining Maxwell equations provide the required constraints on the potentials

\begin{equation}\label{eqn:potentialMethods:220}
-\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{m}} + \spacegrad \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } = -\epsilon
\lr{
-\boldsymbol{\mathcal{M}} – \mu \PD{t}{}
\lr{
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}
}
}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:240}
\spacegrad \cdot
\lr{
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}
}
= \inv{\mu} q_m,
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:260}
\boxed{
\begin{aligned}
\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{m}} – \inv{v^2} \PDSq{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}} – \spacegrad \lr{ \inv{v^2} \PD{t}{\phi_m} + \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } &= -\epsilon \boldsymbol{\mathcal{M}} \\
\spacegrad^2 \phi_m + \PD{t}{}\lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } &= -\inv{\mu} q_m.
\end{aligned}
}
\end{equation}

The general solution to Maxwell’s equations is therefore
\begin{equation}\label{eqn:potentialMethods:280}
\begin{aligned}
\boldsymbol{\mathcal{E}} &=
-\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \inv{\epsilon} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
\boldsymbol{\mathcal{H}} &=
\inv{\mu} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}}
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}},
\end{aligned}
\end{equation}

subject to the constraints \ref{eqn:potentialMethods:120} and \ref{eqn:potentialMethods:260}.

Potential operator structure

Knowing that there is a simple underlying structure to the potential representation of the electromagnetic field in the STA formalism inspires the question of whether that structure can be found directly using the scalar and vector potentials determined above.

Specifically, what is the multivector representation \ref{eqn:potentialMethods:1020} of the electromagnetic field in terms of all the individual potential variables, and can an underlying structure for that field representation be found? The composite field is

\begin{equation}\label{eqn:potentialMethods:280b}
\boldsymbol{\mathcal{F}}
=
-\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \inv{\epsilon} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
+ I \eta
\lr{
\inv{\mu} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}}
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}
}.
\end{equation}

Can this be factored into into multivector operator and multivector potentials? Expanding the cross products provides some direction

\begin{equation}\label{eqn:potentialMethods:1040}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
– \PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \eta \PD{t}{I \boldsymbol{\mathcal{A}}^{\mathrm{m}}}
– \spacegrad \lr{ \phi – \eta I \phi_m } \\
&\quad + \frac{\eta}{2 \mu} \lr{ \rspacegrad \boldsymbol{\mathcal{A}}^{\mathrm{e}} – \boldsymbol{\mathcal{A}}^{\mathrm{e}} \lspacegrad }
+ \frac{1}{2 \epsilon} \lr{ \rspacegrad I \boldsymbol{\mathcal{A}}^{\mathrm{m}} – I \boldsymbol{\mathcal{A}}^{\mathrm{m}} \lspacegrad }.
\end{aligned}
\end{equation}

Observe that the
gradient and the time partials can be grouped together

\begin{equation}\label{eqn:potentialMethods:1060}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
– \PD{t}{ } \lr{\boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \boldsymbol{\mathcal{A}}^{\mathrm{m}}}
– \spacegrad \lr{ \phi + \eta I \phi_m }
+ \frac{v}{2} \lr{ \rspacegrad (\boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \boldsymbol{\mathcal{A}}^{\mathrm{m}}) – (\boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \boldsymbol{\mathcal{A}}^{\mathrm{m}}) \lspacegrad } \\
&=
\inv{2} \lr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} } \lr{ v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}} }

\lr{ v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}} \lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
} \\
&+\quad \inv{2} \lr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} } \lr{ -\phi – \eta I \phi_m }
– \lr{ \phi + \eta I \phi_m } \lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
}
,
\end{aligned}
\end{equation}

or

\begin{equation}\label{eqn:potentialMethods:1080}
\boxed{
\boldsymbol{\mathcal{F}}
=
\inv{2} \Biglr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} }
\lr{
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
}

\lr{
\phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
+ \eta I \phi_m
}
\lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
}
.
}
\end{equation}

There’s a conjugate structure to the potential on each side of the curl operation where we see a sign change for the scalar and pseudoscalar elements only. The reason for this becomes more clear in the STA formalism.

Potentials in the STA formalism.

Maxwell’s equation in its explicit 3D form \ref{eqn:potentialMethods:300} can be
converted to STA form, by introducing a four-vector basis \( \setlr{ \gamma_\mu } \), where the spatial basis
\( \setlr{ \Be_k = \gamma_k \gamma_0 } \)
is expressed in terms of the Dirac basis \( \setlr{ \gamma_\mu } \).
By multiplying from the left with \( \gamma_0 \) a STA form of Maxwell’s equation
\ref{eqn:potentialMethods:320}
is obtained,
where
\begin{equation}\label{eqn:potentialMethods:340}
\begin{aligned}
J &= \gamma^\mu J_\mu = ( v q_e, \boldsymbol{\mathcal{J}} ) \\
M &= \gamma^\mu M_\mu = ( v q_m, \boldsymbol{\mathcal{M}} ) \\
\grad &= \gamma^\mu \partial_\mu = ( (1/v) \partial_t, \spacegrad ) \\
I &= \gamma_0 \gamma_1 \gamma_2 \gamma_3,
\end{aligned}
\end{equation}

Here the metric choice is \( \gamma_0^2 = 1 = -\gamma_k^2 \). Note that in this representation the electromagnetic field \( \boldsymbol{\mathcal{F}} = \boldsymbol{\mathcal{E}} + \eta I \boldsymbol{\mathcal{H}} \) is a bivector, not a multivector as it is explicit (frame dependent) 3D representation of \ref{eqn:potentialMethods:300}.

A potential representation can be obtained as before by considering electric and magnetic sources in sequence and using superposition to assemble a complete potential.

No magnetic sources

Without magnetic sources, Maxwell’s equation splits into vector and trivector terms of the form

\begin{equation}\label{eqn:potentialMethods:380}
\grad \cdot \boldsymbol{\mathcal{F}} = \eta J
\end{equation}
\begin{equation}\label{eqn:potentialMethods:400}
\grad \wedge \boldsymbol{\mathcal{F}} = 0.
\end{equation}

A four-vector curl representation of the field will satisfy \ref{eqn:potentialMethods:400} allowing an immediate potential solution

\begin{equation}\label{eqn:potentialMethods:560}
\boxed{
\begin{aligned}
&\boldsymbol{\mathcal{F}} = \grad \wedge {A^{\mathrm{e}}} \\
&\grad^2 {A^{\mathrm{e}}} – \grad \lr{ \grad \cdot {A^{\mathrm{e}}} } = \eta J.
\end{aligned}
}
\end{equation}

This can be put into correspondence with \ref{eqn:potentialMethods:120} by noting that

\begin{equation}\label{eqn:potentialMethods:460}
\begin{aligned}
\grad^2 &= (\gamma^\mu \partial_\mu) \cdot (\gamma^\nu \partial_\nu) = \inv{v^2} \partial_{tt} – \spacegrad^2 \\
\gamma_0 {A^{\mathrm{e}}} &= \gamma_0 \gamma^\mu {A^{\mathrm{e}}}_\mu = {A^{\mathrm{e}}}_0 + \Be_k {A^{\mathrm{e}}}_k = {A^{\mathrm{e}}}_0 + \BA^{\mathrm{e}} \\
\gamma_0 \grad &= \gamma_0 \gamma^\mu \partial_\mu = \inv{v} \partial_t + \spacegrad \\
\grad \cdot {A^{\mathrm{e}}} &= \partial_\mu {A^{\mathrm{e}}}^\mu = \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}},
\end{aligned}
\end{equation}

so multiplying from the left with \( \gamma_0 \) gives

\begin{equation}\label{eqn:potentialMethods:480}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \lr{ {A^{\mathrm{e}}}_0 + \BA^{\mathrm{e}} } – \lr{ \inv{v} \partial_t + \spacegrad }\lr{ \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}} } = \eta( v q_e – \boldsymbol{\mathcal{J}} ),
\end{equation}

or

\begin{equation}\label{eqn:potentialMethods:520}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \BA^{\mathrm{e}} – \spacegrad \lr{ \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}} } = -\eta \boldsymbol{\mathcal{J}}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:540}
\spacegrad^2 {A^{\mathrm{e}}}_0 – \inv{v} \partial_t \lr{ \spacegrad \cdot \BA^{\mathrm{e}} } = -q_e/\epsilon.
\end{equation}

So \( {A^{\mathrm{e}}}_0 = \phi \) and \( -\ifrac{\BA^{\mathrm{e}}}{v} = \boldsymbol{\mathcal{A}}^{\mathrm{e}} \), or

\begin{equation}\label{eqn:potentialMethods:600}
\boxed{
{A^{\mathrm{e}}} = \gamma_0\lr{ \phi – v \boldsymbol{\mathcal{A}}^{\mathrm{e}} }.
}
\end{equation}

No electric sources

Without electric sources, Maxwell’s equation now splits into

\begin{equation}\label{eqn:potentialMethods:640}
\grad \cdot \boldsymbol{\mathcal{F}} = 0
\end{equation}
\begin{equation}\label{eqn:potentialMethods:660}
\grad \wedge \boldsymbol{\mathcal{F}} = -I M.
\end{equation}

Here the dual of an STA curl yields a solution

\begin{equation}\label{eqn:potentialMethods:680}
\boxed{
\boldsymbol{\mathcal{F}} = I ( \grad \wedge {A^{\mathrm{m}}} ).
}
\end{equation}

Substituting this gives

\begin{equation}\label{eqn:potentialMethods:720}
\begin{aligned}
0
&=
\grad \cdot (I ( \grad \wedge {A^{\mathrm{m}}} ) ) \\
&=
\gpgradeone{ \grad I ( \grad \wedge {A^{\mathrm{m}}} ) } \\
&=
-I \grad \wedge ( \grad \wedge {A^{\mathrm{m}}} ).
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:740}
\begin{aligned}
-I M
&=
\grad \wedge (I ( \grad \wedge {A^{\mathrm{m}}} ) ) \\
&=
\gpgradethree{ \grad I ( \grad \wedge {A^{\mathrm{m}}} ) } \\
&=
-I \grad \cdot ( \grad \wedge {A^{\mathrm{m}}} ).
\end{aligned}
\end{equation}

The \( \grad \cdot \boldsymbol{\mathcal{F}} \) relation \ref{eqn:potentialMethods:720} is identically zero as desired, leaving

\begin{equation}\label{eqn:potentialMethods:760}
\boxed{
\grad^2 {A^{\mathrm{m}}} – \grad \lr{ \grad \cdot {A^{\mathrm{m}}} }
=
M.
}
\end{equation}

So the general solution with both electric and magnetic sources is

\begin{equation}\label{eqn:potentialMethods:800}
\boxed{
\boldsymbol{\mathcal{F}} = \grad \wedge {A^{\mathrm{e}}} + I (\grad \wedge {A^{\mathrm{m}}}),
}
\end{equation}

subject to the constraints of \ref{eqn:potentialMethods:560} and \ref{eqn:potentialMethods:760}. As before the four-potential \( {A^{\mathrm{m}}} \) can be put into correspondence with the conventional scalar and vector potentials by left multiplying with \( \gamma_0 \), which gives

\begin{equation}\label{eqn:potentialMethods:820}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \lr{ {A^{\mathrm{m}}}_0 + \BA^{\mathrm{m}} } – \lr{ \inv{v} \partial_t + \spacegrad }\lr{ \inv{v} \partial_t {A^{\mathrm{m}}}_0 – \spacegrad \cdot \BA^{\mathrm{m}} } = v q_m – \boldsymbol{\mathcal{M}},
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:860}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \BA^{\mathrm{m}} – \spacegrad \lr{ \inv{v} \partial_t {A^{\mathrm{m}}}_0 – \spacegrad \cdot \BA^{\mathrm{m}} } = – \boldsymbol{\mathcal{M}}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:880}
\spacegrad^2 {A^{\mathrm{m}}}_0 – \inv{v} \partial_t \spacegrad \cdot \BA^{\mathrm{m}} = -v q_m.
\end{equation}

Comparing with \ref{eqn:potentialMethods:260} shows that \( {A^{\mathrm{m}}}_0/v = \mu \phi_m \) and \( -\ifrac{\BA^{\mathrm{m}}}{v^2} = \mu \boldsymbol{\mathcal{A}}^{\mathrm{m}} \), or

\begin{equation}\label{eqn:potentialMethods:900}
\boxed{
{A^{\mathrm{m}}} = \gamma_0 \eta \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} }.
}
\end{equation}

Potential operator structure

Observe that there is an underlying uniform structure of the differential operator that acts on the potential to produce the electromagnetic field. Expressed as a linear operator of the
gradient and the potentials, that is

\( \boldsymbol{\mathcal{F}} = L(\lrgrad, {A^{\mathrm{e}}}, {A^{\mathrm{m}}}) \)

\begin{equation}\label{eqn:potentialMethods:980}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
L(\grad, {A^{\mathrm{e}}}, {A^{\mathrm{m}}}) \\
&= \grad \wedge {A^{\mathrm{e}}} + I (\grad \wedge {A^{\mathrm{m}}}) \\
&=
\inv{2} \lr{ \rgrad {A^{\mathrm{e}}} – {A^{\mathrm{e}}} \lgrad }
+ \frac{I}{2} \lr{ \rgrad {A^{\mathrm{m}}} – {A^{\mathrm{m}}} \lgrad } \\
&=
\inv{2} \lr{ \rgrad {A^{\mathrm{e}}} – {A^{\mathrm{e}}} \lgrad }
+ \frac{1}{2} \lr{ -\rgrad I {A^{\mathrm{m}}} – I {A^{\mathrm{m}}} \lgrad } \\
&=
\inv{2} \lr{ \rgrad ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}}) – ({A^{\mathrm{e}}} + I {A^{\mathrm{m}}}) \lgrad }
,
\end{aligned}
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:1000}
\boxed{
\boldsymbol{\mathcal{F}}
=
\inv{2} \lr{ \rgrad ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}}) – ({A^{\mathrm{e}}} – I {A^{\mathrm{m}}})^\dagger \lgrad }
.
}
\end{equation}

Observe that \ref{eqn:potentialMethods:1000} can be
put into correspondence with \ref{eqn:potentialMethods:1080} using a factoring of unity \( 1 = \gamma_0 \gamma_0 \)

\begin{equation}\label{eqn:potentialMethods:1100}
\boldsymbol{\mathcal{F}}
=
\inv{2} \lr{ (-\rgrad \gamma_0) (-\gamma_0 ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}})) – (({A^{\mathrm{e}}} + I {A^{\mathrm{m}}}) \gamma_0)(\gamma_0 \lgrad) },
\end{equation}

where

\begin{equation}\label{eqn:potentialMethods:1140}
\begin{aligned}
-\grad \gamma_0
&=
-(\gamma^0 \partial_0 + \gamma^k \partial_k) \gamma_0 \\
&=
-\partial_0 – \gamma^k \gamma_0 \partial_k \\
&=
\spacegrad
-\inv{v} \partial_t
,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:1160}
\begin{aligned}
\gamma_0 \grad
&=
\gamma_0 (\gamma^0 \partial_0 + \gamma^k \partial_k) \\
&=
\partial_0 – \gamma^k \gamma_0 \partial_k \\
&=
\spacegrad
+ \inv{v} \partial_t
,
\end{aligned}
\end{equation}

and
\begin{equation}\label{eqn:potentialMethods:1200}
\begin{aligned}
-\gamma_0 ( {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} )
&=
-\gamma_0 \gamma_0 \lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \\
&=
-\lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \phi_m – \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}} } \\
&=
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:1220}
\begin{aligned}
( {A^{\mathrm{e}}} + I {A^{\mathrm{m}}} )\gamma_0
&=
\lr{ \gamma_0 \lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} } + I \gamma_0 \eta \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \gamma_0 \\
&=
\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \phi_m + I \eta v \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
&=
\phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}
+ \eta I \phi_m
,
\end{aligned}
\end{equation}

This recovers \ref{eqn:potentialMethods:1080} as desired.

Potentials in the 3D Euclidean formalism

In the conventional scalar plus vector differential representation of Maxwell’s equations \ref{eqn:chapter3Notes:20}…, given electric(magnetic) sources the structure of the electric(magnetic) potential follows from first setting the magnetic(electric) field equal to the curl of a vector potential. The procedure for the STA GA form of Maxwell’s equation was similar, where it was immediately evident that the field could be set to the four-curl of a four-vector potential (or the dual of such a curl for magnetic sources).

In the 3D GA representation, there is no immediate rationale for introducing a curl or the equivalent to a four-curl representation of the field. Reconciliation of this is possible by recognizing that the fact that the field (or a component of it) may be represented by a curl is not actually fundamental. Instead, observe that the two sided gradient action on a potential to generate the electromagnetic field in the STA representation of \ref{eqn:potentialMethods:1000} serves to select the grade two component product of the gradient and the multivector potential \( {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} \), and that this can in fact be written as
a single sided gradient operation on a potential, provided the multivector product is filtered with a four-bivector grade selection operation

\begin{equation}\label{eqn:potentialMethods:1240}
\boxed{
\boldsymbol{\mathcal{F}} = \gpgradetwo{ \grad \lr{ {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} } }.
}
\end{equation}

Similarly, it can be observed that the
specific function of the conjugate structure in the two sided potential representation of
\ref{eqn:potentialMethods:1080}
is to discard all the scalar and pseudoscalar grades in the multivector product. This means that a single sided potential can also be used, provided it is wrapped in a grade selection operation

\begin{equation}\label{eqn:potentialMethods:1260}
\boxed{
\boldsymbol{\mathcal{F}} =
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} }
\lr{
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
} }{1,2}.
}
\end{equation}

It is this grade selection operation that is really the fundamental defining action in the potential of the STA and conventional 3D representations of Maxwell’s equations. So, given Maxwell’s equation in the 3D GA representation, defining a potential representation for the field is really just a demand that the field have the structure

\begin{equation}\label{eqn:potentialMethods:1320}
\boldsymbol{\mathcal{F}} = \gpgrade{ (\alpha \spacegrad + \beta \partial_t)( A_0 + A_1 + I( A_0′ + A_1′ ) }{1,2}.
\end{equation}

This is a mandate that the electromagnetic field is the grades 1 and 2 components of the vector product of space and time derivative operators on a multivector field \( A = \sum_{k=0}^3 A_k = A_0 + A_1 + I( A_0′ + A_1′ ) \) that can potentially have any grade components. There are more degrees of freedom in this specification than required, since the multivector can absorb one of the \( \alpha \) or \( \beta \) coefficients, so without loss of generality, one of these (say \( \alpha\)) can be set to 1.

Expanding \ref{eqn:potentialMethods:1320} gives

\begin{equation}\label{eqn:potentialMethods:1340}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
\spacegrad A_0
+ \beta \partial_t A_1
– \spacegrad \cross A_1′
+ I (\spacegrad \cross A_1
+ \beta \partial_t A_1′
+ \spacegrad A_0′) \\
&=
\boldsymbol{\mathcal{E}} + I \eta \boldsymbol{\mathcal{H}}.
\end{aligned}
\end{equation}

This naturally has all the right mixes of curls, gradients and time derivatives, all following as direct consequences of applying a grade selection operation to the action of a “spacetime gradient” on a general multivector potential.

The conclusion is that the potential representation of the field is

\begin{equation}\label{eqn:potentialMethods:1360}
\boldsymbol{\mathcal{F}} =
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A }{1,2},
\end{equation}

where \( A \) is a multivector potentially containing all grades, where grades 0,1 are required for electric sources, and grades 2,3 are required for magnetic sources. When it is desirable to refer back to the conventional scalar and vector potentials this multivector potential can be written as \( A = -\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ -\phi_m + v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } \).

Gauge transformations

Recall that for electric sources the magnetic field is of the form

\begin{equation}\label{eqn:potentialMethods:1380}
\boldsymbol{\mathcal{B}} = \spacegrad \cross \boldsymbol{\mathcal{A}},
\end{equation}

so adding the gradient of any scalar field to the potential \( \boldsymbol{\mathcal{A}}’ = \boldsymbol{\mathcal{A}} + \spacegrad \psi \)
does not change the magnetic field

\begin{equation}\label{eqn:potentialMethods:1400}
\begin{aligned}
\boldsymbol{\mathcal{B}}’
&= \spacegrad \cross \lr{ \boldsymbol{\mathcal{A}} + \spacegrad \psi } \\
&= \spacegrad \cross \boldsymbol{\mathcal{A}} \\
&= \boldsymbol{\mathcal{B}}.
\end{aligned}
\end{equation}

The electric field with this changed potential is

\begin{equation}\label{eqn:potentialMethods:1420}
\begin{aligned}
\boldsymbol{\mathcal{E}}’
&= -\spacegrad \phi – \partial_t \lr{ \BA + \spacegrad \psi} \\
&= -\spacegrad \lr{ \phi + \partial_t \psi } – \partial_t \BA,
\end{aligned}
\end{equation}

so if
\begin{equation}\label{eqn:potentialMethods:1440}
\phi = \phi’ – \partial_t \psi,
\end{equation}

the electric field will also be unaltered by this transformation.

In the STA representation, the field can similarly be altered by adding any (four)gradient to the potential. For example with only electric sources

\begin{equation}\label{eqn:potentialMethods:1460}
\boldsymbol{\mathcal{F}} = \grad \wedge (A + \grad \psi) = \grad \wedge A
\end{equation}

and for electric or magnetic sources

\begin{equation}\label{eqn:potentialMethods:1480}
\boldsymbol{\mathcal{F}} = \gpgradetwo{ \grad (A + \grad \psi) } = \gpgradetwo{ \grad A }.
\end{equation}

In the 3D GA representation, where the field is given by \ref{eqn:potentialMethods:1360}, there is no field that is being curled to add a gradient to. However, if the scalar and vector potentials transform as

\begin{equation}\label{eqn:potentialMethods:1500}
\begin{aligned}
\boldsymbol{\mathcal{A}} &\rightarrow \boldsymbol{\mathcal{A}} + \spacegrad \psi \\
\phi &\rightarrow \phi – \partial_t \psi,
\end{aligned}
\end{equation}

then the multivector potential transforms as
\begin{equation}\label{eqn:potentialMethods:1520}
-\phi + v \boldsymbol{\mathcal{A}}
\rightarrow -\phi + v \boldsymbol{\mathcal{A}} + \partial_t \psi + v \spacegrad \psi,
\end{equation}

so the electromagnetic field is unchanged when the multivector potential is transformed as

\begin{equation}\label{eqn:potentialMethods:1540}
A \rightarrow A + \lr{ \spacegrad + \inv{v} \partial_t } \psi,
\end{equation}

where \( \psi \) is any field that has scalar or pseudoscalar grades. Viewed in terms of grade selection, this makes perfect sense, since the transformed field is

\begin{equation}\label{eqn:potentialMethods:1560}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&\rightarrow
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } \lr{ A + \lr{ \spacegrad + \inv{v} \partial_t } \psi } }{1,2} \\
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A + \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi }{1,2} \\
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A }{1,2}.
\end{aligned}
\end{equation}

The \( \psi \) contribution to the grade selection operator is killed because it has scalar or pseudoscalar grades.

Lorenz gauge

Maxwell’s equations are completely decoupled if the potential can be found such that

\begin{equation}\label{eqn:potentialMethods:1580}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A }{1,2} \\
&=
\lr{ \spacegrad – \inv{v} \PD{t}{} } A.
\end{aligned}
\end{equation}

When this is the case, Maxwell’s equations are reduced to four non-homogeneous potential wave equations

\begin{equation}\label{eqn:potentialMethods:1620}
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } A = J,
\end{equation}

that is

\begin{equation}\label{eqn:potentialMethods:1600}
\begin{aligned}
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \phi &= – \inv{\epsilon} q_e \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \boldsymbol{\mathcal{A}}^{\mathrm{e}} &= – \mu \boldsymbol{\mathcal{J}} \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \phi_m &= – \frac{I}{\mu} q_m \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \boldsymbol{\mathcal{A}}^{\mathrm{m}} &= – I \epsilon \boldsymbol{\mathcal{M}}.
\end{aligned}
\end{equation}

There should be no a-priori assumption that such a field representation has no scalar, nor no pseudoscalar components. That explicit expansion in grades is

\begin{equation}\label{eqn:potentialMethods:1640}
\begin{aligned}
\lr{ \spacegrad – \inv{v} \PD{t}{} } A
&=
\lr{ \spacegrad – \inv{v} \PD{t}{} } \lr{ -\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ -\phi_m + v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \\
&=
\inv{v} \partial_t \phi
+ v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&-\spacegrad \phi
+ I \eta v \spacegrad \wedge \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \partial_t \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&+ v \spacegrad \wedge \boldsymbol{\mathcal{A}}^{\mathrm{e}}
– \eta I \spacegrad \phi_m
– I \eta \partial_t \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
&+ \eta I \inv{v} \partial_t \phi_m
+ I \eta v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}},
\end{aligned}
\end{equation}

so if this potential representation has only vector and bivector grades, it must be true that

\begin{equation}\label{eqn:potentialMethods:1660}
\begin{aligned}
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} &= 0 \\
\inv{v} \partial_t \phi_m + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} &= 0.
\end{aligned}
\end{equation}

The first is the well known Lorenz gauge condition, whereas the second is the dual of that condition for magnetic sources.

Should one of these conditions, say the Lorenz condition for the electric source potentials, be non-zero, then it is possible to make a potential transformation for which this condition is zero

\begin{equation}\label{eqn:potentialMethods:1680}
\begin{aligned}
0
&\ne
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&=
\inv{v} \partial_t (\phi’ – \partial_t \psi) + v \spacegrad \cdot (\boldsymbol{\mathcal{A}}’ + \spacegrad \psi) \\
&=
\inv{v} \partial_t \phi’ + v \spacegrad \boldsymbol{\mathcal{A}}’
+ v \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi,
\end{aligned}
\end{equation}

so if \( \inv{v} \partial_t \phi’ + v \spacegrad \boldsymbol{\mathcal{A}}’ \) is zero, \( \psi \) must be found such that
\begin{equation}\label{eqn:potentialMethods:1700}
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}}
= v \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi.
\end{equation}

References

[1] Constantine A Balanis. Antenna theory: analysis and design. John Wiley \& Sons, 3rd edition, 2005.

[2] C. Doran and A.N. Lasenby. Geometric algebra for physicists. Cambridge University Press New York, Cambridge, UK, 1st edition, 2003.

[3] David M Pozar. Microwave engineering. John Wiley \& Sons, 2009.

Transverse gauge

November 16, 2016 math and physics play No comments , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Jackson [1] has an interesting presentation of the transverse gauge. I’d like to walk through the details of this, but first want to translate the preliminaries to SI units (if I had the 3rd edition I’d not have to do this translation step).

Gauge freedom

The starting point is noting that \( \spacegrad \cdot \BB = 0 \) the magnetic field can be expressed as a curl

\begin{equation}\label{eqn:transverseGauge:20}
\BB = \spacegrad \cross \BA.
\end{equation}

Faraday’s law now takes the form
\begin{equation}\label{eqn:transverseGauge:40}
\begin{aligned}
0
&= \spacegrad \cross \BE + \PD{t}{\BB} \\
&= \spacegrad \cross \BE + \PD{t}{} \lr{ \spacegrad \cross \BA } \\
&= \spacegrad \cross \lr{ \BE + \PD{t}{\BA} }.
\end{aligned}
\end{equation}

Because this curl is zero, the interior sum can be expressed as a gradient

\begin{equation}\label{eqn:transverseGauge:60}
\BE + \PD{t}{\BA} \equiv -\spacegrad \Phi.
\end{equation}

This can now be substituted into the remaining two Maxwell’s equations.

\begin{equation}\label{eqn:transverseGauge:80}
\begin{aligned}
\spacegrad \cdot \BD &= \rho_v \\
\spacegrad \cross \BH &= \BJ + \PD{t}{\BD} \\
\end{aligned}
\end{equation}

For Gauss’s law, in simple media, we have

\begin{equation}\label{eqn:transverseGauge:140}
\begin{aligned}
\rho_v
&=
\epsilon \spacegrad \cdot \BE \\
&=
\epsilon \spacegrad \cdot \lr{ -\spacegrad \Phi – \PD{t}{\BA} }
\end{aligned}
\end{equation}

For simple media again, the Ampere-Maxwell equation is

\begin{equation}\label{eqn:transverseGauge:100}
\inv{\mu} \spacegrad \cross \lr{ \spacegrad \cross \BA } = \BJ + \epsilon \PD{t}{} \lr{ -\spacegrad \Phi – \PD{t}{\BA} }.
\end{equation}

Expanding \( \spacegrad \cross \lr{ \spacegrad \cross \BA } = -\spacegrad^2 \BA + \spacegrad \lr{ \spacegrad \cdot \BA } \) gives
\begin{equation}\label{eqn:transverseGauge:120}
-\spacegrad^2 \BA + \spacegrad \lr{ \spacegrad \cdot \BA } + \epsilon \mu \PDSq{t}{\BA} = \mu \BJ – \epsilon \mu \spacegrad \PD{t}{\Phi}.
\end{equation}

Maxwell’s equations are now reduced to
\begin{equation}\label{eqn:transverseGauge:180}
\boxed{
\begin{aligned}
\spacegrad^2 \BA – \spacegrad \lr{ \spacegrad \cdot \BA + \epsilon \mu \PD{t}{\Phi}} – \epsilon \mu \PDSq{t}{\BA} &= -\mu \BJ \\
\spacegrad^2 \Phi + \PD{t}{\spacegrad \cdot \BA} &= -\frac{\rho_v }{\epsilon}.
\end{aligned}
}
\end{equation}

There are two obvious constraints that we can impose
\begin{equation}\label{eqn:transverseGauge:200}
\spacegrad \cdot \BA – \epsilon \mu \PD{t}{\Phi} = 0,
\end{equation}

or
\begin{equation}\label{eqn:transverseGauge:220}
\spacegrad \cdot \BA = 0.
\end{equation}

The first constraint is the Lorentz gauge, which I’ve played with previously. It happens to be really nice in a relativistic context since, in vacuum with a four-vector potential \( A = (\Phi/c, \BA) \), that is a requirement that the four-divergence of the four-potential vanishes (\( \partial_\mu A^\mu = 0 \)).

Transverse gauge

Jackson identifies the latter constraint as the transverse gauge, which I’m less familiar with. With this gauge selection, we have

\begin{equation}\label{eqn:transverseGauge:260}
\spacegrad^2 \BA – \epsilon \mu \PDSq{t}{\BA} = -\mu \BJ + \epsilon\mu \spacegrad \PD{t}{\Phi}
\end{equation}
\begin{equation}\label{eqn:transverseGauge:280}
\spacegrad^2 \Phi = -\frac{\rho_v }{\epsilon}.
\end{equation}

What’s not obvious is the fact that the irrotational (zero curl) contribution due to \(\Phi\) in \ref{eqn:transverseGauge:260} cancels the corresponding irrotational term from the current. Jackson uses a transverse and longitudinal decomposition of the current, related to the Helmholtz theorem to allude to this.

That decomposition follows from expanding \( \spacegrad^2 J/R \) in two ways using the delta function \( -4 \pi \delta(\Bx – \Bx’) = \spacegrad^2 1/R \) representation, as well as directly

\begin{equation}\label{eqn:transverseGauge:300}
\begin{aligned}
– 4 \pi \BJ(\Bx)
&=
\int \spacegrad^2 \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
\spacegrad
\int \spacegrad \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
+
\spacegrad \cdot
\int \spacegrad \wedge \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
-\spacegrad
\int \BJ(\Bx’) \cdot \spacegrad’ \inv{\Abs{\Bx – \Bx’}} d^3 x’
+
\spacegrad \cdot \lr{ \spacegrad \wedge
\int \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
} \\
&=
-\spacegrad
\int \spacegrad’ \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
+\spacegrad
\int \frac{\spacegrad’ \cdot \BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’

\spacegrad \cross \lr{
\spacegrad \cross
\int \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
}
\end{aligned}
\end{equation}

The first term can be converted to a surface integral

\begin{equation}\label{eqn:transverseGauge:320}
-\spacegrad
\int \spacegrad’ \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
=
-\spacegrad
\int d\BA’ \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}},
\end{equation}

so provided the currents are either localized or \( \Abs{\BJ}/R \rightarrow 0 \) on an infinite sphere, we can make the identification

\begin{equation}\label{eqn:transverseGauge:340}
\BJ(\Bx)
=
-\spacegrad \inv{4 \pi} \int \frac{\spacegrad’ \cdot \BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
+
\spacegrad \cross \spacegrad \cross \inv{4 \pi} \int \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
\equiv
\BJ_l +
\BJ_t,
\end{equation}

where \( \spacegrad \cross \BJ_l = 0 \) (irrotational, or longitudinal), whereas \( \spacegrad \cdot \BJ_t = 0 \) (solenoidal or transverse). The irrotational property is clear from inspection, and the transverse property can be verified readily

\begin{equation}\label{eqn:transverseGauge:360}
\begin{aligned}
\spacegrad \cdot \lr{ \spacegrad \cross \lr{ \spacegrad \cross \BX } }
&=
-\spacegrad \cdot \lr{ \spacegrad \cdot \lr{ \spacegrad \wedge \BX } } \\
&=
-\spacegrad \cdot \lr{ \spacegrad^2 \BX – \spacegrad \lr{ \spacegrad \cdot \BX } } \\
&=
-\spacegrad \cdot \lr{\spacegrad^2 \BX} + \spacegrad^2 \lr{ \spacegrad \cdot \BX } \\
&= 0.
\end{aligned}
\end{equation}

Since

\begin{equation}\label{eqn:transverseGauge:380}
\Phi(\Bx, t)
=
\inv{4 \pi \epsilon} \int \frac{\rho_v(\Bx’, t)}{\Abs{\Bx – \Bx’}} d^3 x’,
\end{equation}

we have

\begin{equation}\label{eqn:transverseGauge:400}
\begin{aligned}
\spacegrad \PD{t}{\Phi}
&=
\inv{4 \pi \epsilon} \spacegrad \int \frac{\partial_t \rho_v(\Bx’, t)}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
\inv{4 \pi \epsilon} \spacegrad \int \frac{-\spacegrad’ \cdot \BJ}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
\frac{\BJ_l}{\epsilon}.
\end{aligned}
\end{equation}

This means that the Ampere-Maxwell equation takes the form

\begin{equation}\label{eqn:transverseGauge:420}
\spacegrad^2 \BA – \epsilon \mu \PDSq{t}{\BA}
= -\mu \BJ + \mu \BJ_l
= -\mu \BJ_t.
\end{equation}

This justifies the transverse in the label transverse gauge.

References

[1] JD Jackson. Classical Electrodynamics. John Wiley and Sons, 2nd edition, 1975.

Calculating the magnetostatic field from the moment

November 14, 2016 math and physics play No comments , , , , , ,

[Click here for a PDF of this post with nicer formatting]

The vector potential, to first order, for a magnetostatic localized current distribution was found to be

\begin{equation}\label{eqn:magneticFieldFromMoment:20}
\BA(\Bx) = \frac{\mu_0}{4 \pi} \frac{\Bm \cross \Bx}{\Abs{\Bx}^3}.
\end{equation}

Initially, I tried to calculate the magnetic field from this, but ran into trouble. Here’s a new try.

\begin{equation}\label{eqn:magneticFieldFromMoment:40}
\begin{aligned}
\BB
&=
\frac{\mu_0}{4 \pi}
\spacegrad \cross \lr{ \Bm \cross \frac{\Bx}{r^3} } \\
&=
-\frac{\mu_0}{4 \pi}
\spacegrad \cdot \lr{ \Bm \wedge \frac{\Bx}{r^3} } \\
&=
-\frac{\mu_0}{4 \pi}
\lr{
(\Bm \cdot \spacegrad) \frac{\Bx}{r^3}
-\Bm \spacegrad \cdot \frac{\Bx}{r^3}
} \\
&=
\frac{\mu_0}{4 \pi}
\lr{
-\frac{(\Bm \cdot \spacegrad) \Bx}{r^3}
– \lr{ \Bm \cdot \lr{\spacegrad \inv{r^3} }} \Bx
+\Bm (\spacegrad \cdot \Bx) \inv{r^3}
+\Bm \lr{\spacegrad \inv{r^3} } \cdot \Bx
}.
\end{aligned}
\end{equation}

Here I’ve used \( \Ba \cross \lr{ \Bb \cross \Bc } = -\Ba \cdot \lr{ \Bb \wedge \Bc } \), and then expanded that with \( \Ba \cdot \lr{ \Bb \wedge \Bc } = (\Ba \cdot \Bb) \Bc – (\Ba \cdot \Bc) \Bb \). Since one of these vectors is the gradient, care must be taken to have it operate on the appropriate terms in such an expansion.

Since we have \( \spacegrad \cdot \Bx = 3 \), \( (\Bm \cdot \spacegrad) \Bx = \Bm \), and \( \spacegrad 1/r^n = -n \Bx/r^{n+2} \), this reduces to

\begin{equation}\label{eqn:magneticFieldFromMoment:60}
\begin{aligned}
\BB
&=
\frac{\mu_0}{4 \pi}
\lr{
– \frac{\Bm}{r^3}
+ 3 \frac{(\Bm \cdot \Bx) \Bx}{r^5} %
+ 3 \Bm \inv{r^3}
-3 \Bm \frac{\Bx}{r^5} \cdot \Bx
} \\
&=
\frac{\mu_0}{4 \pi}
\frac{3 (\Bm \cdot \ncap) \ncap -\Bm}{r^3},
\end{aligned}
\end{equation}

which is the desired result.

Spherical gradient, divergence, curl and Laplacian

November 9, 2016 math and physics play No comments , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Unit vectors

Two of the spherical unit vectors we can immediately write by inspection.

\begin{equation}\label{eqn:sphericalLaplacian:20}
\begin{aligned}
\rcap &= \Be_1 \sin\theta \cos\phi + \Be_2 \sin\theta \sin\phi + \Be_3 \cos\theta \\
\phicap &= -\Be_1 \sin\theta + \Be_2 \cos\phi
\end{aligned}
\end{equation}

We can compute \( \thetacap \) by utilizing the right hand triplet property

\begin{equation}\label{eqn:sphericalLaplacian:40}
\begin{aligned}
\thetacap
&=
\phicap \cross \rcap \\
&=
\begin{vmatrix}
\Be_1 & \Be_2 & \Be_3 \\
-S_\phi & C_\phi & 0 \\
S_\theta C_\phi & S_\theta S_\phi & C_\theta \\
\end{vmatrix} \\
&=
\Be_1 \lr{ C_\theta C_\phi }
+\Be_2 \lr{ C_\theta S_\phi }
+\Be_3 \lr{ -S_\theta \lr{ S_\phi^2 + C_\phi^2 } } \\
&=
\Be_1 \cos\theta \cos\phi
+\Be_2 \cos\theta \sin\phi
-\Be_3 \sin\theta.
\end{aligned}
\end{equation}

Here I’ve used \( C_\theta = \cos\theta, S_\phi = \sin\phi, \cdots \) as a convenient shorthand. Observe that with \( i = \Be_1 \Be_2 \), these unit vectors admit a small factorization that makes further manipulation easier

\begin{equation}\label{eqn:sphericalLaplacian:80}
\boxed{
\begin{aligned}
\rcap &= \Be_1 e^{i\phi} \sin\theta + \Be_3 \cos\theta \\
\thetacap &= \cos\theta \Be_1 e^{i\phi} – \sin\theta \Be_3 \\
\phicap &= \Be_2 e^{i\phi}
\end{aligned}
}
\end{equation}

It should also be the case that \( \rcap \thetacap \phicap = I \), where \( I = \Be_1 \Be_2 \Be_3 = \Be_{123}\) is the \R{3} pseudoscalar, which is straightforward to check

\begin{equation}\label{eqn:sphericalLaplacian:60}
\begin{aligned}
\rcap \thetacap \phicap
&=
\lr{ \Be_1 e^{i\phi} \sin\theta + \Be_3 \cos\theta }
\lr{ \cos\theta \Be_1 e^{i\phi} – \sin\theta \Be_3 }
\Be_2 e^{i\phi} \\
&=
\lr{ \sin\theta \cos\theta – \cos\theta \sin\theta + \Be_{31} e^{i\phi} \lr{ \cos^2\theta + \sin^2\theta } }
\Be_2 e^{i\phi} \\
&=
\Be_{31} \Be_2 e^{-i\phi} e^{i\phi} \\
&=
\Be_{123}.
\end{aligned}
\end{equation}

This property could also have been used to compute \(\thetacap\).

Gradient

To compute the gradient, note that the coordinate vectors for the spherical parameterization are
\begin{equation}\label{eqn:sphericalLaplacian:120}
\begin{aligned}
\Bx_r
&= \PD{r}{\Br} \\
&= \PD{r}{\lr{r \rcap}} \\
&= \rcap + r \PD{r}{\rcap} \\
&= \rcap,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:140}
\begin{aligned}
\Bx_\theta
&= \PD{\theta}{\lr{r \rcap} } \\
&= r \PD{\theta}{} \lr{ S_\theta \Be_1 e^{i\phi} + C_\theta \Be_3 } \\
&= r \PD{\theta}{} \lr{ C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3 } \\
&= r \thetacap,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:160}
\begin{aligned}
\Bx_\phi
&= \PD{\phi}{\lr{r \rcap} } \\
&= r \PD{\phi}{} \lr{ S_\theta \Be_1 e^{i\phi} + C_\theta \Be_3 } \\
&= r S_\theta \Be_2 e^{i\phi} \\
&= r \sin\theta \phicap.
\end{aligned}
\end{equation}

Since these are all normal, the dual vectors defined by \( \Bx^j \cdot \Bx_k = \delta^j_k \), can be obtained by inspection
\begin{equation}\label{eqn:sphericalLaplacian:180}
\begin{aligned}
\Bx^r &= \rcap \\
\Bx^\theta &= \inv{r} \thetacap \\
\Bx^\phi &= \inv{r \sin\theta} \phicap.
\end{aligned}
\end{equation}

The gradient follows immediately
\begin{equation}\label{eqn:sphericalLaplacian:200}
\spacegrad =
\Bx^r \PD{r}{} +
\Bx^\theta \PD{\theta}{} +
\Bx^\phi \PD{\phicap}{},
\end{equation}

or
\begin{equation}\label{eqn:sphericalLaplacian:240}
\boxed{
\spacegrad
=
\rcap \PD{r}{} +
\frac{\thetacap}{r} \PD{\theta}{} +
\frac{\phicap}{r\sin\theta} \PD{\phicap}{}.
}
\end{equation}

More information on this general dual-vector technique of computing the gradient in curvilinear coordinate systems can be found in
[2].

Partials

To compute the divergence, curl and Laplacian, we’ll need the partials of each of the unit vectors \( \PDi{\theta}{\rcap}, \PDi{\phi}{\rcap}, \PDi{\theta}{\thetacap}, \PDi{\phi}{\thetacap}, \PDi{\phi}{\phicap} \).

The \( \thetacap \) partials are

\begin{equation}\label{eqn:sphericalLaplacian:260}
\begin{aligned}
\PD{\theta}{\thetacap}
&=
\PD{\theta}{} \lr{
C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3
} \\
&=
-S_\theta \Be_1 e^{i\phi} – C_\theta \Be_3 \\
&=
-\rcap,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:280}
\begin{aligned}
\PD{\phi}{\thetacap}
&=
\PD{\phi}{} \lr{
C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3
} \\
&=
C_\theta \Be_2 e^{i\phi} \\
&=
C_\theta \phicap.
\end{aligned}
\end{equation}

The \( \phicap \) partials are

\begin{equation}\label{eqn:sphericalLaplacian:300}
\begin{aligned}
\PD{\theta}{\phicap}
&=
\PD{\theta}{} \Be_2 e^{i\phi} \\
&=
0.
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:320}
\begin{aligned}
\PD{\phi}{\phicap}
&=
\PD{\phi}{} \Be_2 e^{i \phi} \\
&=
-\Be_1 e^{i \phi} \\
&=
-\rcap \gpgradezero{ \rcap \Be_1 e^{i \phi} }
– \thetacap \gpgradezero{ \thetacap \Be_1 e^{i \phi} }
– \phicap \gpgradezero{ \phicap \Be_1 e^{i \phi} } \\
&=
-\rcap \gpgradezero{ \lr{
\Be_1 e^{i\phi} S_\theta + \Be_3 C_\theta
} \Be_1 e^{i \phi} }
– \thetacap \gpgradezero{ \lr{
C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3
} \Be_1 e^{i \phi} } \\
&=
-\rcap \gpgradezero{ e^{-i\phi} S_\theta e^{i \phi} }
– \thetacap \gpgradezero{ C_\theta e^{-i\phi} e^{i \phi} } \\
&=
-\rcap S_\theta
– \thetacap C_\theta.
\end{aligned}
\end{equation}

The \( \rcap \) partials are were computed as a side effect of evaluating \( \Bx_\theta \), and \( \Bx_\phi \), and are

\begin{equation}\label{eqn:sphericalLaplacian:340}
\PD{\theta}{\rcap}
=
\thetacap,
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:360}
\PD{\phi}{\rcap}
=
S_\theta \phicap.
\end{equation}

In summary
\begin{equation}\label{eqn:sphericalLaplacian:380}
\boxed{
\begin{aligned}
\partial_{\theta}{\rcap} &= \thetacap \\
\partial_{\phi}{\rcap} &= S_\theta \phicap \\
\partial_{\theta}{\thetacap} &= -\rcap \\
\partial_{\phi}{\thetacap} &= C_\theta \phicap \\
\partial_{\theta}{\phicap} &= 0 \\
\partial_{\phi}{\phicap} &= -\rcap S_\theta – \thetacap C_\theta.
\end{aligned}
}
\end{equation}

Divergence and curl.

The divergence and curl can be computed from the vector product of the spherical coordinate gradient and the spherical representation of a vector. That is

\begin{equation}\label{eqn:sphericalLaplacian:400}
\spacegrad \BA
= \spacegrad \cdot \BA + \spacegrad \wedge \BA
= \spacegrad \cdot \BA + I \spacegrad \cross \BA.
\end{equation}

That gradient vector product is

\begin{equation}\label{eqn:sphericalLaplacian:420}
\begin{aligned}
\spacegrad \BA
&=
\lr{
\rcap \partial_{r}
+ \frac{\thetacap}{r} \partial_{\theta}
+ \frac{\phicap}{rS_\theta} \partial_{\phi}
}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&=
\rcap \partial_{r}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&+ \frac{\thetacap}{r} \partial_{\theta}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&+ \frac{\phicap}{rS_\theta} \partial_{\phicap}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&=
\lr{ \partial_r A_r + \rcap \thetacap \partial_r A_\theta + \rcap \phicap \partial_r A_\phi} \\
&+ \frac{1}{r}
\lr{
\thetacap (\partial_\theta \rcap) A_r + \thetacap (\partial_\theta \thetacap) A_\theta + \thetacap (\partial_\theta \phicap) A_\phi
+\thetacap \rcap \partial_\theta A_r + \partial_\theta A_\theta + \thetacap \phicap \partial_\theta A_\phi
} \\
&+ \frac{1}{rS_\theta}
\lr{
\phicap (\partial_\phi \rcap) A_r + \phicap (\partial_\phi \thetacap) A_\theta + \phicap (\partial_\phi \phicap) A_\phi
+\phicap \rcap \partial_\phi A_r + \phicap \thetacap \partial_\phi A_\theta + \partial_\phi A_\phi
} \\
&=
\lr{ \partial_r A_r + \rcap \thetacap \partial_r A_\theta + \rcap \phicap \partial_r A_\phi} \\
&+ \frac{1}{r}
\lr{
\thetacap (\thetacap) A_r + \thetacap (-\rcap) A_\theta + \thetacap (0) A_\phi
+\thetacap \rcap \partial_\theta A_r + \partial_\theta A_\theta + \thetacap \phicap \partial_\theta A_\phi
} \\
&+ \frac{1}{r S_\theta}
\lr{
\phicap (S_\theta \phicap) A_r + \phicap (C_\theta \phicap) A_\theta – \phicap (\rcap S_\theta + \thetacap C_\theta) A_\phi
+\phicap \rcap \partial_\phi A_r + \phicap \thetacap \partial_\phi A_\theta + \partial_\phi A_\phi
}.
\end{aligned}
\end{equation}

The scalar component of this is the divergence
\begin{equation}\label{eqn:sphericalLaplacian:440}
\begin{aligned}
\spacegrad \cdot \BA
&=
\partial_r A_r
+ \frac{A_r}{r}
+ \inv{r} \partial_\theta A_\theta
+ \frac{1}{r S_\theta}
\lr{ S_\theta A_r + C_\theta A_\theta + \partial_\phi A_\phi
} \\
&=
\partial_r A_r
+ 2 \frac{A_r}{r}
+ \inv{r} \partial_\theta A_\theta
+ \frac{1}{r S_\theta}
C_\theta A_\theta
+ \frac{1}{r S_\theta} \partial_\phi A_\phi \\
&=
\partial_r A_r
+ 2 \frac{A_r}{r}
+ \inv{r} \partial_\theta A_\theta
+ \frac{1}{r S_\theta}
C_\theta A_\theta
+ \frac{1}{r S_\theta} \partial_\phi A_\phi,
\end{aligned}
\end{equation}

which can be factored as
\begin{equation}\label{eqn:sphericalLaplacian:460}
\boxed{
\spacegrad \cdot \BA
=
\inv{r^2} \partial_r (r^2 A_r)
+ \inv{r S_\theta} \partial_\theta (S_\theta A_\theta)
+ \frac{1}{r S_\theta} \partial_\phi A_\phi.
}
\end{equation}

The bivector grade of \( \spacegrad \BA \) is the bivector curl
\begin{equation}\label{eqn:sphericalLaplacian:480}
\begin{aligned}
\spacegrad \wedge \BA
&=
\lr{
\rcap \thetacap \partial_r A_\theta + \rcap \phicap \partial_r A_\phi
} \\
&\quad + \frac{1}{r}
\lr{
\thetacap (-\rcap) A_\theta
+\thetacap \rcap \partial_\theta A_r + \thetacap \phicap \partial_\theta A_\phi
} \\
&\quad +
\frac{1}{r S_\theta}
\lr{
-\phicap (\rcap S_\theta + \thetacap C_\theta) A_\phi
+\phicap \rcap \partial_\phi A_r + \phicap \thetacap \partial_\phi A_\theta
} \\
&=
\lr{
\rcap \thetacap \partial_r A_\theta – \phicap \rcap \partial_r A_\phi
} \\
&\quad + \frac{1}{r}
\lr{
\rcap \thetacap A_\theta
-\rcap \thetacap \partial_\theta A_r + \thetacap \phicap \partial_\theta A_\phi
} \\
&\quad +
\frac{1}{r S_\theta}
\lr{
-\phicap \rcap S_\theta A_\phi + \thetacap \phicap C_\theta A_\phi
+\phicap \rcap \partial_\phi A_r – \thetacap \phicap \partial_\phi A_\theta
} \\
&=
\thetacap \phicap \lr{
\inv{r S_\theta} C_\theta A_\phi
+\frac{1}{r} \partial_\theta A_\phi
-\frac{1}{r S_\theta} \partial_\phi A_\theta
} \\
&\quad +\phicap \rcap \lr{
-\partial_r A_\phi
+
\frac{1}{r S_\theta}
\lr{
-S_\theta A_\phi
+ \partial_\phi A_r
}
} \\
&\quad +\rcap \thetacap \lr{
\partial_r A_\theta
+ \frac{1}{r} A_\theta
– \inv{r} \partial_\theta A_r
} \\
&=
I
\rcap \lr{
\inv{r S_\theta} \partial_\theta (S_\theta A_\phi)
-\frac{1}{r S_\theta} \partial_\phi A_\theta
}
+ I \thetacap \lr{
\frac{1}{r S_\theta} \partial_\phi A_r
-\inv{r} \partial_r (r A_\phi)
}
+ I \phicap \lr{
\inv{r} \partial_r (r A_\theta)
– \inv{r} \partial_\theta A_r
}
\end{aligned}
\end{equation}

This gives
\begin{equation}\label{eqn:sphericalLaplacian:500}
\boxed{
\spacegrad \cross \BA
=
\rcap \lr{
\inv{r S_\theta} \partial_\theta (S_\theta A_\phi)
-\frac{1}{r S_\theta} \partial_\phi A_\theta
}
+ \thetacap \lr{
\frac{1}{r S_\theta} \partial_\phi A_r
-\inv{r} \partial_r (r A_\phi)
}
+ \phicap \lr{
\inv{r} \partial_r (r A_\theta)
– \inv{r} \partial_\theta A_r
}.
}
\end{equation}

This and the divergence result above both check against the back cover of [1].

Laplacian

Using the divergence and curl it’s possible to compute the Laplacian from those, but we saw in cylindrical coordinates that it was much harder to do it that way than to do it directly.

\begin{equation}\label{eqn:sphericalLaplacian:540}
\begin{aligned}
\spacegrad^2 \psi
&=
\lr{
\rcap \partial_{r} +
\frac{\thetacap}{r} \partial_{\theta} +
\frac{\phicap}{r S_\theta} \partial_{\phi}
}
\lr{
\rcap \partial_{r} \psi
+ \frac{\thetacap}{r} \partial_{\theta} \psi
+ \frac{\phicap}{r S_\theta} \partial_{\phi} \psi
} \\
&=
\partial_{rr} \psi
+ \rcap \thetacap \partial_r \lr{ \inv{r} \partial_\theta \psi}
+ \rcap \phicap \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi } \\
&
\quad + \frac{\thetacap}{r} \partial_{\theta} \lr{ \rcap \partial_{r} \psi }
+ \frac{\thetacap}{r^2} \partial_{\theta} \lr{ \thetacap \partial_{\theta} \psi }
+ \frac{\thetacap}{r^2} \partial_{\theta} \lr{ \frac{\phicap}{S_\theta} \partial_{\phi} \psi } \\
&
\quad + \frac{\phicap}{r S_\theta} \partial_{\phi} \lr{ \rcap \partial_{r} \psi }
+ \frac{\phicap}{r^2 S_\theta} \partial_{\phi} \lr{ \thetacap \partial_{\theta} \psi }
+ \frac{\phicap}{r^2 S_\theta^2} \partial_{\phi} \lr{ \phicap \partial_{\phi} \psi } \\
&=
\partial_{rr} \psi
+ \rcap \thetacap \partial_r \lr{ \inv{r} \partial_\theta \psi}
+ \rcap \phicap \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi } \\
&
\quad + \frac{\thetacap\rcap}{r} \partial_{\theta} \lr{ \partial_{r} \psi }
+ \frac{1}{r^2} \partial_{\theta \theta} \psi
+ \frac{\thetacap \phicap}{r^2} \partial_{\theta} \lr{ \frac{1}{S_\theta} \partial_{\phi} \psi } \\
&
\quad + \frac{\phicap \rcap}{r S_\theta} \partial_{\phi r} \psi
+ \frac{\phicap\thetacap}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} \partial_{\phi \phi} \psi \\
&
\quad + \frac{\thetacap}{r} (\partial_\theta \rcap) \partial_{r} \psi
+ \frac{\thetacap}{r^2} (\partial_\theta \thetacap) \partial_{\theta} \psi
+ \frac{\thetacap}{r^2} (\partial_\theta \phicap) \frac{\phicap}{S_\theta} \partial_{\phi} \psi \\
&
\quad + \frac{\phicap}{r S_\theta} (\partial_\phi \rcap) \partial_{r} \psi
+ \frac{\phicap}{r^2 S_\theta} (\partial_\phi \thetacap) \partial_{\theta} \psi
+ \frac{\phicap}{r^2 S_\theta^2} (\partial_\phi \phicap) \partial_{\phi} \psi \\
&=
\partial_{rr} \psi
+ \rcap \thetacap \partial_r \lr{ \inv{r} \partial_\theta \psi}
+ \rcap \phicap \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi } \\
&
\quad + \frac{\thetacap\rcap}{r} \partial_{\theta} \lr{ \partial_{r} \psi }
+ \frac{1}{r^2} \partial_{\theta \theta} \psi
+ \frac{\thetacap \phicap}{r^2} \partial_{\theta} \lr{ \frac{1}{S_\theta} \partial_{\phi} \psi } \\
&
\quad + \frac{\phicap \rcap}{r S_\theta} \partial_{\phi r} \psi
+ \frac{\phicap\thetacap}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} \partial_{\phi \phi} \psi \\
&
\quad + \frac{\thetacap}{r} (\thetacap) \partial_{r} \psi
+ \frac{\thetacap}{r^2} (-\rcap) \partial_{\theta} \psi
+ \frac{\thetacap}{r^2} (0) \frac{\phicap}{S_\theta} \partial_{\phi} \psi \\
&
\quad + \frac{\phicap}{r S_\theta} (S_\theta \phicap) \partial_{r} \psi
+ \frac{\phicap}{r^2 S_\theta} (C_\theta \phicap) \partial_{\theta} \psi
+ \frac{\phicap}{r^2 S_\theta^2} (-\rcap S_\theta – \thetacap C_\theta) \partial_{\phi} \psi
\end{aligned}
\end{equation}

All the bivector factors are expected to cancel out, but this should be checked. Those with an \( \rcap \thetacap \) factor are

\begin{equation}\label{eqn:sphericalLaplacian:560}
\partial_r \lr{ \inv{r} \partial_\theta \psi}
– \frac{1}{r} \partial_{\theta r} \psi
+ \frac{1}{r^2} \partial_{\theta} \psi
=
-\inv{r^2} \partial_\theta \psi
+\inv{r} \partial_{r \theta} \psi
– \frac{1}{r} \partial_{\theta r} \psi
+ \frac{1}{r^2} \partial_{\theta} \psi
= 0,
\end{equation}

and those with a \( \thetacap \phicap \) factor are
\begin{equation}\label{eqn:sphericalLaplacian:580}
\frac{1}{r^2} \partial_{\theta} \lr{ \frac{1}{S_\theta} \partial_{\phi} \psi }
– \frac{1}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} C_\theta \partial_{\phi} \psi
=
– \frac{1}{r^2} \frac{C_\theta}{S_\theta^2} \partial_{\phi} \psi
+ \frac{1}{r^2 S_\theta} \partial_{\theta \phi} \psi
– \frac{1}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} C_\theta \partial_{\phi} \psi
= 0,
\end{equation}

and those with a \( \phicap \rcap \) factor are
\begin{equation}\label{eqn:sphericalLaplacian:600}
– \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi }
+ \frac{1}{r S_\theta} \partial_{\phi r} \psi
– \frac{1}{r^2 S_\theta^2} S_\theta \partial_{\phi} \psi
=
\inv{S_\theta} \frac{1}{r^2} \partial_\phi \psi
– \inv{r S_\theta} \partial_{r \phi} \psi
+ \frac{1}{r S_\theta} \partial_{\phi r} \psi
– \frac{1}{r^2 S_\theta} \partial_{\phi} \psi
= 0.
\end{equation}

This leaves
\begin{equation}\label{eqn:sphericalLaplacian:620}
\spacegrad^2 \psi
=
\partial_{rr} \psi
+ \frac{2}{r} \partial_{r} \psi
+ \frac{1}{r^2} \partial_{\theta \theta} \psi
+ \frac{1}{r^2 S_\theta} C_\theta \partial_{\theta} \psi
+ \frac{1}{r^2 S_\theta^2} \partial_{\phi \phi} \psi.
\end{equation}

This factors nicely as

\begin{equation}\label{eqn:sphericalLaplacian:640}
\boxed{
\spacegrad^2 \psi
=
\inv{r^2} \PD{r}{} \lr{ r^2 \PD{r}{ \psi} }
+ \frac{1}{r^2 \sin\theta} \PD{\theta}{} \lr{ \sin\theta \PD{\theta}{ \psi } }
+ \frac{1}{r^2 \sin\theta^2} \PDSq{\phi}{ \psi}
,
}
\end{equation}

which checks against the back cover of Jackson. Here it has been demonstrated explicitly that this operator expression is valid for multivector fields \( \psi \) as well as scalar fields \( \psi \).

References

[1] JD Jackson. Classical Electrodynamics. John Wiley and Sons, 2nd edition, 1975.

[2] A. Macdonald. Vector and Geometric Calculus. CreateSpace Independent Publishing Platform, 2012.

Gradient, divergence, curl and Laplacian in cylindrical coordinates

November 6, 2016 math and physics play No comments , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

In class it was suggested that the identity

\begin{equation}\label{eqn:laplacianCylindrical:20}
\spacegrad^2 \BA =
\spacegrad \lr{ \spacegrad \cdot \BA }
-\spacegrad \cross \lr{ \spacegrad \cross \BA },
\end{equation}

can be used to compute the Laplacian in non-rectangular coordinates. Is that the easiest way to do this?

How about just sequential applications of the gradient on the vector? Let’s start with the vector product of the gradient and the vector. First recall that the cylindrical representation of the gradient is

\begin{equation}\label{eqn:laplacianCylindrical:80}
\spacegrad = \rhocap \partial_\rho + \frac{\phicap}{\rho} \partial_\phi + \zcap \partial_z,
\end{equation}

where
\begin{equation}\label{eqn:laplacianCylindrical:100}
\begin{aligned}
\rhocap &= \Be_1 e^{\Be_1 \Be_2 \phi} \\
\phicap &= \Be_2 e^{\Be_1 \Be_2 \phi} \\
\end{aligned}
\end{equation}

Taking \( \phi \) derivatives of \ref{eqn:laplacianCylindrical:100}, we have

\begin{equation}\label{eqn:laplacianCylindrical:120}
\begin{aligned}
\partial_\phi \rhocap &= \Be_1 \Be_1 \Be_2 e^{\Be_1 \Be_2 \phi} = \Be_2 e^{\Be_1 \Be_2 \phi} = \phicap \\
\partial_\phi \phicap &= \Be_2 \Be_1 \Be_2 e^{\Be_1 \Be_2 \phi} = -\Be_1 e^{\Be_1 \Be_2 \phi} = -\rhocap.
\end{aligned}
\end{equation}

The gradient of a vector \( \BA = \rhocap A_\rho + \phicap A_\phi + \zcap A_z \) is

\begin{equation}\label{eqn:laplacianCylindrical:60}
\begin{aligned}
\spacegrad \BA
&=
\lr{ \rhocap \partial_\rho + \frac{\phicap}{\rho} \partial_\phi + \zcap \partial_z }
\lr{ \rhocap A_\rho + \phicap A_\phi + \zcap A_z } \\
&=
\quad \rhocap \partial_\rho \lr{ \rhocap A_\rho + \phicap A_\phi + \zcap A_z } \\
&\quad + \frac{\phicap}{\rho} \partial_\phi \lr{ \rhocap A_\rho + \phicap A_\phi + \zcap A_z } \\
&\quad + \zcap \partial_z \lr{ \rhocap A_\rho + \phicap A_\phi + \zcap A_z } \\
&=
\quad \rhocap \lr{ \rhocap \partial_\rho A_\rho + \phicap \partial_\rho A_\phi + \zcap \partial_\rho A_z } \\
&\quad + \frac{\phicap}{\rho} \lr{ \partial_\phi(\rhocap A_\rho) + \partial_\phi(\phicap A_\phi) + \zcap \partial_\phi A_z } \\
&\quad + \zcap \lr{ \rhocap \partial_z A_\rho + \phicap \partial_z A_\phi + \zcap \partial_z A_z } \\
&=
\quad \partial_\rho A_\rho + \rhocap \phicap \partial_\rho A_\phi + \rhocap \zcap \partial_\rho A_z \\
&\quad +\frac{1}{\rho} \lr{ A_\rho + \phicap \rhocap \partial_\phi A_\rho – \phicap \rhocap A_\phi + \partial_\phi A_\phi + \phicap \zcap \partial_\phi A_z } \\
&\quad + \zcap \rhocap \partial_z A_\rho + \zcap \phicap \partial_z A_\phi + \partial_z A_z \\
&=
\quad \partial_\rho A_\rho + \frac{1}{\rho} \lr{ A_\rho + \partial_\phi A_\phi } + \partial_z A_z \\
&\quad +
\zcap \rhocap \lr{
\partial_z A_\rho
-\partial_\rho A_z
} \\
&\quad +
\phicap \zcap \lr{
\inv{\rho} \partial_\phi A_z
– \partial_z A_\phi
} \\
&\quad +
\rhocap \phicap \lr{
\partial_\rho A_\phi
– \inv{\rho} \lr{ \partial_\phi A_\rho – A_\phi }
},
\end{aligned}
\end{equation}

As expected, we see that the gradient splits nicely into a dot and curl

\begin{equation}\label{eqn:laplacianCylindrical:160}
\begin{aligned}
\spacegrad \BA
&= \spacegrad \cdot \BA + \spacegrad \wedge \BA \\
&= \spacegrad \cdot \BA + \rhocap \phicap \zcap (\spacegrad \cross \BA ),
\end{aligned}
\end{equation}

where the cylindrical representation of the divergence is seen to be

\begin{equation}\label{eqn:laplacianCylindrical:140}
\spacegrad \cdot \BA
=
\inv{\rho} \partial_\rho (\rho A_\rho) + \frac{1}{\rho} \partial_\phi A_\phi + \partial_z A_z,
\end{equation}

and the cylindrical representation of the curl is

\begin{equation}\label{eqn:laplacianCylindrical:180}
\spacegrad \cross \BA
=
\rhocap
\lr{
\inv{\rho} \partial_\phi A_z
– \partial_z A_\phi
}
+
\phicap
\lr{
\partial_z A_\rho
-\partial_\rho A_z
}
+
\inv{\rho} \zcap \lr{
\partial_\rho ( \rho A_\phi )
– \partial_\phi A_\rho
}.
\end{equation}

Should we want to, it is now possible to evaluate the Laplacian of \( \BA \) using
\ref{eqn:laplacianCylindrical:20}
, which will have the following components

\begin{equation}\label{eqn:laplacianCylindrical:220}
\begin{aligned}
\rhocap \cdot \lr{ \spacegrad^2 \BA }
&=
\partial_\rho
\lr{
\inv{\rho} \partial_\rho (\rho A_\rho) + \frac{1}{\rho} \partial_\phi A_\phi + \partial_z A_z
}

\lr{
\inv{\rho} \partial_\phi \lr{
\inv{\rho} \lr{
\partial_\rho ( \rho A_\phi ) – \partial_\phi A_\rho
}
}
– \partial_z \lr{
\partial_z A_\rho -\partial_\rho A_z
}
} \\
&=
\partial_\rho \lr{ \inv{\rho} \partial_\rho (\rho A_\rho)}
+ \partial_\rho \lr{ \frac{1}{\rho} \partial_\phi A_\phi}
+ \partial_{\rho z} A_z
– \inv{\rho^2}\partial_{\phi \rho} ( \rho A_\phi )
+ \inv{\rho^2}\partial_{\phi\phi} A_\rho
+ \partial_{zz} A_\rho
– \partial_{z\rho} A_z \\
&=
\partial_\rho \lr{ \inv{\rho} \partial_\rho (\rho A_\rho)}
+ \inv{\rho^2}\partial_{\phi\phi} A_\rho
+ \partial_{zz} A_\rho
– \frac{1}{\rho^2} \partial_\phi A_\phi
+ \frac{1}{\rho} \partial_{\rho\phi} A_\phi
– \inv{\rho^2}\partial_{\phi} A_\phi
– \inv{\rho}\partial_{\phi\rho} A_\phi \\
&=
\partial_\rho \lr{ \inv{\rho} \partial_\rho (\rho A_\rho)}
+ \inv{\rho^2}\partial_{\phi\phi} A_\rho
+ \partial_{zz} A_\rho
– \frac{2}{\rho^2} \partial_\phi A_\phi \\
&=
\inv{\rho} \partial_\rho \lr{ \rho \partial_\rho A_\rho}
+ \inv{\rho^2}\partial_{\phi\phi} A_\rho
+ \partial_{zz} A_\rho
– \frac{A_\rho}{\rho^2}
– \frac{2}{\rho^2} \partial_\phi A_\phi,
\end{aligned}
\end{equation}

\begin{equation}\label{eqn:laplacianCylindrical:240}
\begin{aligned}
\phicap \cdot \lr{ \spacegrad^2 \BA }
&=
\inv{\rho} \partial_\phi
\lr{
\inv{\rho} \partial_\rho (\rho A_\rho) + \frac{1}{\rho} \partial_\phi A_\phi + \partial_z A_z
}

\lr{
\lr{
\partial_z \lr{
\inv{\rho} \partial_\phi A_z – \partial_z A_\phi
}
-\partial_\rho \lr{
\inv{\rho} \lr{ \partial_\rho ( \rho A_\phi ) – \partial_\phi A_\rho}
}
}
} \\
&=
\inv{\rho^2} \partial_{\phi\rho} (\rho A_\rho)
+ \frac{1}{\rho^2} \partial_{\phi\phi} A_\phi
+ \inv{\rho}\partial_{\phi z} A_z
– \inv{\rho} \partial_{z\phi} A_z
+ \partial_{z z} A_\phi
+\partial_\rho \lr{ \inv{\rho} \partial_\rho ( \rho A_\phi ) }
– \partial_\rho \lr{ \inv{\rho} \partial_\phi A_\rho} \\
&=
\partial_\rho \lr{ \inv{\rho} \partial_\rho ( \rho A_\phi ) }
+ \frac{1}{\rho^2} \partial_{\phi\phi} A_\phi
+ \partial_{z z} A_\phi
+ \inv{\rho^2} \partial_{\phi\rho} (\rho A_\rho)
+ \inv{\rho}\partial_{\phi z} A_z
– \inv{\rho} \partial_{z\phi} A_z
– \partial_\rho \lr{ \inv{\rho} \partial_\phi A_\rho} \\
&=
\partial_\rho \lr{ \inv{\rho} \partial_\rho ( \rho A_\phi ) }
+ \frac{1}{\rho^2} \partial_{\phi\phi} A_\phi
+ \partial_{z z} A_\phi
+ \inv{\rho^2} \partial_{\phi} A_\rho
+ \inv{\rho} \partial_{\phi\rho} A_\rho
+ \inv{\rho^2} \partial_\phi A_\rho
– \inv{\rho} \partial_{\rho\phi} A_\rho \\
&=
\partial_\rho \lr{ \inv{\rho} \partial_\rho ( \rho A_\phi ) }
+ \frac{1}{\rho^2} \partial_{\phi\phi} A_\phi
+ \partial_{z z} A_\phi
+ \frac{2}{\rho^2} \partial_{\phi} A_\rho \\
&=
\inv{\rho} \partial_\rho \lr{ \rho \partial_\rho A_\phi }
+ \frac{1}{\rho^2} \partial_{\phi\phi} A_\phi
+ \partial_{z z} A_\phi
+ \frac{2}{\rho^2} \partial_{\phi} A_\rho
– \frac{A_\phi}{\rho^2},
\end{aligned}
\end{equation}

\begin{equation}\label{eqn:laplacianCylindrical:260}
\begin{aligned}
\zcap \cdot \lr{ \spacegrad^2 \BA }
&=
\partial_z
\lr{
\inv{\rho} \partial_\rho (\rho A_\rho) + \frac{1}{\rho} \partial_\phi A_\phi + \partial_z A_z
}

\inv{\rho} \lr{
\partial_\rho \lr{ \rho \lr{
\partial_z A_\rho -\partial_\rho A_z
}
}
– \partial_\phi \lr{
\inv{\rho} \partial_\phi A_z – \partial_z A_\phi
}
} \\
&=
\inv{\rho} \partial_{z\rho} (\rho A_\rho)
+ \frac{1}{\rho} \partial_{z\phi} A_\phi
+ \partial_{zz} A_z
– \inv{\rho}\partial_\rho \lr{ \rho \partial_z A_\rho }
+ \inv{\rho}\partial_\rho \lr{ \rho \partial_\rho A_z }
+ \inv{\rho^2} \partial_{\phi\phi} A_z
– \inv{\rho} \partial_{\phi z} A_\phi \\
&=
\inv{\rho}\partial_\rho \lr{ \rho \partial_\rho A_z }
+ \inv{\rho^2} \partial_{\phi\phi} A_z
+ \partial_{zz} A_z
+ \inv{\rho} \partial_{z} A_\rho
+\partial_{z\rho} A_\rho
+ \frac{1}{\rho} \partial_{z\phi} A_\phi
– \inv{\rho}\partial_z A_\rho
– \partial_{\rho z} A_\rho
– \inv{\rho} \partial_{\phi z} A_\phi \\
&=
\inv{\rho}\partial_\rho \lr{ \rho \partial_\rho A_z }
+ \inv{\rho^2} \partial_{\phi\phi} A_z
+ \partial_{zz} A_z
\end{aligned}
\end{equation}

Evaluating these was a fairly tedious and mechanical job, and would have been better suited to a computer algebra system than by hand as done here.

Explicit cylindrical Laplacian

Let’s try this a different way. The most obvious potential strategy is to just apply the Laplacian to the vector itself, but we need to include the unit vectors in such an operation

\begin{equation}\label{eqn:laplacianCylindrical:280}
\spacegrad^2 \BA =
\spacegrad^2 \lr{ \rhocap A_\rho + \phicap A_\phi + \zcap A_z }.
\end{equation}

First we need to know the explicit form of the cylindrical Laplacian. From the painful expansion, we can guess that it is

\begin{equation}\label{eqn:laplacianCylindrical:300}
\spacegrad^2 \psi
=
\inv{\rho}\partial_\rho \lr{ \rho \partial_\rho \psi }
+ \inv{\rho^2} \partial_{\phi\phi} \psi
+ \partial_{zz} \psi.
\end{equation}

Let’s check that explicitly. Here I use the vector product where \( \rhocap^2 = \phicap^2 = \zcap^2 = 1 \), and these vectors anticommute when different

\begin{equation}\label{eqn:laplacianCylindrical:320}
\begin{aligned}
\spacegrad^2 \psi
&=
\lr{ \rhocap \partial_\rho + \frac{\phicap}{\rho} \partial_\phi + \zcap \partial_z }
\lr{ \rhocap \partial_\rho \psi + \frac{\phicap}{\rho} \partial_\phi \psi + \zcap \partial_z \psi } \\
&=
\rhocap \partial_\rho
\lr{ \rhocap \partial_\rho \psi + \frac{\phicap}{\rho} \partial_\phi \psi + \zcap \partial_z \psi }
+ \frac{\phicap}{\rho} \partial_\phi
\lr{ \rhocap \partial_\rho \psi + \frac{\phicap}{\rho} \partial_\phi \psi + \zcap \partial_z \psi }
+ \zcap \partial_z
\lr{ \rhocap \partial_\rho \psi + \frac{\phicap}{\rho} \partial_\phi \psi + \zcap \partial_z \psi } \\
&=
\partial_{\rho\rho} \psi
+ \rhocap \phicap \partial_\rho \lr{ \frac{1}{\rho} \partial_\phi \psi}
+ \rhocap \zcap \partial_{\rho z} \psi
+ \frac{\phicap}{\rho} \partial_\phi \lr{ \rhocap \partial_\rho \psi }
+ \frac{\phicap}{\rho} \partial_\phi \lr{ \frac{\phicap}{\rho} \partial_\phi \psi }
+ \frac{\phicap \zcap }{\rho} \partial_{\phi z} \psi
+ \zcap \rhocap \partial_{z\rho} \psi
+ \frac{\zcap \phicap}{\rho} \partial_{z\phi} \psi
+ \partial_{zz} \psi \\
&=
\partial_{\rho\rho} \psi
+ \inv{\rho} \partial_\rho \psi
+ \frac{1}{\rho^2} \partial_{\phi \phi} \psi
+ \partial_{zz} \psi
+ \rhocap \phicap
\lr{
-\frac{1}{\rho^2} \partial_\phi \psi
+\frac{1}{\rho} \partial_{\rho \phi} \psi
-\inv{\rho} \partial_{\phi \rho} \psi
+ \frac{1}{\rho^2} \partial_\phi \psi
}
+ \zcap \rhocap \lr{
-\partial_{\rho z} \psi
+ \partial_{z\rho} \psi
}
+ \phicap \zcap \lr{
\inv{\rho} \partial_{\phi z} \psi
– \inv{\rho} \partial_{z\phi} \psi
} \\
&=
\partial_{\rho\rho} \psi
+ \inv{\rho} \partial_\rho \psi
+ \frac{1}{\rho^2} \partial_{\phi \phi} \psi
+ \partial_{zz} \psi,
\end{aligned}
\end{equation}

so the Laplacian operator is

\begin{equation}\label{eqn:laplacianCylindrical:340}
\boxed{
\spacegrad^2
=
\inv{\rho} \PD{\rho}{} \lr{ \rho \PD{\rho}{} }
+ \frac{1}{\rho^2} \PDSq{\phi}{}
+ \PDSq{z}{}.
}
\end{equation}

All the bivector grades of the Laplacian operator are seen to explicitly cancel, regardless of the grade of \( \psi \), just as if we had expanded the scalar Laplacian as a dot product
\( \spacegrad^2 \psi = \spacegrad \cdot \lr{ \spacegrad \psi} \).
Unlike such a scalar expansion, this derivation is seen to be valid for any grade \( \psi \). We know now that we can trust this result when \( \psi \) is a scalar, a vector, a bivector, a trivector, or even a multivector.

Vector Laplacian

Now that we trust that the typical scalar form of the Laplacian applies equally well to multivectors as it does to scalars, that cylindrical coordinate operator can now be applied to a
vector. Consider the projections onto each of the directions in turn

\begin{equation}\label{eqn:laplacianCylindrical:360}
\spacegrad^2 \lr{ \rhocap A_\rho }
=
\rhocap \inv{\rho} \partial_\rho \lr{ \rho \partial_\rho A_\rho }
+ \frac{1}{\rho^2} \partial_{\phi\phi} \lr{\rhocap A_\rho}
+ \rhocap \partial_{zz} A_\rho
\end{equation}

\begin{equation}\label{eqn:laplacianCylindrical:380}
\begin{aligned}
\partial_{\phi\phi} \lr{\rhocap A_\rho}
&=
\partial_\phi \lr{ \phicap A_\rho + \rhocap \partial_\phi A_\rho } \\
&=
-\rhocap A_\rho
+\phicap \partial_\phi A_\rho
+ \phicap \partial_\phi A_\rho
+ \rhocap \partial_{\phi\phi} A_\rho \\
&=
\rhocap \lr{ \partial_{\phi\phi} A_\rho -A_\rho }
+ 2 \phicap \partial_\phi A_\rho
\end{aligned}
\end{equation}

so this component of the vector Laplacian is

\begin{equation}\label{eqn:laplacianCylindrical:400}
\begin{aligned}
\spacegrad^2 \lr{ \rhocap A_\rho }
&=
\rhocap
\lr{
\inv{\rho} \partial_\rho \lr{ \rho \partial_\rho A_\rho }
+ \inv{\rho^2} \partial_{\phi\phi} A_\rho
– \inv{\rho^2} A_\rho
+ \partial_{zz} A_\rho
}
+
\phicap
\lr{
2 \inv{\rho^2} \partial_\phi A_\rho
} \\
&=
\rhocap \lr{
\spacegrad^2 A_\rho
– \inv{\rho^2} A_\rho
}
+
\phicap
\frac{2}{\rho^2} \partial_\phi A_\rho
.
\end{aligned}
\end{equation}

The Laplacian for the projection of the vector onto the \( \phicap \) direction is

\begin{equation}\label{eqn:laplacianCylindrical:420}
\spacegrad^2 \lr{ \phicap A_\phi }
=
\phicap \inv{\rho} \partial_\rho \lr{ \rho \partial_\rho A_\phi }
+ \frac{1}{\rho^2} \partial_{\phi\phi} \lr{\phicap A_\phi}
+ \phicap \partial_{zz} A_\phi,
\end{equation}

Again, since the unit vectors are \( \phi \) dependent, the \( \phi \) derivatives have to be treated carefully

\begin{equation}\label{eqn:laplacianCylindrical:440}
\begin{aligned}
\partial_{\phi\phi} \lr{\phicap A_\phi}
&=
\partial_{\phi} \lr{-\rhocap A_\phi + \phicap \partial_\phi A_\phi} \\
&=
-\phicap A_\phi
-\rhocap \partial_\phi A_\phi
– \rhocap \partial_\phi A_\phi
+ \phicap \partial_{\phi \phi} A_\phi \\
&=
– 2 \rhocap \partial_\phi A_\phi
+
\phicap
\lr{
\partial_{\phi \phi} A_\phi
– A_\phi
},
\end{aligned}
\end{equation}

so the Laplacian of this projection is
\begin{equation}\label{eqn:laplacianCylindrical:460}
\begin{aligned}
\spacegrad^2 \lr{ \phicap A_\phi }
&=
\phicap
\lr{
\inv{\rho} \partial_\rho \lr{ \rho \partial_\rho A_\phi }
+ \phicap \partial_{zz} A_\phi,
\inv{\rho^2} \partial_{\phi \phi} A_\phi
– \frac{A_\phi }{\rho^2}
}
– \rhocap \frac{2}{\rho^2} \partial_\phi A_\phi \\
&=
\phicap \lr{
\spacegrad^2 A_\phi
– \frac{A_\phi}{\rho^2}
}
– \rhocap \frac{2}{\rho^2} \partial_\phi A_\phi.
\end{aligned}
\end{equation}

Since \( \zcap \) is fixed we have

\begin{equation}\label{eqn:laplacianCylindrical:480}
\spacegrad^2 \zcap A_z
=
\zcap \spacegrad^2 A_z.
\end{equation}

Putting all the pieces together we have
\begin{equation}\label{eqn:laplacianCylindrical:500}
\boxed{
\spacegrad^2 \BA
=
\rhocap \lr{
\spacegrad^2 A_\rho
– \inv{\rho^2} A_\rho
– \frac{2}{\rho^2} \partial_\phi A_\phi
}
+\phicap \lr{
\spacegrad^2 A_\phi
– \frac{A_\phi}{\rho^2}
+ \frac{2}{\rho^2} \partial_\phi A_\rho
}
+
\zcap \spacegrad^2 A_z.
}
\end{equation}

This matches the results of \ref{eqn:laplacianCylindrical:220}, …, from the painful expansion of
\( \spacegrad \lr{ \spacegrad \cdot \BA } – \spacegrad \cross \lr{ \spacegrad \cross \BA } \).

Helmholtz theorem

October 1, 2016 math and physics play No comments , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

This is a problem from ece1228. I attempted solutions in a number of ways. One using Geometric Algebra, one devoid of that algebra, and then this method, which combined aspects of both. Of the three methods I tried to obtain this result, this is the most compact and elegant. It does however, require a fair bit of Geometric Algebra knowledge, including the Fundamental Theorem of Geometric Calculus, as detailed in [1], [3] and [2].

Question: Helmholtz theorem

Prove the first Helmholtz’s theorem, i.e. if vector \(\BM\) is defined by its divergence

\begin{equation}\label{eqn:helmholtzDerviationMultivector:20}
\spacegrad \cdot \BM = s
\end{equation}

and its curl
\begin{equation}\label{eqn:helmholtzDerviationMultivector:40}
\spacegrad \cross \BM = \BC
\end{equation}

within a region and its normal component \( \BM_{\textrm{n}} \) over the boundary, then \( \BM \) is
uniquely specified.

Answer

The gradient of the vector \( \BM \) can be written as a single even grade multivector

\begin{equation}\label{eqn:helmholtzDerviationMultivector:60}
\spacegrad \BM
= \spacegrad \cdot \BM + I \spacegrad \cross \BM
= s + I \BC.
\end{equation}

We will use this to attempt to discover the relation between the vector \( \BM \) and its divergence and curl. We can express \( \BM \) at the point of interest as a convolution with the delta function at all other points in space

\begin{equation}\label{eqn:helmholtzDerviationMultivector:80}
\BM(\Bx) = \int_V dV’ \delta(\Bx – \Bx’) \BM(\Bx’).
\end{equation}

The Laplacian representation of the delta function in \R{3} is

\begin{equation}\label{eqn:helmholtzDerviationMultivector:100}
\delta(\Bx – \Bx’) = -\inv{4\pi} \spacegrad^2 \inv{\Abs{\Bx – \Bx’}},
\end{equation}

so \( \BM \) can be represented as the following convolution

\begin{equation}\label{eqn:helmholtzDerviationMultivector:120}
\BM(\Bx) = -\inv{4\pi} \int_V dV’ \spacegrad^2 \inv{\Abs{\Bx – \Bx’}} \BM(\Bx’).
\end{equation}

Using this relation and proceeding with a few applications of the chain rule, plus the fact that \( \spacegrad 1/\Abs{\Bx – \Bx’} = -\spacegrad’ 1/\Abs{\Bx – \Bx’} \), we find

\begin{equation}\label{eqn:helmholtzDerviationMultivector:720}
\begin{aligned}
-4 \pi \BM(\Bx)
&= \int_V dV’ \spacegrad^2 \inv{\Abs{\Bx – \Bx’}} \BM(\Bx’) \\
&= \gpgradeone{\int_V dV’ \spacegrad^2 \inv{\Abs{\Bx – \Bx’}} \BM(\Bx’)} \\
&= -\gpgradeone{\int_V dV’ \spacegrad \lr{ \spacegrad’ \inv{\Abs{\Bx – \Bx’}}} \BM(\Bx’)} \\
&= -\gpgradeone{\spacegrad \int_V dV’ \lr{
\spacegrad’ \frac{\BM(\Bx’)}{\Abs{\Bx – \Bx’}}
-\frac{\spacegrad’ \BM(\Bx’)}{\Abs{\Bx – \Bx’}}
} } \\
&=
-\gpgradeone{\spacegrad \int_{\partial V} dA’
\ncap \frac{\BM(\Bx’)}{\Abs{\Bx – \Bx’}}
}
+\gpgradeone{\spacegrad \int_V dV’
\frac{s(\Bx’) + I\BC(\Bx’)}{\Abs{\Bx – \Bx’}}
} \\
&=
-\gpgradeone{\spacegrad \int_{\partial V} dA’
\ncap \frac{\BM(\Bx’)}{\Abs{\Bx – \Bx’}}
}
+\spacegrad \int_V dV’
\frac{s(\Bx’)}{\Abs{\Bx – \Bx’}}
+\spacegrad \cdot \int_V dV’
\frac{I\BC(\Bx’)}{\Abs{\Bx – \Bx’}}.
\end{aligned}
\end{equation}

By inserting a no-op grade selection operation in the second step, the trivector terms that would show up in subsequent steps are automatically filtered out. This leaves us with a boundary term dependent on the surface and the normal and tangential components of \( \BM \). Added to that is a pair of volume integrals that provide the unique dependence of \( \BM \) on its divergence and curl. When the surface is taken to infinity, which requires \( \Abs{\BM}/\Abs{\Bx – \Bx’} \rightarrow 0 \), then the dependence of \( \BM \) on its divergence and curl is unique.

In order to express final result in traditional vector algebra form, a couple transformations are required. The first is that

\begin{equation}\label{eqn:helmholtzDerviationMultivector:800}
\gpgradeone{ \Ba I \Bb } = I^2 \Ba \cross \Bb = -\Ba \cross \Bb.
\end{equation}

For the grade selection in the boundary integral, note that

\begin{equation}\label{eqn:helmholtzDerviationMultivector:740}
\begin{aligned}
\gpgradeone{ \spacegrad \ncap \BX }
&=
\gpgradeone{ \spacegrad (\ncap \cdot \BX) }
+
\gpgradeone{ \spacegrad (\ncap \wedge \BX) } \\
&=
\spacegrad (\ncap \cdot \BX)
+
\gpgradeone{ \spacegrad I (\ncap \cross \BX) } \\
&=
\spacegrad (\ncap \cdot \BX)

\spacegrad \cross (\ncap \cross \BX).
\end{aligned}
\end{equation}

These give

\begin{equation}\label{eqn:helmholtzDerviationMultivector:721}
\boxed{
\begin{aligned}
\BM(\Bx)
&=
\spacegrad \inv{4\pi} \int_{\partial V} dA’ \ncap \cdot \frac{\BM(\Bx’)}{\Abs{\Bx – \Bx’}}

\spacegrad \cross \inv{4\pi} \int_{\partial V} dA’ \ncap \cross \frac{\BM(\Bx’)}{\Abs{\Bx – \Bx’}} \\
&-\spacegrad \inv{4\pi} \int_V dV’
\frac{s(\Bx’)}{\Abs{\Bx – \Bx’}}
+\spacegrad \cross \inv{4\pi} \int_V dV’
\frac{\BC(\Bx’)}{\Abs{\Bx – \Bx’}}.
\end{aligned}
}
\end{equation}

References

[1] C. Doran and A.N. Lasenby. Geometric algebra for physicists. Cambridge University Press New York, Cambridge, UK, 1st edition, 2003.

[2] A. Macdonald. Vector and Geometric Calculus. CreateSpace Independent Publishing Platform, 2012.

[3] Garret Sobczyk and Omar Le’on S’anchez. Fundamental theorem of calculus. Advances in Applied Clifford Algebras, 21:221–231, 2011. URL http://arxiv.org/abs/0809.4526.