gradient

Potential solutions to the static Maxwell’s equation using geometric algebra

March 20, 2018 math and physics play No comments , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

When neither the electromagnetic field strength \( F = \BE + I \eta \BH \), nor current \( J = \eta (c \rho – \BJ) + I(c\rho_m – \BM) \) is a function of time, then the geometric algebra form of Maxwell’s equations is the first order multivector (gradient) equation
\begin{equation}\label{eqn:staticPotentials:20}
\spacegrad F = J.
\end{equation}

While direct solutions to this equations are possible with the multivector Green’s function for the gradient
\begin{equation}\label{eqn:staticPotentials:40}
G(\Bx, \Bx’) = \inv{4\pi} \frac{\Bx – \Bx’}{\Norm{\Bx – \Bx’}^3 },
\end{equation}
the aim in this post is to explore second order (potential) solutions in a geometric algebra context. Can we assume that it is possible to find a multivector potential \( A \) for which
\begin{equation}\label{eqn:staticPotentials:60}
F = \spacegrad A,
\end{equation}
is a solution to the Maxwell statics equation? If such a solution exists, then Maxwell’s equation is simply
\begin{equation}\label{eqn:staticPotentials:80}
\spacegrad^2 A = J,
\end{equation}
which can be easily solved using the scalar Green’s function for the Laplacian
\begin{equation}\label{eqn:staticPotentials:240}
G(\Bx, \Bx’) = -\inv{\Norm{\Bx – \Bx’} },
\end{equation}
a beastie that may be easier to convolve than the vector valued Green’s function for the gradient.

It is immediately clear that some restrictions must be imposed on the multivector potential \(A\). In particular, since the field \( F \) has only vector and bivector grades, this gradient must have no scalar, nor pseudoscalar grades. That is
\begin{equation}\label{eqn:staticPotentials:100}
\gpgrade{\spacegrad A}{0,3} = 0.
\end{equation}
This constraint on the potential can be avoided if a grade selection operation is built directly into the assumed potential solution, requiring that the field is given by
\begin{equation}\label{eqn:staticPotentials:120}
F = \gpgrade{\spacegrad A}{1,2}.
\end{equation}
However, after imposing such a constraint, Maxwell’s equation has a much less friendly form
\begin{equation}\label{eqn:staticPotentials:140}
\spacegrad^2 A – \spacegrad \gpgrade{\spacegrad A}{0,3} = J.
\end{equation}
Luckily, it is possible to introduce a transformation of potentials, called a gauge transformation, that eliminates the ugly grade selection term, and allows the potential equation to be expressed as a plain old Laplacian. We do so by assuming first that it is possible to find a solution of the Laplacian equation that has the desired grade restrictions. That is
\begin{equation}\label{eqn:staticPotentials:160}
\begin{aligned}
\spacegrad^2 A’ &= J \\
\gpgrade{\spacegrad A’}{0,3} &= 0,
\end{aligned}
\end{equation}
for which \( F = \spacegrad A’ \) is a grade 1,2 solution to \( \spacegrad F = J \). Suppose that \( A \) is any formal solution, free of any grade restrictions, to \( \spacegrad^2 A = J \), and \( F = \gpgrade{\spacegrad A}{1,2} \). Can we find a function \( \tilde{A} \) for which \( A = A’ + \tilde{A} \)?

Maxwell’s equation in terms of \( A \) is
\begin{equation}\label{eqn:staticPotentials:180}
\begin{aligned}
J
&= \spacegrad \gpgrade{\spacegrad A}{1,2} \\
&= \spacegrad^2 A
– \spacegrad \gpgrade{\spacegrad A}{0,3} \\
&= \spacegrad^2 (A’ + \tilde{A})
– \spacegrad \gpgrade{\spacegrad A}{0,3}
\end{aligned}
\end{equation}
or
\begin{equation}\label{eqn:staticPotentials:200}
\spacegrad^2 \tilde{A} = \spacegrad \gpgrade{\spacegrad A}{0,3}.
\end{equation}
This non-homogeneous Laplacian equation that can be solved as is for \( \tilde{A} \) using the Green’s function for the Laplacian. Alternatively, we may also solve the equivalent first order system using the Green’s function for the gradient.
\begin{equation}\label{eqn:staticPotentials:220}
\spacegrad \tilde{A} = \gpgrade{\spacegrad A}{0,3}.
\end{equation}
Clearly \( \tilde{A} \) is not unique, as we can add any function \( \psi \) satisfying the homogeneous Laplacian equation \( \spacegrad^2 \psi = 0 \).

In summary, if \( A \) is any multivector solution to \( \spacegrad^2 A = J \), that is
\begin{equation}\label{eqn:staticPotentials:260}
A(\Bx)
= \int dV’ G(\Bx, \Bx’) J(\Bx’)
= -\int dV’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} },
\end{equation}
then \( F = \spacegrad A’ \) is a solution to Maxwell’s equation, where \( A’ = A – \tilde{A} \), and \( \tilde{A} \) is a solution to the non-homogeneous Laplacian equation or the non-homogeneous gradient equation above.

Integral form of the gauge transformation.

Additional insight is possible by considering the gauge transformation in integral form. Suppose that
\begin{equation}\label{eqn:staticPotentials:280}
A(\Bx) = -\int_V dV’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} } – \tilde{A}(\Bx),
\end{equation}
is a solution of \( \spacegrad^2 A = J \), where \( \tilde{A} \) is a multivector solution to the homogeneous Laplacian equation \( \spacegrad^2 \tilde{A} = 0 \). Let’s look at the constraints on \( \tilde{A} \) that must be imposed for \( F = \spacegrad A \) to be a valid (i.e. grade 1,2) solution of Maxwell’s equation.
\begin{equation}\label{eqn:staticPotentials:300}
\begin{aligned}
F
&= \spacegrad A \\
&=
-\int_V dV’ \lr{ \spacegrad \inv{\Norm{\Bx – \Bx’} } } J(\Bx’)
– \spacegrad \tilde{A}(\Bx) \\
&=
\int_V dV’ \lr{ \spacegrad’ \inv{\Norm{\Bx – \Bx’} } } J(\Bx’)
– \spacegrad \tilde{A}(\Bx) \\
&=
\int_V dV’ \spacegrad’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} } – \int_V dV’ \frac{\spacegrad’ J(\Bx’)}{\Norm{\Bx – \Bx’} }
– \spacegrad \tilde{A}(\Bx) \\
&=
\int_{\partial V} dA’ \ncap’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} } – \int_V \frac{\spacegrad’ J(\Bx’)}{\Norm{\Bx – \Bx’} }
– \spacegrad \tilde{A}(\Bx).
\end{aligned}
\end{equation}
Where \( \ncap’ = (\Bx’ – \Bx)/\Norm{\Bx’ – \Bx} \), and the fundamental theorem of geometric calculus has been used to transform the gradient volume integral into an integral over the bounding surface. Operating on Maxwell’s equation with the gradient gives \( \spacegrad^2 F = \spacegrad J \), which has only grades 1,2 on the left hand side, meaning that \( J \) is constrained in a way that requires \( \spacegrad J \) to have only grades 1,2. This means that \( F \) has grades 1,2 if
\begin{equation}\label{eqn:staticPotentials:320}
\spacegrad \tilde{A}(\Bx)
= \int_{\partial V} dA’ \frac{ \gpgrade{\ncap’ J(\Bx’)}{0,3} }{\Norm{\Bx – \Bx’} }.
\end{equation}
The product \( \ncap J \) expands to
\begin{equation}\label{eqn:staticPotentials:340}
\begin{aligned}
\ncap J
&=
\gpgradezero{\ncap J_1} + \gpgradethree{\ncap J_2} \\
&=
\ncap \cdot (-\eta \BJ) + \gpgradethree{\ncap (-I \BM)} \\
&=- \eta \ncap \cdot \BJ -I \ncap \cdot \BM,
\end{aligned}
\end{equation}
so
\begin{equation}\label{eqn:staticPotentials:360}
\spacegrad \tilde{A}(\Bx)
=
-\int_{\partial V} dA’ \frac{ \eta \ncap’ \cdot \BJ(\Bx’) + I \ncap’ \cdot \BM(\Bx’)}{\Norm{\Bx – \Bx’} }.
\end{equation}
Observe that if there is no flux of current density \( \BJ \) and (fictitious) magnetic current density \( \BM \) through the surface, then \( F = \spacegrad A \) is a solution to Maxwell’s equation without any gauge transformation. Alternatively \( F = \spacegrad A \) is also a solution if \( \lim_{\Bx’ \rightarrow \infty} \BJ(\Bx’)/\Norm{\Bx – \Bx’} = \lim_{\Bx’ \rightarrow \infty} \BM(\Bx’)/\Norm{\Bx – \Bx’} = 0 \) and the bounding volume is taken to infinity.

References

Generalizing Ampere’s law using geometric algebra.

March 16, 2018 math and physics play No comments , , , , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting, and oriented integrals. All oriented integrals in this post have a clockwise direction.].

The question I’d like to explore in this post is how Ampere’s law, the relationship between the line integral of the magnetic field to current (i.e. the enclosed current)
\begin{equation}\label{eqn:flux:20}
\oint_{\partial A} d\Bx \cdot \BH = -\int_A \ncap \cdot \BJ,
\end{equation}
generalizes to geometric algebra where Maxwell’s equations for a statics configuration (all time derivatives zero) is
\begin{equation}\label{eqn:flux:40}
\spacegrad F = J,
\end{equation}
where the multivector fields and currents are
\begin{equation}\label{eqn:flux:60}
\begin{aligned}
F &= \BE + I \eta \BH \\
J &= \eta \lr{ c \rho – \BJ } + I \lr{ c \rho_\txtm – \BM }.
\end{aligned}
\end{equation}
Here (fictitious) the magnetic charge and current densities that can be useful in antenna theory have been included in the multivector current for generality.

My presumption is that it should be possible to utilize the fundamental theorem of geometric calculus for expressing the integral over an oriented surface to its boundary, but applied directly to Maxwell’s equation. That integral theorem has the form
\begin{equation}\label{eqn:flux:80}
\int_A d^2 \Bx \boldpartial F = \oint_{\partial A} d\Bx F,
\end{equation}
where \( d^2 \Bx = d\Ba \wedge d\Bb \) is a two parameter bivector valued surface, and \( \boldpartial \) is vector derivative, the projection of the gradient onto the tangent space. I won’t try to explain all of geometric calculus here, and refer the interested reader to [1], which is an excellent reference on geometric calculus and integration theory.

The gotcha is that we actually want a surface integral with \( \spacegrad F \). We can split the gradient into the vector derivative a normal component
\begin{equation}\label{eqn:flux:160}
\spacegrad = \boldpartial + \ncap (\ncap \cdot \spacegrad),
\end{equation}
so
\begin{equation}\label{eqn:flux:100}
\int_A d^2 \Bx \spacegrad F
=
\int_A d^2 \Bx \boldpartial F
+
\int_A d^2 \Bx \ncap \lr{ \ncap \cdot \spacegrad } F,
\end{equation}
so
\begin{equation}\label{eqn:flux:120}
\begin{aligned}
\oint_{\partial A} d\Bx F
&=
\int_A d^2 \Bx \lr{ J – \ncap \lr{ \ncap \cdot \spacegrad } F } \\
&=
\int_A dA \lr{ I \ncap J – \lr{ \ncap \cdot \spacegrad } I F }
\end{aligned}
\end{equation}

This is not nearly as nice as the magnetic flux relationship which was nicely split with the current and fields nicely separated. The \( d\Bx F \) product has all possible grades, as does the \( d^2 \Bx J \) product (in general). Observe however, that the normal term on the right has only grades 1,2, so we can split our line integral relations into pairs with and without grade 1,2 components
\begin{equation}\label{eqn:flux:140}
\begin{aligned}
\oint_{\partial A} \gpgrade{d\Bx F}{0,3}
&=
\int_A dA \gpgrade{ I \ncap J }{0,3} \\
\oint_{\partial A} \gpgrade{d\Bx F}{1,2}
&=
\int_A dA \lr{ \gpgrade{ I \ncap J }{1,2} – \lr{ \ncap \cdot \spacegrad } I F }.
\end{aligned}
\end{equation}

Let’s expand these explicitly in terms of the component fields and densities to check against the conventional relationships, and see if things look right. The line integrand expands to
\begin{equation}\label{eqn:flux:180}
\begin{aligned}
d\Bx F
&=
d\Bx \lr{ \BE + I \eta \BH }
=
d\Bx \cdot \BE + I \eta d\Bx \cdot \BH
+
d\Bx \wedge \BE + I \eta d\Bx \wedge \BH \\
&=
d\Bx \cdot \BE
– \eta (d\Bx \cross \BH)
+ I (d\Bx \cross \BE )
+ I \eta (d\Bx \cdot \BH),
\end{aligned}
\end{equation}
the current integrand expands to
\begin{equation}\label{eqn:flux:200}
\begin{aligned}
I \ncap J
&=
I \ncap
\lr{
\frac{\rho}{\epsilon} – \eta \BJ + I \lr{ c \rho_\txtm – \BM }
} \\
&=
\ncap I \frac{\rho}{\epsilon} – \eta \ncap I \BJ – \ncap c \rho_\txtm + \ncap \BM \\
&=
\ncap \cdot \BM
+ \eta (\ncap \cross \BJ)
– \ncap c \rho_\txtm
+ I (\ncap \cross \BM)
+ \ncap I \frac{\rho}{\epsilon}
– \eta I (\ncap \cdot \BJ).
\end{aligned}
\end{equation}

We are left with
\begin{equation}\label{eqn:flux:220}
\begin{aligned}
\oint_{\partial A}
\lr{
d\Bx \cdot \BE + I \eta (d\Bx \cdot \BH)
}
&=
\int_A dA
\lr{
\ncap \cdot \BM – \eta I (\ncap \cdot \BJ)
} \\
\oint_{\partial A}
\lr{
– \eta (d\Bx \cross \BH)
+ I (d\Bx \cross \BE )
}
&=
\int_A dA
\lr{
\eta (\ncap \cross \BJ)
– \ncap c \rho_\txtm
+ I (\ncap \cross \BM)
+ \ncap I \frac{\rho}{\epsilon}
-\PD{n}{} \lr{ I \BE – \eta \BH }
}.
\end{aligned}
\end{equation}
This is a crazy mess of dots, crosses, fields and sources. We can split it into one equation for each grade, which will probably look a little more regular. That is
\begin{equation}\label{eqn:flux:240}
\begin{aligned}
\oint_{\partial A} d\Bx \cdot \BE &= \int_A dA \ncap \cdot \BM \\
\oint_{\partial A} d\Bx \cross \BH
&=
\int_A dA
\lr{
– \ncap \cross \BJ
+ \frac{ \ncap \rho_\txtm }{\mu}
– \PD{n}{\BH}
} \\
\oint_{\partial A} d\Bx \cross \BE &=
\int_A dA
\lr{
\ncap \cross \BM
+ \frac{\ncap \rho}{\epsilon}
– \PD{n}{\BE}
} \\
\oint_{\partial A} d\Bx \cdot \BH &= -\int_A dA \ncap \cdot \BJ \\
\end{aligned}
\end{equation}
The first and last equations could have been obtained much more easily from Maxwell’s equations in their conventional form more easily. The two cross product equations with the normal derivatives are not familiar to me, even without the fictitious magnetic sources. It is somewhat remarkable that so much can be packed into one multivector equation:
\begin{equation}\label{eqn:flux:260}
\oint_{\partial A} d\Bx F
=
I \int_A dA \lr{ \ncap J – \PD{n}{F} }.
\end{equation}

References

[1] A. Macdonald. Vector and Geometric Calculus. CreateSpace Independent Publishing Platform, 2012.

ECE1505H Convex Optimization. Lecture 7: Examples of convex and concave functions, local and global minimums. Taught by Prof. Stark Draper

February 2, 2017 Uncategorized No comments , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Today

  • Local and global optimality
  • Compositions of functions
  • Examples

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:20}
\begin{aligned}
F(x) &= x^2 \\
F”(x) &= 2 > 0
\end{aligned}
\end{equation}

strictly convex.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:40}
\begin{aligned}
F(x) &= x^3 \\
F”(x) &= 6 x.
\end{aligned}
\end{equation}

Not always non-negative, so not convex. However \( x^3 \) is convex on \( \textrm{dom} F = \mathbb{R}_{+} \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:60}
\begin{aligned}
F(x) &= x^\alpha \\
F'(x) &= \alpha x^{\alpha-1} \\
F”(x) &= \alpha(\alpha-1) x^{\alpha-2}.
\end{aligned}
\end{equation}

 

fig. 1. Powers of x.

This is convex on \( \mathbb{R}_{+} \), if \( \alpha \ge 1 \), or \( \alpha \le 0 \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:80}
\begin{aligned}
F(x) &= \log x \\
F'(x) &= \inv{x} \\
F”(x) &= -\inv{x^2} \le 0
\end{aligned}
\end{equation}

This is concave.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:100}
\begin{aligned}
F(x) &= x\log x \\
F'(x) &= \log x + x \inv{x} = 1 + \log x \\
F”(x) &= \inv{x}
\end{aligned}
\end{equation}

This is strictly convex on
\( \mathbb{R}_{++} \), where
\( F”(x) \ge 0 \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:120}
\begin{aligned}
F(x) &= e^{\alpha x} \\
F'(x) &= \alpha e^{\alpha x} \\
F”(x) &= \alpha^2 e^{\alpha x} \ge 0
\end{aligned}
\end{equation}

fig. 2. Exponential.

Such functions are plotted in fig. 2, and are convex function for all \( \alpha \).

Example:

For symmetric \( P \in S^n \)

\begin{equation}\label{eqn:convexOptimizationLecture7:140}
\begin{aligned}
F(\Bx) &= \Bx^\T P \Bx + 2 \Bq^\T \Bx + r \\
\spacegrad F &= (P + P^\T) \Bx + 2 \Bq = 2 P \Bx + 2 \Bq \\
\spacegrad^2 F &= 2 P.
\end{aligned}
\end{equation}

This is convex(concave) if \( P \ge 0 \) (\( P \le 0\)).

Example:

A quadratic function

\begin{equation}\label{eqn:convexOptimizationLecture7:780}
F(x, y) = x^2 + y^2 + 3 x y,
\end{equation}

that is neither convex nor concave is plotted in fig 3.

fig 3. Function with saddle point (3d and contours)

This function can be put in matrix form

\begin{equation}\label{eqn:convexOptimizationLecture7:160}
F(x, y) = x^2 + y^2 + 3 x y
=
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
1 & 1.5 \\
1.5 & 1
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix},
\end{equation}

and has the Hessian

\begin{equation}\label{eqn:convexOptimizationLecture7:180}
\begin{aligned}
\spacegrad^2 F
&=
\begin{bmatrix}
\partial_{xx} F & \partial_{xy} F \\
\partial_{yx} F & \partial_{yy} F \\
\end{bmatrix} \\
&=
\begin{bmatrix}
2 & 3 \\
3 & 2
\end{bmatrix} \\
&= 2 P.
\end{aligned}
\end{equation}

From the plot we know that this is not PSD, but this can be confirmed by checking the eigenvalues

\begin{equation}\label{eqn:convexOptimizationLecture7:200}
\begin{aligned}
0
&=
\det ( P – \lambda I ) \\
&=
(1 – \lambda)^2 – 1.5^2,
\end{aligned}
\end{equation}

which has solutions

\begin{equation}\label{eqn:convexOptimizationLecture7:220}
\lambda = 1 \pm \frac{3}{2} = \frac{3}{2}, -\frac{1}{2}.
\end{equation}

This is not PSD nor negative semi-definite, because it has one positive and one negative eigenvalues. This is neither convex nor concave.

Along \( y = -x \),

\begin{equation}\label{eqn:convexOptimizationLecture7:240}
\begin{aligned}
F(x,y)
&=
F(x,-x) \\
&=
2 x^2 – 3 x^2 \\
&=
– x^2,
\end{aligned}
\end{equation}

so it is concave along this line. Along \( y = x \)

\begin{equation}\label{eqn:convexOptimizationLecture7:260}
\begin{aligned}
F(x,y)
&=
F(x,x) \\
&=
2 x^2 + 3 x^2 \\
&=
5 x^2,
\end{aligned}
\end{equation}

so it is convex along this line.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:280}
F(\Bx) = \sqrt{ x_1 x_2 },
\end{equation}

on \( \textrm{dom} F = \setlr{ x_1 \ge 0, x_2 \ge 0 } \)

For the Hessian
\begin{equation}\label{eqn:convexOptimizationLecture7:300}
\begin{aligned}
\PD{x_1}{F} &= \frac{1}{2} x_1^{-1/2} x_2^{1/2} \\
\PD{x_2}{F} &= \frac{1}{2} x_2^{-1/2} x_1^{1/2}
\end{aligned}
\end{equation}

The Hessian components are

\begin{equation}\label{eqn:convexOptimizationLecture7:320}
\begin{aligned}
\PD{x_1}{} \PD{x_1}{F} &= -\frac{1}{4} x_1^{-3/2} x_2^{1/2} \\
\PD{x_1}{} \PD{x_2}{F} &= \frac{1}{4} x_2^{-1/2} x_1^{-1/2} \\
\PD{x_2}{} \PD{x_1}{F} &= \frac{1}{4} x_1^{-1/2} x_2^{-1/2} \\
\PD{x_2}{} \PD{x_2}{F} &= -\frac{1}{4} x_2^{-3/2} x_1^{1/2}
\end{aligned}
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture7:340}
\spacegrad^2 F
=
-\frac{\sqrt{x_1 x_2}}{4}
\begin{bmatrix}
\inv{x_1^2} & -\inv{x_1 x_2} \\
-\inv{x_1 x_2} & \inv{x_2^2}
\end{bmatrix}.
\end{equation}

Checking this for PSD against \( \Bv = (v_1, v_2) \), we have
\begin{equation}\label{eqn:convexOptimizationLecture7:360}
\begin{aligned}
\begin{bmatrix}
v_1 & v_2
\end{bmatrix}
\begin{bmatrix}
\inv{x_1^2} & -\inv{x_1 x_2} \\
-\inv{x_1 x_2} & \inv{x_2^2}
\end{bmatrix}
\begin{bmatrix}
v_1 \\ v_2
\end{bmatrix}
&=
\begin{bmatrix}
v_1 & v_2
\end{bmatrix}
\begin{bmatrix}
\inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 \\
-\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2
\end{bmatrix} \\
&=
\lr{ \inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 } v_1 +
\lr{ -\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2 } v_2
\\
&=
\inv{x_1^2} v_1^2
+ \inv{x_2^2} v_2^2
-2 \inv{x_1 x_2} v_1 v_2 \\
&=
\lr{
\frac{v_1}{x_1}
-\frac{v_2}{x_2}
}^2 \\
&\ge 0,
\end{aligned}
\end{equation}

so \( \spacegrad^2 F \le 0 \). This is a negative semi-definite function (concave). Observe that this check required checking PSD for all values of \( \Bx \).

This is an example of a more general result

\begin{equation}\label{eqn:convexOptimizationLecture7:380}
F(x) = \lr{ \prod_{i = 1}^n x_i }^{1/n},
\end{equation}

which is concave (prove on homework).

Summary.

If \( F \) is differentiable in \R{n}, then check the curvature of the function along all lines. i.e. At all locations and in all directions.

If the Hessian is PSD at all \( \Bx \in \textrm{dom} F \), that is

\begin{equation}\label{eqn:convexOptimizationLecture7:400}
\spacegrad^2 F \ge 0 \, \forall \Bx \in \textrm{dom} F,
\end{equation}

then the function is convex.

more examples of convex, but not necessarily differentiable functions

Example:

Over \( \textrm{dom} F = \mathbb{R}^n \)

\begin{equation}\label{eqn:convexOptimizationLecture7:420}
F(\Bx) = \max_{i = 1}^n x_i
\end{equation}

i.e.
\begin{equation}\label{eqn:convexOptimizationLecture7:440}
\begin{aligned}
F((1,2) &= 2 \\
F((3,-1) &= 3
\end{aligned}
\end{equation}

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:460}
F(\Bx) = \max_{i = 1}^n F_i(\Bx),
\end{equation}

where

\begin{equation}\label{eqn:convexOptimizationLecture7:480}
F_i(\Bx)
=
… ?
\end{equation}

max of a set of convex functions is a convex function.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:500}
F(x) =
x_{[1]} +
x_{[2]} +
x_{[3]}
\end{equation}

where

\( x_{[k]} \) is the k-th largest number in the list

Write

\begin{equation}\label{eqn:convexOptimizationLecture7:520}
F(x) = \max x_i + x_j + x_k
\end{equation}

\begin{equation}\label{eqn:convexOptimizationLecture7:540}
(i,j,k) \in \binom{n}{3}
\end{equation}

Example:

For \( \Ba \in \mathbb{R}^n \) and \( b_i \in \mathbb{R} \)

\begin{equation}\label{eqn:convexOptimizationLecture7:560}
\begin{aligned}
F(\Bx)
&= \sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )^{-1} \\
&= -\sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )
\end{aligned}
\end{equation}

This \( b_i – \Ba^\T \Bx \) is an affine function of \( \Bx \) so it doesn’t affect convexity.

Since \( \log \) is concave, \( -\log \) is convex. Convex functions of affine function of \( \Bx \) is convex function of \( \Bx \).

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:580}
F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By }
\end{equation}

 

fig. 3. Max length function

 

Here \( C \subseteq \mathbb{R}^n \) is not necessarily convex. We are using \( \sup \) here because the set \( C \) may be open. This function is the length of the line from \( \Bx \) to the point in \( C \) that is furthest from \( \Bx \).

  • \( \Bx – \By \) is linear in \( \Bx \)
  • \( g_\By(\Bx) = \Norm{\Bx – \By} \) is convex in \( \Bx \) since norms are convex functions.
  • \( F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By } \). Each \( \By \) index is a convex function. Taking max of those.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:600}
F(\Bx) = \inf_{\By \in C} \Norm{ \Bx – \By }.
\end{equation}

Min and max of two convex functions are plotted in fig. 4.

fig. 4. Min and max

The max is observed to be convex, whereas the min is not necessarily so.

\begin{equation}\label{eqn:convexOptimizationLecture7:800}
F(\Bz) = F(\theta \Bx + (1-\theta) \By) \ge \theta F(\Bx) + (1-\theta)F(\By).
\end{equation}

This is not necessarily convex for all sets \( C \subseteq \mathbb{R}^n \), because the \( \inf \) of a bunch of convex function is not necessarily convex. However, if \( C \) is convex, then \( F(\Bx) \) is convex.

Consequences of convexity for differentiable functions

  • Think about unconstrained functions \( \textrm{dom} F = \mathbb{R}^n \).
  • By first order condition \( F \) is convex iff the domain is convex and
    \begin{equation}\label{eqn:convexOptimizationLecture7:620}
    F(\Bx) \ge \lr{ \spacegrad F(\Bx)}^\T (\By – \Bx) \, \forall \Bx, \By \in \textrm{dom} F.
    \end{equation}

If \( F \) is convex and one can find an \( \Bx^\conj \in \textrm{dom} F \) such that

\begin{equation}\label{eqn:convexOptimizationLecture7:640}
\spacegrad F(\Bx^\conj) = 0,
\end{equation}

then

\begin{equation}\label{eqn:convexOptimizationLecture7:660}
F(\By) \ge F(\Bx^\conj) \, \forall \By \in \textrm{dom} F.
\end{equation}

If you can find the point where the gradient is zero (which can’t always be found), then \( \Bx^\conj\) is a global minimum of \( F \).

Conversely, if \( \Bx^\conj \) is a global minimizer of \( F \), then \( \spacegrad F(\Bx^\conj) = 0 \) must hold. If that were not the case, then you would be able to find a direction to move downhill, contracting the optimality of \( \Bx^\conj\).

Local vs Global optimum

 

fig. 6. Global and local minimums

Definition: Local optimum
\( \Bx^\conj \) is a local optimum of \( F \) if \( \exists \epsilon > 0 \) such that \( \forall \Bx \), \( \Norm{\Bx – \Bx^\conj} < \epsilon \), we have

\begin{equation*}
F(\Bx^\conj) \le F(\Bx)
\end{equation*}

 

fig. 5. min length function

Theorem:
Suppose \( F \) is twice continuously differentiable (not necessarily convex)

  • If \( \Bx^\conj\) is a local optimum then\begin{equation*}
    \begin{aligned}
    \spacegrad F(\Bx^\conj) &= 0 \\
    \spacegrad^2 F(\Bx^\conj) \ge 0
    \end{aligned}
    \end{equation*}
  • If
    \begin{equation*}
    \begin{aligned}
    \spacegrad F(\Bx^\conj) &= 0 \\
    \spacegrad^2 F(\Bx^\conj) \ge 0
    \end{aligned},
    \end{equation*}then \( \Bx^\conj\) is a local optimum.

Proof:

  • Let \( \Bx^\conj \) be a local optimum. Pick any \( \Bv \in \mathbb{R}^n \).\begin{equation}\label{eqn:convexOptimizationLecture7:720}
    \lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t}
    = \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv
    \ge 0.
    \end{equation}

Here the fraction is \( \ge 0 \) since \( \Bx^\conj \) is a local optimum.

Since the choice of \( \Bv \) is arbitrary, the only case that you can ensure that \( \ge 0, \forall \Bv \) is

\begin{equation}\label{eqn:convexOptimizationLecture7:740}
\spacegrad F = 0,
\end{equation}

( or else could pick \( \Bv = -\spacegrad F(\Bx^\conj) \).

This means that \( \spacegrad F(\Bx^\conj) = 0 \) if \( \Bx^\conj \) is a local optimum.

Consider the 2nd order derivative

\begin{equation}\label{eqn:convexOptimizationLecture7:760}
\begin{aligned}
\lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t^2}
&=
\lim_{t \rightarrow 0} \inv{t^2}
\lr{
F(\Bx^\conj) + t \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv + \inv{2} t^2 \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv + O(t^3)
– F(\Bx^\conj)
} \\
&=
\inv{2} \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv \\
&\ge 0.
\end{aligned}
\end{equation}

Here the \( \ge \) condition also comes from the fraction, based on the optimiality of \( \Bx^\conj \). This is true for all choice of \( \Bv \), thus \( \spacegrad^2 F(\Bx^\conj) \).

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

ECE1505H Convex Optimization. Lecture 6: First and second order conditions. Taught by Prof.\ Stark Draper

February 1, 2017 ece1505 No comments , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Today

  • First and second order conditions for convexity of differentiable functions.
  • Consequences of convexity: local and global optimality.
  • Properties.

Quasi-convex

\( F_1 \) and \( F_2 \) convex implies \( \max( F_1, F_2) \) convex.

 

fig. 1. Min and Max

Note that \( \min(F_1, F_2) \) is NOT convex.

If \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is convex, then \( F( \Bx_0 + t \Bv ) \) is convex in \( t\,\forall t \in \mathbb{R}, \Bx_0 \in \mathbb{R}^n, \Bv \in \mathbb{R}^n \), provided \( \Bx_0 + t \Bv \in \textrm{dom} F \).

Idea: Restrict to a line (line segment) in \( \textrm{dom} F \). Take a cross section or slice through \( F \) alone the line. If the result is a 1D convex function for all slices, then \( F \) is convex.

This is nice since it allows for checking for convexity, and is also nice numerically. Attempting to test a given data set for non-convexity with some random lines can help disprove convexity. However, to show that \( F \) is convex it is required to test all possible slices (which isn’t possible numerically, but is in some circumstances possible analytically).

Differentiable (convex) functions

Definition: First order condition.

If

\begin{equation*}
F : \mathbb{R}^n \rightarrow \mathbb{R}
\end{equation*}

is differentiable, then \( F \) is convex iff \( \textrm{dom} F \) is a convex set and \( \forall \Bx, \Bx_0 \in \textrm{dom} F \)

\begin{equation*}
F(\Bx) \ge F(\Bx_0) + \lr{\spacegrad F(\Bx_0)}^\T (\Bx – \Bx_0).
\end{equation*}

This is the first order Taylor expansion. If \( n = 1 \), this is \( F(x) \ge F(x_0) + F'(x_0) ( x – x_0) \).

The first order condition says a convex function \underline{always} lies above its first order approximation, as sketched in fig. 3.

 

fig. 2. First order approximation lies below convex function

When differentiable, the supporting plane is the tangent plane.

Definition: Second order condition

If \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is twice differentiable, then \( F \) is convex iff \( \textrm{dom} F \) is a convex set and \( \spacegrad^2 F(\Bx) \ge 0 \,\forall \Bx \in \textrm{dom} F\).

The Hessian is always symmetric, but is not necessarily positive. Recall that the Hessian is the matrix of the second order partials \( (\spacegrad F)_{ij} = \partial^2 F/(\partial x_i \partial x_j) \).

The scalar case is \( F”(x) \ge 0 \, \forall x \in \textrm{dom} F \).

An implication is that if \( F \) is convex, then \( F(x) \ge F(x_0) + F'(x_0) (x – x_0) \,\forall x, x_0 \in \textrm{dom} F\)

Since \( F \) is convex, \( \textrm{dom} F \) is convex.

Consider any 2 points \( x, y \in \textrm{dom} F \), and \( \theta \in [0,1] \). Define

\begin{equation}\label{eqn:convexOptimizationLecture6:60}
z = (1-\theta) x + \theta y \in \textrm{dom} F,
\end{equation}

then since \( \textrm{dom} F \) is convex

\begin{equation}\label{eqn:convexOptimizationLecture6:80}
F(z) =
F( (1-\theta) x + \theta y )
\le
(1-\theta) F(x) + \theta F(y )
\end{equation}

Reordering

\begin{equation}\label{eqn:convexOptimizationLecture6:220}
\theta F(x) \ge
\theta F(x) + F(z) – F(x),
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture6:100}
F(y) \ge
F(x) + \frac{F(x + \theta(y-x)) – F(x)}{\theta},
\end{equation}

which is, in the limit,

\begin{equation}\label{eqn:convexOptimizationLecture6:120}
F(y) \ge
F(x) + F'(x) (y – x),
\end{equation}

completing one direction of the proof.

To prove the other direction, showing that

\begin{equation}\label{eqn:convexOptimizationLecture6:140}
F(x) \ge F(x_0) + F'(x_0) (x – x_0),
\end{equation}

implies that \( F \) is convex. Take any \( x, y \in \textrm{dom} F \) and any \( \theta \in [0,1] \). Define

\begin{equation}\label{eqn:convexOptimizationLecture6:160}
z = \theta x + (1 -\theta) y,
\end{equation}

which is in \( \textrm{dom} F \) by assumption. We want to show that

\begin{equation}\label{eqn:convexOptimizationLecture6:180}
F(z) \le \theta F(x) + (1-\theta) F(y).
\end{equation}

By assumption

  1. \( F(x) \ge F(z) + F'(z) (x – z) \)
  2. \( F(y) \ge F(z) + F'(z) (y – z) \)

Compute

\begin{equation}\label{eqn:convexOptimizationLecture6:200}
\begin{aligned}
\theta F(x) + (1-\theta) F(y)
&\ge
\theta \lr{ F(z) + F'(z) (x – z) }
+ (1-\theta) \lr{ F(z) + F'(z) (y – z) } \\
&=
F(z) + F'(z) \lr{ \theta( x – z) + (1-\theta) (y-z) } \\
&=
F(z) + F'(z) \lr{ \theta x + (1-\theta) y – \theta z – (1 -\theta) z } \\
&=
F(z) + F'(z) \lr{ \theta x + (1-\theta) y – z} \\
&=
F(z) + F'(z) \lr{ z – z} \\
&= F(z).
\end{aligned}
\end{equation}

Proof of the 2nd order case for \( n = 1 \)

Want to prove that if

\begin{equation}\label{eqn:convexOptimizationLecture6:240}
F : \mathbb{R} \rightarrow \mathbb{R}
\end{equation}

is a convex function, then \( F”(x) \ge 0 \,\forall x \in \textrm{dom} F \).

By the first order conditions \( \forall x \ne y \in \textrm{dom} F \)

\begin{equation}\label{eqn:convexOptimizationLecture6:260}
\begin{aligned}
F(y) &\ge F(x) + F'(x) (y – x)
F(x) &\ge F(y) + F'(y) (x – y)
\end{aligned}
\end{equation}

Can combine and get

\begin{equation}\label{eqn:convexOptimizationLecture6:280}
F'(x) (y-x) \le F(y) – F(x) \le F'(y)(y-x)
\end{equation}

Subtract the two derivative terms for

\begin{equation}\label{eqn:convexOptimizationLecture6:340}
\frac{(F'(y) – F'(x))(y – x)}{(y – x)^2} \ge 0,
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture6:300}
\frac{F'(y) – F'(x)}{y – x} \ge 0.
\end{equation}

In the limit as \( y \rightarrow x \), this is
\begin{equation}\label{eqn:convexOptimizationLecture6:320}
\boxed{
F”(x) \ge 0 \,\forall x \in \textrm{dom} F.
}
\end{equation}

Now prove the reverse condition:

If \( F”(x) \ge 0 \,\forall x \in \textrm{dom} F \subseteq \mathbb{R} \), implies that \( F : \mathbb{R} \rightarrow \mathbb{R} \) is convex.

Note that if \( F”(x) \ge 0 \), then \( F'(x) \) is non-decreasing in \( x \).

i.e. If \( x < y \), where \( x, y \in \textrm{dom} F\), then

\begin{equation}\label{eqn:convexOptimizationLecture6:360}
F'(x) \le F'(y).
\end{equation}

Consider any \( x,y \in \textrm{dom} F\) such that \( x < y \), where

\begin{equation}\label{eqn:convexOptimizationLecture6:380}
F(y) – F(x) = \int_x^y F'(t) dt \ge F'(x) \int_x^y 1 dt = F'(x) (y-x).
\end{equation}

This tells us that

\begin{equation}\label{eqn:convexOptimizationLecture6:400}
F(y) \ge F(x) + F'(x)(y – x),
\end{equation}

which is the first order condition. Similarly consider any \( x,y \in \textrm{dom} F\) such that \( x < y \), where

\begin{equation}\label{eqn:convexOptimizationLecture6:420}
F(y) – F(x) = \int_x^y F'(t) dt \le F'(y) \int_x^y 1 dt = F'(y) (y-x).
\end{equation}

This tells us that

\begin{equation}\label{eqn:convexOptimizationLecture6:440}
F(x) \ge F(y) + F'(y)(x – y).
\end{equation}

Vector proof:

\( F \) is convex iff \( F(\Bx + t \Bv) \) is convex \( \forall \Bx,\Bv \in \mathbb{R}^n, t \in \mathbb{R} \), keeping \( \Bx + t \Bv \in \textrm{dom} F\).

Let
\begin{equation}\label{eqn:convexOptimizationLecture6:460}
h(t ; \Bx, \Bv) = F(\Bx + t \Bv)
\end{equation}

then \( h(t) \) satisfies scalar first and second order conditions for all \( \Bx, \Bv \).

\begin{equation}\label{eqn:convexOptimizationLecture6:480}
h(t) = F(\Bx + t \Bv) = F(g(t)),
\end{equation}

where \( g(t) = \Bx + t \Bv \), where

\begin{equation}\label{eqn:convexOptimizationLecture6:500}
\begin{aligned}
F &: \mathbb{R}^n \rightarrow \mathbb{R} \\
g &: \mathbb{R} \rightarrow \mathbb{R}^n.
\end{aligned}
\end{equation}

This is expressing \( h(t) \) as a composition of two functions. By the first order condition for scalar functions we know that

\begin{equation}\label{eqn:convexOptimizationLecture6:520}
h(t) \ge h(0) + h'(0) t.
\end{equation}

Note that

\begin{equation}\label{eqn:convexOptimizationLecture6:540}
h(0) = \evalbar{F(\Bx + t \Bv)}{t = 0} = F(\Bx).
\end{equation}

Let’s figure out what \( h'(0) \) is. Recall hat for any \( \tilde{F} : \mathbb{R}^n \rightarrow \mathbb{R}^m \)

\begin{equation}\label{eqn:convexOptimizationLecture6:560}
D \tilde{F} \in \mathbb{R}^{m \times n},
\end{equation}

and
\begin{equation}\label{eqn:convexOptimizationLecture6:580}
{D \tilde{F}(\Bx)}_{ij} = \PD{x_j}{\tilde{F_i}(\Bx)}
\end{equation}

This is one function per row, for \( i \in [1,m], j \in [1,n] \). This gives

\begin{equation}\label{eqn:convexOptimizationLecture6:600}
\begin{aligned}
\frac{d}{dt} F(\Bx + \Bv t)
&=
\frac{d}{dt} F( g(t) ) \\
&=
\frac{d}{dt} h(t) \\
&= D h(t) \\
&= D F(g(t)) \cdot D g(t)
\end{aligned}
\end{equation}

The first matrix is in \( \mathbb{R}^{1\times n} \) whereas the second is in \( \mathbb{R}^{n\times 1} \), since \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) and \( g : \mathbb{R} \rightarrow \mathbb{R}^n \). This gives

\begin{equation}\label{eqn:convexOptimizationLecture6:620}
\frac{d}{dt} F(\Bx + \Bv t)
= \evalbar{D F(\tilde{\Bx})}{\tilde{\Bx} = g(t)} \cdot D g(t).
\end{equation}

That first matrix is

\begin{equation}\label{eqn:convexOptimizationLecture6:640}
\begin{aligned}
\evalbar{D F(\tilde{\Bx})}{\tilde{\Bx} = g(t)}
&=
\evalbar{
\lr{\begin{bmatrix}
\PD{\tilde{x}_1}{ F(\tilde{\Bx})} &
\PD{\tilde{x}_2}{ F(\tilde{\Bx})} & \cdots
\PD{\tilde{x}_n}{ F(\tilde{\Bx})}
\end{bmatrix}
}}{ \tilde{\Bx} = g(t) = \Bx + t \Bv } \\
&=
\evalbar{
\lr{ \spacegrad F(\tilde{\Bx}) }^\T
}{
\tilde{\Bx} = g(t)
} \\
=
\lr{ \spacegrad F(g(t)) }^\T.
\end{aligned}
\end{equation}

The second Jacobian is

\begin{equation}\label{eqn:convexOptimizationLecture6:660}
D g(t)
=
D
\begin{bmatrix}
g_1(t) \\
g_2(t) \\
\vdots \\
g_n(t) \\
\end{bmatrix}
=
D
\begin{bmatrix}
x_1 + t v_1 \\
x_2 + t v_2 \\
\vdots \\
x_n + t v_n \\
\end{bmatrix}
=
\begin{bmatrix}
v_1 \\
v_1 \\
\vdots \\
v_n \\
\end{bmatrix}
=
\Bv.
\end{equation}

so

\begin{equation}\label{eqn:convexOptimizationLecture6:680}
h'(t) = D h(t) = \lr{ \spacegrad F(g(t))}^\T \Bv,
\end{equation}

and
\begin{equation}\label{eqn:convexOptimizationLecture6:700}
h'(0) = \lr{ \spacegrad F(g(0))}^\T \Bv
=
\lr{ \spacegrad F(\Bx)}^\T \Bv.
\end{equation}

Finally

\begin{equation}\label{eqn:convexOptimizationLecture6:720}
\begin{aligned}
F(\Bx + t \Bv)
&\ge h(0) + h'(0) t \\
&= F(\Bx) + \lr{ \spacegrad F(\Bx) }^\T (t \Bv) \\
&= F(\Bx) + \innerprod{ \spacegrad F(\Bx) }{ t \Bv}.
\end{aligned}
\end{equation}

Which is true for all \( \Bx, \Bx + t \Bv \in \textrm{dom} F \). Note that the quantity \( t \Bv \) is a shift.

Epigraph

Recall that if \( (\Bx, t) \in \textrm{epi} F \) then \( t \ge F(\Bx) \).

\begin{equation}\label{eqn:convexOptimizationLecture6:740}
t \ge F(\Bx) \ge F(\Bx_0) + \lr{\spacegrad F(\Bx_0) }^\T (\Bx – \Bx_0),
\end{equation}

or

\begin{equation}\label{eqn:convexOptimizationLecture6:760}
0 \ge
-(t – F(\Bx_0)) + \lr{\spacegrad F(\Bx_0) }^\T (\Bx – \Bx_0),
\end{equation}

In block matrix form

\begin{equation}\label{eqn:convexOptimizationLecture6:780}
0 \ge
\begin{bmatrix}
\lr{ \spacegrad F(\Bx_0) }^\T & -1
\end{bmatrix}
\begin{bmatrix}
\Bx – \Bx_0 \\
t – F(\Bx_0)
\end{bmatrix}
\end{equation}

With \( \Bw =
\begin{bmatrix}
\lr{ \spacegrad F(\Bx_0) }^\T & -1
\end{bmatrix} \), the geometry of the epigraph relation to the half plane is sketched in fig. 3.

 

fig. 3. Half planes and epigraph.

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

ECE1505H Convex Optimization. Lecture 3: Matrix functions, SVD, and types of Sets. Taught by Prof. Stark Draper

January 19, 2017 ece1505 No comments , , , , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper.

Matrix inner product

Given real matrices \( X, Y \in \mathbb{R}^{m\times n} \), one possible matrix inner product definition is

\begin{equation}\label{eqn:convexOptimizationLecture3:20}
\begin{aligned}
\innerprod{X}{Y}
&= \textrm{Tr}( X^\T Y) \\
&= \textrm{Tr} \lr{ \sum_{k = 1}^m X_{ki} Y_{kj} } \\
&= \sum_{k = 1}^m \sum_{j = 1}^n X_{kj} Y_{kj} \\
&= \sum_{i = 1}^m \sum_{j = 1}^n X_{ij} Y_{ij}.
\end{aligned}
\end{equation}

This inner product induces a norm on the (matrix) vector space, called the Frobenius norm

\begin{equation}\label{eqn:convexOptimizationLecture3:40}
\begin{aligned}
\Norm{X }_F
&= \textrm{Tr}( X^\T X) \\
&= \sqrt{ \innerprod{X}{X} } \\
&=
\sum_{i = 1}^m \sum_{j = 1}^n X_{ij}^2.
\end{aligned}
\end{equation}

Range, nullspace.

Definition: Range: Given \( A \in \mathbb{R}^{m \times n} \), the range of A is the set:

\begin{equation*}
\mathcal{R}(A) = \setlr{ A \Bx | \Bx \in \mathbb{R}^n }.
\end{equation*}

Definition: Nullspace: Given \( A \in \mathbb{R}^{m \times n} \), the nullspace of A is the set:

\begin{equation*}
\mathcal{N}(A) = \setlr{ \Bx | A \Bx = 0 }.
\end{equation*}

SVD.

To understand operation of \( A \in \mathbb{R}^{m \times n} \), a representation of a linear transformation from \R{n} to \R{m}, decompose \( A \) using the singular value decomposition (SVD).

Definition: SVD: Given \( A \in \mathbb{R}^{m \times n} \), an operator on \( \Bx \in \mathbb{R}^n \), a decomposition of the following form is always possible

\begin{equation*}
\begin{aligned}
A &= U \Sigma V^\T \\
U &\in \mathbb{R}^{m \times r} \\
V &\in \mathbb{R}^{n \times r},
\end{aligned}
\end{equation*}

where \( r \) is the rank of \(A\), and both \( U \) and \( V \) are orthogonal

\begin{equation*}
\begin{aligned}
U^\T U &= I \in \mathbb{R}^{r \times r} \\
V^\T V &= I \in \mathbb{R}^{r \times r}.
\end{aligned}
\end{equation*}

Here \( \Sigma = \textrm{diag}( \sigma_1, \sigma_2, \cdots, \sigma_r ) \), is a diagonal matrix of “singular” values, where

\begin{equation*}
\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r.
\end{equation*}

For simplicity consider square case \( m = n \)

\begin{equation}\label{eqn:convexOptimizationLecture3:100}
A \Bx = \lr{ U \Sigma V^\T } \Bx.
\end{equation}

The first product \( V^\T \Bx \) is a rotation, which can be checked by looking at the length

\begin{equation}\label{eqn:convexOptimizationLecture3:120}
\begin{aligned}
\Norm{ V^\T \Bx}_2
&= \sqrt{ \Bx^\T V V^\T \Bx } \\
&= \sqrt{ \Bx^\T \Bx } \\
&= \Norm{ \Bx }_2,
\end{aligned}
\end{equation}

which shows that the length of the vector is unchanged after application of the linear transformation represented by \( V^\T \) so that operation must be a rotation.

Similarly the operation of \( U \) on \( \Sigma V^\T \Bx \) also must be a rotation. The operation \( \Sigma = [\sigma_i]_i \) applies a scaling operation to each component of the vector \( V^\T \Bx \).

All linear (square) transformations can therefore be thought of as a rotate-scale-rotate operation. Often the \( A \) of interest will be symmetric \( A = A^\T \).

Set of symmetric matrices

Let \( S^n \) be the set of real, symmetric \( n \times n \) matrices.

Theorem: Spectral theorem: When \( A \in S^n \) then it is possible to factor \( A \) as

\begin{equation*}
A = Q \Lambda Q^\T,
\end{equation*}

where \( Q \) is an orthogonal matrix, and \( \Lambda = \textrm{diag}( \lambda_1, \lambda_2, \cdots \lambda_n)\). Here \( \lambda_i \in \mathbb{R} \, \forall i \) are the (real) eigenvalues of \( A \).

A real symmetric matrix \( A \in S^n\) is “positive semi-definite” if

\begin{equation*}
\Bv^\T A \Bv \ge 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0,
\end{equation*}
and is “positive definite” if

\begin{equation*}
\Bv^\T A \Bv > 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0.
\end{equation*}

The set of such matrices is denoted \( S^n_{+} \), and \( S^n_{++} \) respectively.

Consider \( A \in S^n_{+} \) (or \( S^n_{++} \) )

\begin{equation}\label{eqn:convexOptimizationLecture3:200}
A = Q \Lambda Q^\T,
\end{equation}

possible since the matrix is symmetric. For such a matrix

\begin{equation}\label{eqn:convexOptimizationLecture3:220}
\begin{aligned}
\Bv^\T A \Bv
&=
\Bv^\T Q \Lambda A^\T \Bv \\
&=
\Bw^\T \Lambda \Bw,
\end{aligned}
\end{equation}

where \( \Bw = A^\T \Bv \). Such a product is

\begin{equation}\label{eqn:convexOptimizationLecture3:240}
\Bv^\T A \Bv
=
\sum_{i = 1}^n \lambda_i w_i^2.
\end{equation}

So, if \( \lambda_i \ge 0 \) (\(\lambda_i > 0 \) ) then \( \sum_{i = 1}^n \lambda_i w_i^2 \) is non-negative (positive) \( \forall \Bw \in \mathbb{R}^n, \Bw \ne 0 \). Since \( \Bw \) is just a rotated version of \( \Bv \) this also holds for all \( \Bv \). A necessary and sufficient condition for \( A \in S^n_{+} \) (\( S^n_{++} \) ) is \( \lambda_i \ge 0 \) (\(\lambda_i > 0\)).

Square root of positive semi-definite matrix

Real symmetric matrix power relationships such as

\begin{equation}\label{eqn:convexOptimizationLecture3:260}
A^2
=
Q \Lambda Q^\T
Q \Lambda Q^\T
=
Q \Lambda^2
Q^\T
,
\end{equation}

or more generally \( A^k = Q \Lambda^k Q^\T,\, k \in \mathbb{Z} \), can be further generalized to non-integral powers. In particular, the square root (non-unique) of a square matrix can be written

\begin{equation}\label{eqn:convexOptimizationLecture3:280}
A^{1/2} = Q
\begin{bmatrix}
\sqrt{\lambda_1} & & & \\
& \sqrt{\lambda_2} & & \\
& & \ddots & \\
& & & \sqrt{\lambda_n} \\
\end{bmatrix}
Q^\T,
\end{equation}

since \( A^{1/2} A^{1/2} = A \), regardless of the sign picked for the square roots in question.

Functions of matrices

Consider \( F : S^n \rightarrow \mathbb{R} \), and define

\begin{equation}\label{eqn:convexOptimizationLecture3:300}
F(X) = \log \det X,
\end{equation}

Here \( \textrm{dom} F = S^n_{++} \). The task is to find \( \spacegrad F \), which can be done by looking at the perturbation \( \log \det ( X + \Delta X ) \)

\begin{equation}\label{eqn:convexOptimizationLecture3:320}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det ( X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2}) X^{1/2} ) \\
&=
\log \det ( X (I + X^{-1/2} \Delta X X^{-1/2}) ) \\
&=
\log \det X + \log \det (I + X^{-1/2} \Delta X X^{-1/2}).
\end{aligned}
\end{equation}

Let \( X^{-1/2} \Delta X X^{-1/2} = M \) where \( \lambda_i \) are the eigenvalues of \( M : M \Bv = \lambda_i \Bv \) when \( \Bv \) is an eigenvector of \( M \). In particular

\begin{equation}\label{eqn:convexOptimizationLecture3:340}
(I + M) \Bv =
(1 + \lambda_i) \Bv,
\end{equation}

where \( 1 + \lambda_i \) are the eigenvalues of the \( I + M \) matrix. Since the determinant is the product of the eigenvalues, this gives

\begin{equation}\label{eqn:convexOptimizationLecture3:360}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det X +
\log \prod_{i = 1}^n (1 + \lambda_i) \\
&=
\log \det X +
\sum_{i = 1}^n \log (1 + \lambda_i).
\end{aligned}
\end{equation}

If \( \lambda_i \) are sufficiently “small”, then \( \log ( 1 + \lambda_i ) \approx \lambda_i \), giving

\begin{equation}\label{eqn:convexOptimizationLecture3:380}
\log \det ( X + \Delta X )
=
\log \det X +
\sum_{i = 1}^n \lambda_i
\approx
\log \det X +
\textrm{Tr}( X^{-1/2} \Delta X X^{-1/2} ).
\end{equation}

Since
\begin{equation}\label{eqn:convexOptimizationLecture3:400}
\textrm{Tr}( A B ) = \textrm{Tr}( B A ),
\end{equation}

this trace operation can be written as

\begin{equation}\label{eqn:convexOptimizationLecture3:420}
\log \det ( X + \Delta X )
\approx
\log \det X +
\textrm{Tr}( X^{-1} \Delta X )
=
\log \det X +
\innerprod{ X^{-1}}{\Delta X},
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:440}
\spacegrad F(X) = X^{-1}.
\end{equation}

To check this, consider the simplest example with \( X \in \mathbb{R}^{1 \times 1} \), where we have

\begin{equation}\label{eqn:convexOptimizationLecture3:460}
\frac{d}{dX} \lr{ \log \det X } = \frac{d}{dX} \lr{ \log X } = \inv{X} = X^{-1}.
\end{equation}

This is a nice example demonstrating how the gradient can be obtained by performing a first order perturbation of the function. The gradient can then be read off from the result.

Second order perturbations

  • To get first order approximation found the part that varied linearly in \( \Delta X \).
  • To get the second order part, perturb \( X^{-1} \) by \( \Delta X \) and see how that perturbation varies in \( \Delta X \).

For \( G(X) = X^{-1} \), this is

\begin{equation}\label{eqn:convexOptimizationLecture3:480}
\begin{aligned}
(X + \Delta X)^{-1}
&=
\lr{ X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2} ) X^{1/2} }^{-1} \\
&=
X^{-1/2} (I + X^{-1/2} \Delta X X^{-1/2} )^{-1} X^{-1/2}
\end{aligned}
\end{equation}

To be proven in the homework (for “small” A)

\begin{equation}\label{eqn:convexOptimizationLecture3:500}
(I + A)^{-1} \approx I – A.
\end{equation}

This gives

\begin{equation}\label{eqn:convexOptimizationLecture3:520}
\begin{aligned}
(X + \Delta X)^{-1}
&=
X^{-1/2} (I – X^{-1/2} \Delta X X^{-1/2} ) X^{-1/2} \\
&=
X^{-1} – X^{-1} \Delta X X^{-1},
\end{aligned}
\end{equation}

or

\begin{equation}\label{eqn:convexOptimizationLecture3:800}
\begin{aligned}
G(X + \Delta X)
&= G(X) + (D G) \Delta X \\
&= G(X) + (\spacegrad G)^\T \Delta X,
\end{aligned}
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:820}
(\spacegrad G)^\T \Delta X
=
– X^{-1} \Delta X X^{-1}.
\end{equation}

The Taylor expansion of \( F \) to second order is

\begin{equation}\label{eqn:convexOptimizationLecture3:840}
F(X + \Delta X)
=
F(X)
+
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
+
\inv{2}
\lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}.
\end{equation}

The first trace can be expressed as an inner product

\begin{equation}\label{eqn:convexOptimizationLecture3:860}
\begin{aligned}
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
&=
\innerprod{ \spacegrad F }{\Delta X} \\
&=
\innerprod{ X^{-1} }{\Delta X}.
\end{aligned}
\end{equation}

The second trace also has the structure of an inner product

\begin{equation}\label{eqn:convexOptimizationLecture3:880}
\begin{aligned}
(\Delta X)^\T (\spacegrad^2 F) \Delta X
&=
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X} \\
&=
\innerprod{ (\spacegrad^2 F)^\T \Delta X }{\Delta X},
\end{aligned}
\end{equation}

where a no-op trace could be inserted in the second order term since that quadratic form is already a scalar. This \( (\spacegrad^2 F)^\T \Delta X \) term has essentially been found implicitly by performing the linear variation of \( \spacegrad F \) in \( \Delta X \), showing that we must have

\begin{equation}\label{eqn:convexOptimizationLecture3:900}
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}
=
\innerprod{ – X^{-1} \Delta X X^{-1} }{\Delta X},
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:560}
F( X + \Delta X) = F(X) +
\innerprod{X^{-1}}{\Delta X}
+\inv{2} \innerprod{-X^{-1} \Delta X X^{-1}}{\Delta X},
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture3:580}
\log \det ( X + \Delta X) = \log \det X +
\textrm{Tr}( X^{-1} \Delta X )
– \inv{2} \textrm{Tr}( X^{-1} \Delta X X^{-1} \Delta X ).
\end{equation}

Convex Sets

  • Types of sets: Affine, convex, cones
  • Examples: Hyperplanes, polyhedra, balls, ellipses, norm balls, cone of PSD matrices.

Definition: Affine set:

A set \( C \subseteq \mathbb{R}^n \) is affine if \( \forall \Bx_1, \Bx_2 \in C \) then

\begin{equation*}
\theta \Bx_1 + (1 -\theta) \Bx_2 \in C, \qquad \forall \theta \in \mathbb{R}.
\end{equation*}

The affine sum above can
be rewritten as

\begin{equation}\label{eqn:convexOptimizationLecture3:600}
\Bx_2 + \theta (\Bx_1 – \Bx_2).
\end{equation}

Since \( \theta \) is a scaling, this is the line containing \( \Bx_2 \) in the direction between \( \Bx_1 \) and \( \Bx_2 \).

Observe that the solution to a set of linear equations

\begin{equation}\label{eqn:convexOptimizationLecture3:620}
C = \setlr{ \Bx | A \Bx = \Bb },
\end{equation}

is an affine set. To check, note that

\begin{equation}\label{eqn:convexOptimizationLecture3:640}
\begin{aligned}
A (\theta \Bx_1 + (1 – \theta) \Bx_2)
&=
\theta A \Bx_1 + (1 – \theta) A \Bx_2 \\
&=
\theta \Bb + (1 – \theta) \Bb \\
&= \Bb.
\end{aligned}
\end{equation}

Definition: Affine combination: An affine combination of points \( \Bx_1, \Bx_2, \cdots \Bx_n \) is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that for \( \theta_i \in \mathbb{R} \)

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1.
\end{equation*}

An affine set contains all affine combinations of points in the set. Examples of a couple affine sets are sketched in fig 1.1

For comparison, a couple of non-affine sets are sketched in fig 1.2

 

Definition: Convex set: A set \( C \subseteq \mathbb{R}^n \) is convex if \( \forall \Bx_1, \Bx_2 \in C \) and \( \forall \theta \in \mathbb{R}, \theta \in [0,1] \), the combination

\begin{equation}\label{eqn:convexOptimizationLecture3:700}
\theta \Bx_1 + (1 – \theta) \Bx_2 \in C.
\end{equation}

Definition: Convex combination: A convex combination of \( \Bx_1, \Bx_2, \cdots \Bx_n \) is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that \( \forall \theta_i \ge 0 \)

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1
\end{equation*}

Definition: Convex hull: Convex hull of a set \( C \) is a set of all convex combinations of points in \(C\), denoted

\begin{equation}\label{eqn:convexOptimizationLecture3:720}
\textrm{conv}(C) = \setlr{ \sum_{i=1}^n \theta_i \Bx_i | \Bx_i \in C, \theta_i \ge 0, \sum_{i=1}^n \theta_i = 1 }.
\end{equation}

A non-convex set can be converted into a convex hull by filling in all the combinations of points connecting points in the set, as sketched in fig 1.3.

Definition: Cone: A set \(C\) is a cone if \( \forall \Bx \in C \) and \( \forall \theta \ge 0 \) we have \( \theta \Bx \in C\).

This scales out if \(\theta > 1\) and scales in if \(\theta < 1\).

A convex cone is a cone that is also a convex set. A conic combination is

\begin{equation*}
\sum_{i=1}^n \theta_i \Bx_i, \theta_i \ge 0.
\end{equation*}

A convex and non-convex 2D cone is sketched in fig. 1.4

A comparison of properties for different set types is tabulated in table 1.1

Hyperplanes and half spaces

Definition: Hyperplane: A hyperplane is defined by

\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb, \Ba \ne 0 }.
\end{equation*}

A line and plane are examples of this general construct as sketched in
fig. 1.5

An alternate view is possible should one
find any specific \( \Bx_0 \) such that \( \Ba^\T \Bx_0 = \Bb \)

\begin{equation}\label{eqn:convexOptimizationLecture3:740}
\setlr{\Bx | \Ba^\T \Bx = b }
=
\setlr{\Bx | \Ba^\T (\Bx -\Bx_0) = 0 }
\end{equation}

This shows that \( \Bx – \Bx_0 = \Ba^\perp \) is perpendicular to \( \Ba \), or

\begin{equation}\label{eqn:convexOptimizationLecture3:780}
\Bx
=
\Bx_0 + \Ba^\perp.
\end{equation}

This is the subspace perpendicular to \( \Ba \) shifted by \(\Bx_0\), subject to \( \Ba^\T \Bx_0 = \Bb \). As a set

\begin{equation}\label{eqn:convexOptimizationLecture3:760}
\Ba^\perp = \setlr{ \Bv | \Ba^\T \Bv = 0 }.
\end{equation}

Half space

Definition: Half space: The half space is defined as
\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb }
= \setlr{ \Bx | \Ba^\T (\Bx – \Bx_0) \le 0 }.
\end{equation*}

This can also be expressed as \( \setlr{ \Bx | \innerprod{ \Ba }{\Bx – \Bx_0 } \le 0 } \).

Jacobian and Hessian matrices

January 15, 2017 ece1505 No comments , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Motivation

In class this Friday the Jacobian and Hessian matrices were introduced, but I did not find the treatment terribly clear. Here is an alternate treatment, beginning with the gradient construction from [2], which uses a nice trick to frame the multivariable derivative operation as a single variable Taylor expansion.

Multivariable Taylor approximation

The Taylor series expansion for a scalar function \( g : {\mathbb{R}} \rightarrow {\mathbb{R}} \) about the origin is just

\begin{equation}\label{eqn:jacobianAndHessian:20}
g(t) = g(0) + t g'(0) + \frac{t^2}{2} g”(0) + \cdots
\end{equation}

In particular

\begin{equation}\label{eqn:jacobianAndHessian:40}
g(1) = g(0) + g'(0) + \frac{1}{2} g”(0) + \cdots
\end{equation}

Now consider \( g(t) = f( \Bx + \Ba t ) \), where \( f : {\mathbb{R}}^n \rightarrow {\mathbb{R}} \), \( g(0) = f(\Bx) \), and \( g(1) = f(\Bx + \Ba) \). The multivariable Taylor expansion now follows directly

\begin{equation}\label{eqn:jacobianAndHessian:60}
f( \Bx + \Ba)
= f(\Bx)
+ \evalbar{\frac{df(\Bx + \Ba t)}{dt}}{t = 0} + \frac{1}{2} \evalbar{\frac{d^2f(\Bx + \Ba t)}{dt^2}}{t = 0} + \cdots
\end{equation}

The first order term is

\begin{equation}\label{eqn:jacobianAndHessian:80}
\begin{aligned}
\evalbar{\frac{df(\Bx + \Ba t)}{dt}}{t = 0}
&=
\sum_{i = 1}^n
\frac{d( x_i + a_i t)}{dt}
\evalbar{\PD{(x_i + a_i t)}{f(\Bx + \Ba t)}}{t = 0} \\
&=
\sum_{i = 1}^n
a_i
\PD{x_i}{f(\Bx)} \\
&= \Ba \cdot \spacegrad f.
\end{aligned}
\end{equation}

Similarily, for the second order term

\begin{equation}\label{eqn:jacobianAndHessian:100}
\begin{aligned}
\evalbar{\frac{d^2 f(\Bx + \Ba t)}{dt^2}}{t = 0}
&=
\evalbar{\lr{
\frac{d}{dt}
\lr{
\sum_{i = 1}^n
a_i
\PD{(x_i + a_i t)}{f(\Bx + \Ba t)}
}
}
}{t = 0} \\
&=
\evalbar{
\lr{
\sum_{j = 1}^n
\frac{d(x_j + a_j t)}{dt}
\sum_{i = 1}^n
a_i
\frac{\partial^2 f(\Bx + \Ba t)}{\partial (x_j + a_j t) \partial (x_i + a_i t) }
}
}{t = 0} \\
&=
\sum_{i,j = 1}^n a_i a_j \frac{\partial^2 f}{\partial x_i \partial x_j} \\
&=
(\Ba \cdot \spacegrad)^2 f.
\end{aligned}
\end{equation}

The complete Taylor expansion of a scalar function \( f : {\mathbb{R}}^n \rightarrow {\mathbb{R}} \) is therefore

\begin{equation}\label{eqn:jacobianAndHessian:120}
f(\Bx + \Ba)
= f(\Bx) +
\Ba \cdot \spacegrad f +
\inv{2} \lr{ \Ba \cdot \spacegrad}^2 f + \cdots,
\end{equation}

so the Taylor expansion has an exponential structure

\begin{equation}\label{eqn:jacobianAndHessian:140}
f(\Bx + \Ba) = \sum_{k = 0}^\infty \inv{k!} \lr{ \Ba \cdot \spacegrad}^k f = e^{\Ba \cdot \spacegrad} f.
\end{equation}

Should an approximation of a vector valued function \( \Bf : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m \) be desired it is only required to form a matrix of the components

\begin{equation}\label{eqn:jacobianAndHessian:160}
\Bf(\Bx + \Ba)
= \Bf(\Bx) +
[\Ba \cdot \spacegrad f_i]_i +
\inv{2} [\lr{ \Ba \cdot \spacegrad}^2 f_i]_i + \cdots,
\end{equation}

where \( [.]_i \) denotes a column vector over the rows \( i \in [1,m] \), and \( f_i \) are the coordinates of \( \Bf \).

The Jacobian matrix

In [1] the Jacobian \( D \Bf \) of a function \( \Bf : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m \) is defined in terms of the limit of the \( l_2 \) norm ratio

\begin{equation}\label{eqn:jacobianAndHessian:180}
\frac{\Norm{\Bf(\Bz) – \Bf(\Bx) – (D \Bf) (\Bz – \Bx)}_2 }{ \Norm{\Bz – \Bx}_2 },
\end{equation}

with the statement that the function \( \Bf \) has a derivative if this limit exists. Here the Jacobian \( D \Bf \in {\mathbb{R}}^{m \times n} \) must be matrix valued.

Let \( \Bz = \Bx + \Ba \), so the first order expansion of \ref{eqn:jacobianAndHessian:160} is

\begin{equation}\label{eqn:jacobianAndHessian:200}
\Bf(\Bz)
= \Bf(\Bx) + [\lr{ \Bz – \Bx } \cdot \spacegrad f_i]_i
.
\end{equation}

With the (unproven) assumption that this Taylor expansion satisfies the norm limit criteria of \ref{eqn:jacobianAndHessian:180}, it is possible to extract the structure of the Jacobian by comparison

\begin{equation}\label{eqn:jacobianAndHessian:220}
\begin{aligned}
(D \Bf)
(\Bz – \Bx)
&=
{\begin{bmatrix}
\lr{ \Bz – \Bx } \cdot \spacegrad f_i
\end{bmatrix}}_i \\
&=
{\begin{bmatrix}
\sum_{j = 1}^n (z_j – x_j) \PD{x_j}{f_i}
\end{bmatrix}}_i \\
&=
{\begin{bmatrix}
\PD{x_j}{f_i}
\end{bmatrix}}_{ij}
(\Bz – \Bx),
\end{aligned}
\end{equation}

so
\begin{equation}\label{eqn:jacobianAndHessian:240}
\boxed{
(D \Bf)_{ij} = \PD{x_j}{f_i}
}
\end{equation}

Written out explictly as a matrix the Jacobian is

\begin{equation}\label{eqn:jacobianAndHessian:320}
D \Bf
=
\begin{bmatrix}
\PD{x_1}{f_1} & \PD{x_2}{f_1} & \cdots & \PD{x_n}{f_1} \\
\PD{x_1}{f_2} & \PD{x_2}{f_2} & \cdots & \PD{x_n}{f_2} \\
\vdots & \vdots & & \vdots \\
\PD{x_1}{f_m} & \PD{x_2}{f_m} & \cdots & \PD{x_n}{f_m} \\
\end{bmatrix}
=
\begin{bmatrix}
(\spacegrad f_1)^\T \\
(\spacegrad f_2)^\T \\
\vdots \\
(\spacegrad f_m)^\T
\end{bmatrix}.
\end{equation}

In particular, when the function is scalar valued
\begin{equation}\label{eqn:jacobianAndHessian:261}
D f = (\spacegrad f)^\T.
\end{equation}

With this notation, the first Taylor expansion, in terms of the Jacobian matrix is

\begin{equation}\label{eqn:jacobianAndHessian:260}
\boxed{
\Bf(\Bz)
\approx \Bf(\Bx) + (D \Bf) \lr{ \Bz – \Bx }.
}
\end{equation}

The Hessian matrix

For scalar valued functions, the text expresses the second order expansion of a function in terms of the Jacobian and Hessian matrices

\begin{equation}\label{eqn:jacobianAndHessian:271}
f(\Bz)
\approx f(\Bx) + (D f) \lr{ \Bz – \Bx }
+ \inv{2} \lr{ \Bz – \Bx }^\T (\spacegrad^2 f) \lr{ \Bz – \Bx }.
\end{equation}

Because \( \spacegrad^2 \) is the usual notation for a Laplacian operator, this \( \spacegrad^2 f \in {\mathbb{R}}^{n \times n}\) notation for the Hessian matrix is not ideal in my opinion. Ignoring that notational objection for this class, the structure of the Hessian matrix can be extracted by comparison with the coordinate expansion

\begin{equation}\label{eqn:jacobianAndHessian:300}
\Ba^\T (\spacegrad^2 f) \Ba
=
\sum_{r,s = 1}^n a_r a_s \frac{\partial^2 f}{\partial x_r \partial x_s}
\end{equation}

so
\begin{equation}\label{eqn:jacobianAndHessian:280}
\boxed{
(\spacegrad^2 f)_{ij}
=
\frac{\partial^2 f_i}{\partial x_i \partial x_j}.
}
\end{equation}

In explicit matrix form the Hessian is

\begin{equation}\label{eqn:jacobianAndHessian:340}
\spacegrad^2 f
=
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1 \partial x_1} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_1 \partial x_n} \\
\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2 \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_2 \partial x_n} \\
\vdots & \vdots & & \vdots \\
\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_n \partial x_n}
\end{bmatrix}.
\end{equation}

Is there a similar nice matrix structure for the Hessian of a function \( f : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m \)?

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[2] D. Hestenes. New Foundations for Classical Mechanics. Kluwer Academic Publishers, 1999.

A comparison of Geometric Algebra electrodynamic potential methods

January 7, 2017 math and physics play No comments , , , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Motivation

Geometric algebra (GA) allows for a compact description of Maxwell’s equations in either an explicit 3D representation or a STA (SpaceTime Algebra [2]) representation. The 3D GA and STA representations Maxwell’s equation both the form

\begin{equation}\label{eqn:potentialMethods:1280}
L \boldsymbol{\mathcal{F}} = J,
\end{equation}

where \( J \) represents the sources, \( L \) is a multivector gradient operator that includes partial derivative operator components for each of the space and time coordinates, and

\begin{equation}\label{eqn:potentialMethods:1020}
\boldsymbol{\mathcal{F}} = \boldsymbol{\mathcal{E}} + \eta I \boldsymbol{\mathcal{H}},
\end{equation}

is an electromagnetic field multivector, \( I = \Be_1 \Be_2 \Be_3 \) is the \R{3} pseudoscalar, and \( \eta = \sqrt{\mu/\epsilon} \) is the impedance of the media.

When Maxwell’s equations are extended to include magnetic sources in addition to conventional electric sources (as used in antenna-theory [1] and microwave engineering [3]), they take the form

\begin{equation}\label{eqn:chapter3Notes:20}
\spacegrad \cross \boldsymbol{\mathcal{E}} = – \boldsymbol{\mathcal{M}} – \PD{t}{\boldsymbol{\mathcal{B}}}
\end{equation}
\begin{equation}\label{eqn:chapter3Notes:40}
\spacegrad \cross \boldsymbol{\mathcal{H}} = \boldsymbol{\mathcal{J}} + \PD{t}{\boldsymbol{\mathcal{D}}}
\end{equation}
\begin{equation}\label{eqn:chapter3Notes:60}
\spacegrad \cdot \boldsymbol{\mathcal{D}} = q_{\textrm{e}}
\end{equation}
\begin{equation}\label{eqn:chapter3Notes:80}
\spacegrad \cdot \boldsymbol{\mathcal{B}} = q_{\textrm{m}}.
\end{equation}

The corresponding GA Maxwell equations in their respective 3D and STA forms are

\begin{equation}\label{eqn:potentialMethods:300}
\lr{ \spacegrad + \inv{v} \PD{t}{} } \boldsymbol{\mathcal{F}}
=
\eta
\lr{ v q_{\textrm{e}} – \boldsymbol{\mathcal{J}} }
+ I \lr{ v q_{\textrm{m}} – \boldsymbol{\mathcal{M}} }
\end{equation}
\begin{equation}\label{eqn:potentialMethods:320}
\grad \boldsymbol{\mathcal{F}} = \eta J – I M,
\end{equation}

where the wave group velocity in the medium is \( v = 1/\sqrt{\epsilon\mu} \), and the medium is isotropic with
\( \boldsymbol{\mathcal{B}} = \mu \boldsymbol{\mathcal{H}} \), and \( \boldsymbol{\mathcal{D}} = \epsilon \boldsymbol{\mathcal{E}} \). In the STA representation, \( \grad, J, M \) are all four-vectors, the specific meanings of which will be spelled out below.

How to determine the potential equations and the field representation using the conventional distinct Maxwell’s \ref{eqn:chapter3Notes:20}, … is well known. The basic procedure is to consider the electric and magnetic sources in turn, and observe that in each case one of the electric or magnetic fields must have a curl representation. The STA approach is similar, except that it can be observed that the field must have a four-curl representation for each type of source. In the explicit 3D GA formalism
\ref{eqn:potentialMethods:300} how to formulate a natural potential representation is not as obvious. There is no longer an reason to set any component of the field equal to a curl, and the representation of the four curl from the STA approach is awkward. Additionally, it is not obvious what form gauge invariance takes in the 3D GA representation.

Ideas explored in these notes

  • GA representation of Maxwell’s equations including magnetic sources.
  • STA GA formalism for Maxwell’s equations including magnetic sources.
  • Explicit form of the GA potential representation including both electric and magnetic sources.
  • Demonstration of exactly how the 3D and STA potentials are related.
  • Explore the structure of gauge transformations when magnetic sources are included.
  • Explore the structure of gauge transformations in the 3D GA formalism.
  • Specify the form of the Lorentz gauge in the 3D GA formalism.

Traditional vector algebra

No magnetic sources

When magnetic sources are omitted, it follows from \ref{eqn:chapter3Notes:80} that there is some \( \boldsymbol{\mathcal{A}}^{\mathrm{e}} \) for which

\begin{equation}\label{eqn:potentialMethods:20}
\boxed{
\boldsymbol{\mathcal{B}} = \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}},
}
\end{equation}

Substitution into Faraday’s law \ref{eqn:chapter3Notes:20} gives

\begin{equation}\label{eqn:potentialMethods:40}
\spacegrad \cross \boldsymbol{\mathcal{E}} = – \PD{t}{}\lr{ \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}} },
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:60}
\spacegrad \cross \lr{ \boldsymbol{\mathcal{E}} + \PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } = 0.
\end{equation}

A gradient representation of this curled quantity, say \( -\spacegrad \phi \), will provide the required zero

\begin{equation}\label{eqn:potentialMethods:80}
\boxed{
\boldsymbol{\mathcal{E}} = -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }.
}
\end{equation}

The final two Maxwell equations yield

\begin{equation}\label{eqn:potentialMethods:100}
\begin{aligned}
-\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \spacegrad \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} } &= \mu \lr{ \boldsymbol{\mathcal{J}} + \epsilon \PD{t}{} \lr{ -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } } \\
\spacegrad \cdot \lr{ -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } &= q_e/\epsilon,
\end{aligned}
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:120}
\boxed{
\begin{aligned}
\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{e}} – \inv{v^2} \PDSq{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \spacegrad \lr{
\inv{v^2} \PD{t}{\phi}
+\spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}}
}
&= -\mu \boldsymbol{\mathcal{J}} \\
\spacegrad^2 \phi + \PD{t}{} \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} } &= -q_e/\epsilon.
\end{aligned}
}
\end{equation}

Note that the Lorentz condition \( \PDi{t}{(\phi/v^2)} + \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} = 0 \) can be imposed to decouple these, leaving non-homogeneous wave equations for the vector and scalar potentials respectively.

No electric sources

Without electric sources, a curl representation of the electric field can be assumed, satisfying Gauss’s law

\begin{equation}\label{eqn:potentialMethods:140}
\boxed{
\boldsymbol{\mathcal{D}} = – \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}}.
}
\end{equation}

Substitution into the Maxwell-Faraday law gives
\begin{equation}\label{eqn:potentialMethods:160}
\spacegrad \cross \lr{ \boldsymbol{\mathcal{H}} + \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}} } = 0.
\end{equation}

This is satisfied with any gradient, say, \( -\spacegrad \phi_m \), providing a potential representation for the magnetic field

\begin{equation}\label{eqn:potentialMethods:180}
\boxed{
\boldsymbol{\mathcal{H}} = -\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}.
}
\end{equation}

The remaining Maxwell equations provide the required constraints on the potentials

\begin{equation}\label{eqn:potentialMethods:220}
-\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{m}} + \spacegrad \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } = -\epsilon
\lr{
-\boldsymbol{\mathcal{M}} – \mu \PD{t}{}
\lr{
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}
}
}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:240}
\spacegrad \cdot
\lr{
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}
}
= \inv{\mu} q_m,
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:260}
\boxed{
\begin{aligned}
\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{m}} – \inv{v^2} \PDSq{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}} – \spacegrad \lr{ \inv{v^2} \PD{t}{\phi_m} + \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } &= -\epsilon \boldsymbol{\mathcal{M}} \\
\spacegrad^2 \phi_m + \PD{t}{}\lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } &= -\inv{\mu} q_m.
\end{aligned}
}
\end{equation}

The general solution to Maxwell’s equations is therefore
\begin{equation}\label{eqn:potentialMethods:280}
\begin{aligned}
\boldsymbol{\mathcal{E}} &=
-\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \inv{\epsilon} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
\boldsymbol{\mathcal{H}} &=
\inv{\mu} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}}
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}},
\end{aligned}
\end{equation}

subject to the constraints \ref{eqn:potentialMethods:120} and \ref{eqn:potentialMethods:260}.

Potential operator structure

Knowing that there is a simple underlying structure to the potential representation of the electromagnetic field in the STA formalism inspires the question of whether that structure can be found directly using the scalar and vector potentials determined above.

Specifically, what is the multivector representation \ref{eqn:potentialMethods:1020} of the electromagnetic field in terms of all the individual potential variables, and can an underlying structure for that field representation be found? The composite field is

\begin{equation}\label{eqn:potentialMethods:280b}
\boldsymbol{\mathcal{F}}
=
-\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \inv{\epsilon} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
+ I \eta
\lr{
\inv{\mu} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{e}}
-\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}
}.
\end{equation}

Can this be factored into into multivector operator and multivector potentials? Expanding the cross products provides some direction

\begin{equation}\label{eqn:potentialMethods:1040}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
– \PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \eta \PD{t}{I \boldsymbol{\mathcal{A}}^{\mathrm{m}}}
– \spacegrad \lr{ \phi – \eta I \phi_m } \\
&\quad + \frac{\eta}{2 \mu} \lr{ \rspacegrad \boldsymbol{\mathcal{A}}^{\mathrm{e}} – \boldsymbol{\mathcal{A}}^{\mathrm{e}} \lspacegrad }
+ \frac{1}{2 \epsilon} \lr{ \rspacegrad I \boldsymbol{\mathcal{A}}^{\mathrm{m}} – I \boldsymbol{\mathcal{A}}^{\mathrm{m}} \lspacegrad }.
\end{aligned}
\end{equation}

Observe that the
gradient and the time partials can be grouped together

\begin{equation}\label{eqn:potentialMethods:1060}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
– \PD{t}{ } \lr{\boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \boldsymbol{\mathcal{A}}^{\mathrm{m}}}
– \spacegrad \lr{ \phi + \eta I \phi_m }
+ \frac{v}{2} \lr{ \rspacegrad (\boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \boldsymbol{\mathcal{A}}^{\mathrm{m}}) – (\boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \boldsymbol{\mathcal{A}}^{\mathrm{m}}) \lspacegrad } \\
&=
\inv{2} \lr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} } \lr{ v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}} }

\lr{ v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}} \lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
} \\
&+\quad \inv{2} \lr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} } \lr{ -\phi – \eta I \phi_m }
– \lr{ \phi + \eta I \phi_m } \lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
}
,
\end{aligned}
\end{equation}

or

\begin{equation}\label{eqn:potentialMethods:1080}
\boxed{
\boldsymbol{\mathcal{F}}
=
\inv{2} \Biglr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} }
\lr{
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
}

\lr{
\phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
+ \eta I \phi_m
}
\lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
}
.
}
\end{equation}

There’s a conjugate structure to the potential on each side of the curl operation where we see a sign change for the scalar and pseudoscalar elements only. The reason for this becomes more clear in the STA formalism.

Potentials in the STA formalism.

Maxwell’s equation in its explicit 3D form \ref{eqn:potentialMethods:300} can be
converted to STA form, by introducing a four-vector basis \( \setlr{ \gamma_\mu } \), where the spatial basis
\( \setlr{ \Be_k = \gamma_k \gamma_0 } \)
is expressed in terms of the Dirac basis \( \setlr{ \gamma_\mu } \).
By multiplying from the left with \( \gamma_0 \) a STA form of Maxwell’s equation
\ref{eqn:potentialMethods:320}
is obtained,
where
\begin{equation}\label{eqn:potentialMethods:340}
\begin{aligned}
J &= \gamma^\mu J_\mu = ( v q_e, \boldsymbol{\mathcal{J}} ) \\
M &= \gamma^\mu M_\mu = ( v q_m, \boldsymbol{\mathcal{M}} ) \\
\grad &= \gamma^\mu \partial_\mu = ( (1/v) \partial_t, \spacegrad ) \\
I &= \gamma_0 \gamma_1 \gamma_2 \gamma_3,
\end{aligned}
\end{equation}

Here the metric choice is \( \gamma_0^2 = 1 = -\gamma_k^2 \). Note that in this representation the electromagnetic field \( \boldsymbol{\mathcal{F}} = \boldsymbol{\mathcal{E}} + \eta I \boldsymbol{\mathcal{H}} \) is a bivector, not a multivector as it is explicit (frame dependent) 3D representation of \ref{eqn:potentialMethods:300}.

A potential representation can be obtained as before by considering electric and magnetic sources in sequence and using superposition to assemble a complete potential.

No magnetic sources

Without magnetic sources, Maxwell’s equation splits into vector and trivector terms of the form

\begin{equation}\label{eqn:potentialMethods:380}
\grad \cdot \boldsymbol{\mathcal{F}} = \eta J
\end{equation}
\begin{equation}\label{eqn:potentialMethods:400}
\grad \wedge \boldsymbol{\mathcal{F}} = 0.
\end{equation}

A four-vector curl representation of the field will satisfy \ref{eqn:potentialMethods:400} allowing an immediate potential solution

\begin{equation}\label{eqn:potentialMethods:560}
\boxed{
\begin{aligned}
&\boldsymbol{\mathcal{F}} = \grad \wedge {A^{\mathrm{e}}} \\
&\grad^2 {A^{\mathrm{e}}} – \grad \lr{ \grad \cdot {A^{\mathrm{e}}} } = \eta J.
\end{aligned}
}
\end{equation}

This can be put into correspondence with \ref{eqn:potentialMethods:120} by noting that

\begin{equation}\label{eqn:potentialMethods:460}
\begin{aligned}
\grad^2 &= (\gamma^\mu \partial_\mu) \cdot (\gamma^\nu \partial_\nu) = \inv{v^2} \partial_{tt} – \spacegrad^2 \\
\gamma_0 {A^{\mathrm{e}}} &= \gamma_0 \gamma^\mu {A^{\mathrm{e}}}_\mu = {A^{\mathrm{e}}}_0 + \Be_k {A^{\mathrm{e}}}_k = {A^{\mathrm{e}}}_0 + \BA^{\mathrm{e}} \\
\gamma_0 \grad &= \gamma_0 \gamma^\mu \partial_\mu = \inv{v} \partial_t + \spacegrad \\
\grad \cdot {A^{\mathrm{e}}} &= \partial_\mu {A^{\mathrm{e}}}^\mu = \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}},
\end{aligned}
\end{equation}

so multiplying from the left with \( \gamma_0 \) gives

\begin{equation}\label{eqn:potentialMethods:480}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \lr{ {A^{\mathrm{e}}}_0 + \BA^{\mathrm{e}} } – \lr{ \inv{v} \partial_t + \spacegrad }\lr{ \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}} } = \eta( v q_e – \boldsymbol{\mathcal{J}} ),
\end{equation}

or

\begin{equation}\label{eqn:potentialMethods:520}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \BA^{\mathrm{e}} – \spacegrad \lr{ \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}} } = -\eta \boldsymbol{\mathcal{J}}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:540}
\spacegrad^2 {A^{\mathrm{e}}}_0 – \inv{v} \partial_t \lr{ \spacegrad \cdot \BA^{\mathrm{e}} } = -q_e/\epsilon.
\end{equation}

So \( {A^{\mathrm{e}}}_0 = \phi \) and \( -\ifrac{\BA^{\mathrm{e}}}{v} = \boldsymbol{\mathcal{A}}^{\mathrm{e}} \), or

\begin{equation}\label{eqn:potentialMethods:600}
\boxed{
{A^{\mathrm{e}}} = \gamma_0\lr{ \phi – v \boldsymbol{\mathcal{A}}^{\mathrm{e}} }.
}
\end{equation}

No electric sources

Without electric sources, Maxwell’s equation now splits into

\begin{equation}\label{eqn:potentialMethods:640}
\grad \cdot \boldsymbol{\mathcal{F}} = 0
\end{equation}
\begin{equation}\label{eqn:potentialMethods:660}
\grad \wedge \boldsymbol{\mathcal{F}} = -I M.
\end{equation}

Here the dual of an STA curl yields a solution

\begin{equation}\label{eqn:potentialMethods:680}
\boxed{
\boldsymbol{\mathcal{F}} = I ( \grad \wedge {A^{\mathrm{m}}} ).
}
\end{equation}

Substituting this gives

\begin{equation}\label{eqn:potentialMethods:720}
\begin{aligned}
0
&=
\grad \cdot (I ( \grad \wedge {A^{\mathrm{m}}} ) ) \\
&=
\gpgradeone{ \grad I ( \grad \wedge {A^{\mathrm{m}}} ) } \\
&=
-I \grad \wedge ( \grad \wedge {A^{\mathrm{m}}} ).
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:740}
\begin{aligned}
-I M
&=
\grad \wedge (I ( \grad \wedge {A^{\mathrm{m}}} ) ) \\
&=
\gpgradethree{ \grad I ( \grad \wedge {A^{\mathrm{m}}} ) } \\
&=
-I \grad \cdot ( \grad \wedge {A^{\mathrm{m}}} ).
\end{aligned}
\end{equation}

The \( \grad \cdot \boldsymbol{\mathcal{F}} \) relation \ref{eqn:potentialMethods:720} is identically zero as desired, leaving

\begin{equation}\label{eqn:potentialMethods:760}
\boxed{
\grad^2 {A^{\mathrm{m}}} – \grad \lr{ \grad \cdot {A^{\mathrm{m}}} }
=
M.
}
\end{equation}

So the general solution with both electric and magnetic sources is

\begin{equation}\label{eqn:potentialMethods:800}
\boxed{
\boldsymbol{\mathcal{F}} = \grad \wedge {A^{\mathrm{e}}} + I (\grad \wedge {A^{\mathrm{m}}}),
}
\end{equation}

subject to the constraints of \ref{eqn:potentialMethods:560} and \ref{eqn:potentialMethods:760}. As before the four-potential \( {A^{\mathrm{m}}} \) can be put into correspondence with the conventional scalar and vector potentials by left multiplying with \( \gamma_0 \), which gives

\begin{equation}\label{eqn:potentialMethods:820}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \lr{ {A^{\mathrm{m}}}_0 + \BA^{\mathrm{m}} } – \lr{ \inv{v} \partial_t + \spacegrad }\lr{ \inv{v} \partial_t {A^{\mathrm{m}}}_0 – \spacegrad \cdot \BA^{\mathrm{m}} } = v q_m – \boldsymbol{\mathcal{M}},
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:860}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \BA^{\mathrm{m}} – \spacegrad \lr{ \inv{v} \partial_t {A^{\mathrm{m}}}_0 – \spacegrad \cdot \BA^{\mathrm{m}} } = – \boldsymbol{\mathcal{M}}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:880}
\spacegrad^2 {A^{\mathrm{m}}}_0 – \inv{v} \partial_t \spacegrad \cdot \BA^{\mathrm{m}} = -v q_m.
\end{equation}

Comparing with \ref{eqn:potentialMethods:260} shows that \( {A^{\mathrm{m}}}_0/v = \mu \phi_m \) and \( -\ifrac{\BA^{\mathrm{m}}}{v^2} = \mu \boldsymbol{\mathcal{A}}^{\mathrm{m}} \), or

\begin{equation}\label{eqn:potentialMethods:900}
\boxed{
{A^{\mathrm{m}}} = \gamma_0 \eta \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} }.
}
\end{equation}

Potential operator structure

Observe that there is an underlying uniform structure of the differential operator that acts on the potential to produce the electromagnetic field. Expressed as a linear operator of the
gradient and the potentials, that is

\( \boldsymbol{\mathcal{F}} = L(\lrgrad, {A^{\mathrm{e}}}, {A^{\mathrm{m}}}) \)

\begin{equation}\label{eqn:potentialMethods:980}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
L(\grad, {A^{\mathrm{e}}}, {A^{\mathrm{m}}}) \\
&= \grad \wedge {A^{\mathrm{e}}} + I (\grad \wedge {A^{\mathrm{m}}}) \\
&=
\inv{2} \lr{ \rgrad {A^{\mathrm{e}}} – {A^{\mathrm{e}}} \lgrad }
+ \frac{I}{2} \lr{ \rgrad {A^{\mathrm{m}}} – {A^{\mathrm{m}}} \lgrad } \\
&=
\inv{2} \lr{ \rgrad {A^{\mathrm{e}}} – {A^{\mathrm{e}}} \lgrad }
+ \frac{1}{2} \lr{ -\rgrad I {A^{\mathrm{m}}} – I {A^{\mathrm{m}}} \lgrad } \\
&=
\inv{2} \lr{ \rgrad ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}}) – ({A^{\mathrm{e}}} + I {A^{\mathrm{m}}}) \lgrad }
,
\end{aligned}
\end{equation}

or
\begin{equation}\label{eqn:potentialMethods:1000}
\boxed{
\boldsymbol{\mathcal{F}}
=
\inv{2} \lr{ \rgrad ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}}) – ({A^{\mathrm{e}}} – I {A^{\mathrm{m}}})^\dagger \lgrad }
.
}
\end{equation}

Observe that \ref{eqn:potentialMethods:1000} can be
put into correspondence with \ref{eqn:potentialMethods:1080} using a factoring of unity \( 1 = \gamma_0 \gamma_0 \)

\begin{equation}\label{eqn:potentialMethods:1100}
\boldsymbol{\mathcal{F}}
=
\inv{2} \lr{ (-\rgrad \gamma_0) (-\gamma_0 ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}})) – (({A^{\mathrm{e}}} + I {A^{\mathrm{m}}}) \gamma_0)(\gamma_0 \lgrad) },
\end{equation}

where

\begin{equation}\label{eqn:potentialMethods:1140}
\begin{aligned}
-\grad \gamma_0
&=
-(\gamma^0 \partial_0 + \gamma^k \partial_k) \gamma_0 \\
&=
-\partial_0 – \gamma^k \gamma_0 \partial_k \\
&=
\spacegrad
-\inv{v} \partial_t
,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:1160}
\begin{aligned}
\gamma_0 \grad
&=
\gamma_0 (\gamma^0 \partial_0 + \gamma^k \partial_k) \\
&=
\partial_0 – \gamma^k \gamma_0 \partial_k \\
&=
\spacegrad
+ \inv{v} \partial_t
,
\end{aligned}
\end{equation}

and
\begin{equation}\label{eqn:potentialMethods:1200}
\begin{aligned}
-\gamma_0 ( {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} )
&=
-\gamma_0 \gamma_0 \lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \\
&=
-\lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \phi_m – \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}} } \\
&=
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:potentialMethods:1220}
\begin{aligned}
( {A^{\mathrm{e}}} + I {A^{\mathrm{m}}} )\gamma_0
&=
\lr{ \gamma_0 \lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} } + I \gamma_0 \eta \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \gamma_0 \\
&=
\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \phi_m + I \eta v \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
&=
\phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}
+ \eta I \phi_m
,
\end{aligned}
\end{equation}

This recovers \ref{eqn:potentialMethods:1080} as desired.

Potentials in the 3D Euclidean formalism

In the conventional scalar plus vector differential representation of Maxwell’s equations \ref{eqn:chapter3Notes:20}…, given electric(magnetic) sources the structure of the electric(magnetic) potential follows from first setting the magnetic(electric) field equal to the curl of a vector potential. The procedure for the STA GA form of Maxwell’s equation was similar, where it was immediately evident that the field could be set to the four-curl of a four-vector potential (or the dual of such a curl for magnetic sources).

In the 3D GA representation, there is no immediate rationale for introducing a curl or the equivalent to a four-curl representation of the field. Reconciliation of this is possible by recognizing that the fact that the field (or a component of it) may be represented by a curl is not actually fundamental. Instead, observe that the two sided gradient action on a potential to generate the electromagnetic field in the STA representation of \ref{eqn:potentialMethods:1000} serves to select the grade two component product of the gradient and the multivector potential \( {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} \), and that this can in fact be written as
a single sided gradient operation on a potential, provided the multivector product is filtered with a four-bivector grade selection operation

\begin{equation}\label{eqn:potentialMethods:1240}
\boxed{
\boldsymbol{\mathcal{F}} = \gpgradetwo{ \grad \lr{ {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} } }.
}
\end{equation}

Similarly, it can be observed that the
specific function of the conjugate structure in the two sided potential representation of
\ref{eqn:potentialMethods:1080}
is to discard all the scalar and pseudoscalar grades in the multivector product. This means that a single sided potential can also be used, provided it is wrapped in a grade selection operation

\begin{equation}\label{eqn:potentialMethods:1260}
\boxed{
\boldsymbol{\mathcal{F}} =
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} }
\lr{
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
} }{1,2}.
}
\end{equation}

It is this grade selection operation that is really the fundamental defining action in the potential of the STA and conventional 3D representations of Maxwell’s equations. So, given Maxwell’s equation in the 3D GA representation, defining a potential representation for the field is really just a demand that the field have the structure

\begin{equation}\label{eqn:potentialMethods:1320}
\boldsymbol{\mathcal{F}} = \gpgrade{ (\alpha \spacegrad + \beta \partial_t)( A_0 + A_1 + I( A_0′ + A_1′ ) }{1,2}.
\end{equation}

This is a mandate that the electromagnetic field is the grades 1 and 2 components of the vector product of space and time derivative operators on a multivector field \( A = \sum_{k=0}^3 A_k = A_0 + A_1 + I( A_0′ + A_1′ ) \) that can potentially have any grade components. There are more degrees of freedom in this specification than required, since the multivector can absorb one of the \( \alpha \) or \( \beta \) coefficients, so without loss of generality, one of these (say \( \alpha\)) can be set to 1.

Expanding \ref{eqn:potentialMethods:1320} gives

\begin{equation}\label{eqn:potentialMethods:1340}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
\spacegrad A_0
+ \beta \partial_t A_1
– \spacegrad \cross A_1′
+ I (\spacegrad \cross A_1
+ \beta \partial_t A_1′
+ \spacegrad A_0′) \\
&=
\boldsymbol{\mathcal{E}} + I \eta \boldsymbol{\mathcal{H}}.
\end{aligned}
\end{equation}

This naturally has all the right mixes of curls, gradients and time derivatives, all following as direct consequences of applying a grade selection operation to the action of a “spacetime gradient” on a general multivector potential.

The conclusion is that the potential representation of the field is

\begin{equation}\label{eqn:potentialMethods:1360}
\boldsymbol{\mathcal{F}} =
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A }{1,2},
\end{equation}

where \( A \) is a multivector potentially containing all grades, where grades 0,1 are required for electric sources, and grades 2,3 are required for magnetic sources. When it is desirable to refer back to the conventional scalar and vector potentials this multivector potential can be written as \( A = -\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ -\phi_m + v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } \).

Gauge transformations

Recall that for electric sources the magnetic field is of the form

\begin{equation}\label{eqn:potentialMethods:1380}
\boldsymbol{\mathcal{B}} = \spacegrad \cross \boldsymbol{\mathcal{A}},
\end{equation}

so adding the gradient of any scalar field to the potential \( \boldsymbol{\mathcal{A}}’ = \boldsymbol{\mathcal{A}} + \spacegrad \psi \)
does not change the magnetic field

\begin{equation}\label{eqn:potentialMethods:1400}
\begin{aligned}
\boldsymbol{\mathcal{B}}’
&= \spacegrad \cross \lr{ \boldsymbol{\mathcal{A}} + \spacegrad \psi } \\
&= \spacegrad \cross \boldsymbol{\mathcal{A}} \\
&= \boldsymbol{\mathcal{B}}.
\end{aligned}
\end{equation}

The electric field with this changed potential is

\begin{equation}\label{eqn:potentialMethods:1420}
\begin{aligned}
\boldsymbol{\mathcal{E}}’
&= -\spacegrad \phi – \partial_t \lr{ \BA + \spacegrad \psi} \\
&= -\spacegrad \lr{ \phi + \partial_t \psi } – \partial_t \BA,
\end{aligned}
\end{equation}

so if
\begin{equation}\label{eqn:potentialMethods:1440}
\phi = \phi’ – \partial_t \psi,
\end{equation}

the electric field will also be unaltered by this transformation.

In the STA representation, the field can similarly be altered by adding any (four)gradient to the potential. For example with only electric sources

\begin{equation}\label{eqn:potentialMethods:1460}
\boldsymbol{\mathcal{F}} = \grad \wedge (A + \grad \psi) = \grad \wedge A
\end{equation}

and for electric or magnetic sources

\begin{equation}\label{eqn:potentialMethods:1480}
\boldsymbol{\mathcal{F}} = \gpgradetwo{ \grad (A + \grad \psi) } = \gpgradetwo{ \grad A }.
\end{equation}

In the 3D GA representation, where the field is given by \ref{eqn:potentialMethods:1360}, there is no field that is being curled to add a gradient to. However, if the scalar and vector potentials transform as

\begin{equation}\label{eqn:potentialMethods:1500}
\begin{aligned}
\boldsymbol{\mathcal{A}} &\rightarrow \boldsymbol{\mathcal{A}} + \spacegrad \psi \\
\phi &\rightarrow \phi – \partial_t \psi,
\end{aligned}
\end{equation}

then the multivector potential transforms as
\begin{equation}\label{eqn:potentialMethods:1520}
-\phi + v \boldsymbol{\mathcal{A}}
\rightarrow -\phi + v \boldsymbol{\mathcal{A}} + \partial_t \psi + v \spacegrad \psi,
\end{equation}

so the electromagnetic field is unchanged when the multivector potential is transformed as

\begin{equation}\label{eqn:potentialMethods:1540}
A \rightarrow A + \lr{ \spacegrad + \inv{v} \partial_t } \psi,
\end{equation}

where \( \psi \) is any field that has scalar or pseudoscalar grades. Viewed in terms of grade selection, this makes perfect sense, since the transformed field is

\begin{equation}\label{eqn:potentialMethods:1560}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&\rightarrow
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } \lr{ A + \lr{ \spacegrad + \inv{v} \partial_t } \psi } }{1,2} \\
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A + \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi }{1,2} \\
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A }{1,2}.
\end{aligned}
\end{equation}

The \( \psi \) contribution to the grade selection operator is killed because it has scalar or pseudoscalar grades.

Lorenz gauge

Maxwell’s equations are completely decoupled if the potential can be found such that

\begin{equation}\label{eqn:potentialMethods:1580}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A }{1,2} \\
&=
\lr{ \spacegrad – \inv{v} \PD{t}{} } A.
\end{aligned}
\end{equation}

When this is the case, Maxwell’s equations are reduced to four non-homogeneous potential wave equations

\begin{equation}\label{eqn:potentialMethods:1620}
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } A = J,
\end{equation}

that is

\begin{equation}\label{eqn:potentialMethods:1600}
\begin{aligned}
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \phi &= – \inv{\epsilon} q_e \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \boldsymbol{\mathcal{A}}^{\mathrm{e}} &= – \mu \boldsymbol{\mathcal{J}} \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \phi_m &= – \frac{I}{\mu} q_m \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \boldsymbol{\mathcal{A}}^{\mathrm{m}} &= – I \epsilon \boldsymbol{\mathcal{M}}.
\end{aligned}
\end{equation}

There should be no a-priori assumption that such a field representation has no scalar, nor no pseudoscalar components. That explicit expansion in grades is

\begin{equation}\label{eqn:potentialMethods:1640}
\begin{aligned}
\lr{ \spacegrad – \inv{v} \PD{t}{} } A
&=
\lr{ \spacegrad – \inv{v} \PD{t}{} } \lr{ -\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ -\phi_m + v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \\
&=
\inv{v} \partial_t \phi
+ v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&-\spacegrad \phi
+ I \eta v \spacegrad \wedge \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \partial_t \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&+ v \spacegrad \wedge \boldsymbol{\mathcal{A}}^{\mathrm{e}}
– \eta I \spacegrad \phi_m
– I \eta \partial_t \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
&+ \eta I \inv{v} \partial_t \phi_m
+ I \eta v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}},
\end{aligned}
\end{equation}

so if this potential representation has only vector and bivector grades, it must be true that

\begin{equation}\label{eqn:potentialMethods:1660}
\begin{aligned}
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} &= 0 \\
\inv{v} \partial_t \phi_m + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} &= 0.
\end{aligned}
\end{equation}

The first is the well known Lorenz gauge condition, whereas the second is the dual of that condition for magnetic sources.

Should one of these conditions, say the Lorenz condition for the electric source potentials, be non-zero, then it is possible to make a potential transformation for which this condition is zero

\begin{equation}\label{eqn:potentialMethods:1680}
\begin{aligned}
0
&\ne
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&=
\inv{v} \partial_t (\phi’ – \partial_t \psi) + v \spacegrad \cdot (\boldsymbol{\mathcal{A}}’ + \spacegrad \psi) \\
&=
\inv{v} \partial_t \phi’ + v \spacegrad \boldsymbol{\mathcal{A}}’
+ v \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi,
\end{aligned}
\end{equation}

so if \( \inv{v} \partial_t \phi’ + v \spacegrad \boldsymbol{\mathcal{A}}’ \) is zero, \( \psi \) must be found such that
\begin{equation}\label{eqn:potentialMethods:1700}
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}}
= v \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi.
\end{equation}

References

[1] Constantine A Balanis. Antenna theory: analysis and design. John Wiley \& Sons, 3rd edition, 2005.

[2] C. Doran and A.N. Lasenby. Geometric algebra for physicists. Cambridge University Press New York, Cambridge, UK, 1st edition, 2003.

[3] David M Pozar. Microwave engineering. John Wiley \& Sons, 2009.

Transverse gauge

November 16, 2016 math and physics play No comments , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Jackson [1] has an interesting presentation of the transverse gauge. I’d like to walk through the details of this, but first want to translate the preliminaries to SI units (if I had the 3rd edition I’d not have to do this translation step).

Gauge freedom

The starting point is noting that \( \spacegrad \cdot \BB = 0 \) the magnetic field can be expressed as a curl

\begin{equation}\label{eqn:transverseGauge:20}
\BB = \spacegrad \cross \BA.
\end{equation}

Faraday’s law now takes the form
\begin{equation}\label{eqn:transverseGauge:40}
\begin{aligned}
0
&= \spacegrad \cross \BE + \PD{t}{\BB} \\
&= \spacegrad \cross \BE + \PD{t}{} \lr{ \spacegrad \cross \BA } \\
&= \spacegrad \cross \lr{ \BE + \PD{t}{\BA} }.
\end{aligned}
\end{equation}

Because this curl is zero, the interior sum can be expressed as a gradient

\begin{equation}\label{eqn:transverseGauge:60}
\BE + \PD{t}{\BA} \equiv -\spacegrad \Phi.
\end{equation}

This can now be substituted into the remaining two Maxwell’s equations.

\begin{equation}\label{eqn:transverseGauge:80}
\begin{aligned}
\spacegrad \cdot \BD &= \rho_v \\
\spacegrad \cross \BH &= \BJ + \PD{t}{\BD} \\
\end{aligned}
\end{equation}

For Gauss’s law, in simple media, we have

\begin{equation}\label{eqn:transverseGauge:140}
\begin{aligned}
\rho_v
&=
\epsilon \spacegrad \cdot \BE \\
&=
\epsilon \spacegrad \cdot \lr{ -\spacegrad \Phi – \PD{t}{\BA} }
\end{aligned}
\end{equation}

For simple media again, the Ampere-Maxwell equation is

\begin{equation}\label{eqn:transverseGauge:100}
\inv{\mu} \spacegrad \cross \lr{ \spacegrad \cross \BA } = \BJ + \epsilon \PD{t}{} \lr{ -\spacegrad \Phi – \PD{t}{\BA} }.
\end{equation}

Expanding \( \spacegrad \cross \lr{ \spacegrad \cross \BA } = -\spacegrad^2 \BA + \spacegrad \lr{ \spacegrad \cdot \BA } \) gives
\begin{equation}\label{eqn:transverseGauge:120}
-\spacegrad^2 \BA + \spacegrad \lr{ \spacegrad \cdot \BA } + \epsilon \mu \PDSq{t}{\BA} = \mu \BJ – \epsilon \mu \spacegrad \PD{t}{\Phi}.
\end{equation}

Maxwell’s equations are now reduced to
\begin{equation}\label{eqn:transverseGauge:180}
\boxed{
\begin{aligned}
\spacegrad^2 \BA – \spacegrad \lr{ \spacegrad \cdot \BA + \epsilon \mu \PD{t}{\Phi}} – \epsilon \mu \PDSq{t}{\BA} &= -\mu \BJ \\
\spacegrad^2 \Phi + \PD{t}{\spacegrad \cdot \BA} &= -\frac{\rho_v }{\epsilon}.
\end{aligned}
}
\end{equation}

There are two obvious constraints that we can impose
\begin{equation}\label{eqn:transverseGauge:200}
\spacegrad \cdot \BA – \epsilon \mu \PD{t}{\Phi} = 0,
\end{equation}

or
\begin{equation}\label{eqn:transverseGauge:220}
\spacegrad \cdot \BA = 0.
\end{equation}

The first constraint is the Lorentz gauge, which I’ve played with previously. It happens to be really nice in a relativistic context since, in vacuum with a four-vector potential \( A = (\Phi/c, \BA) \), that is a requirement that the four-divergence of the four-potential vanishes (\( \partial_\mu A^\mu = 0 \)).

Transverse gauge

Jackson identifies the latter constraint as the transverse gauge, which I’m less familiar with. With this gauge selection, we have

\begin{equation}\label{eqn:transverseGauge:260}
\spacegrad^2 \BA – \epsilon \mu \PDSq{t}{\BA} = -\mu \BJ + \epsilon\mu \spacegrad \PD{t}{\Phi}
\end{equation}
\begin{equation}\label{eqn:transverseGauge:280}
\spacegrad^2 \Phi = -\frac{\rho_v }{\epsilon}.
\end{equation}

What’s not obvious is the fact that the irrotational (zero curl) contribution due to \(\Phi\) in \ref{eqn:transverseGauge:260} cancels the corresponding irrotational term from the current. Jackson uses a transverse and longitudinal decomposition of the current, related to the Helmholtz theorem to allude to this.

That decomposition follows from expanding \( \spacegrad^2 J/R \) in two ways using the delta function \( -4 \pi \delta(\Bx – \Bx’) = \spacegrad^2 1/R \) representation, as well as directly

\begin{equation}\label{eqn:transverseGauge:300}
\begin{aligned}
– 4 \pi \BJ(\Bx)
&=
\int \spacegrad^2 \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
\spacegrad
\int \spacegrad \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
+
\spacegrad \cdot
\int \spacegrad \wedge \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
-\spacegrad
\int \BJ(\Bx’) \cdot \spacegrad’ \inv{\Abs{\Bx – \Bx’}} d^3 x’
+
\spacegrad \cdot \lr{ \spacegrad \wedge
\int \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
} \\
&=
-\spacegrad
\int \spacegrad’ \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
+\spacegrad
\int \frac{\spacegrad’ \cdot \BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’

\spacegrad \cross \lr{
\spacegrad \cross
\int \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
}
\end{aligned}
\end{equation}

The first term can be converted to a surface integral

\begin{equation}\label{eqn:transverseGauge:320}
-\spacegrad
\int \spacegrad’ \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
=
-\spacegrad
\int d\BA’ \cdot \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}},
\end{equation}

so provided the currents are either localized or \( \Abs{\BJ}/R \rightarrow 0 \) on an infinite sphere, we can make the identification

\begin{equation}\label{eqn:transverseGauge:340}
\BJ(\Bx)
=
-\spacegrad \inv{4 \pi} \int \frac{\spacegrad’ \cdot \BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
+
\spacegrad \cross \spacegrad \cross \inv{4 \pi} \int \frac{\BJ(\Bx’)}{\Abs{\Bx – \Bx’}} d^3 x’
\equiv
\BJ_l +
\BJ_t,
\end{equation}

where \( \spacegrad \cross \BJ_l = 0 \) (irrotational, or longitudinal), whereas \( \spacegrad \cdot \BJ_t = 0 \) (solenoidal or transverse). The irrotational property is clear from inspection, and the transverse property can be verified readily

\begin{equation}\label{eqn:transverseGauge:360}
\begin{aligned}
\spacegrad \cdot \lr{ \spacegrad \cross \lr{ \spacegrad \cross \BX } }
&=
-\spacegrad \cdot \lr{ \spacegrad \cdot \lr{ \spacegrad \wedge \BX } } \\
&=
-\spacegrad \cdot \lr{ \spacegrad^2 \BX – \spacegrad \lr{ \spacegrad \cdot \BX } } \\
&=
-\spacegrad \cdot \lr{\spacegrad^2 \BX} + \spacegrad^2 \lr{ \spacegrad \cdot \BX } \\
&= 0.
\end{aligned}
\end{equation}

Since

\begin{equation}\label{eqn:transverseGauge:380}
\Phi(\Bx, t)
=
\inv{4 \pi \epsilon} \int \frac{\rho_v(\Bx’, t)}{\Abs{\Bx – \Bx’}} d^3 x’,
\end{equation}

we have

\begin{equation}\label{eqn:transverseGauge:400}
\begin{aligned}
\spacegrad \PD{t}{\Phi}
&=
\inv{4 \pi \epsilon} \spacegrad \int \frac{\partial_t \rho_v(\Bx’, t)}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
\inv{4 \pi \epsilon} \spacegrad \int \frac{-\spacegrad’ \cdot \BJ}{\Abs{\Bx – \Bx’}} d^3 x’ \\
&=
\frac{\BJ_l}{\epsilon}.
\end{aligned}
\end{equation}

This means that the Ampere-Maxwell equation takes the form

\begin{equation}\label{eqn:transverseGauge:420}
\spacegrad^2 \BA – \epsilon \mu \PDSq{t}{\BA}
= -\mu \BJ + \mu \BJ_l
= -\mu \BJ_t.
\end{equation}

This justifies the transverse in the label transverse gauge.

References

[1] JD Jackson. Classical Electrodynamics. John Wiley and Sons, 2nd edition, 1975.

Calculating the magnetostatic field from the moment

November 14, 2016 math and physics play No comments , , , , , ,

[Click here for a PDF of this post with nicer formatting]

The vector potential, to first order, for a magnetostatic localized current distribution was found to be

\begin{equation}\label{eqn:magneticFieldFromMoment:20}
\BA(\Bx) = \frac{\mu_0}{4 \pi} \frac{\Bm \cross \Bx}{\Abs{\Bx}^3}.
\end{equation}

Initially, I tried to calculate the magnetic field from this, but ran into trouble. Here’s a new try.

\begin{equation}\label{eqn:magneticFieldFromMoment:40}
\begin{aligned}
\BB
&=
\frac{\mu_0}{4 \pi}
\spacegrad \cross \lr{ \Bm \cross \frac{\Bx}{r^3} } \\
&=
-\frac{\mu_0}{4 \pi}
\spacegrad \cdot \lr{ \Bm \wedge \frac{\Bx}{r^3} } \\
&=
-\frac{\mu_0}{4 \pi}
\lr{
(\Bm \cdot \spacegrad) \frac{\Bx}{r^3}
-\Bm \spacegrad \cdot \frac{\Bx}{r^3}
} \\
&=
\frac{\mu_0}{4 \pi}
\lr{
-\frac{(\Bm \cdot \spacegrad) \Bx}{r^3}
– \lr{ \Bm \cdot \lr{\spacegrad \inv{r^3} }} \Bx
+\Bm (\spacegrad \cdot \Bx) \inv{r^3}
+\Bm \lr{\spacegrad \inv{r^3} } \cdot \Bx
}.
\end{aligned}
\end{equation}

Here I’ve used \( \Ba \cross \lr{ \Bb \cross \Bc } = -\Ba \cdot \lr{ \Bb \wedge \Bc } \), and then expanded that with \( \Ba \cdot \lr{ \Bb \wedge \Bc } = (\Ba \cdot \Bb) \Bc – (\Ba \cdot \Bc) \Bb \). Since one of these vectors is the gradient, care must be taken to have it operate on the appropriate terms in such an expansion.

Since we have \( \spacegrad \cdot \Bx = 3 \), \( (\Bm \cdot \spacegrad) \Bx = \Bm \), and \( \spacegrad 1/r^n = -n \Bx/r^{n+2} \), this reduces to

\begin{equation}\label{eqn:magneticFieldFromMoment:60}
\begin{aligned}
\BB
&=
\frac{\mu_0}{4 \pi}
\lr{
– \frac{\Bm}{r^3}
+ 3 \frac{(\Bm \cdot \Bx) \Bx}{r^5} %
+ 3 \Bm \inv{r^3}
-3 \Bm \frac{\Bx}{r^5} \cdot \Bx
} \\
&=
\frac{\mu_0}{4 \pi}
\frac{3 (\Bm \cdot \ncap) \ncap -\Bm}{r^3},
\end{aligned}
\end{equation}

which is the desired result.

Spherical gradient, divergence, curl and Laplacian

November 9, 2016 math and physics play No comments , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Unit vectors

Two of the spherical unit vectors we can immediately write by inspection.

\begin{equation}\label{eqn:sphericalLaplacian:20}
\begin{aligned}
\rcap &= \Be_1 \sin\theta \cos\phi + \Be_2 \sin\theta \sin\phi + \Be_3 \cos\theta \\
\phicap &= -\Be_1 \sin\theta + \Be_2 \cos\phi
\end{aligned}
\end{equation}

We can compute \( \thetacap \) by utilizing the right hand triplet property

\begin{equation}\label{eqn:sphericalLaplacian:40}
\begin{aligned}
\thetacap
&=
\phicap \cross \rcap \\
&=
\begin{vmatrix}
\Be_1 & \Be_2 & \Be_3 \\
-S_\phi & C_\phi & 0 \\
S_\theta C_\phi & S_\theta S_\phi & C_\theta \\
\end{vmatrix} \\
&=
\Be_1 \lr{ C_\theta C_\phi }
+\Be_2 \lr{ C_\theta S_\phi }
+\Be_3 \lr{ -S_\theta \lr{ S_\phi^2 + C_\phi^2 } } \\
&=
\Be_1 \cos\theta \cos\phi
+\Be_2 \cos\theta \sin\phi
-\Be_3 \sin\theta.
\end{aligned}
\end{equation}

Here I’ve used \( C_\theta = \cos\theta, S_\phi = \sin\phi, \cdots \) as a convenient shorthand. Observe that with \( i = \Be_1 \Be_2 \), these unit vectors admit a small factorization that makes further manipulation easier

\begin{equation}\label{eqn:sphericalLaplacian:80}
\boxed{
\begin{aligned}
\rcap &= \Be_1 e^{i\phi} \sin\theta + \Be_3 \cos\theta \\
\thetacap &= \cos\theta \Be_1 e^{i\phi} – \sin\theta \Be_3 \\
\phicap &= \Be_2 e^{i\phi}
\end{aligned}
}
\end{equation}

It should also be the case that \( \rcap \thetacap \phicap = I \), where \( I = \Be_1 \Be_2 \Be_3 = \Be_{123}\) is the \R{3} pseudoscalar, which is straightforward to check

\begin{equation}\label{eqn:sphericalLaplacian:60}
\begin{aligned}
\rcap \thetacap \phicap
&=
\lr{ \Be_1 e^{i\phi} \sin\theta + \Be_3 \cos\theta }
\lr{ \cos\theta \Be_1 e^{i\phi} – \sin\theta \Be_3 }
\Be_2 e^{i\phi} \\
&=
\lr{ \sin\theta \cos\theta – \cos\theta \sin\theta + \Be_{31} e^{i\phi} \lr{ \cos^2\theta + \sin^2\theta } }
\Be_2 e^{i\phi} \\
&=
\Be_{31} \Be_2 e^{-i\phi} e^{i\phi} \\
&=
\Be_{123}.
\end{aligned}
\end{equation}

This property could also have been used to compute \(\thetacap\).

Gradient

To compute the gradient, note that the coordinate vectors for the spherical parameterization are
\begin{equation}\label{eqn:sphericalLaplacian:120}
\begin{aligned}
\Bx_r
&= \PD{r}{\Br} \\
&= \PD{r}{\lr{r \rcap}} \\
&= \rcap + r \PD{r}{\rcap} \\
&= \rcap,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:140}
\begin{aligned}
\Bx_\theta
&= \PD{\theta}{\lr{r \rcap} } \\
&= r \PD{\theta}{} \lr{ S_\theta \Be_1 e^{i\phi} + C_\theta \Be_3 } \\
&= r \PD{\theta}{} \lr{ C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3 } \\
&= r \thetacap,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:160}
\begin{aligned}
\Bx_\phi
&= \PD{\phi}{\lr{r \rcap} } \\
&= r \PD{\phi}{} \lr{ S_\theta \Be_1 e^{i\phi} + C_\theta \Be_3 } \\
&= r S_\theta \Be_2 e^{i\phi} \\
&= r \sin\theta \phicap.
\end{aligned}
\end{equation}

Since these are all normal, the dual vectors defined by \( \Bx^j \cdot \Bx_k = \delta^j_k \), can be obtained by inspection
\begin{equation}\label{eqn:sphericalLaplacian:180}
\begin{aligned}
\Bx^r &= \rcap \\
\Bx^\theta &= \inv{r} \thetacap \\
\Bx^\phi &= \inv{r \sin\theta} \phicap.
\end{aligned}
\end{equation}

The gradient follows immediately
\begin{equation}\label{eqn:sphericalLaplacian:200}
\spacegrad =
\Bx^r \PD{r}{} +
\Bx^\theta \PD{\theta}{} +
\Bx^\phi \PD{\phicap}{},
\end{equation}

or
\begin{equation}\label{eqn:sphericalLaplacian:240}
\boxed{
\spacegrad
=
\rcap \PD{r}{} +
\frac{\thetacap}{r} \PD{\theta}{} +
\frac{\phicap}{r\sin\theta} \PD{\phicap}{}.
}
\end{equation}

More information on this general dual-vector technique of computing the gradient in curvilinear coordinate systems can be found in
[2].

Partials

To compute the divergence, curl and Laplacian, we’ll need the partials of each of the unit vectors \( \PDi{\theta}{\rcap}, \PDi{\phi}{\rcap}, \PDi{\theta}{\thetacap}, \PDi{\phi}{\thetacap}, \PDi{\phi}{\phicap} \).

The \( \thetacap \) partials are

\begin{equation}\label{eqn:sphericalLaplacian:260}
\begin{aligned}
\PD{\theta}{\thetacap}
&=
\PD{\theta}{} \lr{
C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3
} \\
&=
-S_\theta \Be_1 e^{i\phi} – C_\theta \Be_3 \\
&=
-\rcap,
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:280}
\begin{aligned}
\PD{\phi}{\thetacap}
&=
\PD{\phi}{} \lr{
C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3
} \\
&=
C_\theta \Be_2 e^{i\phi} \\
&=
C_\theta \phicap.
\end{aligned}
\end{equation}

The \( \phicap \) partials are

\begin{equation}\label{eqn:sphericalLaplacian:300}
\begin{aligned}
\PD{\theta}{\phicap}
&=
\PD{\theta}{} \Be_2 e^{i\phi} \\
&=
0.
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:320}
\begin{aligned}
\PD{\phi}{\phicap}
&=
\PD{\phi}{} \Be_2 e^{i \phi} \\
&=
-\Be_1 e^{i \phi} \\
&=
-\rcap \gpgradezero{ \rcap \Be_1 e^{i \phi} }
– \thetacap \gpgradezero{ \thetacap \Be_1 e^{i \phi} }
– \phicap \gpgradezero{ \phicap \Be_1 e^{i \phi} } \\
&=
-\rcap \gpgradezero{ \lr{
\Be_1 e^{i\phi} S_\theta + \Be_3 C_\theta
} \Be_1 e^{i \phi} }
– \thetacap \gpgradezero{ \lr{
C_\theta \Be_1 e^{i\phi} – S_\theta \Be_3
} \Be_1 e^{i \phi} } \\
&=
-\rcap \gpgradezero{ e^{-i\phi} S_\theta e^{i \phi} }
– \thetacap \gpgradezero{ C_\theta e^{-i\phi} e^{i \phi} } \\
&=
-\rcap S_\theta
– \thetacap C_\theta.
\end{aligned}
\end{equation}

The \( \rcap \) partials are were computed as a side effect of evaluating \( \Bx_\theta \), and \( \Bx_\phi \), and are

\begin{equation}\label{eqn:sphericalLaplacian:340}
\PD{\theta}{\rcap}
=
\thetacap,
\end{equation}
\begin{equation}\label{eqn:sphericalLaplacian:360}
\PD{\phi}{\rcap}
=
S_\theta \phicap.
\end{equation}

In summary
\begin{equation}\label{eqn:sphericalLaplacian:380}
\boxed{
\begin{aligned}
\partial_{\theta}{\rcap} &= \thetacap \\
\partial_{\phi}{\rcap} &= S_\theta \phicap \\
\partial_{\theta}{\thetacap} &= -\rcap \\
\partial_{\phi}{\thetacap} &= C_\theta \phicap \\
\partial_{\theta}{\phicap} &= 0 \\
\partial_{\phi}{\phicap} &= -\rcap S_\theta – \thetacap C_\theta.
\end{aligned}
}
\end{equation}

Divergence and curl.

The divergence and curl can be computed from the vector product of the spherical coordinate gradient and the spherical representation of a vector. That is

\begin{equation}\label{eqn:sphericalLaplacian:400}
\spacegrad \BA
= \spacegrad \cdot \BA + \spacegrad \wedge \BA
= \spacegrad \cdot \BA + I \spacegrad \cross \BA.
\end{equation}

That gradient vector product is

\begin{equation}\label{eqn:sphericalLaplacian:420}
\begin{aligned}
\spacegrad \BA
&=
\lr{
\rcap \partial_{r}
+ \frac{\thetacap}{r} \partial_{\theta}
+ \frac{\phicap}{rS_\theta} \partial_{\phi}
}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&=
\rcap \partial_{r}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&+ \frac{\thetacap}{r} \partial_{\theta}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&+ \frac{\phicap}{rS_\theta} \partial_{\phicap}
\lr{ \rcap A_r + \thetacap A_\theta + \phicap A_\phi} \\
&=
\lr{ \partial_r A_r + \rcap \thetacap \partial_r A_\theta + \rcap \phicap \partial_r A_\phi} \\
&+ \frac{1}{r}
\lr{
\thetacap (\partial_\theta \rcap) A_r + \thetacap (\partial_\theta \thetacap) A_\theta + \thetacap (\partial_\theta \phicap) A_\phi
+\thetacap \rcap \partial_\theta A_r + \partial_\theta A_\theta + \thetacap \phicap \partial_\theta A_\phi
} \\
&+ \frac{1}{rS_\theta}
\lr{
\phicap (\partial_\phi \rcap) A_r + \phicap (\partial_\phi \thetacap) A_\theta + \phicap (\partial_\phi \phicap) A_\phi
+\phicap \rcap \partial_\phi A_r + \phicap \thetacap \partial_\phi A_\theta + \partial_\phi A_\phi
} \\
&=
\lr{ \partial_r A_r + \rcap \thetacap \partial_r A_\theta + \rcap \phicap \partial_r A_\phi} \\
&+ \frac{1}{r}
\lr{
\thetacap (\thetacap) A_r + \thetacap (-\rcap) A_\theta + \thetacap (0) A_\phi
+\thetacap \rcap \partial_\theta A_r + \partial_\theta A_\theta + \thetacap \phicap \partial_\theta A_\phi
} \\
&+ \frac{1}{r S_\theta}
\lr{
\phicap (S_\theta \phicap) A_r + \phicap (C_\theta \phicap) A_\theta – \phicap (\rcap S_\theta + \thetacap C_\theta) A_\phi
+\phicap \rcap \partial_\phi A_r + \phicap \thetacap \partial_\phi A_\theta + \partial_\phi A_\phi
}.
\end{aligned}
\end{equation}

The scalar component of this is the divergence
\begin{equation}\label{eqn:sphericalLaplacian:440}
\begin{aligned}
\spacegrad \cdot \BA
&=
\partial_r A_r
+ \frac{A_r}{r}
+ \inv{r} \partial_\theta A_\theta
+ \frac{1}{r S_\theta}
\lr{ S_\theta A_r + C_\theta A_\theta + \partial_\phi A_\phi
} \\
&=
\partial_r A_r
+ 2 \frac{A_r}{r}
+ \inv{r} \partial_\theta A_\theta
+ \frac{1}{r S_\theta}
C_\theta A_\theta
+ \frac{1}{r S_\theta} \partial_\phi A_\phi \\
&=
\partial_r A_r
+ 2 \frac{A_r}{r}
+ \inv{r} \partial_\theta A_\theta
+ \frac{1}{r S_\theta}
C_\theta A_\theta
+ \frac{1}{r S_\theta} \partial_\phi A_\phi,
\end{aligned}
\end{equation}

which can be factored as
\begin{equation}\label{eqn:sphericalLaplacian:460}
\boxed{
\spacegrad \cdot \BA
=
\inv{r^2} \partial_r (r^2 A_r)
+ \inv{r S_\theta} \partial_\theta (S_\theta A_\theta)
+ \frac{1}{r S_\theta} \partial_\phi A_\phi.
}
\end{equation}

The bivector grade of \( \spacegrad \BA \) is the bivector curl
\begin{equation}\label{eqn:sphericalLaplacian:480}
\begin{aligned}
\spacegrad \wedge \BA
&=
\lr{
\rcap \thetacap \partial_r A_\theta + \rcap \phicap \partial_r A_\phi
} \\
&\quad + \frac{1}{r}
\lr{
\thetacap (-\rcap) A_\theta
+\thetacap \rcap \partial_\theta A_r + \thetacap \phicap \partial_\theta A_\phi
} \\
&\quad +
\frac{1}{r S_\theta}
\lr{
-\phicap (\rcap S_\theta + \thetacap C_\theta) A_\phi
+\phicap \rcap \partial_\phi A_r + \phicap \thetacap \partial_\phi A_\theta
} \\
&=
\lr{
\rcap \thetacap \partial_r A_\theta – \phicap \rcap \partial_r A_\phi
} \\
&\quad + \frac{1}{r}
\lr{
\rcap \thetacap A_\theta
-\rcap \thetacap \partial_\theta A_r + \thetacap \phicap \partial_\theta A_\phi
} \\
&\quad +
\frac{1}{r S_\theta}
\lr{
-\phicap \rcap S_\theta A_\phi + \thetacap \phicap C_\theta A_\phi
+\phicap \rcap \partial_\phi A_r – \thetacap \phicap \partial_\phi A_\theta
} \\
&=
\thetacap \phicap \lr{
\inv{r S_\theta} C_\theta A_\phi
+\frac{1}{r} \partial_\theta A_\phi
-\frac{1}{r S_\theta} \partial_\phi A_\theta
} \\
&\quad +\phicap \rcap \lr{
-\partial_r A_\phi
+
\frac{1}{r S_\theta}
\lr{
-S_\theta A_\phi
+ \partial_\phi A_r
}
} \\
&\quad +\rcap \thetacap \lr{
\partial_r A_\theta
+ \frac{1}{r} A_\theta
– \inv{r} \partial_\theta A_r
} \\
&=
I
\rcap \lr{
\inv{r S_\theta} \partial_\theta (S_\theta A_\phi)
-\frac{1}{r S_\theta} \partial_\phi A_\theta
}
+ I \thetacap \lr{
\frac{1}{r S_\theta} \partial_\phi A_r
-\inv{r} \partial_r (r A_\phi)
}
+ I \phicap \lr{
\inv{r} \partial_r (r A_\theta)
– \inv{r} \partial_\theta A_r
}
\end{aligned}
\end{equation}

This gives
\begin{equation}\label{eqn:sphericalLaplacian:500}
\boxed{
\spacegrad \cross \BA
=
\rcap \lr{
\inv{r S_\theta} \partial_\theta (S_\theta A_\phi)
-\frac{1}{r S_\theta} \partial_\phi A_\theta
}
+ \thetacap \lr{
\frac{1}{r S_\theta} \partial_\phi A_r
-\inv{r} \partial_r (r A_\phi)
}
+ \phicap \lr{
\inv{r} \partial_r (r A_\theta)
– \inv{r} \partial_\theta A_r
}.
}
\end{equation}

This and the divergence result above both check against the back cover of [1].

Laplacian

Using the divergence and curl it’s possible to compute the Laplacian from those, but we saw in cylindrical coordinates that it was much harder to do it that way than to do it directly.

\begin{equation}\label{eqn:sphericalLaplacian:540}
\begin{aligned}
\spacegrad^2 \psi
&=
\lr{
\rcap \partial_{r} +
\frac{\thetacap}{r} \partial_{\theta} +
\frac{\phicap}{r S_\theta} \partial_{\phi}
}
\lr{
\rcap \partial_{r} \psi
+ \frac{\thetacap}{r} \partial_{\theta} \psi
+ \frac{\phicap}{r S_\theta} \partial_{\phi} \psi
} \\
&=
\partial_{rr} \psi
+ \rcap \thetacap \partial_r \lr{ \inv{r} \partial_\theta \psi}
+ \rcap \phicap \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi } \\
&
\quad + \frac{\thetacap}{r} \partial_{\theta} \lr{ \rcap \partial_{r} \psi }
+ \frac{\thetacap}{r^2} \partial_{\theta} \lr{ \thetacap \partial_{\theta} \psi }
+ \frac{\thetacap}{r^2} \partial_{\theta} \lr{ \frac{\phicap}{S_\theta} \partial_{\phi} \psi } \\
&
\quad + \frac{\phicap}{r S_\theta} \partial_{\phi} \lr{ \rcap \partial_{r} \psi }
+ \frac{\phicap}{r^2 S_\theta} \partial_{\phi} \lr{ \thetacap \partial_{\theta} \psi }
+ \frac{\phicap}{r^2 S_\theta^2} \partial_{\phi} \lr{ \phicap \partial_{\phi} \psi } \\
&=
\partial_{rr} \psi
+ \rcap \thetacap \partial_r \lr{ \inv{r} \partial_\theta \psi}
+ \rcap \phicap \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi } \\
&
\quad + \frac{\thetacap\rcap}{r} \partial_{\theta} \lr{ \partial_{r} \psi }
+ \frac{1}{r^2} \partial_{\theta \theta} \psi
+ \frac{\thetacap \phicap}{r^2} \partial_{\theta} \lr{ \frac{1}{S_\theta} \partial_{\phi} \psi } \\
&
\quad + \frac{\phicap \rcap}{r S_\theta} \partial_{\phi r} \psi
+ \frac{\phicap\thetacap}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} \partial_{\phi \phi} \psi \\
&
\quad + \frac{\thetacap}{r} (\partial_\theta \rcap) \partial_{r} \psi
+ \frac{\thetacap}{r^2} (\partial_\theta \thetacap) \partial_{\theta} \psi
+ \frac{\thetacap}{r^2} (\partial_\theta \phicap) \frac{\phicap}{S_\theta} \partial_{\phi} \psi \\
&
\quad + \frac{\phicap}{r S_\theta} (\partial_\phi \rcap) \partial_{r} \psi
+ \frac{\phicap}{r^2 S_\theta} (\partial_\phi \thetacap) \partial_{\theta} \psi
+ \frac{\phicap}{r^2 S_\theta^2} (\partial_\phi \phicap) \partial_{\phi} \psi \\
&=
\partial_{rr} \psi
+ \rcap \thetacap \partial_r \lr{ \inv{r} \partial_\theta \psi}
+ \rcap \phicap \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi } \\
&
\quad + \frac{\thetacap\rcap}{r} \partial_{\theta} \lr{ \partial_{r} \psi }
+ \frac{1}{r^2} \partial_{\theta \theta} \psi
+ \frac{\thetacap \phicap}{r^2} \partial_{\theta} \lr{ \frac{1}{S_\theta} \partial_{\phi} \psi } \\
&
\quad + \frac{\phicap \rcap}{r S_\theta} \partial_{\phi r} \psi
+ \frac{\phicap\thetacap}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} \partial_{\phi \phi} \psi \\
&
\quad + \frac{\thetacap}{r} (\thetacap) \partial_{r} \psi
+ \frac{\thetacap}{r^2} (-\rcap) \partial_{\theta} \psi
+ \frac{\thetacap}{r^2} (0) \frac{\phicap}{S_\theta} \partial_{\phi} \psi \\
&
\quad + \frac{\phicap}{r S_\theta} (S_\theta \phicap) \partial_{r} \psi
+ \frac{\phicap}{r^2 S_\theta} (C_\theta \phicap) \partial_{\theta} \psi
+ \frac{\phicap}{r^2 S_\theta^2} (-\rcap S_\theta – \thetacap C_\theta) \partial_{\phi} \psi
\end{aligned}
\end{equation}

All the bivector factors are expected to cancel out, but this should be checked. Those with an \( \rcap \thetacap \) factor are

\begin{equation}\label{eqn:sphericalLaplacian:560}
\partial_r \lr{ \inv{r} \partial_\theta \psi}
– \frac{1}{r} \partial_{\theta r} \psi
+ \frac{1}{r^2} \partial_{\theta} \psi
=
-\inv{r^2} \partial_\theta \psi
+\inv{r} \partial_{r \theta} \psi
– \frac{1}{r} \partial_{\theta r} \psi
+ \frac{1}{r^2} \partial_{\theta} \psi
= 0,
\end{equation}

and those with a \( \thetacap \phicap \) factor are
\begin{equation}\label{eqn:sphericalLaplacian:580}
\frac{1}{r^2} \partial_{\theta} \lr{ \frac{1}{S_\theta} \partial_{\phi} \psi }
– \frac{1}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} C_\theta \partial_{\phi} \psi
=
– \frac{1}{r^2} \frac{C_\theta}{S_\theta^2} \partial_{\phi} \psi
+ \frac{1}{r^2 S_\theta} \partial_{\theta \phi} \psi
– \frac{1}{r^2 S_\theta} \partial_{\phi\theta} \psi
+ \frac{1}{r^2 S_\theta^2} C_\theta \partial_{\phi} \psi
= 0,
\end{equation}

and those with a \( \phicap \rcap \) factor are
\begin{equation}\label{eqn:sphericalLaplacian:600}
– \inv{S_\theta} \partial_r \lr{ \inv{r} \partial_\phi \psi }
+ \frac{1}{r S_\theta} \partial_{\phi r} \psi
– \frac{1}{r^2 S_\theta^2} S_\theta \partial_{\phi} \psi
=
\inv{S_\theta} \frac{1}{r^2} \partial_\phi \psi
– \inv{r S_\theta} \partial_{r \phi} \psi
+ \frac{1}{r S_\theta} \partial_{\phi r} \psi
– \frac{1}{r^2 S_\theta} \partial_{\phi} \psi
= 0.
\end{equation}

This leaves
\begin{equation}\label{eqn:sphericalLaplacian:620}
\spacegrad^2 \psi
=
\partial_{rr} \psi
+ \frac{2}{r} \partial_{r} \psi
+ \frac{1}{r^2} \partial_{\theta \theta} \psi
+ \frac{1}{r^2 S_\theta} C_\theta \partial_{\theta} \psi
+ \frac{1}{r^2 S_\theta^2} \partial_{\phi \phi} \psi.
\end{equation}

This factors nicely as

\begin{equation}\label{eqn:sphericalLaplacian:640}
\boxed{
\spacegrad^2 \psi
=
\inv{r^2} \PD{r}{} \lr{ r^2 \PD{r}{ \psi} }
+ \frac{1}{r^2 \sin\theta} \PD{\theta}{} \lr{ \sin\theta \PD{\theta}{ \psi } }
+ \frac{1}{r^2 \sin\theta^2} \PDSq{\phi}{ \psi}
,
}
\end{equation}

which checks against the back cover of Jackson. Here it has been demonstrated explicitly that this operator expression is valid for multivector fields \( \psi \) as well as scalar fields \( \psi \).

References

[1] JD Jackson. Classical Electrodynamics. John Wiley and Sons, 2nd edition, 1975.

[2] A. Macdonald. Vector and Geometric Calculus. CreateSpace Independent Publishing Platform, 2012.