## superkill!

January 20, 2017 Mainframe No comments , , , ,

I was amused to find the following in the z/OS C/C++ runtime library reference

#include <signal.h>

int __superkill(pid_t pid);


where the documentation includes:

“The __superkill() function generates a more robust version of the SIGKILL signal to
the process with pid as the process ID. The SIGKILL will be able to break through
almost all of the current signal deterrents that can be an obstacle to the normal
delivery of a SIGKILL and the resulting termination of the target process.”

The obvious question, not mentioned in the documentation, is whether or not this API can kill zombies?

## ECE1505H Convex Optimization. Lecture 3: Matrix functions, SVD, and types of Sets. Taught by Prof. Stark Draper

### Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper.

## Matrix inner product

Given real matrices $$X, Y \in \mathbb{R}^{m\times n}$$, one possible matrix inner product definition is

\label{eqn:convexOptimizationLecture3:20}
\begin{aligned}
\innerprod{X}{Y}
&= \textrm{Tr}( X^\T Y) \\
&= \textrm{Tr} \lr{ \sum_{k = 1}^m X_{ki} Y_{kj} } \\
&= \sum_{k = 1}^m \sum_{j = 1}^n X_{kj} Y_{kj} \\
&= \sum_{i = 1}^m \sum_{j = 1}^n X_{ij} Y_{ij}.
\end{aligned}

This inner product induces a norm on the (matrix) vector space, called the Frobenius norm

\label{eqn:convexOptimizationLecture3:40}
\begin{aligned}
\Norm{X }_F
&= \textrm{Tr}( X^\T X) \\
&= \sqrt{ \innerprod{X}{X} } \\
&=
\sum_{i = 1}^m \sum_{j = 1}^n X_{ij}^2.
\end{aligned}

## Range, nullspace.

Definition: Range: Given $$A \in \mathbb{R}^{m \times n}$$, the range of A is the set:

\begin{equation*}
\mathcal{R}(A) = \setlr{ A \Bx | \Bx \in \mathbb{R}^n }.
\end{equation*}

Definition: Nullspace: Given $$A \in \mathbb{R}^{m \times n}$$, the nullspace of A is the set:

\begin{equation*}
\mathcal{N}(A) = \setlr{ \Bx | A \Bx = 0 }.
\end{equation*}

## SVD.

To understand operation of $$A \in \mathbb{R}^{m \times n}$$, a representation of a linear transformation from \R{n} to \R{m}, decompose $$A$$ using the singular value decomposition (SVD).

Definition: SVD: Given $$A \in \mathbb{R}^{m \times n}$$, an operator on $$\Bx \in \mathbb{R}^n$$, a decomposition of the following form is always possible

\begin{equation*}
\begin{aligned}
A &= U \Sigma V^\T \\
U &\in \mathbb{R}^{m \times r} \\
V &\in \mathbb{R}^{n \times r},
\end{aligned}
\end{equation*}

where $$r$$ is the rank of $$A$$, and both $$U$$ and $$V$$ are orthogonal

\begin{equation*}
\begin{aligned}
U^\T U &= I \in \mathbb{R}^{r \times r} \\
V^\T V &= I \in \mathbb{R}^{r \times r}.
\end{aligned}
\end{equation*}

Here $$\Sigma = \textrm{diag}( \sigma_1, \sigma_2, \cdots, \sigma_r )$$, is a diagonal matrix of “singular” values, where

\begin{equation*}
\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r.
\end{equation*}

For simplicity consider square case $$m = n$$

\label{eqn:convexOptimizationLecture3:100}
A \Bx = \lr{ U \Sigma V^\T } \Bx.

The first product $$V^\T \Bx$$ is a rotation, which can be checked by looking at the length

\label{eqn:convexOptimizationLecture3:120}
\begin{aligned}
\Norm{ V^\T \Bx}_2
&= \sqrt{ \Bx^\T V V^\T \Bx } \\
&= \sqrt{ \Bx^\T \Bx } \\
&= \Norm{ \Bx }_2,
\end{aligned}

which shows that the length of the vector is unchanged after application of the linear transformation represented by $$V^\T$$ so that operation must be a rotation.

Similarly the operation of $$U$$ on $$\Sigma V^\T \Bx$$ also must be a rotation. The operation $$\Sigma = [\sigma_i]_i$$ applies a scaling operation to each component of the vector $$V^\T \Bx$$.

All linear (square) transformations can therefore be thought of as a rotate-scale-rotate operation. Often the $$A$$ of interest will be symmetric $$A = A^\T$$.

## Set of symmetric matrices

Let $$S^n$$ be the set of real, symmetric $$n \times n$$ matrices.

Theorem: Spectral theorem: When $$A \in S^n$$ then it is possible to factor $$A$$ as

\begin{equation*}
A = Q \Lambda Q^\T,
\end{equation*}

where $$Q$$ is an orthogonal matrix, and $$\Lambda = \textrm{diag}( \lambda_1, \lambda_2, \cdots \lambda_n)$$. Here $$\lambda_i \in \mathbb{R} \, \forall i$$ are the (real) eigenvalues of $$A$$.

A real symmetric matrix $$A \in S^n$$ is “positive semi-definite” if

\begin{equation*}
\Bv^\T A \Bv \ge 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0,
\end{equation*}
and is “positive definite” if

\begin{equation*}
\Bv^\T A \Bv > 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0.
\end{equation*}

The set of such matrices is denoted $$S^n_{+}$$, and $$S^n_{++}$$ respectively.

Consider $$A \in S^n_{+}$$ (or $$S^n_{++}$$ )

\label{eqn:convexOptimizationLecture3:200}
A = Q \Lambda Q^\T,

possible since the matrix is symmetric. For such a matrix

\label{eqn:convexOptimizationLecture3:220}
\begin{aligned}
\Bv^\T A \Bv
&=
\Bv^\T Q \Lambda A^\T \Bv \\
&=
\Bw^\T \Lambda \Bw,
\end{aligned}

where $$\Bw = A^\T \Bv$$. Such a product is

\label{eqn:convexOptimizationLecture3:240}
\Bv^\T A \Bv
=
\sum_{i = 1}^n \lambda_i w_i^2.

So, if $$\lambda_i \ge 0$$ ($$\lambda_i > 0$$ ) then $$\sum_{i = 1}^n \lambda_i w_i^2$$ is non-negative (positive) $$\forall \Bw \in \mathbb{R}^n, \Bw \ne 0$$. Since $$\Bw$$ is just a rotated version of $$\Bv$$ this also holds for all $$\Bv$$. A necessary and sufficient condition for $$A \in S^n_{+}$$ ($$S^n_{++}$$ ) is $$\lambda_i \ge 0$$ ($$\lambda_i > 0$$).

## Square root of positive semi-definite matrix

Real symmetric matrix power relationships such as

\label{eqn:convexOptimizationLecture3:260}
A^2
=
Q \Lambda Q^\T
Q \Lambda Q^\T
=
Q \Lambda^2
Q^\T
,

or more generally $$A^k = Q \Lambda^k Q^\T,\, k \in \mathbb{Z}$$, can be further generalized to non-integral powers. In particular, the square root (non-unique) of a square matrix can be written

\label{eqn:convexOptimizationLecture3:280}
A^{1/2} = Q
\begin{bmatrix}
\sqrt{\lambda_1} & & & \\
& \sqrt{\lambda_2} & & \\
& & \ddots & \\
& & & \sqrt{\lambda_n} \\
\end{bmatrix}
Q^\T,

since $$A^{1/2} A^{1/2} = A$$, regardless of the sign picked for the square roots in question.

## Functions of matrices

Consider $$F : S^n \rightarrow \mathbb{R}$$, and define

\label{eqn:convexOptimizationLecture3:300}
F(X) = \log \det X,

Here $$\textrm{dom} F = S^n_{++}$$. The task is to find $$\spacegrad F$$, which can be done by looking at the perturbation $$\log \det ( X + \Delta X )$$

\label{eqn:convexOptimizationLecture3:320}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det ( X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2}) X^{1/2} ) \\
&=
\log \det ( X (I + X^{-1/2} \Delta X X^{-1/2}) ) \\
&=
\log \det X + \log \det (I + X^{-1/2} \Delta X X^{-1/2}).
\end{aligned}

Let $$X^{-1/2} \Delta X X^{-1/2} = M$$ where $$\lambda_i$$ are the eigenvalues of $$M : M \Bv = \lambda_i \Bv$$ when $$\Bv$$ is an eigenvector of $$M$$. In particular

\label{eqn:convexOptimizationLecture3:340}
(I + M) \Bv =
(1 + \lambda_i) \Bv,

where $$1 + \lambda_i$$ are the eigenvalues of the $$I + M$$ matrix. Since the determinant is the product of the eigenvalues, this gives

\label{eqn:convexOptimizationLecture3:360}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det X +
\log \prod_{i = 1}^n (1 + \lambda_i) \\
&=
\log \det X +
\sum_{i = 1}^n \log (1 + \lambda_i).
\end{aligned}

If $$\lambda_i$$ are sufficiently “small”, then $$\log ( 1 + \lambda_i ) \approx \lambda_i$$, giving

\label{eqn:convexOptimizationLecture3:380}
\log \det ( X + \Delta X )
=
\log \det X +
\sum_{i = 1}^n \lambda_i
\approx
\log \det X +
\textrm{Tr}( X^{-1/2} \Delta X X^{-1/2} ).

Since
\label{eqn:convexOptimizationLecture3:400}
\textrm{Tr}( A B ) = \textrm{Tr}( B A ),

this trace operation can be written as

\label{eqn:convexOptimizationLecture3:420}
\log \det ( X + \Delta X )
\approx
\log \det X +
\textrm{Tr}( X^{-1} \Delta X )
=
\log \det X +
\innerprod{ X^{-1}}{\Delta X},

so
\label{eqn:convexOptimizationLecture3:440}

To check this, consider the simplest example with $$X \in \mathbb{R}^{1 \times 1}$$, where we have

\label{eqn:convexOptimizationLecture3:460}
\frac{d}{dX} \lr{ \log \det X } = \frac{d}{dX} \lr{ \log X } = \inv{X} = X^{-1}.

This is a nice example demonstrating how the gradient can be obtained by performing a first order perturbation of the function. The gradient can then be read off from the result.

## Second order perturbations

• To get first order approximation found the part that varied linearly in $$\Delta X$$.
• To get the second order part, perturb $$X^{-1}$$ by $$\Delta X$$ and see how that perturbation varies in $$\Delta X$$.

For $$G(X) = X^{-1}$$, this is

\label{eqn:convexOptimizationLecture3:480}
\begin{aligned}
(X + \Delta X)^{-1}
&=
\lr{ X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2} ) X^{1/2} }^{-1} \\
&=
X^{-1/2} (I + X^{-1/2} \Delta X X^{-1/2} )^{-1} X^{-1/2}
\end{aligned}

To be proven in the homework (for “small” A)

\label{eqn:convexOptimizationLecture3:500}
(I + A)^{-1} \approx I – A.

This gives

\label{eqn:convexOptimizationLecture3:520}
\begin{aligned}
(X + \Delta X)^{-1}
&=
X^{-1/2} (I – X^{-1/2} \Delta X X^{-1/2} ) X^{-1/2} \\
&=
X^{-1} – X^{-1} \Delta X X^{-1},
\end{aligned}

or

\label{eqn:convexOptimizationLecture3:800}
\begin{aligned}
G(X + \Delta X)
&= G(X) + (D G) \Delta X \\
&= G(X) + (\spacegrad G)^\T \Delta X,
\end{aligned}

so
\label{eqn:convexOptimizationLecture3:820}
=
– X^{-1} \Delta X X^{-1}.

The Taylor expansion of $$F$$ to second order is

\label{eqn:convexOptimizationLecture3:840}
F(X + \Delta X)
=
F(X)
+
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
+
\inv{2}
\lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}.

The first trace can be expressed as an inner product

\label{eqn:convexOptimizationLecture3:860}
\begin{aligned}
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
&=
\innerprod{ \spacegrad F }{\Delta X} \\
&=
\innerprod{ X^{-1} }{\Delta X}.
\end{aligned}

The second trace also has the structure of an inner product

\label{eqn:convexOptimizationLecture3:880}
\begin{aligned}
(\Delta X)^\T (\spacegrad^2 F) \Delta X
&=
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X} \\
&=
\innerprod{ (\spacegrad^2 F)^\T \Delta X }{\Delta X},
\end{aligned}

where a no-op trace could be inserted in the second order term since that quadratic form is already a scalar. This $$(\spacegrad^2 F)^\T \Delta X$$ term has essentially been found implicitly by performing the linear variation of $$\spacegrad F$$ in $$\Delta X$$, showing that we must have

\label{eqn:convexOptimizationLecture3:900}
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}
=
\innerprod{ – X^{-1} \Delta X X^{-1} }{\Delta X},

so
\label{eqn:convexOptimizationLecture3:560}
F( X + \Delta X) = F(X) +
\innerprod{X^{-1}}{\Delta X}
+\inv{2} \innerprod{-X^{-1} \Delta X X^{-1}}{\Delta X},

or
\label{eqn:convexOptimizationLecture3:580}
\log \det ( X + \Delta X) = \log \det X +
\textrm{Tr}( X^{-1} \Delta X )
– \inv{2} \textrm{Tr}( X^{-1} \Delta X X^{-1} \Delta X ).

## Convex Sets

• Types of sets: Affine, convex, cones
• Examples: Hyperplanes, polyhedra, balls, ellipses, norm balls, cone of PSD matrices.

Definition: Affine set:

A set $$C \subseteq \mathbb{R}^n$$ is affine if $$\forall \Bx_1, \Bx_2 \in C$$ then

\begin{equation*}
\theta \Bx_1 + (1 -\theta) \Bx_2 \in C, \qquad \forall \theta \in \mathbb{R}.
\end{equation*}

The affine sum above can
be rewritten as

\label{eqn:convexOptimizationLecture3:600}
\Bx_2 + \theta (\Bx_1 – \Bx_2).

Since $$\theta$$ is a scaling, this is the line containing $$\Bx_2$$ in the direction between $$\Bx_1$$ and $$\Bx_2$$.

Observe that the solution to a set of linear equations

\label{eqn:convexOptimizationLecture3:620}
C = \setlr{ \Bx | A \Bx = \Bb },

is an affine set. To check, note that

\label{eqn:convexOptimizationLecture3:640}
\begin{aligned}
A (\theta \Bx_1 + (1 – \theta) \Bx_2)
&=
\theta A \Bx_1 + (1 – \theta) A \Bx_2 \\
&=
\theta \Bb + (1 – \theta) \Bb \\
&= \Bb.
\end{aligned}

Definition: Affine combination: An affine combination of points $$\Bx_1, \Bx_2, \cdots \Bx_n$$ is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that for $$\theta_i \in \mathbb{R}$$

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1.
\end{equation*}

An affine set contains all affine combinations of points in the set. Examples of a couple affine sets are sketched in fig 1.1

For comparison, a couple of non-affine sets are sketched in fig 1.2

Definition: Convex set: A set $$C \subseteq \mathbb{R}^n$$ is convex if $$\forall \Bx_1, \Bx_2 \in C$$ and $$\forall \theta \in \mathbb{R}, \theta \in [0,1]$$, the combination

\label{eqn:convexOptimizationLecture3:700}
\theta \Bx_1 + (1 – \theta) \Bx_2 \in C.

Definition: Convex combination: A convex combination of $$\Bx_1, \Bx_2, \cdots \Bx_n$$ is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that $$\forall \theta_i \ge 0$$

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1
\end{equation*}

Definition: Convex hull: Convex hull of a set $$C$$ is a set of all convex combinations of points in $$C$$, denoted

\label{eqn:convexOptimizationLecture3:720}
\textrm{conv}(C) = \setlr{ \sum_{i=1}^n \theta_i \Bx_i | \Bx_i \in C, \theta_i \ge 0, \sum_{i=1}^n \theta_i = 1 }.

A non-convex set can be converted into a convex hull by filling in all the combinations of points connecting points in the set, as sketched in fig 1.3.

Definition: Cone: A set $$C$$ is a cone if $$\forall \Bx \in C$$ and $$\forall \theta \ge 0$$ we have $$\theta \Bx \in C$$.

This scales out if $$\theta > 1$$ and scales in if $$\theta < 1$$.

A convex cone is a cone that is also a convex set. A conic combination is

\begin{equation*}
\sum_{i=1}^n \theta_i \Bx_i, \theta_i \ge 0.
\end{equation*}

A convex and non-convex 2D cone is sketched in fig. 1.4

A comparison of properties for different set types is tabulated in table 1.1

## Hyperplanes and half spaces

Definition: Hyperplane: A hyperplane is defined by

\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb, \Ba \ne 0 }.
\end{equation*}

A line and plane are examples of this general construct as sketched in
fig. 1.5

An alternate view is possible should one
find any specific $$\Bx_0$$ such that $$\Ba^\T \Bx_0 = \Bb$$

\label{eqn:convexOptimizationLecture3:740}
\setlr{\Bx | \Ba^\T \Bx = b }
=
\setlr{\Bx | \Ba^\T (\Bx -\Bx_0) = 0 }

This shows that $$\Bx – \Bx_0 = \Ba^\perp$$ is perpendicular to $$\Ba$$, or

\label{eqn:convexOptimizationLecture3:780}
\Bx
=
\Bx_0 + \Ba^\perp.

This is the subspace perpendicular to $$\Ba$$ shifted by $$\Bx_0$$, subject to $$\Ba^\T \Bx_0 = \Bb$$. As a set

\label{eqn:convexOptimizationLecture3:760}
\Ba^\perp = \setlr{ \Bv | \Ba^\T \Bv = 0 }.

## Half space

Definition: Half space: The half space is defined as
\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb }
= \setlr{ \Bx | \Ba^\T (\Bx – \Bx_0) \le 0 }.
\end{equation*}

This can also be expressed as $$\setlr{ \Bx | \innerprod{ \Ba }{\Bx – \Bx_0 } \le 0 }$$.

## ECE1505H Convex Optimization. Lecture 2: Mathematical background. Taught by Prof. Stark Draper

### Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

### Topics

• Calculus: Derivatives and Jacobians, Gradients, Hessians, approximation functions.
• Linear algebra, Matrices, decompositions, …

## Norms

Vector space:

A set of elements (vectors) that is closed under vector addition and scaling.

This generalizes the directed arrow concept of vector space (fig. 1) that is familiar from geometry.

Normed vector spaces:

A vector space with a notion of length of any single vector, the “norm”.

Inner product space:
A normed vector space with a notion of a real angle between any pair of vectors.

This course has a focus on optimization in \R{n}. Complex spaces in the context of this course can be considered with a mapping $$\text{\C{n}} \rightarrow \mathbb{R}^{2 n}$$.

Norm:
A norm is a function operating on a vector

\begin{equation*}
\Bx = ( x_1, x_2, \cdots, x_n )
\end{equation*}

that provides a mapping

\begin{equation*}
\Norm{ \cdot } : \mathbb{R}^{n} \rightarrow \mathbb{R},
\end{equation*}

where

• $$\Norm{ \Bx } \ge 0$$
• $$\Norm{ \Bx } = 0 \qquad \iff \Bx = 0$$
• $$\Norm{ t \Bx } = \Abs{t} \Norm{ \Bx }$$
• $$\Norm{ \Bx + \By } \le \Norm{ \Bx } + \Norm{\By}$$. This is the triangle inequality.

### Example: Euclidean norm

\label{eqn:convex-optimizationLecture2:24}
\Norm{\Bx} = \sqrt{ \sum_{i = 1}^n x_i^2 }

### Example: $$l_p$$-norms

\label{eqn:convex-optimizationLecture2:44}
\Norm{\Bx}_p = \lr{ \sum_{i = 1}^n \Abs{x_i}^p }^{1/p}.

For $$p = 1$$, this is

\label{eqn:convex-optimizationLecture2:64}
\Norm{\Bx}_1 = \sum_{i = 1}^n \Abs{x_i},

For $$p = 2$$, this is the Euclidean norm \ref{eqn:convex-optimizationLecture2:24}.
For $$p = \infty$$, this is

\label{eqn:convex-optimizationLecture2:324}
\Norm{\Bx}_\infty = \max_{i = 1}^n \Abs{x_i}.

Unit ball:

\begin{equation*}
\setlr{ \Bx | \Norm{\Bx} \le 1 }
\end{equation*}

The regions of the unit ball under the $$l_1, l_2, and l_\infty$$ norms are plotted in fig. 2.

fig. 2. Some unit ball regions.

The $$l_2$$ norm is not only familiar, but can be “induced” by an inner product

\label{eqn:convex-optimizationLecture2:84}
\left\langle \Bx, \By \right\rangle = \Bx^\T \By = \sum_{i = 1}^n x_i y_i,

which is not true for all norms. The norm induced by this inner product is

\label{eqn:convex-optimizationLecture2:104}
\Norm{\Bx}_2 = \sqrt{ \left\langle \Bx, \By \right\rangle }

Inner product spaces have a notion of angle (fig. 3) given by

\label{eqn:convex-optimizationLecture2:124}
\left\langle \Bx, \By \right\rangle = \Norm{\Bx} \Norm{\By} \cos \theta,

fig. 3. Inner product induced angle.

and always satisfy the Cauchy-Schwartz inequality

\label{eqn:convex-optimizationLecture2:144}
\left\langle \Bx, \By \right\rangle \le \Norm{\Bx}_2 \Norm{\By}_2.

In an inner product space we say $$\Bx$$ and $$\By$$ are orthogonal vectors $$\Bx \perp \By$$ if $$\left\langle \Bx, \By \right\rangle = 0$$, as sketched in fig. 4.

fig. 4. Orthogonality.

## Dual norm

Let $$\Norm{ \cdot }$$ be a norm in \R{n}. The “dual” norm $$\Norm{ \cdot }_\conj$$ is defined as

\begin{equation*}
\Norm{\Bz}_\conj = \sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx} \le 1 }.
\end{equation*}

where $$\sup$$ is roughly the “least upper bound”.
\index{sup}

This is a limit over the unit ball of $$\Norm{\cdot}$$.

### $$l_2$$ dual

Dual of the $$l_2$$ is the $$l_2$$ norm.

fig. 5. l_2 dual norm determination.

Proof:

\label{eqn:convex-optimizationLecture2:164}
\begin{aligned}
\Norm{\Bz}_\conj
&= \sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx}_2 \le 1 } \\
&= \sup_\Bx \setlr{ \Norm{\Bz}_2 \Norm{\Bx}_2 \cos\theta | \Norm{\Bx}_2 \le 1 } \\
&\le \sup_\Bx \setlr{ \Norm{\Bz}_2 \Norm{\Bx}_2 | \Norm{\Bx}_2 \le 1 } \\
&\le
\Norm{\Bz}_2
\Norm{
\frac{\Bz}{ \Norm{\Bz}_2 }
}_2 \\
&=
\Norm{\Bz}_2.
\end{aligned}

### $$l_1$$ dual

For $$l_1$$, the dual is the $$l_\infty$$ norm. Proof:

\label{eqn:convex-optimizationLecture2:184}
\Norm{\Bz}_\conj
=
\sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx}_1 \le 1 },

but
\label{eqn:convex-optimizationLecture2:204}
\Bz^\T \Bx
=
\sum_{i=1}^n z_i x_i \le
\Abs{
\sum_{i=1}^n z_i x_i
}
\le
\sum_{i=1}^n \Abs{z_i x_i },

so
\label{eqn:convex-optimizationLecture2:224}
\begin{aligned}
\Norm{\Bz}_\conj
&=
\sum_{i=1}^n \Abs{z_i}\Abs{ x_i } \\
&\le \lr{ \max_{j=1}^n \Abs{z_j} }
\sum_{i=1}^n \Abs{ x_i } \\
&\le \lr{ \max_{j=1}^n \Abs{z_j} } \\
&=
\Norm{\Bz}_\infty.
\end{aligned}

### $$l_\infty$$ dual

.

fig. 6. l_1 dual norm determination.

fig. 7. l_\infinity dual norm determination.

\label{eqn:convex-optimizationLecture2:244}
\Norm{\Bz}_\conj
=
\sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx}_\infty \le 1 }.

Here
\label{eqn:convex-optimizationLecture2:264}
\begin{aligned}
\Bz^\T \Bx
&=
\sum_{i=1}^n z_i x_i \\
&\le
\sum_{i=1}^n \Abs{z_i}\Abs{ x_i } \\
&\le
\lr{ \max_j \Abs{ x_j } }
\sum_{i=1}^n \Abs{z_i} \\
&=
\Norm{\Bx}_\infty
\sum_{i=1}^n \Abs{z_i}.
\end{aligned}

So
\label{eqn:convex-optimizationLecture2:284}
\Norm{\Bz}_\conj
\le
\sum_{i=1}^n \Abs{z_i}
=
\Norm{\Bz}_1.

Statement from the lecture: I’m not sure where this fits:

\label{eqn:convex-optimizationLecture2:304}
x_i^\conj
=
\left\{
\begin{array}{l l}
+1 & \quad \mbox{$$z_i \ge 0$$} \\
-1 & \quad \mbox{$$z_i \le 0$$}
\end{array}
\right.

# References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

## Jacobian and Hessian matrices

January 15, 2017 ece1505 No comments , , , , ,

## Motivation

In class this Friday the Jacobian and Hessian matrices were introduced, but I did not find the treatment terribly clear. Here is an alternate treatment, beginning with the gradient construction from [2], which uses a nice trick to frame the multivariable derivative operation as a single variable Taylor expansion.

## Multivariable Taylor approximation

The Taylor series expansion for a scalar function $$g : {\mathbb{R}} \rightarrow {\mathbb{R}}$$ about the origin is just

\label{eqn:jacobianAndHessian:20}
g(t) = g(0) + t g'(0) + \frac{t^2}{2} g”(0) + \cdots

In particular

\label{eqn:jacobianAndHessian:40}
g(1) = g(0) + g'(0) + \frac{1}{2} g”(0) + \cdots

Now consider $$g(t) = f( \Bx + \Ba t )$$, where $$f : {\mathbb{R}}^n \rightarrow {\mathbb{R}}$$, $$g(0) = f(\Bx)$$, and $$g(1) = f(\Bx + \Ba)$$. The multivariable Taylor expansion now follows directly

\label{eqn:jacobianAndHessian:60}
f( \Bx + \Ba)
= f(\Bx)
+ \evalbar{\frac{df(\Bx + \Ba t)}{dt}}{t = 0} + \frac{1}{2} \evalbar{\frac{d^2f(\Bx + \Ba t)}{dt^2}}{t = 0} + \cdots

The first order term is

\label{eqn:jacobianAndHessian:80}
\begin{aligned}
\evalbar{\frac{df(\Bx + \Ba t)}{dt}}{t = 0}
&=
\sum_{i = 1}^n
\frac{d( x_i + a_i t)}{dt}
\evalbar{\PD{(x_i + a_i t)}{f(\Bx + \Ba t)}}{t = 0} \\
&=
\sum_{i = 1}^n
a_i
\PD{x_i}{f(\Bx)} \\
\end{aligned}

Similarily, for the second order term

\label{eqn:jacobianAndHessian:100}
\begin{aligned}
\evalbar{\frac{d^2 f(\Bx + \Ba t)}{dt^2}}{t = 0}
&=
\evalbar{\lr{
\frac{d}{dt}
\lr{
\sum_{i = 1}^n
a_i
\PD{(x_i + a_i t)}{f(\Bx + \Ba t)}
}
}
}{t = 0} \\
&=
\evalbar{
\lr{
\sum_{j = 1}^n
\frac{d(x_j + a_j t)}{dt}
\sum_{i = 1}^n
a_i
\frac{\partial^2 f(\Bx + \Ba t)}{\partial (x_j + a_j t) \partial (x_i + a_i t) }
}
}{t = 0} \\
&=
\sum_{i,j = 1}^n a_i a_j \frac{\partial^2 f}{\partial x_i \partial x_j} \\
&=
\end{aligned}

The complete Taylor expansion of a scalar function $$f : {\mathbb{R}}^n \rightarrow {\mathbb{R}}$$ is therefore

\label{eqn:jacobianAndHessian:120}
f(\Bx + \Ba)
= f(\Bx) +
\inv{2} \lr{ \Ba \cdot \spacegrad}^2 f + \cdots,

so the Taylor expansion has an exponential structure

\label{eqn:jacobianAndHessian:140}
f(\Bx + \Ba) = \sum_{k = 0}^\infty \inv{k!} \lr{ \Ba \cdot \spacegrad}^k f = e^{\Ba \cdot \spacegrad} f.

Should an approximation of a vector valued function $$\Bf : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m$$ be desired it is only required to form a matrix of the components

\label{eqn:jacobianAndHessian:160}
\Bf(\Bx + \Ba)
= \Bf(\Bx) +
\inv{2} [\lr{ \Ba \cdot \spacegrad}^2 f_i]_i + \cdots,

where $$[.]_i$$ denotes a column vector over the rows $$i \in [1,m]$$, and $$f_i$$ are the coordinates of $$\Bf$$.

## The Jacobian matrix

In [1] the Jacobian $$D \Bf$$ of a function $$\Bf : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m$$ is defined in terms of the limit of the $$l_2$$ norm ratio

\label{eqn:jacobianAndHessian:180}
\frac{\Norm{\Bf(\Bz) – \Bf(\Bx) – (D \Bf) (\Bz – \Bx)}_2 }{ \Norm{\Bz – \Bx}_2 },

with the statement that the function $$\Bf$$ has a derivative if this limit exists. Here the Jacobian $$D \Bf \in {\mathbb{R}}^{m \times n}$$ must be matrix valued.

Let $$\Bz = \Bx + \Ba$$, so the first order expansion of \ref{eqn:jacobianAndHessian:160} is

\label{eqn:jacobianAndHessian:200}
\Bf(\Bz)
= \Bf(\Bx) + [\lr{ \Bz – \Bx } \cdot \spacegrad f_i]_i
.

With the (unproven) assumption that this Taylor expansion satisfies the norm limit criteria of \ref{eqn:jacobianAndHessian:180}, it is possible to extract the structure of the Jacobian by comparison

\label{eqn:jacobianAndHessian:220}
\begin{aligned}
(D \Bf)
(\Bz – \Bx)
&=
{\begin{bmatrix}
\lr{ \Bz – \Bx } \cdot \spacegrad f_i
\end{bmatrix}}_i \\
&=
{\begin{bmatrix}
\sum_{j = 1}^n (z_j – x_j) \PD{x_j}{f_i}
\end{bmatrix}}_i \\
&=
{\begin{bmatrix}
\PD{x_j}{f_i}
\end{bmatrix}}_{ij}
(\Bz – \Bx),
\end{aligned}

so
\label{eqn:jacobianAndHessian:240}
\boxed{
(D \Bf)_{ij} = \PD{x_j}{f_i}
}

Written out explictly as a matrix the Jacobian is

\label{eqn:jacobianAndHessian:320}
D \Bf
=
\begin{bmatrix}
\PD{x_1}{f_1} & \PD{x_2}{f_1} & \cdots & \PD{x_n}{f_1} \\
\PD{x_1}{f_2} & \PD{x_2}{f_2} & \cdots & \PD{x_n}{f_2} \\
\vdots & \vdots & & \vdots \\
\PD{x_1}{f_m} & \PD{x_2}{f_m} & \cdots & \PD{x_n}{f_m} \\
\end{bmatrix}
=
\begin{bmatrix}
\vdots \\
\end{bmatrix}.

In particular, when the function is scalar valued
\label{eqn:jacobianAndHessian:261}

With this notation, the first Taylor expansion, in terms of the Jacobian matrix is

\label{eqn:jacobianAndHessian:260}
\boxed{
\Bf(\Bz)
\approx \Bf(\Bx) + (D \Bf) \lr{ \Bz – \Bx }.
}

## The Hessian matrix

For scalar valued functions, the text expresses the second order expansion of a function in terms of the Jacobian and Hessian matrices

\label{eqn:jacobianAndHessian:271}
f(\Bz)
\approx f(\Bx) + (D f) \lr{ \Bz – \Bx }
+ \inv{2} \lr{ \Bz – \Bx }^\T (\spacegrad^2 f) \lr{ \Bz – \Bx }.

Because $$\spacegrad^2$$ is the usual notation for a Laplacian operator, this $$\spacegrad^2 f \in {\mathbb{R}}^{n \times n}$$ notation for the Hessian matrix is not ideal in my opinion. Ignoring that notational objection for this class, the structure of the Hessian matrix can be extracted by comparison with the coordinate expansion

\label{eqn:jacobianAndHessian:300}
=
\sum_{r,s = 1}^n a_r a_s \frac{\partial^2 f}{\partial x_r \partial x_s}

so
\label{eqn:jacobianAndHessian:280}
\boxed{
=
\frac{\partial^2 f_i}{\partial x_i \partial x_j}.
}

In explicit matrix form the Hessian is

\label{eqn:jacobianAndHessian:340}
=
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1 \partial x_1} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_1 \partial x_n} \\
\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2 \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_2 \partial x_n} \\
\vdots & \vdots & & \vdots \\
\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots &\frac{\partial^2 f}{\partial x_n \partial x_n}
\end{bmatrix}.

Is there a similar nice matrix structure for the Hessian of a function $$f : {\mathbb{R}}^n \rightarrow {\mathbb{R}}^m$$?

# References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[2] D. Hestenes. New Foundations for Classical Mechanics. Kluwer Academic Publishers, 1999.

## UofT ece1505 convex optimization: Introduction (taught by Prof. Stark Draper)

January 12, 2017 ece1505 No comments

## Peeter’s lecture notes. May be incoherent or rough.

• Science of optimization.
• problem formulation, design, analysis of engineering systems.

## Basic concepts

• Basic concepts. convex sets, functions, problems.
• Theory (about 40 % of the material). Specifically Lagrangian duality.
• Algorithms: gradient descent, Newton’s, interior point, …

Homework will involve computational work (solving problems, …)

## Goals

• Recognize and formulate engineering problems as convex optimization problems.
• To develop (Matlab) code to solve problems numerically.
• To characterize the solutions via duality theory
• NOT a math course, but lots of proofs.
• NOT a communications course, but lots of … (?)
• NOT a CS course, but lots of useful algorithms.

## Mathematical program

\label{eqn:intro:20}
\min_\Bx F_0(\Bx)

where $$\Bx = (x_1, x_2, \cdots, x_m) \in \text{\R{m}}$$ is subject to constraints $$F_i : \text{\R{m}} \rightarrow \text{\R{1}}$$

\label{eqn:intro:40}
F_i(\Bx) \le 0, \qquad i = 1, \cdots, m

The function $$F_0 : \text{\R{m}} \rightarrow \text{\R{1}}$$ is called the “objective function”.

Solving a problem produces:

An optimal $$\Bx^\conj$$ is a value $$\Bx$$ that gives the smallest value among all the feasible $$\Bx$$ for the objective function $$F_0$$. Such a function is sketched in fig. 1.

fig. 1. Convex objective function.

• A convex objective looks like a bowl, “holds water”.
• If connect two feasible points line segment in the ? above bottom of the bowl.

A non-convex function is illustrated in fig. 2, which has a number of local minimums.

fig. 2. Non-convex (wavy) figure with a number of local minimums.

## Example: Line fitting.

A linear fit of some points distributed around a line $$y = a x + b$$ is plotted in fig. 3. Here $$a, b$$ are the optimization variables $$\Bx = (a, b)$$.

fig. 3. Linear fit of points around a line.

How is the solution for such a best fit line obtained?

### Approach 1: Calculus minimization of a multivariable error function.

Describe an error function, describing how far from the line a given point is.

\label{eqn:intro:100}
y_i – (a x_i + b),

Because this can be positive or negative, we can define a squared variant of this, and then sum over all data points.

\label{eqn:intro:120}
F_0 = \sum_{i=1}^n \lr{ y_i – (a x_i + b) }^2.

One way to solve (for $$a, b$$): Take the derivatives

\label{eqn:intro:140}
\begin{aligned}
\PD{a}{F_0} &= \sum_{i=1}^n 2 ( y_i – (a x_i + b) )(-x_i) = 0 \\
\PD{b}{F_0} &= \sum_{i=1}^n 2 ( y_i – (a x_i + b) )(-1) = 0.
\end{aligned}

This yields

\label{eqn:intro:160}
\begin{aligned}
\sum_{i = 1}^n y_i &= \lr{\sum_{i = 1}^n x_i} a + \lr{\sum_{i = 1}^n 1} b \\
\sum_{i = 1}^n x_i y_i &= \lr{\sum_{i = 1}^n x_i^2} a + \lr{\sum_{i = 1}^n x_i} b.
\end{aligned}

In matrix form, this is

\label{eqn:intro:180}
\begin{bmatrix}
\sum x_i y_i \\
\sum y_i
\end{bmatrix}
=
\begin{bmatrix}
\sum x_i^2 & \sum x_i \\
\sum x_i & n
\end{bmatrix}
\begin{bmatrix}
a \\
b
\end{bmatrix}.

If invertible, have an analytic solution for $$(a^\conj, b^\conj)$$. This is a convex optimization problem because $$F(x) = x^2$$ is a convex “quadratic program”. In general a quadratic program has the structure

\label{eqn:intro:200}
F(a, b) = (\cdots) a^2 + (\cdots) a b + (\cdots) b^2.

### Approach 2: Linear algebraic formulation.

\label{eqn:intro:220}
\begin{bmatrix}
y_1 \\
\vdots \\
y_n
\end{bmatrix}
=
\begin{bmatrix}
x_1 & 1 \\
\vdots & \vdots \\
x_n & 1
\end{bmatrix}
\begin{bmatrix}
a \\
b
\end{bmatrix}
+
\begin{bmatrix}
z_1 \\
\vdots \\
z_n
\end{bmatrix}
,

or
\label{eqn:intro:240}
\By = H \Bv + \Bz,

where $$\Bz$$ is the error vector. The problem is now reduced to to: Fit $$\By$$ to be as close to $$H \Bv + \Bz$$ as possible, or to minimize the norm of the error vector, or

\label{eqn:intro:260}
\begin{aligned}
\min_\Bv \Norm{ \By – H \Bv }^2_2
&= \min_\Bv \lr{ \By – H \Bv }^\T \lr{ \By – H \Bv } \\
&= \min_\Bv
\lr{ \By^\T \By – \By^\T H \Bv – \Bv^\T H \By + \Bv^\T H^\T H \Bv } \\
&= \min_\Bv
\lr{ \By^\T \By – 2 \By^\T H \Bv + \Bv^\T H^\T H \Bv }.
\end{aligned}

It is now possible to take the derivative with respect to the $$\Bv$$ vector (i.e. the gradient with respect to the coordinates of the constraint vector)

\label{eqn:intro:280}
\PD{\Bv}{}
\lr{ \By^\T \By – 2 \By^\T H \Bv + \Bv^\T H^\T H \Bv }
=
– 2 \By^\T H + 2 \Bv^\T H^\T H
= 0,

or

\label{eqn:intro:300}
(H^\T H) \Bv = H^\T \By,

so, assuming that $$H^\T H$$ is invertible, the optimization problem has solution

\label{eqn:intro:320}
\Bv^\conj =
(H^\T H)^{-1} H^\T \By,

where

\label{eqn:intro:340}
\begin{aligned}
H^\T H
&=
\begin{bmatrix}
x_1 & \cdots & x_n \\
1 & \cdots & 1 \\
\end{bmatrix}
\begin{bmatrix}
x_1 & 1 \\
\vdots & \vdots \\
x_n & 1
\end{bmatrix} \\
&=
\begin{bmatrix}
\sum x_i^2 & \sum x_i \\
\sum x_i & n
\end{bmatrix}
,
\end{aligned}

as seen in the calculus approach.

## Maximum Likelyhood Estimation (MLE).

It is reasonable to ask why the 2-norm was picked for the objective function?

• One justification is practical: Because we can solve the derivative equation.
• Another justification: In statistics the error vector $$\Bz = \By – H \Bv$$ can be modelled as an IID (Independently and Identically Distributed) Gaussian random variable (i.e. noise). Under this model, the use of the 2-norm can be viewed as a consequence of such an ML estimation problem (see [1] ch. 7).

A Gaussian fig. 4 IID model is given by

\label{eqn:intro:360}
y_i = a x_i + b

\label{eqn:intro:380}
z_i = y_i – a x_i -b \sim N(O, O^2)

\label{eqn:intro:400}
P_Z(z) = \inv{\sqrt{2 \pi \sigma}} \exp\lr{ -\inv{2} z^2/\sigma^2 }.

fig. 4. Gaussian probability distribution.

### MLE: Maximum Likelyhood Estimator

Pick $$(a,b)$$ to maximize the probability of observed data.

\label{eqn:intro:420}
\begin{aligned}
(a^\conj, b^\conj)
&= \arg \max P( x, y ; a, b ) \\
&= \arg \max P_Z( y – (a x + b) ) \\
&= \arg \max \prod_{i = 1}^n \\
&= \arg \max \inv{\sqrt{2 \pi \sigma}} \exp\lr{ -\inv{2} (y_i – a x_i – b)^2/\sigma^2 }.
\end{aligned}

Taking logs gives
\label{eqn:intro:440}
\begin{aligned}
(a^\conj, b^\conj)
&= \arg \max
\lr{
\textrm{constant}
-\inv{2} \sum_i (y_i – a x_i – b)^2/\sigma^2
} \\
&= \arg \min
\inv{2} \sum_i (y_i – a x_i – b)^2/\sigma^2 \\
&= \arg \min
\sum_i (y_i – a x_i – b)^2/\sigma^2
\end{aligned}

Here $$\arg \max$$ is not the maximum of the function, but the value of the parameter (the argument) that maximizes the function.

### Double sides exponential noise

A double sided exponential distribution is plotted in fig. 5, and has the mathematical form

\label{eqn:intro:460}
P_Z(z) = \inv{2 c} \exp\lr{ -\inv{c} \Abs{z} }.

fig. 5. Double sided exponential probability distribution.

The optimization problem is

\label{eqn:intro:480}
\begin{aligned}
\max_{a,b} \prod_{i = 1}^n P_z(z_i)
&=
\max_{a,b} \prod_{i = 1}^n
\inv{2 c} \exp\lr{ -\inv{c} \Abs{z_i} } \\
&=
\max_{a,b} \prod_{i = 1}^n
\inv{2 c} \exp\lr{ -\inv{c} \Abs{y_i – a x_i – b} } \\
&=
\max_{a,b}
\lr{\inv{2 c}}^n \exp\lr{ -\inv{c} \sum_{i=1}^n \Abs{y_i – a x_i – b} }.
\end{aligned}

This is a L1 norm problem

\label{eqn:intro:500}
\min_{a,b} \sum_{i = 1}^n \Abs{ y_i – a x_i – b }.

i.e.

\label{eqn:intro:520}
\min_\Bv \Norm{ \By – H \Bv }_1.

This is still convex, but has no analytic solution, and

is an example of a linear program.

### Solution of linear program

Introduce helper variables $$t_1, \cdots, t_n$$, and minimize $$\sum_i t_i$$, such that

\label{eqn:intro:540}
\Abs{ y_i – a x_i – b } \le t_i,

This is now an optimization problem for $$a, b, t_1, \cdots t_n$$. A linear program is defined as

\label{eqn:intro:560}
\min_{a, b, t_1, \cdots t_n} \sum_i t_i

such that
\label{eqn:intro:580}
\begin{aligned}
y_i – a x_i – b \le t_i
y_i – a x_i – b \ge -t_i
\end{aligned}

### Single sided exponential

What if your noise doesn’t look double sided, with only noise for values $$x > 0$$. Can define a single sided probability distribution, as that of fig. 6.

fig. 6. Single sided exponential distribution.

\label{eqn:intro:600}
P_Z(z) =
\left\{
\begin{array}{l l}
\inv{c} e^{-z/c} & \quad \mbox{$$z \ge 0$$} \\
0 & \quad \mbox{$$z < 0$$} \end{array} \right. i.e. all $$z_i$$ error values are always non-negative. \label{eqn:intro:620} \log P_z(z) = \left\{ \begin{array}{l l} \textrm{const} – z/c & \quad \mbox{$$z > 0$$} \\
-\infty & \quad \mbox{$$z< 0$$}
\end{array}
\right.

Problem becomes

\label{eqn:intro:640}
\min_{a, b} \sum_i \lr{ y_i – a x_i – b }

such that
\label{eqn:intro:660}
y_i – a x_i – b \ge t_i \qquad \forall i

### Uniform noise

For noise that is uniformly distributed in a range, as that of fig. 7, which is constant in the range $$[-c,c]$$ and zero outside that range.

fig. 7. Uniform probability distribution.

\label{eqn:intro:680}
P_Z(z) =
\left\{
\begin{array}{l l}
\inv{2 c} & \quad \mbox{$$\Abs{z} \le c$$} \\
0 & \quad \mbox{$$\Abs{z} > c.$$}
\end{array}
\right.

or

\label{eqn:intro:700}
\log P_Z(z) =
\left\{
\begin{array}{l l}
\textrm{const} & \quad \mbox{$$\Abs{z} \le c$$} \\
-\infty & \quad \mbox{$$\Abs{z} > c.$$}
\end{array}
\right.

MLE solution

\label{eqn:intro:720}
\max_{a,b} \prod_{i = 1}^n P(x, y; a, b)
=
\max_{a,b} \sum_{i = 1}^n \log P_Z( y_i – a x_i – b )

Here the argument is constant if $$-c \le y_i – a x_i – b \le c$$, so an ML solution is \underline{any} $$(a,b)$$ such that

\label{eqn:intro:740}
\Abs{ y_i – a x_i – b } \le c \qquad \forall i \in 1, \cdots, n.

This is a linear program known as a “feasibility problem”.

\label{eqn:intro:760}
\min d

such that

\label{eqn:intro:780}
\begin{aligned}
y_i – a x_i – b &\le d \\
y_i – a x_i – b &\ge -d
\end{aligned}

If $$d^\conj \le c$$, then the problem is feasible, however, if $$d^\conj > c$$ it is infeasible.

### Method comparison

The double sided exponential, single sided exponential and uniform probability distributions of fig 1.8 each respectively represent the point plots of the form fig 1.9. The double sided exponential samples are distributed on both sides of the line, the single sided strictly above or on the line, and the uniform representing error bars distributed around the line of best fit.

## References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

## gdb set target-charset

January 9, 2017 C/C++ development and debugging. No comments , ,

I was looking for a way to convert ASCII and EBCDIC strings in gdb debugging sessions and was experimenting with gdb python script extensions. I managed to figure out how to add my own command that read a gdb variable, and print it out, but it failed when I tried to run a character conversion function. In the process of debugging that char encoding error, I found that there’s a built in way to do exactly what I wanted to do:

(gdb) p argv[0]
$16 = 0x7fd8fbda0108 "\323\326\303\301\323\305\303\326" (gdb) set target-charset EBCDIC-US (gdb) p argv[0]$17 = 0x7fd8fbda0108 "LOCALECO"
(gdb) set target-charset ASCII
(gdb) p argv[0]
\$18 = 0x7fd8fbda0108 "\323\326\303\301\323\305\303\326"


## Motivation

Geometric algebra (GA) allows for a compact description of Maxwell’s equations in either an explicit 3D representation or a STA (SpaceTime Algebra [2]) representation. The 3D GA and STA representations Maxwell’s equation both the form

\label{eqn:potentialMethods:1280}
L \boldsymbol{\mathcal{F}} = J,

where $$J$$ represents the sources, $$L$$ is a multivector gradient operator that includes partial derivative operator components for each of the space and time coordinates, and

\label{eqn:potentialMethods:1020}
\boldsymbol{\mathcal{F}} = \boldsymbol{\mathcal{E}} + \eta I \boldsymbol{\mathcal{H}},

is an electromagnetic field multivector, $$I = \Be_1 \Be_2 \Be_3$$ is the \R{3} pseudoscalar, and $$\eta = \sqrt{\mu/\epsilon}$$ is the impedance of the media.

When Maxwell’s equations are extended to include magnetic sources in addition to conventional electric sources (as used in antenna-theory [1] and microwave engineering [3]), they take the form

\label{eqn:chapter3Notes:20}
\spacegrad \cross \boldsymbol{\mathcal{E}} = – \boldsymbol{\mathcal{M}} – \PD{t}{\boldsymbol{\mathcal{B}}}

\label{eqn:chapter3Notes:40}
\spacegrad \cross \boldsymbol{\mathcal{H}} = \boldsymbol{\mathcal{J}} + \PD{t}{\boldsymbol{\mathcal{D}}}

\label{eqn:chapter3Notes:60}

\label{eqn:chapter3Notes:80}

The corresponding GA Maxwell equations in their respective 3D and STA forms are

\label{eqn:potentialMethods:300}
\lr{ \spacegrad + \inv{v} \PD{t}{} } \boldsymbol{\mathcal{F}}
=
\eta
\lr{ v q_{\textrm{e}} – \boldsymbol{\mathcal{J}} }
+ I \lr{ v q_{\textrm{m}} – \boldsymbol{\mathcal{M}} }

\label{eqn:potentialMethods:320}
\grad \boldsymbol{\mathcal{F}} = \eta J – I M,

where the wave group velocity in the medium is $$v = 1/\sqrt{\epsilon\mu}$$, and the medium is isotropic with
$$\boldsymbol{\mathcal{B}} = \mu \boldsymbol{\mathcal{H}}$$, and $$\boldsymbol{\mathcal{D}} = \epsilon \boldsymbol{\mathcal{E}}$$. In the STA representation, $$\grad, J, M$$ are all four-vectors, the specific meanings of which will be spelled out below.

How to determine the potential equations and the field representation using the conventional distinct Maxwell’s \ref{eqn:chapter3Notes:20}, … is well known. The basic procedure is to consider the electric and magnetic sources in turn, and observe that in each case one of the electric or magnetic fields must have a curl representation. The STA approach is similar, except that it can be observed that the field must have a four-curl representation for each type of source. In the explicit 3D GA formalism
\ref{eqn:potentialMethods:300} how to formulate a natural potential representation is not as obvious. There is no longer an reason to set any component of the field equal to a curl, and the representation of the four curl from the STA approach is awkward. Additionally, it is not obvious what form gauge invariance takes in the 3D GA representation.

### Ideas explored in these notes

• GA representation of Maxwell’s equations including magnetic sources.
• STA GA formalism for Maxwell’s equations including magnetic sources.
• Explicit form of the GA potential representation including both electric and magnetic sources.
• Demonstration of exactly how the 3D and STA potentials are related.
• Explore the structure of gauge transformations when magnetic sources are included.
• Explore the structure of gauge transformations in the 3D GA formalism.
• Specify the form of the Lorentz gauge in the 3D GA formalism.

### No magnetic sources

When magnetic sources are omitted, it follows from \ref{eqn:chapter3Notes:80} that there is some $$\boldsymbol{\mathcal{A}}^{\mathrm{e}}$$ for which

\label{eqn:potentialMethods:20}
\boxed{
}

Substitution into Faraday’s law \ref{eqn:chapter3Notes:20} gives

\label{eqn:potentialMethods:40}

or
\label{eqn:potentialMethods:60}
\spacegrad \cross \lr{ \boldsymbol{\mathcal{E}} + \PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } = 0.

A gradient representation of this curled quantity, say $$-\spacegrad \phi$$, will provide the required zero

\label{eqn:potentialMethods:80}
\boxed{
\boldsymbol{\mathcal{E}} = -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }.
}

The final two Maxwell equations yield

\label{eqn:potentialMethods:100}
\begin{aligned}
-\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \spacegrad \lr{ \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} } &= \mu \lr{ \boldsymbol{\mathcal{J}} + \epsilon \PD{t}{} \lr{ -\spacegrad \phi -\PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} } } } \\
\end{aligned}

or
\label{eqn:potentialMethods:120}
\boxed{
\begin{aligned}
\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{e}} – \inv{v^2} \PDSq{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
\inv{v^2} \PD{t}{\phi}
}
&= -\mu \boldsymbol{\mathcal{J}} \\
\end{aligned}
}

Note that the Lorentz condition $$\PDi{t}{(\phi/v^2)} + \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} = 0$$ can be imposed to decouple these, leaving non-homogeneous wave equations for the vector and scalar potentials respectively.

### No electric sources

Without electric sources, a curl representation of the electric field can be assumed, satisfying Gauss’s law

\label{eqn:potentialMethods:140}
\boxed{
\boldsymbol{\mathcal{D}} = – \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}}.
}

Substitution into the Maxwell-Faraday law gives
\label{eqn:potentialMethods:160}
\spacegrad \cross \lr{ \boldsymbol{\mathcal{H}} + \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}} } = 0.

This is satisfied with any gradient, say, $$-\spacegrad \phi_m$$, providing a potential representation for the magnetic field

\label{eqn:potentialMethods:180}
\boxed{
\boldsymbol{\mathcal{H}} = -\spacegrad \phi_m – \PD{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}}.
}

The remaining Maxwell equations provide the required constraints on the potentials

\label{eqn:potentialMethods:220}
\lr{
-\boldsymbol{\mathcal{M}} – \mu \PD{t}{}
\lr{
}
}

\label{eqn:potentialMethods:240}
\lr{
}
= \inv{\mu} q_m,

or
\label{eqn:potentialMethods:260}
\boxed{
\begin{aligned}
\spacegrad^2 \boldsymbol{\mathcal{A}}^{\mathrm{m}} – \inv{v^2} \PDSq{t}{\boldsymbol{\mathcal{A}}^{\mathrm{m}}} – \spacegrad \lr{ \inv{v^2} \PD{t}{\phi_m} + \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} } &= -\epsilon \boldsymbol{\mathcal{M}} \\
\end{aligned}
}

The general solution to Maxwell’s equations is therefore
\label{eqn:potentialMethods:280}
\begin{aligned}
\boldsymbol{\mathcal{E}} &=
– \inv{\epsilon} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
\boldsymbol{\mathcal{H}} &=
\end{aligned}

subject to the constraints \ref{eqn:potentialMethods:120} and \ref{eqn:potentialMethods:260}.

### Potential operator structure

Knowing that there is a simple underlying structure to the potential representation of the electromagnetic field in the STA formalism inspires the question of whether that structure can be found directly using the scalar and vector potentials determined above.

Specifically, what is the multivector representation \ref{eqn:potentialMethods:1020} of the electromagnetic field in terms of all the individual potential variables, and can an underlying structure for that field representation be found? The composite field is

\label{eqn:potentialMethods:280b}
\boldsymbol{\mathcal{F}}
=
– \inv{\epsilon} \spacegrad \cross \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
+ I \eta
\lr{
}.

Can this be factored into into multivector operator and multivector potentials? Expanding the cross products provides some direction

\label{eqn:potentialMethods:1040}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
– \PD{t}{ \boldsymbol{\mathcal{A}}^{\mathrm{e}} }
– \eta \PD{t}{I \boldsymbol{\mathcal{A}}^{\mathrm{m}}}
– \spacegrad \lr{ \phi – \eta I \phi_m } \\
+ \frac{1}{2 \epsilon} \lr{ \rspacegrad I \boldsymbol{\mathcal{A}}^{\mathrm{m}} – I \boldsymbol{\mathcal{A}}^{\mathrm{m}} \lspacegrad }.
\end{aligned}

Observe that the
gradient and the time partials can be grouped together

\label{eqn:potentialMethods:1060}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
– \PD{t}{ } \lr{\boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \boldsymbol{\mathcal{A}}^{\mathrm{m}}}
– \spacegrad \lr{ \phi + \eta I \phi_m }
+ \frac{v}{2} \lr{ \rspacegrad (\boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \boldsymbol{\mathcal{A}}^{\mathrm{m}}) – (\boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \boldsymbol{\mathcal{A}}^{\mathrm{m}}) \lspacegrad } \\
&=
\inv{2} \lr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} } \lr{ v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}} }

\lr{ v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}} \lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
} \\
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} } \lr{ -\phi – \eta I \phi_m }
– \lr{ \phi + \eta I \phi_m } \lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
}
,
\end{aligned}

or

\label{eqn:potentialMethods:1080}
\boxed{
\boldsymbol{\mathcal{F}}
=
\inv{2} \Biglr{
\lr{ \rspacegrad – \inv{v} {\stackrel{ \rightarrow }{\partial_t}} }
\lr{
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
}

\lr{
\phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
+ \eta I \phi_m
}
\lr{ \lspacegrad + \inv{v} {\stackrel{ \leftarrow }{\partial_t}} }
}
.
}

There’s a conjugate structure to the potential on each side of the curl operation where we see a sign change for the scalar and pseudoscalar elements only. The reason for this becomes more clear in the STA formalism.

## Potentials in the STA formalism.

Maxwell’s equation in its explicit 3D form \ref{eqn:potentialMethods:300} can be
converted to STA form, by introducing a four-vector basis $$\setlr{ \gamma_\mu }$$, where the spatial basis
$$\setlr{ \Be_k = \gamma_k \gamma_0 }$$
is expressed in terms of the Dirac basis $$\setlr{ \gamma_\mu }$$.
By multiplying from the left with $$\gamma_0$$ a STA form of Maxwell’s equation
\ref{eqn:potentialMethods:320}
is obtained,
where
\label{eqn:potentialMethods:340}
\begin{aligned}
J &= \gamma^\mu J_\mu = ( v q_e, \boldsymbol{\mathcal{J}} ) \\
M &= \gamma^\mu M_\mu = ( v q_m, \boldsymbol{\mathcal{M}} ) \\
I &= \gamma_0 \gamma_1 \gamma_2 \gamma_3,
\end{aligned}

Here the metric choice is $$\gamma_0^2 = 1 = -\gamma_k^2$$. Note that in this representation the electromagnetic field $$\boldsymbol{\mathcal{F}} = \boldsymbol{\mathcal{E}} + \eta I \boldsymbol{\mathcal{H}}$$ is a bivector, not a multivector as it is explicit (frame dependent) 3D representation of \ref{eqn:potentialMethods:300}.

A potential representation can be obtained as before by considering electric and magnetic sources in sequence and using superposition to assemble a complete potential.

### No magnetic sources

Without magnetic sources, Maxwell’s equation splits into vector and trivector terms of the form

\label{eqn:potentialMethods:380}
\grad \cdot \boldsymbol{\mathcal{F}} = \eta J

\label{eqn:potentialMethods:400}

A four-vector curl representation of the field will satisfy \ref{eqn:potentialMethods:400} allowing an immediate potential solution

\label{eqn:potentialMethods:560}
\boxed{
\begin{aligned}
&\boldsymbol{\mathcal{F}} = \grad \wedge {A^{\mathrm{e}}} \\
\end{aligned}
}

This can be put into correspondence with \ref{eqn:potentialMethods:120} by noting that

\label{eqn:potentialMethods:460}
\begin{aligned}
\grad^2 &= (\gamma^\mu \partial_\mu) \cdot (\gamma^\nu \partial_\nu) = \inv{v^2} \partial_{tt} – \spacegrad^2 \\
\gamma_0 {A^{\mathrm{e}}} &= \gamma_0 \gamma^\mu {A^{\mathrm{e}}}_\mu = {A^{\mathrm{e}}}_0 + \Be_k {A^{\mathrm{e}}}_k = {A^{\mathrm{e}}}_0 + \BA^{\mathrm{e}} \\
\gamma_0 \grad &= \gamma_0 \gamma^\mu \partial_\mu = \inv{v} \partial_t + \spacegrad \\
\grad \cdot {A^{\mathrm{e}}} &= \partial_\mu {A^{\mathrm{e}}}^\mu = \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}},
\end{aligned}

so multiplying from the left with $$\gamma_0$$ gives

\label{eqn:potentialMethods:480}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \lr{ {A^{\mathrm{e}}}_0 + \BA^{\mathrm{e}} } – \lr{ \inv{v} \partial_t + \spacegrad }\lr{ \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}} } = \eta( v q_e – \boldsymbol{\mathcal{J}} ),

or

\label{eqn:potentialMethods:520}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \BA^{\mathrm{e}} – \spacegrad \lr{ \inv{v} \partial_t {A^{\mathrm{e}}}_0 – \spacegrad \cdot \BA^{\mathrm{e}} } = -\eta \boldsymbol{\mathcal{J}}

\label{eqn:potentialMethods:540}
\spacegrad^2 {A^{\mathrm{e}}}_0 – \inv{v} \partial_t \lr{ \spacegrad \cdot \BA^{\mathrm{e}} } = -q_e/\epsilon.

So $${A^{\mathrm{e}}}_0 = \phi$$ and $$-\ifrac{\BA^{\mathrm{e}}}{v} = \boldsymbol{\mathcal{A}}^{\mathrm{e}}$$, or

\label{eqn:potentialMethods:600}
\boxed{
{A^{\mathrm{e}}} = \gamma_0\lr{ \phi – v \boldsymbol{\mathcal{A}}^{\mathrm{e}} }.
}

### No electric sources

Without electric sources, Maxwell’s equation now splits into

\label{eqn:potentialMethods:640}

\label{eqn:potentialMethods:660}
\grad \wedge \boldsymbol{\mathcal{F}} = -I M.

Here the dual of an STA curl yields a solution

\label{eqn:potentialMethods:680}
\boxed{
\boldsymbol{\mathcal{F}} = I ( \grad \wedge {A^{\mathrm{m}}} ).
}

Substituting this gives

\label{eqn:potentialMethods:720}
\begin{aligned}
0
&=
&=
&=
\end{aligned}

\label{eqn:potentialMethods:740}
\begin{aligned}
-I M
&=
&=
&=
\end{aligned}

The $$\grad \cdot \boldsymbol{\mathcal{F}}$$ relation \ref{eqn:potentialMethods:720} is identically zero as desired, leaving

\label{eqn:potentialMethods:760}
\boxed{
=
M.
}

So the general solution with both electric and magnetic sources is

\label{eqn:potentialMethods:800}
\boxed{
}

subject to the constraints of \ref{eqn:potentialMethods:560} and \ref{eqn:potentialMethods:760}. As before the four-potential $${A^{\mathrm{m}}}$$ can be put into correspondence with the conventional scalar and vector potentials by left multiplying with $$\gamma_0$$, which gives

\label{eqn:potentialMethods:820}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \lr{ {A^{\mathrm{m}}}_0 + \BA^{\mathrm{m}} } – \lr{ \inv{v} \partial_t + \spacegrad }\lr{ \inv{v} \partial_t {A^{\mathrm{m}}}_0 – \spacegrad \cdot \BA^{\mathrm{m}} } = v q_m – \boldsymbol{\mathcal{M}},

or
\label{eqn:potentialMethods:860}
\lr{ \inv{v^2} \partial_{tt} – \spacegrad^2 } \BA^{\mathrm{m}} – \spacegrad \lr{ \inv{v} \partial_t {A^{\mathrm{m}}}_0 – \spacegrad \cdot \BA^{\mathrm{m}} } = – \boldsymbol{\mathcal{M}}

\label{eqn:potentialMethods:880}

Comparing with \ref{eqn:potentialMethods:260} shows that $${A^{\mathrm{m}}}_0/v = \mu \phi_m$$ and $$-\ifrac{\BA^{\mathrm{m}}}{v^2} = \mu \boldsymbol{\mathcal{A}}^{\mathrm{m}}$$, or

\label{eqn:potentialMethods:900}
\boxed{
{A^{\mathrm{m}}} = \gamma_0 \eta \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} }.
}

### Potential operator structure

Observe that there is an underlying uniform structure of the differential operator that acts on the potential to produce the electromagnetic field. Expressed as a linear operator of the
gradient and the potentials, that is

$$\boldsymbol{\mathcal{F}} = L(\lrgrad, {A^{\mathrm{e}}}, {A^{\mathrm{m}}})$$

\label{eqn:potentialMethods:980}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
&=
&=
+ \frac{1}{2} \lr{ -\rgrad I {A^{\mathrm{m}}} – I {A^{\mathrm{m}}} \lgrad } \\
&=
\inv{2} \lr{ \rgrad ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}}) – ({A^{\mathrm{e}}} + I {A^{\mathrm{m}}}) \lgrad }
,
\end{aligned}

or
\label{eqn:potentialMethods:1000}
\boxed{
\boldsymbol{\mathcal{F}}
=
\inv{2} \lr{ \rgrad ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}}) – ({A^{\mathrm{e}}} – I {A^{\mathrm{m}}})^\dagger \lgrad }
.
}

Observe that \ref{eqn:potentialMethods:1000} can be
put into correspondence with \ref{eqn:potentialMethods:1080} using a factoring of unity $$1 = \gamma_0 \gamma_0$$

\label{eqn:potentialMethods:1100}
\boldsymbol{\mathcal{F}}
=
\inv{2} \lr{ (-\rgrad \gamma_0) (-\gamma_0 ({A^{\mathrm{e}}} -I {A^{\mathrm{m}}})) – (({A^{\mathrm{e}}} + I {A^{\mathrm{m}}}) \gamma_0)(\gamma_0 \lgrad) },

where

\label{eqn:potentialMethods:1140}
\begin{aligned}
&=
-(\gamma^0 \partial_0 + \gamma^k \partial_k) \gamma_0 \\
&=
-\partial_0 – \gamma^k \gamma_0 \partial_k \\
&=
-\inv{v} \partial_t
,
\end{aligned}

\label{eqn:potentialMethods:1160}
\begin{aligned}
&=
\gamma_0 (\gamma^0 \partial_0 + \gamma^k \partial_k) \\
&=
\partial_0 – \gamma^k \gamma_0 \partial_k \\
&=
+ \inv{v} \partial_t
,
\end{aligned}

and
\label{eqn:potentialMethods:1200}
\begin{aligned}
-\gamma_0 ( {A^{\mathrm{e}}} – I {A^{\mathrm{m}}} )
&=
-\gamma_0 \gamma_0 \lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \\
&=
-\lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \phi_m – \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}} } \\
&=
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
\end{aligned}

\label{eqn:potentialMethods:1220}
\begin{aligned}
( {A^{\mathrm{e}}} + I {A^{\mathrm{m}}} )\gamma_0
&=
\lr{ \gamma_0 \lr{ \phi -v \boldsymbol{\mathcal{A}}^{\mathrm{e}} } + I \gamma_0 \eta \lr{ \phi_m – v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \gamma_0 \\
&=
\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + I \eta \phi_m + I \eta v \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
&=
\phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta v I \boldsymbol{\mathcal{A}}^{\mathrm{m}}
+ \eta I \phi_m
,
\end{aligned}

This recovers \ref{eqn:potentialMethods:1080} as desired.

## Potentials in the 3D Euclidean formalism

In the conventional scalar plus vector differential representation of Maxwell’s equations \ref{eqn:chapter3Notes:20}…, given electric(magnetic) sources the structure of the electric(magnetic) potential follows from first setting the magnetic(electric) field equal to the curl of a vector potential. The procedure for the STA GA form of Maxwell’s equation was similar, where it was immediately evident that the field could be set to the four-curl of a four-vector potential (or the dual of such a curl for magnetic sources).

In the 3D GA representation, there is no immediate rationale for introducing a curl or the equivalent to a four-curl representation of the field. Reconciliation of this is possible by recognizing that the fact that the field (or a component of it) may be represented by a curl is not actually fundamental. Instead, observe that the two sided gradient action on a potential to generate the electromagnetic field in the STA representation of \ref{eqn:potentialMethods:1000} serves to select the grade two component product of the gradient and the multivector potential $${A^{\mathrm{e}}} – I {A^{\mathrm{m}}}$$, and that this can in fact be written as
a single sided gradient operation on a potential, provided the multivector product is filtered with a four-bivector grade selection operation

\label{eqn:potentialMethods:1240}
\boxed{
}

Similarly, it can be observed that the
specific function of the conjugate structure in the two sided potential representation of
\ref{eqn:potentialMethods:1080}
is to discard all the scalar and pseudoscalar grades in the multivector product. This means that a single sided potential can also be used, provided it is wrapped in a grade selection operation

\label{eqn:potentialMethods:1260}
\boxed{
\boldsymbol{\mathcal{F}} =
\lr{
– \phi
+ v \boldsymbol{\mathcal{A}}^{\mathrm{e}}
+ \eta I v \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \eta I \phi_m
} }{1,2}.
}

It is this grade selection operation that is really the fundamental defining action in the potential of the STA and conventional 3D representations of Maxwell’s equations. So, given Maxwell’s equation in the 3D GA representation, defining a potential representation for the field is really just a demand that the field have the structure

\label{eqn:potentialMethods:1320}
\boldsymbol{\mathcal{F}} = \gpgrade{ (\alpha \spacegrad + \beta \partial_t)( A_0 + A_1 + I( A_0′ + A_1′ ) }{1,2}.

This is a mandate that the electromagnetic field is the grades 1 and 2 components of the vector product of space and time derivative operators on a multivector field $$A = \sum_{k=0}^3 A_k = A_0 + A_1 + I( A_0′ + A_1′ )$$ that can potentially have any grade components. There are more degrees of freedom in this specification than required, since the multivector can absorb one of the $$\alpha$$ or $$\beta$$ coefficients, so without loss of generality, one of these (say $$\alpha$$) can be set to 1.

Expanding \ref{eqn:potentialMethods:1320} gives

\label{eqn:potentialMethods:1340}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
+ \beta \partial_t A_1
+ \beta \partial_t A_1′
&=
\boldsymbol{\mathcal{E}} + I \eta \boldsymbol{\mathcal{H}}.
\end{aligned}

This naturally has all the right mixes of curls, gradients and time derivatives, all following as direct consequences of applying a grade selection operation to the action of a “spacetime gradient” on a general multivector potential.

The conclusion is that the potential representation of the field is

\label{eqn:potentialMethods:1360}
\boldsymbol{\mathcal{F}} =

where $$A$$ is a multivector potentially containing all grades, where grades 0,1 are required for electric sources, and grades 2,3 are required for magnetic sources. When it is desirable to refer back to the conventional scalar and vector potentials this multivector potential can be written as $$A = -\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ -\phi_m + v \boldsymbol{\mathcal{A}}^{\mathrm{m}} }$$.

## Gauge transformations

Recall that for electric sources the magnetic field is of the form

\label{eqn:potentialMethods:1380}

so adding the gradient of any scalar field to the potential $$\boldsymbol{\mathcal{A}}’ = \boldsymbol{\mathcal{A}} + \spacegrad \psi$$
does not change the magnetic field

\label{eqn:potentialMethods:1400}
\begin{aligned}
\boldsymbol{\mathcal{B}}’
&= \boldsymbol{\mathcal{B}}.
\end{aligned}

The electric field with this changed potential is

\label{eqn:potentialMethods:1420}
\begin{aligned}
\boldsymbol{\mathcal{E}}’
&= -\spacegrad \lr{ \phi + \partial_t \psi } – \partial_t \BA,
\end{aligned}

so if
\label{eqn:potentialMethods:1440}
\phi = \phi’ – \partial_t \psi,

the electric field will also be unaltered by this transformation.

In the STA representation, the field can similarly be altered by adding any (four)gradient to the potential. For example with only electric sources

\label{eqn:potentialMethods:1460}

and for electric or magnetic sources

\label{eqn:potentialMethods:1480}

In the 3D GA representation, where the field is given by \ref{eqn:potentialMethods:1360}, there is no field that is being curled to add a gradient to. However, if the scalar and vector potentials transform as

\label{eqn:potentialMethods:1500}
\begin{aligned}
\boldsymbol{\mathcal{A}} &\rightarrow \boldsymbol{\mathcal{A}} + \spacegrad \psi \\
\phi &\rightarrow \phi – \partial_t \psi,
\end{aligned}

then the multivector potential transforms as
\label{eqn:potentialMethods:1520}
-\phi + v \boldsymbol{\mathcal{A}}
\rightarrow -\phi + v \boldsymbol{\mathcal{A}} + \partial_t \psi + v \spacegrad \psi,

so the electromagnetic field is unchanged when the multivector potential is transformed as

\label{eqn:potentialMethods:1540}
A \rightarrow A + \lr{ \spacegrad + \inv{v} \partial_t } \psi,

where $$\psi$$ is any field that has scalar or pseudoscalar grades. Viewed in terms of grade selection, this makes perfect sense, since the transformed field is

\label{eqn:potentialMethods:1560}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&\rightarrow
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } \lr{ A + \lr{ \spacegrad + \inv{v} \partial_t } \psi } }{1,2} \\
&=
\gpgrade{ \lr{ \spacegrad – \inv{v} \PD{t}{} } A + \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi }{1,2} \\
&=
\end{aligned}

The $$\psi$$ contribution to the grade selection operator is killed because it has scalar or pseudoscalar grades.

## Lorenz gauge

Maxwell’s equations are completely decoupled if the potential can be found such that

\label{eqn:potentialMethods:1580}
\begin{aligned}
\boldsymbol{\mathcal{F}}
&=
&=
\lr{ \spacegrad – \inv{v} \PD{t}{} } A.
\end{aligned}

When this is the case, Maxwell’s equations are reduced to four non-homogeneous potential wave equations

\label{eqn:potentialMethods:1620}
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } A = J,

that is

\label{eqn:potentialMethods:1600}
\begin{aligned}
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \phi &= – \inv{\epsilon} q_e \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \boldsymbol{\mathcal{A}}^{\mathrm{e}} &= – \mu \boldsymbol{\mathcal{J}} \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \phi_m &= – \frac{I}{\mu} q_m \\
\lr{ \spacegrad^2 – \inv{v^2} \PDSq{t}{} } \boldsymbol{\mathcal{A}}^{\mathrm{m}} &= – I \epsilon \boldsymbol{\mathcal{M}}.
\end{aligned}

There should be no a-priori assumption that such a field representation has no scalar, nor no pseudoscalar components. That explicit expansion in grades is

\label{eqn:potentialMethods:1640}
\begin{aligned}
\lr{ \spacegrad – \inv{v} \PD{t}{} } A
&=
\lr{ \spacegrad – \inv{v} \PD{t}{} } \lr{ -\phi + v \boldsymbol{\mathcal{A}}^{\mathrm{e}} + \eta I \lr{ -\phi_m + v \boldsymbol{\mathcal{A}}^{\mathrm{m}} } } \\
&=
\inv{v} \partial_t \phi
+ v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
+ I \eta v \spacegrad \wedge \boldsymbol{\mathcal{A}}^{\mathrm{m}}
– \partial_t \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
– I \eta \partial_t \boldsymbol{\mathcal{A}}^{\mathrm{m}} \\
&+ \eta I \inv{v} \partial_t \phi_m
+ I \eta v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}},
\end{aligned}

so if this potential representation has only vector and bivector grades, it must be true that

\label{eqn:potentialMethods:1660}
\begin{aligned}
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} &= 0 \\
\inv{v} \partial_t \phi_m + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{m}} &= 0.
\end{aligned}

The first is the well known Lorenz gauge condition, whereas the second is the dual of that condition for magnetic sources.

Should one of these conditions, say the Lorenz condition for the electric source potentials, be non-zero, then it is possible to make a potential transformation for which this condition is zero

\label{eqn:potentialMethods:1680}
\begin{aligned}
0
&\ne
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}} \\
&=
\inv{v} \partial_t (\phi’ – \partial_t \psi) + v \spacegrad \cdot (\boldsymbol{\mathcal{A}}’ + \spacegrad \psi) \\
&=
\inv{v} \partial_t \phi’ + v \spacegrad \boldsymbol{\mathcal{A}}’
+ v \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi,
\end{aligned}

so if $$\inv{v} \partial_t \phi’ + v \spacegrad \boldsymbol{\mathcal{A}}’$$ is zero, $$\psi$$ must be found such that
\label{eqn:potentialMethods:1700}
\inv{v} \partial_t \phi + v \spacegrad \cdot \boldsymbol{\mathcal{A}}^{\mathrm{e}}
= v \lr{ \spacegrad^2 – \inv{v^2} \partial_{tt} } \psi.

# References

[1] Constantine A Balanis. Antenna theory: analysis and design. John Wiley \& Sons, 3rd edition, 2005.

[2] C. Doran and A.N. Lasenby. Geometric algebra for physicists. Cambridge University Press New York, Cambridge, UK, 1st edition, 2003.

[3] David M Pozar. Microwave engineering. John Wiley \& Sons, 2009.

## Total internal reflection

From Snell’s second law we have

\label{eqn:brewsters:20}
\theta_t = \arcsin\lr{ \frac{n_i}{n_t} \sin\theta_i }.

This is plotted in fig. 3.

fig. 3. Transmission angle vs incident angle.

For the $$n_i > n_t$$ case, for example, like shining from glass into air, there is a critical incident angle beyond which there is no real value of $$\theta_t$$. That critical incident angle occurs when $$\theta_t = \pi/2$$, which is

\label{eqn:brewsters:40}
\sin\theta_{ic} = \frac{n_t}{n_i} \sin(\pi/2).

With
\label{eqn:brewsters:340}
n = n_t/n_i

the critical angle is
\label{eqn:brewsters:60}
\theta_{ic} = \arcsin n.

Note that Snell’s law can also be expressed in terms of this critical angle, allowing for the solution of the transmission angle in a convenient way
\label{eqn:brewsters:360}
\begin{aligned}
\sin\theta_i
&= \frac{n_t}{n_i} \sin\theta_t \\
&= n \sin\theta_t \\
&= \sin\theta_{ic} \sin\theta_t,
\end{aligned}

or

\label{eqn:brewsters:380}
\sin\theta_t = \frac{\sin\theta_i}{\sin\theta_{ic}}.

Still for $$n_i > n_t$$, at angles past $$\theta_{ic}$$, the transmitted wave angle becomes complex as outlined in [2], namely

\label{eqn:brewsters:400}
\begin{aligned}
\cos^2\theta_t
&=
1 – \sin^2 \theta_t \\
&=
1 –
\frac{\sin^2\theta_i}{\sin^2\theta_{ic}} \\
&=
-\lr{
\frac{\sin^2\theta_i}{\sin^2\theta_{ic}}
-1
},
\end{aligned}

or
\label{eqn:brewsters:420}
\cos\theta_t =
j \sqrt{
\frac{\sin^2\theta_i}{\sin^2\theta_{ic}}
-1
}.

Following the convention that puts the normal propagation direction along z, and the interface along x, the wave vector direction is
\label{eqn:brewsters:440}
\begin{aligned}
\kcap_t
&= \Be_3 e^{ \Be_{31} \theta_t } \\
&= \Be_3 \cos\theta_t + \Be_1 \sin\theta_t.
\end{aligned}

The phase factor for the transmitted field is

\label{eqn:brewsters:460}
\begin{aligned}
\exp\lr{ j \omega t \pm j \Bk_t \cdot \Bx }
&=
\exp\lr{ j \omega t \pm j k \kcap_t \cdot \Bx } \\
&=
\exp\lr{ j \omega t \pm j k \lr{ z \cos\theta_t + x \sin\theta_t } } \\
&=
\exp\lr{
j \omega t
\pm j k \lr{ z j \sqrt{ \frac{\sin^2\theta_i}{\sin^2\theta_{ic}} -1 } + x \frac{\sin\theta_i}{\sin\theta_{ic}} }
} \\
&=
\exp\lr{
j \omega t \pm k
\lr{
j x \frac{\sin\theta_i}{\sin\theta_{ic}}
– z \sqrt{ \frac{\sin^2\theta_i}{\sin^2\theta_{ic}} -1 }
}
}.
\end{aligned}

The propagation is channelled along the x axis, but the propagation into the second medium decays exponentially (or unphysically grows exponentially), only getting into the surface a small amount.

What is the average power transmission into the medium? We are interested in the time average of the normal component of the Poynting vector $$\BS \cdot \ncap$$.

\label{eqn:brewsters:480}
\begin{aligned}
\BS
&= \inv{2} \BE \cross \BH^\conj \\
&= \inv{2} \BE \cross \lr{ \inv{\eta} \kcap_t \cross \BE^\conj } \\
&= -\inv{2 \eta} \BE \cdot \lr{ \kcap_t \wedge \BE^\conj } \\
&= -\inv{2 \eta} \lr{
(\BE \cdot \kcap_t) \BE^\conj

\kcap_t \BE \cdot \BE^\conj
} \\
&=
\inv{2 \eta}
\kcap_t \Abs{\BE}^2.
\end{aligned}

\label{eqn:brewsters:500}
\begin{aligned}
\kcap_t \cdot \ncap
&= \lr{ \Be_3 \cos\theta_t + \Be_1 \sin\theta_t } \cdot \Be_3 \\
&= \cos\theta_t \\
&=
j \sqrt{
\frac{\sin^2\theta_i}{\sin^2\theta_{ic}}
-1
}.
\end{aligned}

Note that this is purely imaginary. The time average real power transmission is

\label{eqn:brewsters:520}
\begin{aligned}
\expectation{\BS \cdot \ncap}
&=
\textrm{Re} \lr{
j \sqrt{
\frac{\sin^2\theta_i}{\sin^2\theta_{ic}}
-1
}
\frac{1}{2 \eta} \Abs{\BE}^2
} \\
&= 0.
\end{aligned}

There is no power transmission into the second medium at or past the critical angle for total internal reflection.

## Brewster’s angle

Brewster’s angle is the angle for which there the amplitude of the reflected component of the field is zero. Recall that when the electric field is parallel(perpendicular) to the plane of incidence, the reflection amplitude ([1] eq. 4.38)

\label{eqn:brewsters:80}
r_\parallel
=
\frac
{
\frac{ n_t }{\mu_t} \cos \theta_i
-\frac{ n_i }{\mu_i} \cos \theta_t
}
{
\frac{ n_t }{\mu_t} \cos \theta_i
+\frac{ n_i }{\mu_i} \cos \theta_t
}

\label{eqn:brewsters:100}
r_\perp
=
\frac
{
\frac{ n_i }{\mu_i} \cos \theta_i
-\frac{ n_t }{\mu_t} \cos \theta_t
}
{
\frac{ n_i }{\mu_i} \cos \theta_i
+\frac{ n_t }{\mu_t} \cos \theta_t
}

There are limited conditions for which $$r_\perp$$ is zero, at least for $$\mu_i = \mu_t$$. Using Snell’s second law $$n_i \sin\theta_i = n_t \sin\theta_t$$, that zero is found at

\label{eqn:brewsters:120}
\begin{aligned}
n_i \cos \theta_i
&= n_t \cos \theta_t \\
&= n_t \sqrt{ 1 – \sin^2 \theta_t } \\
&= n_t \sqrt{ 1 – \frac{n_i^2}{n_t^2} \sin^2 \theta_i },
\end{aligned}

or

\label{eqn:brewsters:140}
\frac{n_i^2}{n_t^2} \cos^2 \theta_i = 1 – \frac{n_i^2}{n_t^2} \sin^2 \theta_i,

or
\label{eqn:brewsters:160}
\frac{n_i^2}{n_t^2} \lr{ \cos^2 \theta_i + \sin^2 \theta_i } = 1.

This has solutions only when $$n_i = \pm n_t$$. The $$n_i = n_t$$ case is of no interest, since that is just propagation, so naturally there is no reflection. The $$n_i = -n_t$$ case is possible with the transmission into a negative index of refraction material that is matched in absolute magnitude with the index of refraction in the incident medium.

There are richer solutions for the $$r_\parallel$$ zero. Again considering $$\mu_1 = \mu_2$$ those occur when

\label{eqn:brewsters:180}
\begin{aligned}
n_t \cos \theta_i
&= n_i \cos \theta_t \\
&= n_i \sqrt{ 1 – \frac{n_i^2}{n_t^2} \sin^2 \theta_i } \\
&= n_i \sqrt{ 1 – \frac{n_i^2}{n_t^2} \sin^2 \theta_i }
\end{aligned}

Let $$n = n_t/n_i$$, and square both sides. This gives

\label{eqn:brewsters:200}
\begin{aligned}
n^2 \cos^2 \theta_i
&= 1 – \inv{n^2} \sin^2 \theta_i \\
&= 1 – \inv{n^2} (1 – \cos^2 \theta_i),
\end{aligned}

or

\label{eqn:brewsters:220}
\cos^2 \theta_i \lr{ n^2 + \inv{n^2}} = 1 – \inv{n^2},

or
\label{eqn:brewsters:240}
\begin{aligned}
\cos^2 \theta_i
&= \frac{1 – \inv{n^2}}{ n^2 – \inv{n^2} } \\
&= \frac{n^2 – 1}{ n^4 – 1 } \\
&= \frac{n^2 – 1}{ (n^2 – 1)(n^2 + 1) } \\
&= \frac{1}{ n^2 + 1 }.
\end{aligned}

We also have

\label{eqn:brewsters:260}
\begin{aligned}
\sin^2 \theta_i
&=
1 – \frac{1}{ n^2 + 1 } \\
&=
\frac{n^2}{ n^2 + 1 },
\end{aligned}

so
\label{eqn:brewsters:280}
\tan^2 \theta_i = n^2,

and
\label{eqn:brewsters:300}
\tan \theta_{iB} = \pm n,

For normal media where $$n_i > 0, n_t > 0$$, only the positive solution is physically relevant, which is

\label{eqn:brewsters:320}
\boxed{
\theta_{iB} = \arctan\lr{ \frac{n_t}{n_i} }.
}

# References

[1] E. Hecht. Optics. 1998.

[2] JD Jackson. Classical Electrodynamics. John Wiley and Sons, 2nd edition, 1975.

## Political correctness

I saw an article on facebook about some recent idiocy at Queen’s university.

The idiocy isn’t what is being dubbed a racist party, but the fact that a costume party is dubbed racist.

A comment on this (Leon) that I thought summed things up nicely was:

“It is people who criticize a bunch of kids dressing as racists who make incidents of real racism greatly diminished.”

There is an alarming trend of perverting language in the political correct circles that is mystifying

• A kiss without a contract, triple signed and witnessed, is now being called rape, or it’s seeming legal equivalent “sexual assault”.  There are concent posters all over UofT that outline the legalistic contracting required for sexuality in this PC age.  I was too inhibited when I was an undergrad to have had much sexual activity, but I’m glad that I’m not an undergrad now subject to the current guidelines.  It’s definitely not okay to take advantage of somebody who is drunk, but this has been flipped on its head.  Sex after consentual codrunkenness now appears to be sexual assult in some places.
• Failing to use the “correct” gendered pronoun is now “hate speech”, and is perceived as, or at least mislabelled as, explicit violence.  I’m a firm believer that people should have complete freedom to engage in hate speech or discrimination of any sort.  Let people dig themselves their own social graves instead of trying to legislate speech.
• Costume parties, even at halloween, are now being mislabelled racist.  Attempting to point that out at some PC universities resulted in so much PC backlash that resignations followed.

I keep hearing about instance after instance of such events.  It seems like most of the people who are pushing the political correctness agenda really desperately need dictionaries.  Just because you can label two things as identical, doesn’t mean that they are.  A perfect example of this is the use of “sexual assault” now instead of rape.  The two are now identified as identical, even though sexual assault is a much broader term that includes groping.

There was lots in the recent US election media circus about how Trump’s bragging of pussy grabbing and aggressive kissing, acts that were facilitated by stardom.  One of the debate moderators explicitly called that sexual assault.  I don’t like the phrase sexual assault, because it is ambiguous, and has connotations of rape, while not necessarily being rape.  It seems to be a phrase designed to have the emotional impact of rape, while being something lesser.

Whether or not that Trump was bragging about sexual assault is probably dependent on state law.  Ambiguous language identifies unequal events with the same weight, and seems to be a characteristic of political correct speech and activism.  For example, calling pussy grabbing rape would be an obvious example of the misuse of language.  That’s why PC correct speech uses sexual assault instead.  A side effect of such PC correct speech is that actual rape, a horribly abusive event, is trivialized.  The irony in the Trump case was that the media could have focused on actual rape.  For example, Trump and his pedophile buddy Jeffrey Epstein, are codefendents in an actual rape case (which I understand has now unfortunately been dropped due to technicalities).  Characteristic of many of the charges laid against Epstein, this one is also of a child, in this case a 13 year old.

Of his buddy Epstein Trump said

“I’ve known Jeff for fifteen years. Terrific guy. He’s a lot of fun to be with. It is even said that he likes beautiful women as much as I do, and many of them are on the younger side.”

It remains to see if Trump is a sexual predator on par with Bill Clinton.  My gut feeling why pussy grabbing got so much attention, but Trump’s case with Epstein did not was because Bill is also a good friend of Epstein, and had been down to Epstein’s pedophile island many times.  Raising attention to that would have distracted from Hillary’s campaign (perhaps even raised the issue that she’d also “partied” there, in ways currently unspecified).

I digress.

How can political correctness be combatted?  One way is calling out explicit misuse of language.  Be very careful to use accurate words, and not to conflate things in order to push an agenda.

Because the political correctness movement is anti-intellectual, I suspect that purely linguistic techniques to fighting it are doomed.  Are there active social techniques that would be effective?

I came up with one idea that I amused myself with.  Perhaps it is time to start hosting some explicitly politically incorrect parties, just to push back.  Imagine a Halloween party that you are not allowed into, unless you are offending some minority group.  Suggested costume ideas include Hilter, blackface, transvestites or red-indians.  If you aren’t insulting somebody, then you can’t come in.  If you don’t think that Hilter is offensive enough, perhaps the host would allow you in if you dressed as some other psychopathic killer like Kissinger or Churchill, but that risks turning the party into an political party instead of an anti-PC party.  Costume prize adjudication would be biased against those that are in a visible minority group, so you should get extra points if you are a cis gendered white male.  Bonus points to the hosts of the party should they hold it on a university campus.

## Fresnel angular sum and difference formulas

November 22, 2016 math and physics play No comments , ,

In [1] are some sum and angle difference formulations for the Fresnel formulas given a $$\mu_1 = \mu_2$$ constraint. The proof of these trig Fresnel equations is left to an exercise, and will be derived here.

\label{eqn:fresnelSumAndDifferenceAngleFormulas:20}
\begin{aligned}
\sin(a + b)
&=
\textrm{Im}\lr{ e^{j(a + b)} } \\
&=
\textrm{Im}\lr{
e^{ja} e^{+ jb}
} \\
&=
\textrm{Im}\lr{
(\cos a + j \sin a) (\cos b + j \sin b)
} \\
&=
\sin a \cos b + \cos a \sin b.
\end{aligned}

Allowing for both signs we have

\label{eqn:fresnelSumAndDifferenceAngleFormulas:240}
\begin{aligned}
\sin(a + b) &= \sin a \cos b + \cos a \sin b \\
\sin(a – b) &= \sin a \cos b – \cos a \sin b.
\end{aligned}

The mixed sine and cosine product can be expressed as a sum of sines

\label{eqn:fresnelSumAndDifferenceAngleFormulas:40}
2 \sin a \cos b = \sin(a + b) + \sin(a – b).

With $$2 x = a + b, 2 y = a – b$$, or $$a = x + y, b = x – y$$, we find

\label{eqn:fresnelSumAndDifferenceAngleFormulas:60}
\begin{aligned}
2 \sin(x + y) \cos (x – y) &= \sin( 2 x ) + \sin( 2 y ) \\
2 \sin(x – y) \cos (x + y) &= \sin( 2 x ) – \sin( 2 y ).
\end{aligned}

Returning to the problem. When $$\mu_1 = \mu_2$$ the Fresnel equations were found to be

\label{eqn:fresnelSumAndDifferenceAngleFormulas:100}
\begin{aligned}
r^{\textrm{TE}} &= \frac { n_1 \cos\theta_i – n_2 \cos\theta_t } { n_1 \cos\theta_i + n_2 \cos\theta_t } \\
r^{\textrm{TM}} &= \frac{n_2 \cos\theta_i – n_1 \cos\theta_t }{ n_2 \cos\theta_i + n_1 \cos\theta_t } \\
t^{\textrm{TE}} &= \frac{ 2 n_1 \cos\theta_i } { n_1 \cos\theta_i + n_2 \cos\theta_t } \\
t^{\textrm{TM}} &= \frac{2 n_1 \cos\theta_i }{ n_2 \cos\theta_i + n_1 \cos\theta_t }.
\end{aligned}

Using Snell’s law, one of $$n_1, n_2$$ can be eliminated, for example

\label{eqn:fresnelSumAndDifferenceAngleFormulas:120}
n_1 = n_2 \frac{\sin \theta_t}{\sin\theta_i}.

Inserting this and proceeding with the application of the trig identities above, we have

\label{eqn:fresnelSumAndDifferenceAngleFormulas:160}
\begin{aligned}
r^{\textrm{TE}}
&= \frac { n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_i – n_2 \cos\theta_t } { n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_i + n_2 \cos\theta_t } \\
&=
\frac {
\sin\theta_t \cos\theta_i – \cos\theta_t \sin\theta_i
} {
\sin\theta_t \cos\theta_i + \cos\theta_t \sin\theta_i
} \\
&=
\frac {
\sin( \theta_t – \theta_i )
} {
\sin( \theta_t + \theta_i )
}
\end{aligned}

\label{eqn:fresnelSumAndDifferenceAngleFormulas:180}
\begin{aligned}
r^{\textrm{TM}}
&= \frac{n_2 \cos\theta_i – n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_t }{ n_2 \cos\theta_i + n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_t } \\
&= \frac{
\sin\theta_i \cos\theta_i – \sin\theta_t \cos\theta_t
}{
\sin\theta_i \cos\theta_i + \sin\theta_t \cos\theta_t
} \\
&= \frac{\inv{2} \sin(2 \theta_i) – \inv{2} \sin(2 \theta_t) }{ \inv{2} \sin(2 \theta_i) + \inv{2} \sin(2 \theta_t) } \\
&= \frac
{\sin(\theta_i – \theta_t)\cos(\theta_i + \theta_t) }
{\sin(\theta_i + \theta_t)\cos(\theta_i – \theta_t) } \\
&=
\frac
{\tan(\theta_i -\theta_t)}
{\tan(\theta_i +\theta_t)}
\end{aligned}

\label{eqn:fresnelSumAndDifferenceAngleFormulas:200}
\begin{aligned}
t^{\textrm{TE}}
&= \frac{ 2 n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_i } { n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_i + n_2 \cos\theta_t } \\
&= \frac{ 2 \sin\theta_t \cos\theta_i } { \sin\theta_t \cos\theta_i + \cos\theta_t \sin\theta_i } \\
&= \frac{ 2 \sin\theta_t \cos\theta_i }
{ \sin(\theta_i + \theta_t) }
\end{aligned}

\label{eqn:fresnelSumAndDifferenceAngleFormulas:220}
\begin{aligned}
t^{\textrm{TM}}
&= \frac{2 n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_i }{ n_2 \cos\theta_i + n_2 \frac{\sin\theta_t}{\sin\theta_i} \cos\theta_t } \\
&= \frac{2 \sin\theta_t \cos\theta_i }{ \sin\theta_i \cos\theta_i + \sin\theta_t \cos\theta_t } \\
&= \frac{2 \sin\theta_t \cos\theta_i }
{ \inv{2} \sin(2 \theta_i) + \inv{2} \sin(2 \theta_t) } \\
&= \frac{2 \sin\theta_t \cos\theta_i }
{ \sin(\theta_i + \theta_t) \cos(\theta_i – \theta_t) }
\end{aligned}

# References

[1] E. Hecht. Optics. 1998.