Notes.

Due to limitations in the MathJax-Latex package, all the oriented integrals in this blog post should be interpreted as having a clockwise orientation. [See the PDF version of this post for more sophisticated formatting.]

Guts.

Given a two dimensional generating vector space, there are two instances of the fundamental theorem for multivector integration
\label{eqn:unpackingFundamentalTheorem:20}
\int_S F d\Bx \lrpartial G = \evalbar{F G}{\Delta S},

and
\label{eqn:unpackingFundamentalTheorem:40}
\int_S F d^2\Bx \lrpartial G = \oint_{\partial S} F d\Bx G.

The first case is trivial. Given a parameterizated curve $$x = x(u)$$, it just states
\label{eqn:unpackingFundamentalTheorem:60}
\int_{u(0)}^{u(1)} du \PD{u}{}\lr{FG} = F(u(1))G(u(1)) – F(u(0))G(u(0)),

for all multivectors $$F, G$$, regardless of the signature of the underlying space.

The surface integral is more interesting. Let’s first look at the area element for this surface integral, which is
\label{eqn:unpackingFundamentalTheorem:80}
d^2 \Bx = d\Bx_u \wedge d \Bx_v.

Geometrically, this has the area of the parallelogram spanned by $$d\Bx_u$$ and $$d\Bx_v$$, but weighted by the pseudoscalar of the space. This is explored algebraically in the following problem and illustrated in fig. 1.

fig. 1. 2D vector space and area element.

Problem: Expansion of 2D area bivector.

Let $$\setlr{e_1, e_2}$$ be an orthonormal basis for a two dimensional space, with reciprocal frame $$\setlr{e^1, e^2}$$. Expand the area bivector $$d^2 \Bx$$ in coordinates relating the bivector to the Jacobian and the pseudoscalar.

With parameterization $$x = x(u,v) = x^\alpha e_\alpha = x_\alpha e^\alpha$$, we have
\label{eqn:unpackingFundamentalTheorem:120}
\Bx_u \wedge \Bx_v
=
\lr{ \PD{u}{x^\alpha} e_\alpha } \wedge
\lr{ \PD{v}{x^\beta} e_\beta }
=
\PD{u}{x^\alpha}
\PD{v}{x^\beta}
e_\alpha
e_\beta
=
\PD{(u,v)}{(x^1,x^2)} e_1 e_2,

or
\label{eqn:unpackingFundamentalTheorem:160}
\Bx_u \wedge \Bx_v
=
\lr{ \PD{u}{x_\alpha} e^\alpha } \wedge
\lr{ \PD{v}{x_\beta} e^\beta }
=
\PD{u}{x_\alpha}
\PD{v}{x_\beta}
e^\alpha
e^\beta
=
\PD{(u,v)}{(x_1,x_2)} e^1 e^2.

The upper and lower index pseudoscalars are related by
\label{eqn:unpackingFundamentalTheorem:180}
e^1 e^2 e_1 e_2 =
-e^1 e^2 e_2 e_1 =
-1,

so with $$I = e_1 e_2$$,
\label{eqn:unpackingFundamentalTheorem:200}
e^1 e^2 = -I^{-1},

leaving us with
\label{eqn:unpackingFundamentalTheorem:140}
d^2 \Bx
= \PD{(u,v)}{(x^1,x^2)} du dv\, I
= -\PD{(u,v)}{(x_1,x_2)} du dv\, I^{-1}.

We see that the area bivector is proportional to either the upper or lower index Jacobian and to the pseudoscalar for the space.

We may write the fundamental theorem for a 2D space as
\label{eqn:unpackingFundamentalTheorem:680}
\int_S du dv \, \PD{(u,v)}{(x^1,x^2)} F I \lrgrad G = \oint_{\partial S} F d\Bx G,

where we have dispensed with the vector derivative and use the gradient instead, since they are identical in a two parameter two dimensional space. Of course, unless we are using $$x^1, x^2$$ as our parameterization, we still want the curvilinear representation of the gradient $$\grad = \Bx^u \PDi{u}{} + \Bx^v \PDi{v}{}$$.

Problem: Standard basis expansion of fundamental surface relation.

For a parameterization $$x = x^1 e_1 + x^2 e_2$$, where $$\setlr{ e_1, e_2 }$$ is a standard (orthogonal) basis, expand the fundamental theorem for surface integrals for the single sided $$F = 1$$ case. Consider functions $$G$$ of each grade (scalar, vector, bivector.)

From \ref{eqn:unpackingFundamentalTheorem:140} we see that the fundamental theorem takes the form
\label{eqn:unpackingFundamentalTheorem:220}
\int_S dx^1 dx^2\, F I \lrgrad G = \oint_{\partial S} F d\Bx G.

In a Euclidean space, the operator $$I \lrgrad$$, is a $$\pi/2$$ rotation of the gradient, but has a rotated like structure in all metrics:
\label{eqn:unpackingFundamentalTheorem:240}
=
e_1 e_2 \lr{ e^1 \partial_1 + e^2 \partial_2 }
=
-e_2 \partial_1 + e_1 \partial_2.

• $$F = 1$$ and $$G \in \bigwedge^0$$ or $$G \in \bigwedge^2$$. For $$F = 1$$ and scalar or bivector $$G$$ we have
\label{eqn:unpackingFundamentalTheorem:260}
\int_S dx^1 dx^2\, \lr{ -e_2 \partial_1 + e_1 \partial_2 } G = \oint_{\partial S} d\Bx G,

where, for $$x^1 \in [x^1(0),x^1(1)]$$ and $$x^2 \in [x^2(0),x^2(1)]$$, the RHS written explicitly is
\label{eqn:unpackingFundamentalTheorem:280}
\oint_{\partial S} d\Bx G
=
\int dx^1 e_1
\lr{ G(x^1, x^2(1)) – G(x^1, x^2(0)) }
– dx^2 e_2
\lr{ G(x^1(1),x^2) – G(x^1(0), x^2) }.

This is sketched in fig. 2. Since a 2D bivector $$G$$ can be written as $$G = I g$$, where $$g$$ is a scalar, we may write the pseudoscalar case as
\label{eqn:unpackingFundamentalTheorem:300}
\int_S dx^1 dx^2\, \lr{ -e_2 \partial_1 + e_1 \partial_2 } g = \oint_{\partial S} d\Bx g,

after right multiplying both sides with $$I^{-1}$$. Algebraically the scalar and pseudoscalar cases can be thought of as identical scalar relationships.
• $$F = 1, G \in \bigwedge^1$$. For $$F = 1$$ and vector $$G$$ the 2D fundamental theorem for surfaces can be split into scalar
\label{eqn:unpackingFundamentalTheorem:320}
\int_S dx^1 dx^2\, \lr{ -e_2 \partial_1 + e_1 \partial_2 } \cdot G = \oint_{\partial S} d\Bx \cdot G,

and bivector relations
\label{eqn:unpackingFundamentalTheorem:340}
\int_S dx^1 dx^2\, \lr{ -e_2 \partial_1 + e_1 \partial_2 } \wedge G = \oint_{\partial S} d\Bx \wedge G.

To expand \ref{eqn:unpackingFundamentalTheorem:320}, let
\label{eqn:unpackingFundamentalTheorem:360}
G = g_1 e^1 + g_2 e^2,

for which
\label{eqn:unpackingFundamentalTheorem:380}
\lr{ -e_2 \partial_1 + e_1 \partial_2 } \cdot G
=
\lr{ -e_2 \partial_1 + e_1 \partial_2 } \cdot
\lr{ g_1 e^1 + g_2 e^2 }
=
\partial_2 g_1 – \partial_1 g_2,

and
\label{eqn:unpackingFundamentalTheorem:400}
d\Bx \cdot G
=
\lr{ dx^1 e_1 – dx^2 e_2 } \cdot \lr{ g_1 e^1 + g_2 e^2 }
=
dx^1 g_1 – dx^2 g_2,

so \ref{eqn:unpackingFundamentalTheorem:320} expands to
\label{eqn:unpackingFundamentalTheorem:500}
\int_S dx^1 dx^2\, \lr{ \partial_2 g_1 – \partial_1 g_2 }
=
\int
\evalbar{dx^1 g_1}{\Delta x^2} – \evalbar{ dx^2 g_2 }{\Delta x^1}.

This coordinate expansion illustrates how the pseudoscalar nature of the area element results in a duality transformation, as we end up with a curl like operation on the LHS, despite the dot product nature of the decomposition that we used. That can also be seen directly for vector $$G$$, since
\label{eqn:unpackingFundamentalTheorem:560}
=
=
dA I \lr{ \grad \wedge G },

since the scalar selection of $$I \lr{ \grad \cdot G }$$ is zero.In the grade-2 relation \ref{eqn:unpackingFundamentalTheorem:340}, we expect a pseudoscalar cancellation on both sides, leaving a scalar (divergence-like) relationship. This time, we use upper index coordinates for the vector $$G$$, letting
\label{eqn:unpackingFundamentalTheorem:440}
G = g^1 e_1 + g^2 e_2,

so
\label{eqn:unpackingFundamentalTheorem:460}
\lr{ -e_2 \partial_1 + e_1 \partial_2 } \wedge G
=
\lr{ -e_2 \partial_1 + e_1 \partial_2 } \wedge G
\lr{ g^1 e_1 + g^2 e_2 }
=
e_1 e_2 \lr{ \partial_1 g^1 + \partial_2 g^2 },

and
\label{eqn:unpackingFundamentalTheorem:480}
d\Bx \wedge G
=
\lr{ dx^1 e_1 – dx^2 e_2 } \wedge
\lr{ g^1 e_1 + g^2 e_2 }
=
e_1 e_2 \lr{ dx^1 g^2 + dx^2 g^1 }.

So \ref{eqn:unpackingFundamentalTheorem:340}, after multiplication of both sides by $$I^{-1}$$, is
\label{eqn:unpackingFundamentalTheorem:520}
\int_S dx^1 dx^2\,
\lr{ \partial_1 g^1 + \partial_2 g^2 }
=
\int
\evalbar{dx^1 g^2}{\Delta x^2} + \evalbar{dx^2 g^1 }{\Delta x^1}.

As before, we’ve implicitly performed a duality transformation, and end up with a divergence operation. That can be seen directly without coordinate expansion, by rewriting the wedge as a grade two selection, and expanding the gradient action on the vector $$G$$, as follows
\label{eqn:unpackingFundamentalTheorem:580}
=
=
dA I \lr{ \grad \cdot G },

since $$I \lr{ \grad \wedge G }$$ has only a scalar component.

fig. 2. Line integral around rectangular boundary.

Theorem 1.1: Green’s theorem [1].

Let $$S$$ be a Jordan region with a piecewise-smooth boundary $$C$$. If $$P, Q$$ are continuously differentiable on an open set that contains $$S$$, then
\begin{equation*}
\int dx dy \lr{ \PD{y}{P} – \PD{x}{Q} } = \oint P dx + Q dy.
\end{equation*}

Problem: Relationship to Green’s theorem.

If the space is Euclidean, show that \ref{eqn:unpackingFundamentalTheorem:500} and \ref{eqn:unpackingFundamentalTheorem:520} are both instances of Green’s theorem with suitable choices of $$P$$ and $$Q$$.

I will omit the subtleties related to general regions and consider just the case of an infinitesimal square region.

Start proof:

Let’s start with \ref{eqn:unpackingFundamentalTheorem:500}, with $$g_1 = P$$ and $$g_2 = Q$$, and $$x^1 = x, x^2 = y$$, the RHS is
\label{eqn:unpackingFundamentalTheorem:600}
\int dx dy \lr{ \PD{y}{P} – \PD{x}{Q} }.

On the RHS we have
\label{eqn:unpackingFundamentalTheorem:620}
\int \evalbar{dx P}{\Delta y} – \evalbar{ dy Q }{\Delta x}
=
\int dx \lr{ P(x, y_1) – P(x, y_0) } – \int dy \lr{ Q(x_1, y) – Q(x_0, y) }.

This pair of integrals is plotted in fig. 3, from which we see that \ref{eqn:unpackingFundamentalTheorem:620} can be expressed as the line integral, leaving us with
\label{eqn:unpackingFundamentalTheorem:640}
\int dx dy \lr{ \PD{y}{P} – \PD{x}{Q} }
=
\oint dx P + dy Q,

which is Green’s theorem over the infinitesimal square integration region.

For the equivalence of \ref{eqn:unpackingFundamentalTheorem:520} to Green’s theorem, let $$g^2 = P$$, and $$g^1 = -Q$$. Plugging into the LHS, we find the Green’s theorem integrand. On the RHS, the integrand expands to
\label{eqn:unpackingFundamentalTheorem:660}
\evalbar{dx g^2}{\Delta y} + \evalbar{dy g^1 }{\Delta x}
=
dx \lr{ P(x,y_1) – P(x, y_0)}
+
dy \lr{ -Q(x_1, y) + Q(x_0, y)},

which is exactly what we found in \ref{eqn:unpackingFundamentalTheorem:620}.

End proof.

fig. 3. Path for Green’s theorem.

We may also relate multivector gradient integrals in 2D to the normal integral around the boundary of the bounding curve. That relationship is as follows.

\begin{equation*}
\begin{aligned}
\int J du dv \rgrad G &= \oint I^{-1} d\Bx G = \int J \lr{ \Bx^v du + \Bx^u dv } G \\
\int J du dv F \lgrad &= \oint F I^{-1} d\Bx = \int J F \lr{ \Bx^v du + \Bx^u dv },
\end{aligned}
\end{equation*}
where $$J = \partial(x^1, x^2)/\partial(u,v)$$ is the Jacobian of the parameterization $$x = x(u,v)$$. In terms of the coordinates $$x^1, x^2$$, this reduces to
\begin{equation*}
\begin{aligned}
\int dx^1 dx^2 \rgrad G &= \oint I^{-1} d\Bx G = \int \lr{ e^2 dx^1 + e^1 dx^2 } G \\
\int dx^1 dx^2 F \lgrad &= \oint G I^{-1} d\Bx = \int F \lr{ e^2 dx^1 + e^1 dx^2 }.
\end{aligned}
\end{equation*}
The vector $$I^{-1} d\Bx$$ is orthogonal to the tangent vector along the boundary, and for Euclidean spaces it can be identified as the outwards normal.

Start proof:

Respectively setting $$F = 1$$, and $$G = 1$$ in \ref{eqn:unpackingFundamentalTheorem:680}, we have
\label{eqn:unpackingFundamentalTheorem:940}
\int I^{-1} d^2 \Bx \rgrad G = \oint I^{-1} d\Bx G,

and
\label{eqn:unpackingFundamentalTheorem:960}
\int F d^2 \Bx \lgrad I^{-1} = \oint F d\Bx I^{-1}.

Starting with \ref{eqn:unpackingFundamentalTheorem:940} we find
\label{eqn:unpackingFundamentalTheorem:700}
\int I^{-1} J du dv I \rgrad G = \oint d\Bx G,

to find $$\int dx^1 dx^2 \rgrad G = \oint I^{-1} d\Bx G$$, as desireed. In terms of a parameterization $$x = x(u,v)$$, the pseudoscalar for the space is
\label{eqn:unpackingFundamentalTheorem:720}
I = \frac{\Bx_u \wedge \Bx_v}{J},

so
\label{eqn:unpackingFundamentalTheorem:740}
I^{-1} = \frac{J}{\Bx_u \wedge \Bx_v}.

Also note that $$\lr{\Bx_u \wedge \Bx_v}^{-1} = \Bx^v \wedge \Bx^u$$, so
\label{eqn:unpackingFundamentalTheorem:760}
I^{-1} = J \lr{ \Bx^v \wedge \Bx^u },

and
\label{eqn:unpackingFundamentalTheorem:780}
I^{-1} d\Bx
= I^{-1} \cdot d\Bx
= J \lr{ \Bx^v \wedge \Bx^u } \cdot \lr{ \Bx_u du – \Bx_v dv }
= J \lr{ \Bx^v du + \Bx^u dv },

so the right acting gradient integral is
\label{eqn:unpackingFundamentalTheorem:800}
\int J du dv \grad G =
\int
\evalbar{J \Bx^v G}{\Delta v} du + \evalbar{J \Bx^u G dv}{\Delta u},

which we write in abbreviated form as $$\int J \lr{ \Bx^v du + \Bx^u dv} G$$.

For the $$G = 1$$ case, from \ref{eqn:unpackingFundamentalTheorem:960} we find
\label{eqn:unpackingFundamentalTheorem:820}
\int J du dv F I \lgrad I^{-1} = \oint F d\Bx I^{-1}.

However, in a 2D space, regardless of metric, we have $$I a = – a I$$ for any vector $$a$$ (i.e. $$\grad$$ or $$d\Bx$$), so we may commute the outer pseudoscalars in
\label{eqn:unpackingFundamentalTheorem:840}
\int J du dv F I \lgrad I^{-1} = \oint F d\Bx I^{-1},

so
\label{eqn:unpackingFundamentalTheorem:850}
-\int J du dv F I I^{-1} \lgrad = -\oint F I^{-1} d\Bx.

After cancelling the negative sign on both sides, we have the claimed result.

To see that $$I a$$, for any vector $$a$$ is normal to $$a$$, we can compute the dot product
\label{eqn:unpackingFundamentalTheorem:860}
\lr{ I a } \cdot a
=
=
= 0,

since the scalar selection of a bivector is zero. Since $$I^{-1} = \pm I$$, the same argument shows that $$I^{-1} d\Bx$$ must be orthogonal to $$d\Bx$$.

End proof.

Let’s look at the geometry of the normal $$I^{-1} \Bx$$ in a couple 2D vector spaces. We use an integration volume of a unit square to simplify the boundary term expressions.

• Euclidean: With a parameterization $$x(u,v) = u\Be_1 + v \Be_2$$, and Euclidean basis vectors $$(\Be_1)^2 = (\Be_2)^2 = 1$$, the fundamental theorem integrated over the rectangle $$[x_0,x_1] \times [y_0,y_1]$$ is
\label{eqn:unpackingFundamentalTheorem:880}
\int dx dy \grad G =
\int
\Be_2 \lr{ G(x,y_1) – G(x,y_0) } dx +
\Be_1 \lr{ G(x_1,y) – G(x_0,y) } dy,

Each of the terms in the integrand above are illustrated in fig. 4, and we see that this is a path integral weighted by the outwards normal.

fig. 4. Outwards oriented normal for Euclidean space.

• Spacetime: Let $$x(u,v) = u \gamma_0 + v \gamma_1$$, where $$(\gamma_0)^2 = -(\gamma_1)^2 = 1$$. With $$u = t, v = x$$, the gradient integral over a $$[t_0,t_1] \times [x_0,x_1]$$ of spacetime is
\label{eqn:unpackingFundamentalTheorem:900}
\begin{aligned}
&=
\int
\gamma^1 dt \lr{ G(t, x_1) – G(t, x_0) }
+
\gamma^0 dx \lr{ G(t_1, x) – G(t_1, x) } \\
&=
\int
\gamma_1 dt \lr{ -G(t, x_1) + G(t, x_0) }
+
\gamma_0 dx \lr{ G(t_1, x) – G(t_1, x) }
.
\end{aligned}

With $$t$$ plotted along the horizontal axis, and $$x$$ along the vertical, each of the terms of this integrand is illustrated graphically in fig. 5. For this mixed signature space, there is no longer any good geometrical characterization of the normal.

fig. 5. Orientation of the boundary normal for a spacetime basis.

• Spacelike:
Let $$x(u,v) = u \gamma_1 + v \gamma_2$$, where $$(\gamma_1)^2 = (\gamma_2)^2 = -1$$. With $$u = x, v = y$$, the gradient integral over a $$[x_0,x_1] \times [y_0,y_1]$$ of this space is
\label{eqn:unpackingFundamentalTheorem:920}
\begin{aligned}
&=
\int
\gamma^2 dx \lr{ G(x, y_1) – G(x, y_0) }
+
\gamma^1 dy \lr{ G(x_1, y) – G(x_1, y) } \\
&=
\int
\gamma_2 dx \lr{ -G(x, y_1) + G(x, y_0) }
+
\gamma_1 dy \lr{ -G(x_1, y) + G(x_1, y) }
.
\end{aligned}

Referring to fig. 6. where the elements of the integrand are illustrated, we see that the normal $$I^{-1} d\Bx$$ for the boundary of this region can be characterized as inwards.

fig. 6. Inwards oriented normal for a Dirac spacelike basis.

References

[1] S.L. Salas and E. Hille. Calculus: one and several variables. Wiley New York, 1990.

New version of classical mechanics notes

I’ve posted a new version of my classical mechanics notes compilation.  This version is not yet live on amazon, but you shouldn’t buy a copy of this “book” anyways, as it is horribly rough (if you want a copy, grab the free PDF instead.)  [I am going to buy a copy so that I can continue to edit a paper copy of it, but nobody else should.]

This version includes additional background material on Space Time Algebra (STA), i.e. the geometric algebra name for the Dirac/Clifford-algebra in 3+1 dimensions.  In particular, I’ve added material on reciprocal frames, the gradient and vector derivatives, line and surface integrals and the fundamental theorem for both.  Some of the integration theory content might make sense to move to a different book, but I’ll keep it with the rest of these STA notes for now.

Relativistic multivector surface integrals

Background.

This post is a continuation of:

Surface integrals.

[If mathjax doesn’t display properly for you, click here for a PDF of this post]

We’ve now covered line integrals and the fundamental theorem for line integrals, so it’s now time to move on to surface integrals.

Definition 1.1: Surface integral.

Given a two variable parameterization $$x = x(u,v)$$, we write $$d^2\Bx = \Bx_u \wedge \Bx_v du dv$$, and call
\begin{equation*}
\int F d^2\Bx\, G,
\end{equation*}
a surface integral, where $$F,G$$ are arbitrary multivector functions.

Like our multivector line integral, this is intrinsically multivector valued, with a product of $$F$$ with arbitrary grades, a bivector $$d^2 \Bx$$, and $$G$$, also potentially with arbitrary grades. Let’s consider an example.

Problem: Surface area integral example.

Given the hyperbolic surface parameterization $$x(\rho,\alpha) = \rho \gamma_0 e^{-\vcap \alpha}$$, where $$\vcap = \gamma_{20}$$ evaluate the indefinite integral
\label{eqn:relativisticSurface:40}
\int \gamma_1 e^{\gamma_{21}\alpha} d^2 \Bx\, \gamma_2.

We have $$\Bx_\rho = \gamma_0 e^{-\vcap \alpha}$$ and $$\Bx_\alpha = \rho\gamma_{2} e^{-\vcap \alpha}$$, so
\label{eqn:relativisticSurface:60}
\begin{aligned}
d^2 \Bx
&=
(\Bx_\rho \wedge \Bx_\alpha) d\rho d\alpha \\
&=
\gamma_{0} e^{-\vcap \alpha} \rho\gamma_{2} e^{-\vcap \alpha}
}
d\rho d\alpha \\
&=
\rho \gamma_{02} d\rho d\alpha,
\end{aligned}

so the integral is
\label{eqn:relativisticSurface:80}
\begin{aligned}
\int \rho \gamma_1 e^{\gamma_{21}\alpha} \gamma_{022} d\rho d\alpha
&=
-\inv{2} \rho^2 \int \gamma_1 e^{\gamma_{21}\alpha} \gamma_{0} d\alpha \\
&=
\frac{\gamma_{01}}{2} \rho^2 \int e^{\gamma_{21}\alpha} d\alpha \\
&=
\frac{\gamma_{01}}{2} \rho^2 \gamma^{12} e^{\gamma_{21}\alpha} \\
&=
\frac{\rho^2 \gamma_{20}}{2} e^{\gamma_{21}\alpha}.
\end{aligned}

Because $$F$$ and $$G$$ were both vectors, the resulting integral could only have been a multivector with grades 0,2,4. As it happens, there were no scalar nor pseudoscalar grades in the end result, and we ended up with the spacetime plane between $$\gamma_0$$, and $$\gamma_2 e^{\gamma_{21}\alpha}$$, which are rotations of $$\gamma_2$$ in the x,y plane. This is illustrated in fig. 1 (omitting scale and sign factors.)

fig. 1. Spacetime plane.

Fundamental theorem for surfaces.

For line integrals we saw that $$d\Bx \cdot \grad = \gpgradezero{ d\Bx \partial }$$, and obtained the fundamental theorem for multivector line integrals by omitting the grade selection and using the multivector operator $$d\Bx \partial$$ in the integrand directly. We have the same situation for surface integrals. In particular, we know that the $$\mathbb{R}^3$$ Stokes theorem can be expressed in terms of $$d^2 \Bx \cdot \spacegrad$$

Problem: GA form of 3D Stokes’ theorem integrand.

Given an $$\mathbb{R}^3$$ vector field $$\Bf$$, show that
\label{eqn:relativisticSurface:180}
\int dA \ncap \cdot \lr{ \spacegrad \cross \Bf }
=
-\int \lr{d^2\Bx \cdot \spacegrad } \cdot \Bf.

Let $$d^2 \Bx = I \ncap dA$$, implicitly fixing the relative orientation of the bivector area element compared to the chosen surface normal direction.
\label{eqn:relativisticSurface:200}
\begin{aligned}
\int \lr{d^2\Bx \cdot \spacegrad } \cdot \Bf
&=
&=
\int dA \lr{ I \lr{ \ncap \wedge \spacegrad} } \cdot \Bf \\
&=
&=
-\int dA \lr{ \ncap \cross \spacegrad} \cdot \Bf \\
&=
-\int dA \ncap \cdot \lr{ \spacegrad \cross \Bf }.
\end{aligned}

The moral of the story is that the conventional dual form of the $$\mathbb{R}^3$$ Stokes’ theorem can be written directly by projecting the gradient onto the surface area element. Geometrically, this projection operation has a rotational effect as well, since for bivector $$B$$, and vector $$x$$, the bivector-vector dot product $$B \cdot x$$ is the component of $$x$$ that lies in the plane $$B \wedge x = 0$$, but also rotated 90 degrees.

For multivector integration, we do not want an integral operator that includes such dot products. In the line integral case, we were able to achieve the same projective operation by using vector derivative instead of a dot product, and can do the same for the surface integral case. In particular

Theorem 1.1: Projection of gradient onto the tangent space.

Given a curvilinear representation of the gradient with respect to parameters $$u^0, u^1, u^2, u^3$$
\begin{equation*}
\end{equation*}
the surface projection onto the tangent space associated with any two of those parameters, satisfies
\begin{equation*}
\end{equation*}

Start proof:

Without loss of generality, we may pick $$u^0, u^1$$ as the parameters associated with the tangent space. The area element for the surface is
\label{eqn:relativisticSurface:100}
d^2 \Bx = \Bx_0 \wedge \Bx_1 \,
du^0 du^1.

Dotting this with the gradient gives
\label{eqn:relativisticSurface:120}
\begin{aligned}
&=
du^0 du^1
\lr{ \Bx_0 \wedge \Bx_1 } \cdot \Bx^\mu \PD{u^\mu}{} \\
&=
du^0 du^1
\lr{
\Bx_0
\lr{\Bx_1 \cdot \Bx^\mu }

\Bx_1
\lr{\Bx_0 \cdot \Bx^\mu }
}
\PD{u^\mu}{} \\
&=
du^0 du^1
\lr{
\Bx_0 \PD{u^1}{}

\Bx_0 \PD{u^1}{}
}.
\end{aligned}

On the other hand, the vector derivative for this surface is
\label{eqn:relativisticSurface:140}
\partial
=
\Bx^0 \PD{u^0}{}
+
\Bx^1 \PD{u^1}{},

so
\label{eqn:relativisticSurface:160}
\begin{aligned}
&=
du^0 du^1\,
\lr{ \Bx_0 \wedge \Bx_1 } \cdot
\lr{
\Bx^0 \PD{u^0}{}
+
\Bx^1 \PD{u^1}{}
} \\
&=
du^0 du^1
\lr{
\Bx_0 \PD{u^1}{}

\Bx_1 \PD{u^0}{}
}.
\end{aligned}

End proof.

We now want to formulate the geometric algebra form of the fundamental theorem for surface integrals.

Theorem 1.2: Fundamental theorem for surface integrals.

Given multivector functions $$F, G$$, and surface area element $$d^2 \Bx = \lr{ \Bx_u \wedge \Bx_v }\, du dv$$, associated with a two parameter curve $$x(u,v)$$, then
\begin{equation*}
\int_S F d^2\Bx \lrpartial G = \int_{\partial S} F d^1\Bx G,
\end{equation*}
where $$S$$ is the integration surface, and $$\partial S$$ designates its boundary, and the line integral on the RHS is really short hand for
\begin{equation*}
\int
\evalbar{ \lr{ F (-d\Bx_v) G } }{\Delta u}
+
\int
\evalbar{ \lr{ F (d\Bx_u) G } }{\Delta v},
\end{equation*}
which is a line integral that traverses the boundary of the surface with the opposite orientation to the circulation of the area element.

Start proof:

The vector derivative for this surface is
\label{eqn:relativisticSurface:220}
\partial =
\Bx^u \PD{u}{}
+
\Bx^v \PD{v}{},

so
\label{eqn:relativisticSurface:240}
F d^2\Bx \lrpartial G
=
\PD{u}{} \lr{ F d^2\Bx\, \Bx^u G }
+
\PD{v}{} \lr{ F d^2\Bx\, \Bx^v G },

where $$d^2\Bx\, \Bx^u$$ is held constant with respect to $$u$$, and $$d^2\Bx\, \Bx^v$$ is held constant with respect to $$v$$ (since the partials of the vector derivative act on $$F, G$$, but not on the area element, nor on the reciprocal vectors of $$\lrpartial$$ itself.) Note that
\label{eqn:relativisticSurface:260}
d^2\Bx \wedge \Bx^u
=
du dv\, \lr{ \Bx_u \wedge \Bx_v } \wedge \Bx^u = 0,

since $$\Bx^u \in sectionpan \setlr{ \Bx_u\, \Bx_v }$$, so
\label{eqn:relativisticSurface:280}
\begin{aligned}
d^2\Bx\, \Bx^u
&=
d^2\Bx \cdot \Bx^u
+
d^2\Bx \wedge \Bx^u \\
&=
d^2\Bx \cdot \Bx^u \\
&=
du dv\, \lr{ \Bx_u \wedge \Bx_v } \cdot \Bx^u \\
&=
-du dv\, \Bx_v.
\end{aligned}

Similarly
\label{eqn:relativisticSurface:300}
\begin{aligned}
d^2\Bx\, \Bx^v
&=
d^2\Bx \cdot \Bx^v \\
&=
du dv\, \lr{ \Bx_u \wedge \Bx_v } \cdot \Bx^v \\
&=
du dv\, \Bx_u.
\end{aligned}

This leaves us with
\label{eqn:relativisticSurface:320}
F d^2\Bx \lrpartial G
=
-du dv\,
\PD{u}{} \lr{ F \Bx_v G }
+
du dv\,
\PD{v}{} \lr{ F \Bx_u G },

where $$\Bx_v, \Bx_u$$ are held constant with respect to $$u,v$$ respectively. Fortuitously, this constant condition can be dropped, since the antisymmetry of the wedge in the area element results in perfect cancellation. If these line elements are not held constant then
\label{eqn:relativisticSurface:340}
\PD{u}{} \lr{ F \Bx_v G }

\PD{v}{} \lr{ F \Bx_u G }
=
F \lr{
\PD{v}{\Bx_u}

\PD{u}{\Bx_v}
} G
+
\lr{
\PD{u}{F} \Bx_v G
+
F \Bx_v \PD{u}{G}
}
+
\lr{
\PD{v}{F} \Bx_u G
+
F \Bx_u \PD{v}{G}
}
,

but the mixed partial contribution is zero
\label{eqn:relativisticSurface:360}
\begin{aligned}
\PD{v}{\Bx_u}

\PD{u}{\Bx_v}
&=
\PD{v}{} \PD{u}{x}

\PD{u}{} \PD{v}{x} \\
&=
0,
\end{aligned}

by equality of mixed partials. We have two perfect differentials, and can evaluate each of these integrals
\label{eqn:relativisticSurface:380}
\begin{aligned}
\int F d^2\Bx \lrpartial G
&=
-\int
du dv\,
\PD{u}{} \lr{ F \Bx_v G }
+
\int
du dv\,
\PD{v}{} \lr{ F \Bx_u G } \\
&=
-\int
dv\,
\evalbar{ \lr{ F \Bx_v G } }{\Delta u}
+
\int
du\,
\evalbar{ \lr{ F \Bx_u G } }{\Delta v} \\
&=
\int
\evalbar{ \lr{ F (-d\Bx_v) G } }{\Delta u}
+
\int
\evalbar{ \lr{ F (d\Bx_u) G } }{\Delta v}.
\end{aligned}

We use the shorthand $$d^1 \Bx = d\Bx_u – d\Bx_v$$ to write
\label{eqn:relativisticSurface:400}
\int_S F d^2\Bx \lrpartial G = \int_{\partial S} F d^1\Bx G,

with the understanding that this is really instructions to evaluate the line integrals in the last step of \ref{eqn:relativisticSurface:380}.

Problem: Integration in the t,y plane.

Let $$x(t,y) = c t \gamma_0 + y \gamma_2$$. Write out both sides of the fundamental theorem explicitly.

Let’s designate the tangent basis vectors as
\label{eqn:relativisticSurface:420}
\Bx_0 = \PD{t}{x} = c \gamma_0,

and
\label{eqn:relativisticSurface:440}
\Bx_2 = \PD{y}{x} = \gamma_2,

so the vector derivative is
\label{eqn:relativisticSurface:460}
\partial
= \inv{c} \gamma^0 \PD{t}{}
+ \gamma^2 \PD{y}{},

and the area element is
\label{eqn:relativisticSurface:480}
d^2 \Bx = c \gamma_0 \gamma_2.

The fundamental theorem of surface integrals is just a statement that
\label{eqn:relativisticSurface:500}
\int_{t_0}^{t_1} c dt
\int_{y_0}^{y_1} dy
F \gamma_0 \gamma_2 \lr{
\inv{c} \gamma^0 \PD{t}{}
+ \gamma^2 \PD{y}{}
} G
=
\int F \lr{ c \gamma_0 dt – \gamma_2 dy } G,

where the RHS, when stated explicitly, really means
\label{eqn:relativisticSurface:520}
\begin{aligned}
\int &F \lr{ c \gamma_0 dt – \gamma_2 dy } G
=
\int_{t_0}^{t_1} c dt \lr{ F(t,y_1) \gamma_0 G(t, y_1) – F(t,y_0) \gamma_0 G(t, y_0) } \\
\int_{y_0}^{y_1} dy \lr{ F(t_1,y) \gamma_2 G(t_1, y) – F(t_0,y) \gamma_0 G(t_0, y) }.
\end{aligned}

In this particular case, since $$\Bx_0 = c \gamma_0, \Bx_2 = \gamma_2$$ are both constant functions that depend on neither $$t$$ nor $$y$$, it is easy to derive the full expansion of \ref{eqn:relativisticSurface:520} directly from the LHS of \ref{eqn:relativisticSurface:500}.

Problem: A cylindrical hyperbolic surface.

Generalizing the example surface integral from \ref{eqn:relativisticSurface:40}, let
\label{eqn:relativisticSurface:540}
x(\rho, \alpha) = \rho e^{-\vcap \alpha/2} x(0,1) e^{\vcap \alpha/2},

where $$\rho$$ is a scalar, and $$\vcap = \cos\theta_k\gamma_{k0}$$ is a unit spatial bivector, and $$\cos\theta_k$$ are direction cosines of that vector. This is a composite transformation, where the $$\alpha$$ variation boosts the $$x(0,1)$$ four-vector, and the $$\rho$$ parameter contracts or increases the magnitude of this vector, resulting in $$x$$ spanning a hyperbolic region of spacetime.

Compute the tangent and reciprocal basis, the area element for the surface, and explicitly state both sides of the fundamental theorem.

For the tangent basis vectors we have
\label{eqn:relativisticSurface:560}
\Bx_\rho = \PD{\rho}{x} =
e^{-\vcap \alpha/2} x(0,1) e^{\vcap \alpha/2} = \frac{x}{\rho},

and
\label{eqn:relativisticSurface:580}
\Bx_\alpha = \PD{\alpha}{x} =
\lr{-\vcap/2} x
+
x \lr{ \vcap/2 }
=
x \cdot \vcap.

These vectors $$\Bx_\rho, \Bx_\alpha$$ are orthogonal, as $$x \cdot \vcap$$ is the projection of $$x$$ onto the spacetime plane $$x \wedge \vcap = 0$$, but rotated so that $$x \cdot \lr{ x \cdot \vcap } = 0$$. Because of this orthogonality, the vector derivative for this tangent space is
\label{eqn:relativisticSurface:600}
\partial =
\inv{x \cdot \vcap} \PD{\alpha}{}
+
\frac{\rho}{x}
\PD{\rho}{}
.

The area element is
\label{eqn:relativisticSurface:620}
\begin{aligned}
d^2 \Bx
&=
d\rho d\alpha\,
\frac{x}{\rho} \wedge \lr{ x \cdot \vcap } \\
&=
\inv{\rho} d\rho d\alpha\,
x \lr{ x \cdot \vcap }
.
\end{aligned}

The full statement of the fundamental theorem for this surface is
\label{eqn:relativisticSurface:640}
\int_S
d\rho d\alpha\,
F
\lr{
\inv{\rho} x \lr{ x \cdot \vcap }
}
\lr{
\inv{x \cdot \vcap} \PD{\alpha}{}
+
\frac{\rho}{x}
\PD{\rho}{}
}
G
=
\int_{\partial S}
F \lr{ d\rho \frac{x}{\rho} – d\alpha \lr{ x \cdot \vcap } } G.

As in the previous example, due to the orthogonality of the tangent basis vectors, it’s easy to show find the RHS directly from the LHS.

Problem: Simple example with non-orthogonal tangent space basis vectors.

Let $$x(u,v) = u a + v b$$, where $$u,v$$ are scalar parameters, and $$a, b$$ are non-null and non-colinear constant four-vectors. Write out the fundamental theorem for surfaces with respect to this parameterization.

The tangent basis vectors are just $$\Bx_u = a, \Bx_v = b$$, with reciprocals
\label{eqn:relativisticSurface:660}
\Bx^u = \Bx_v \cdot \inv{ \Bx_u \wedge \Bx_v } = b \cdot \inv{ a \wedge b },

and
\label{eqn:relativisticSurface:680}
\Bx^v = -\Bx_u \cdot \inv{ \Bx_u \wedge \Bx_v } = -a \cdot \inv{ a \wedge b }.

The fundamental theorem, with respect to this surface, when written out explicitly takes the form
\label{eqn:relativisticSurface:700}
\int F \, du dv\, \lr{ a \wedge b } \inv{ a \wedge b } \cdot \lr{ a \PD{u}{} – b \PD{v}{} } G
=
\int F \lr{ a du – b dv } G.

This is a good example to illustrate the geometry of the line integral circulation.
Suppose that we are integrating over $$u \in [0,1], v \in [0,1]$$. In this case, the line integral really means
\label{eqn:relativisticSurface:720}
\begin{aligned}
\int &F \lr{ a du – b dv } G
=
+
\int F(u,1) (+a du) G(u,1)
+
\int F(u,0) (-a du) G(u,0) \\
\int F(1,v) (-b dv) G(1,v)
+
\int F(0,v) (+b dv) G(0,v),
\end{aligned}

which is a path around the spacetime parallelogram spanned by $$u, v$$, as illustrated in fig. 1, which illustrates the orientation of the bivector area element with the arrows around the exterior of the parallelogram: $$0 \rightarrow a \rightarrow a + b \rightarrow b \rightarrow 0$$.

fig. 2. Line integral orientation.

A couple more reciprocal frame examples.

[If mathjax doesn’t display properly for you, click here for a PDF of this post]

This post logically follows both of the following:

The PDF linked above above contains all the content from this post plus (1.) above [to be edited later into a more logical sequence.]

More examples.

Here are a few additional examples of reciprocal frame calculations.

Problem: Unidirectional arbitrary functional dependence.

Let
\label{eqn:reciprocal:2540}
x = a f(u),

where $$a$$ is a constant vector and $$f(u)$$ is some arbitrary differentiable function with a non-zero derivative in the region of interest.

Here we have just a single tangent space direction (a line in spacetime) with tangent vector
\label{eqn:reciprocal:2400}
\Bx_u = a \PD{u}{f} = a f_u,

so we see that the tangent space vectors are just rescaled values of the direction vector $$a$$.
This is a simple enough parameterization that we can compute the reciprocal frame vector explicitly using the gradient. We expect that $$\Bx^u = 1/\Bx_u$$, and find
\label{eqn:reciprocal:2420}
\inv{a} \cdot x = f(u),

but for constant $$a$$, we know that $$\grad a \cdot x = a$$, so taking gradients of both sides we find
\label{eqn:reciprocal:2440}

so the reciprocal vector is
\label{eqn:reciprocal:2460}
\Bx^u = \grad u = \inv{a f_u},

as expected.

Problem: Linear two variable parameterization.

Let $$x = a u + b v$$, where $$x \wedge a \wedge b = 0$$ represents spacetime plane (also the tangent space.) Find the curvilinear coordinates and their reciprocals.

The frame vectors are easy to compute, as they are just
\label{eqn:reciprocal:1960}
\begin{aligned}
\Bx_u &= \PD{u}{x} = a \\
\Bx_v &= \PD{v}{x} = b.
\end{aligned}

This is an example of a parametric equation that we can easily invert, as we have
\label{eqn:reciprocal:1980}
\begin{aligned}
x \wedge a &= – v \lr{ a \wedge b } \\
x \wedge b &= u \lr{ a \wedge b },
\end{aligned}

so
\label{eqn:reciprocal:2000}
\begin{aligned}
u
&= \inv{ a \wedge b } \cdot \lr{ x \wedge b } \\
&= \inv{ \lr{a \wedge b}^2 } \lr{ a \wedge b } \cdot \lr{ x \wedge b } \\
&=
\frac{
\lr{b \cdot x} \lr{ a \cdot b }

\lr{a \cdot x} \lr{ b \cdot b }
}{ \lr{a \wedge b}^2 }
\end{aligned}

\label{eqn:reciprocal:2020}
\begin{aligned}
v &= -\inv{ a \wedge b } \cdot \lr{ x \wedge a } \\
&= -\inv{ \lr{a \wedge b}^2 } \lr{ a \wedge b } \cdot \lr{ x \wedge a } \\
&=
-\frac{
\lr{b \cdot x} \lr{ a \cdot a }

\lr{a \cdot x} \lr{ a \cdot b }
}{ \lr{a \wedge b}^2 }
\end{aligned}

Recall that $$\grad \lr{ a \cdot x} = a$$, if $$a$$ is a constant, so our gradients are just
\label{eqn:reciprocal:2040}
\begin{aligned}
&=
\frac{
b \lr{ a \cdot b }

a
\lr{ b \cdot b }
}{ \lr{a \wedge b}^2 } \\
&=
b \cdot \inv{ a \wedge b },
\end{aligned}

and
\label{eqn:reciprocal:2060}
\begin{aligned}
&=
-\frac{
b \lr{ a \cdot a }

a \lr{ a \cdot b }
}{ \lr{a \wedge b}^2 } \\
&=
-a \cdot \inv{ a \wedge b }.
\end{aligned}

Expressed in terms of the frame vectors, this is just
\label{eqn:reciprocal:2080}
\begin{aligned}
\Bx^u &= \Bx_v \cdot \inv{ \Bx_u \wedge \Bx_v } \\
\Bx^v &= -\Bx_u \cdot \inv{ \Bx_u \wedge \Bx_v },
\end{aligned}

so we were able to show, for this special two parameter linear case, that the explicit evaluation of the gradients has the exact structure that we intuited that the reciprocals must have, provided they are constrained to the spacetime plane $$a \wedge b$$. It is interesting to observe how this structure falls out of the linear system solution so directly. Also note that these reciprocals are not defined at the origin of the $$(u,v)$$ parameter space.

Now consider a variation of the previous problem, with $$x = a u^2 + b v^2$$. Find the curvilinear coordinates and their reciprocals.

\label{eqn:reciprocal:2100}
\begin{aligned}
\Bx_u &= \PD{u}{x} = 2 u a \\
\Bx_v &= \PD{v}{x} = 2 v b.
\end{aligned}

Our tangent space is still the $$a \wedge b$$ plane (as is the surface itself), but the spacing of the cells starts getting wider in proportion to $$u, v$$.
Utilizing the work from the previous problem, we have
\label{eqn:reciprocal:2120}
\begin{aligned}
b \cdot \inv{ a \wedge b } \\
-a \cdot \inv{ a \wedge b }.
\end{aligned}

A bit of rearrangement can show that this is equivalent to the reciprocal frame identities. This is a second demonstration that the gradient and the algebraic formulations for the reciprocals match, at least for these special cases of linear non-coupled parameterizations.

Problem: Reciprocal frame for generalized cylindrical parameterization.

Let the vector parameterization be $$x(\rho,\theta) = \rho e^{-i\theta/2} x(\rho_0, \theta_0) e^{i \theta}$$, where $$i^2 = \pm 1$$ is a unit bivector ($$+1$$ for a boost, and $$-1$$ for a rotation), and where $$\theta, \rho$$ are scalars. Find the tangent space vectors and their reciprocals.

fig. 1. “Cylindrical” boost parameterization.

Note that this is cylindrical parameterization for the rotation case, and traces out hyperbolic regions for the boost case. The boost case is illustrated in fig. 1 where hyperbolas in the light cone are found for boosts of $$\gamma_0$$ with various values of $$\rho$$, and the spacelike hyperbolas are boosts of $$\gamma_1$$, again for various values of $$\rho$$.

The tangent space vectors are
\label{eqn:reciprocal:2480}
\Bx_\rho = \frac{x}{\rho},

and

\label{eqn:reciprocal:2500}
\begin{aligned}
\Bx_\theta
&= -\frac{i}{2} x + x \frac{i}{2} \\
&= x \cdot i.
\end{aligned}

Recall that $$x \cdot i$$ lies perpendicular to $$x$$ (in the plane $$i$$), as illustrated in fig. 2. This means that $$\Bx_\rho$$ and $$\Bx_\theta$$ are orthogonal, so we can find the reciprocal vectors by just inverting them
\label{eqn:reciprocal:2520}
\begin{aligned}
\Bx^\rho &= \frac{\rho}{x} \\
\Bx^\theta &= \frac{1}{x \cdot i}.
\end{aligned}

fig. 2. Projection and rejection geometry.

Parameterization of a general linear transformation.

Given $$N$$ parameters $$u^0, u^1, \cdots u^{N-1}$$, a general linear transformation from the parameter space to the vector space has the form
\label{eqn:reciprocal:2160}
x =
{a^\alpha}_\beta \gamma_\alpha u^\beta,

where $$\beta \in [0, \cdots, N-1]$$ and $$\alpha \in [0,3]$$.
For such a general transformation, observe that the curvilinear basis vectors are
\label{eqn:reciprocal:2180}
\begin{aligned}
\Bx_\mu
&= \PD{u^\mu}{x} \\
&= \PD{u^\mu}{}
{a^\alpha}_\beta \gamma_\alpha u^\beta \\
&=
{a^\alpha}_\mu \gamma_\alpha.
\end{aligned}

We find an interpretation of $${a^\alpha}_\mu$$ by dotting $$\Bx_\mu$$ with the reciprocal frame vectors of the standard basis
\label{eqn:reciprocal:2200}
\begin{aligned}
\Bx_\mu \cdot \gamma^\nu
&=
{a^\alpha}_\mu \lr{ \gamma_\alpha \cdot \gamma^\nu } \\
&=
{a^\nu}_\mu,
\end{aligned}

so
\label{eqn:reciprocal:2220}
x = \Bx_\mu u^\mu.

We are able to reinterpret \ref{eqn:reciprocal:2160} as a contraction of the tangent space vectors with the parameters, scaling and summing these direction vectors to characterize all the points in the tangent plane.

Theorem 1.1: Projecting onto the tangent space.

Let $$T$$ represent the tangent space. The projection of a vector onto the tangent space has the form
\label{eqn:reciprocal:2560}
\textrm{Proj}_{\textrm{T}} y = \lr{ y \cdot \Bx^\mu } \Bx_\mu = \lr{ y \cdot \Bx_\mu } \Bx^\mu.

Start proof:

Let’s designate $$a$$ as the portion of the vector $$y$$ that lies outside of the tangent space
\label{eqn:reciprocal:2260}
y = y^\mu \Bx_\mu + a.

If we knew the coordinates $$y^\mu$$, we would have a recipe for the projection.
Algebraically, requiring that $$a$$ lies outside of the tangent space, is equivalent to stating $$a \cdot \Bx_\mu = a \cdot \Bx^\mu = 0$$. We use that fact, and then take dot products
\label{eqn:reciprocal:2280}
\begin{aligned}
y \cdot \Bx^\nu
&= \lr{ y^\mu \Bx_\mu + a } \cdot \Bx^\nu \\
&= y^\nu,
\end{aligned}

so
\label{eqn:reciprocal:2300}
y = \lr{ y \cdot \Bx^\mu } \Bx_\mu + a.

Similarly, the tangent space projection can be expressed as a linear combination of reciprocal basis elements
\label{eqn:reciprocal:2320}
y = y_\mu \Bx^\mu + a.

Dotting with $$\Bx_\mu$$, we have
\label{eqn:reciprocal:2340}
\begin{aligned}
y \cdot \Bx^\mu
&= \lr{ y_\alpha \Bx^\alpha + a } \cdot \Bx_\mu \\
&= y_\mu,
\end{aligned}

so
\label{eqn:reciprocal:2360}
y = \lr{ y \cdot \Bx^\mu } \Bx_\mu + a.

We find the two stated ways of computing the projection.

Observe that, for the special case that all of $$\setlr{ \Bx_\mu }$$ are orthogonal, the equivalence of these two projection methods follows directly, since
\label{eqn:reciprocal:2380}
\begin{aligned}
\lr{ y \cdot \Bx^\mu } \Bx_\mu
&=
\lr{ y \cdot \inv{\Bx_\mu} } \inv{\Bx^\mu} \\
&=
\lr{ y \cdot \frac{\Bx_\mu}{\lr{\Bx_\mu}^2 } } \frac{\Bx^\mu}{\lr{\Bx^\mu}^2} \\
&=
\lr{ y \cdot \Bx_\mu } \Bx^\mu.
\end{aligned}

Curvilinear coordinates and gradient in spacetime, and reciprocal frames.

[If mathjax doesn’t display properly for you, click here for a PDF of this post]

Motivation.

I started pondering some aspects of spacetime integration theory, and found that there were some aspects of the concepts of reciprocal frames that were not clear to me. In the process of sorting those ideas out for myself, I wrote up the following notes.

In the notes below, I will introduce the many of the prerequisite ideas that are needed to express and apply the fundamental theorem of geometric calculus in a 4D relativistic context. The focus will be the Dirac’s algebra of special relativity, known as STA (Space Time Algebra) in geometric algebra parlance. If desired, it should be clear how to apply these ideas to lower or higher dimensional spaces, and to plain old Euclidean metrics.

On notation.

In Euclidean space we use bold face reciprocal frame vectors $$\Bx^i \cdot \Bx_j = {\delta^i}_j$$, which nicely distinguishes them from the generalized coordinates $$x_i, x^j$$ associated with the basis or the reciprocal frame, that is
\label{eqn:reciprocalblog:640}
\Bx = x^i \Bx_i = x_j \Bx^j.

On the other hand, it is conventional to use non-bold face for both the four-vectors and their coordinates in STA, such as the following standard basis decomposition
\label{eqn:reciprocalblog:660}
x = x^\mu \gamma_\mu = x_\mu \gamma^\mu.

If we use non-bold face $$x^\mu, x_\nu$$ for the coordinates with respect to a specified frame, then we cannot also use non-bold face for the curvilinear basis vectors.

To resolve this notational ambiguity, I’ve chosen to use bold face $$\Bx^\mu, \Bx_\nu$$ symbols as the curvilinear basis elements in this relativistic context, as we do for Euclidean spaces.

Definition 1.1: Standard Dirac basis.

The Dirac basis elements are $$\setlr{ \gamma_0, \gamma_1, \gamma_2, \gamma_3 }$$, satisfying
\label{eqn:reciprocalblog:1940}
\gamma_0^2 = 1 = -\gamma_k^2, \quad \forall k = 1,2,3,

and
\label{eqn:reciprocalblog:740}
\gamma_\mu \cdot \gamma_\nu = 0, \quad \forall \mu \ne \nu.

A conventional way of summarizing these orthogonality relationships is $$\gamma_\mu \cdot \gamma_\nu = \eta_{\mu\nu}$$, where $$\eta_{\mu\nu}$$ are the elements of the metric $$G = \text{diag}(+,-,-,-)$$.

Definition 1.2: Reciprocal basis for the standard Dirac basis.

We define a reciprocal basis $$\setlr{ \gamma^0, \gamma^1, \gamma^2, \gamma^3}$$ satisfying $$\gamma^\mu \cdot \gamma_\nu = {\delta^\mu}_\nu, \forall \mu,\nu \in 0,1,2,3$$.

Theorem 1.1: Reciprocal basis uniqueness.

This reciprocal basis is unique, and for our choice of metric has the values
\label{eqn:reciprocalblog:1960}
\gamma^0 = \gamma_0, \quad \gamma^k = -\gamma_k, \quad \forall k = 1,2,3.

Proof is left to the reader.

Definition 1.3: Coordinates.

We define the coordinates of a vector with respect to the standard basis as $$x^\mu$$ satisfying
\label{eqn:reciprocalblog:1980}
x = x^\mu \gamma_\mu,

and define the coordinates of a vector with respect to the reciprocal basis as $$x_\mu$$ satisfying
\label{eqn:reciprocalblog:2000}
x = x_\mu \gamma^\mu,

Theorem 1.2: Coordinates.

Given the definitions above, we may compute the coordinates of a vector, simply by dotting with the basis elements
\label{eqn:reciprocalblog:2020}
x^\mu = x \cdot \gamma^\mu,

and
\label{eqn:reciprocalblog:2040}
x_\mu = x \cdot \gamma_\mu,

Start proof:

This follows by straightforward computation
\label{eqn:reciprocalblog:840}
\begin{aligned}
x \cdot \gamma^\mu
&=
\lr{ x^\nu \gamma_\nu } \cdot \gamma^\mu \\
&=
x^\nu \lr{ \gamma_\nu \cdot \gamma^\mu } \\
&=
x^\nu {\delta_\nu}^\mu \\
&=
x^\mu,
\end{aligned}

and
\label{eqn:reciprocalblog:860}
\begin{aligned}
x \cdot \gamma_\mu
&=
\lr{ x_\nu \gamma^\nu } \cdot \gamma_\mu \\
&=
x_\nu \lr{ \gamma^\nu \cdot \gamma_\mu } \\
&=
x_\nu {\delta^\nu}_\mu \\
&=
x_\mu.
\end{aligned}

Derivative operators.

We’d like to determine the form of the (spacetime) gradient operator. The gradient can be defined in terms of coordinates directly, but we choose an implicit definition, in terms of the directional derivative.

Definition 1.4: Directional derivative and gradient.

Let $$F = F(x)$$ be a four-vector parameterized multivector. The directional derivative of $$F$$ with respect to the (four-vector) direction $$a$$ is denoted
\label{eqn:reciprocalblog:2060}
\lr{ a \cdot \grad } F = \lim_{\epsilon \rightarrow 0} \frac{ F(x + \epsilon a) – F(x) }{ \epsilon },

where $$\grad$$ is called the space time gradient.

The standard basis representation of the gradient is
\label{eqn:reciprocalblog:2080}

where
\label{eqn:reciprocalblog:2100}
\partial_\mu = \PD{x^\mu}{}.

Start proof:

The Dirac gradient pops naturally out of the coordinate representation of the directional derivative, as we can see by expanding $$F(x + \epsilon a)$$ in Taylor series
\label{eqn:reciprocalblog:900}
\begin{aligned}
F(x + \epsilon a)
&= F(x) + \epsilon \frac{dF(x + \epsilon a)}{d\epsilon} + O(\epsilon^2) \\
&= F(x) + \epsilon \PD{\lr{x^\mu + \epsilon a^\mu}}{F} \PD{\epsilon}{\lr{x^\mu + \epsilon a^\mu}} \\
&= F(x) + \epsilon \PD{\lr{x^\mu + \epsilon a^\mu}}{F} a^\mu.
\end{aligned}

The directional derivative is
\label{eqn:reciprocalblog:920}
\begin{aligned}
\lim_{\epsilon \rightarrow 0}
\frac{F(x + \epsilon a) – F(x)}{\epsilon}
&=
\lim_{\epsilon \rightarrow 0}\,
a^\mu
\PD{\lr{x^\mu + \epsilon a^\mu}}{F} \\
&=
a^\mu
\PD{x^\mu}{F} \\
&=
\lr{a^\nu \gamma_\nu} \cdot \gamma^\mu \PD{x^\mu}{F} \\
&=
a \cdot \lr{ \gamma^\mu \partial_\mu } F.
\end{aligned}

Curvilinear bases.

Curvilinear bases are the foundation of the fundamental theorem of multivector calculus. This form of integral calculus is defined over parameterized surfaces (called manifolds) that satisfy some specific non-degeneracy and continuity requirements.

A parameterized vector $$x(u,v, \cdots w)$$ can be thought of as tracing out a hypersurface (curve, surface, volume, …), where the dimension of the hypersurface depends on the number of parameters. At each point, a bases can be constructed from the differentials of the parameterized vector. Such a basis is called the tangent space to the surface at the point in question. Our curvilinear bases will be related to these differentials. We will also be interested in a dual basis that is restricted to the span of the tangent space. This dual basis will be called the reciprocal frame, and line the basis of the tangent space itself, also varies from point to point on the surface.

Fig 1a. One parameter curve, with illustration of tangent space along the curve.

Fig 1b. Two parameter surface, with illustration of tangent space along the surface.

One and two parameter spaces are illustrated in fig. 1a, and 1b.  The tangent space basis at a specific point of a two parameter surface, $$x(u^0, u^1)$$, is illustrated in fig. 1. The differential directions that span the tangent space are
\label{eqn:reciprocalblog:1040}
\begin{aligned}
d\Bx_0 &= \PD{u^0}{x} du^0 \\
d\Bx_1 &= \PD{u^1}{x} du^1,
\end{aligned}

and the tangent space itself is $$\mbox{Span}\setlr{ d\Bx_0, d\Bx_1 }$$. We may form an oriented surface area element $$d\Bx_0 \wedge d\Bx_1$$ over this surface.

Fig 2. Two parameter surface.

Tangent spaces associated with 3 or more parameters cannot be easily visualized in three dimensions, but the idea generalizes algebraically without trouble.

Definition 1.5: Tangent basis and space.

Given a parameterization $$x = x(u^0, \cdots, u^N)$$, where $$N < 4$$, the span of the vectors
\label{eqn:reciprocalblog:2120}
\Bx_\mu = \PD{u^\mu}{x},

is called the tangent space for the hypersurface associated with the parameterization, and it’s basis is
$$\setlr{ \Bx_\mu }$$.

Later we will see that parameterization constraints must be imposed, as not all surfaces generated by a set of parameterizations are useful for integration theory. In particular, degenerate parameterizations for which the wedge products of the tangent space basis vectors are zero, or those wedge products cannot be inverted, are not physically meaningful. Properly behaved surfaces of this sort are called manifolds.

Having introduced curvilinear coordinates associated with a parameterization, we can now determine the form of the gradient with respect to a parameterization of spacetime.

Given a spacetime parameterization $$x = x(u^0, u^1, u^2, u^3)$$, the gradient with respect to the parameters $$u^\mu$$ is
\label{eqn:reciprocalblog:2140}
\PD{u^\mu}{},

where
\label{eqn:reciprocalblog:2160}

The vectors $$\Bx^\mu$$ are called the reciprocal frame vectors, and the ordered set $$\setlr{ \Bx^0, \Bx^1, \Bx^2, \Bx^3 }$$ is called the reciprocal basis.It is convenient to define $$\partial_\mu \equiv \PDi{u^\mu}{}$$, so that the gradient can be expressed in mixed index representation
\label{eqn:reciprocalblog:2180}

This introduces some notational ambiguity, since we used $$\partial_\mu = \PDi{x^\mu}{}$$ for the standard basis derivative operators too, but we will be careful to be explicit when there is any doubt about what is intended.

Start proof:

The proof follows by application of the chain rule.
\label{eqn:reciprocalblog:960}
\begin{aligned}
&=
\gamma^\alpha \PD{x^\alpha}{F} \\
&=
\gamma^\alpha
\PD{x^\alpha}{u^\mu}
\PD{u^\mu}{F} \\
&=
\lr{ \grad u^\mu } \PD{u^\mu}{F} \\
&=
\Bx^\mu \PD{u^\mu}{F}.
\end{aligned}

Theorem 1.5: Reciprocal relationship.

The vectors $$\Bx^\mu = \grad u^\mu$$, and $$\Bx_\mu = \PDi{u^\mu}{x}$$ satisfy the reciprocal relationship
\label{eqn:reciprocalblog:2200}
\Bx^\mu \cdot \Bx_\nu = {\delta^\mu}_\nu.

Start proof:

\label{eqn:reciprocalblog:1020}
\begin{aligned}
\Bx^\mu \cdot \Bx_\nu
&=
\PD{u^\nu}{x} \\
&=
\lr{
\gamma^\alpha \PD{x^\alpha}{u^\mu}
}
\cdot
\lr{
\PD{u^\nu}{x^\beta} \gamma_\beta
} \\
&=
{\delta^\alpha}_\beta \PD{x^\alpha}{u^\mu}
\PD{u^\nu}{x^\beta} \\
&=
\PD{x^\alpha}{u^\mu} \PD{u^\nu}{x^\alpha} \\
&=
\PD{u^\nu}{u^\mu} \\
&=
{\delta^\mu}_\nu
.
\end{aligned}

End proof.

It is instructive to consider an example. Here is a parameterization that scales the proper time parameter, and uses polar coordinates in the $$x-y$$ plane.

Problem: Compute the curvilinear and reciprocal basis.

Given
\label{eqn:reciprocalblog:2360}
x(t,\rho,\theta,z) = c t \gamma_0 + \gamma_1 \rho e^{i \theta} + z \gamma_3,

where $$i = \gamma_1 \gamma_2$$, compute the curvilinear frame vectors and their reciprocals.

The frame vectors are all easy to compute
\label{eqn:reciprocalblog:1180}
\begin{aligned}
\Bx_0 &= \PD{t}{x} = c \gamma_0 \\
\Bx_1 &= \PD{\rho}{x} = \gamma_1 e^{i \theta} \\
\Bx_2 &= \PD{\theta}{x} = \rho \gamma_1 \gamma_1 \gamma_2 e^{i \theta} = – \rho \gamma_2 e^{i \theta} \\
\Bx_3 &= \PD{z}{x} = \gamma_3.
\end{aligned}

The $$\Bx_1$$ vector is radial, $$\Bx^2$$ is perpendicular to that tangent to the same unit circle, as plotted in fig 3.

Fig3: Tangent space direction vectors.

All of these particular frame vectors happen to be mutually perpendicular, something that will not generally be true for a more arbitrary parameterization.

To compute the reciprocal frame vectors, we must express our parameters in terms of $$x^\mu$$ coordinates, and use implicit integration techniques to deal with the coupling of the rotational terms. First observe that
\label{eqn:reciprocalblog:1200}
\gamma_1 e^{i\theta}
= \gamma_1 \lr{ \cos\theta + \gamma_1 \gamma_2 \sin\theta }
= \gamma_1 \cos\theta – \gamma_2 \sin\theta,

so
\label{eqn:reciprocalblog:1220}
\begin{aligned}
x^0 &= c t \\
x^1 &= \rho \cos\theta \\
x^2 &= -\rho \sin\theta \\
x^3 &= z.
\end{aligned}

We can easily evaluate the $$t, z$$ gradients
\label{eqn:reciprocalblog:1240}
\begin{aligned}
\grad t &= \frac{\gamma^1 }{c} \\
\end{aligned}

but the $$\rho, \theta$$ gradients are not as easy. First writing
\label{eqn:reciprocalblog:1260}
\rho^2 = \lr{x^1}^2 + \lr{x^2}^2,

we find
\label{eqn:reciprocalblog:1280}
\begin{aligned}
&= 2 \rho \lr{ \cos\theta \gamma^1 – \sin\theta \gamma^2 } \\
&= 2 \rho \gamma^1 \lr{ \cos\theta – \gamma_1 \gamma^2 \sin\theta } \\
&= 2 \rho \gamma^1 e^{i\theta},
\end{aligned}

so
\label{eqn:reciprocalblog:1300}

For the $$\theta$$ gradient, we can write
\label{eqn:reciprocalblog:1320}
\tan\theta = -\frac{x^2}{x^1},

so
\label{eqn:reciprocalblog:1340}
\begin{aligned}
&= -\frac{\gamma^2}{x^1} – x^2 \frac{-\gamma^1}{\lr{x^1}^2} \\
&= \inv{\lr{x^1}^2} \lr{ – \gamma^2 x^1 + \gamma^1 x^2 } \\
&= \frac{\rho}{\rho^2 \cos^2\theta } \lr{ – \gamma^2 \cos\theta – \gamma^1 \sin\theta } \\
&= -\frac{1}{\rho \cos^2\theta } \gamma^2 \lr{ \cos\theta + \gamma_2 \gamma^1 \sin\theta } \\
&= -\frac{\gamma^2 e^{i\theta} }{\rho \cos^2\theta },
\end{aligned}

or
\label{eqn:reciprocalblog:1360}

In summary,
\label{eqn:reciprocalblog:1380}
\begin{aligned}
\Bx^0 &= \frac{\gamma^0}{c} \\
\Bx^1 &= \gamma^1 e^{i\theta} \\
\Bx^2 &= -\inv{\rho} \gamma^2 e^{i\theta} \\
\Bx^3 &= \gamma^3.
\end{aligned}

Despite being a fairly simple parameterization, it was still fairly difficult to solve for the gradients when the parameterization introduced coupling between the coordinates. In this particular case, we could have solved for the parameters in terms of the coordinates (but it was easier not to), but that will not generally be true. We want a less labor intensive strategy to find the reciprocal frame. When we have a full parameterization of spacetime, then we can do this with nothing more than a matrix inversion.

Theorem 1.6: Reciprocal frame matrix equations.

Given a spacetime basis $$\setlr{\Bx_0, \cdots \Bx_3}$$, let $$[\Bx_\mu]$$ and $$[\Bx^\nu]$$ be column matrices with the coordinates of these vectors and their reciprocals, with respect to the standard basis $$\setlr{\gamma_0, \gamma_1, \gamma_2, \gamma_3 }$$. Let
\label{eqn:reciprocalblog:2220}
A =
\begin{bmatrix}
[\Bx_0] & \cdots & [\Bx_{3}]
\end{bmatrix}
X =
\begin{bmatrix}
[\Bx^0] & \cdots & [\Bx^{3}]
\end{bmatrix}.

The coordinates of the reciprocal frame vectors can be found by solving
\label{eqn:reciprocalblog:2240}
A^\T G X = 1,

where $$G = \text{diag}(1,-1,-1,-1)$$ and the RHS is an $$4 \times 4$$ identity matrix.

Start proof:

Let $$\Bx_\mu = {a_\mu}^\alpha \gamma_\alpha, \Bx^\nu = b^{\nu\beta} \gamma_\beta$$, so that
\label{eqn:reciprocalblog:140}
A =
\begin{bmatrix}
{a_\nu}^\mu
\end{bmatrix},

and
\label{eqn:reciprocalblog:160}
X =
\begin{bmatrix}
b^{\nu\mu}
\end{bmatrix},

where $$\mu \in [0,3]$$ are the row indexes and $$\nu \in [0,N-1]$$ are the column indexes. The reciprocal frame satisfies $$\Bx_\mu \cdot \Bx^\nu = {\delta_\mu}^\nu$$, which has the coordinate representation of
\label{eqn:reciprocalblog:180}
\begin{aligned}
\Bx_\mu \cdot \Bx^\nu
&=
\lr{
{a_\mu}^\alpha \gamma_\alpha
}
\cdot
\lr{
b^{\nu\beta} \gamma_\beta
} \\
&=
{a_\mu}^\alpha
\eta_{\alpha\beta}
b^{\nu\beta} \\
&=
{[A^\T G B]_\mu}^\nu,
\end{aligned}

where $$\mu$$ is the row index and $$\nu$$ is the column index.

Problem: Matrix inversion reciprocals.

For the parameterization of \ref{eqn:reciprocalblog:2360}, find the reciprocal frame vectors by matrix inversion.

We expanded $$\Bx_1$$ explicitly in \ref{eqn:reciprocalblog:1200}. Doing the same for $$\Bx_2$$, we have
\label{eqn:reciprocalblog:1201}
\Bx_2 =
-\rho \gamma_2 e^{i\theta}
= -\rho \gamma_2 \lr{ \cos\theta + \gamma_1 \gamma_2 \sin\theta }
= – \rho \lr{ \gamma_2 \cos\theta + \gamma_1 \sin\theta}.

Reading off the coordinates of our frame vectors, we have
\label{eqn:reciprocalblog:1400}
X =
\begin{bmatrix}
c & 0 & 0 & 0 \\
0 & C & -\rho S & 0 \\
0 & -S & -\rho C & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix},

where $$C = \cos\theta$$ and $$S = \sin\theta$$. We want
\label{eqn:reciprocalblog:1420}
Y =
{\begin{bmatrix}
c & 0 & 0 & 0 \\
0 & -C & S & 0 \\
0 & \rho S & \rho C & 0 \\
0 & 0 & 0 & -1 \\
\end{bmatrix}}^{-1}
=
\begin{bmatrix}
\inv{c} & 0 & 0 & 0 \\
0 & -C & \frac{S}{\rho} & 0 \\
0 & S & \frac{C}{\rho} & 0 \\
0 & 0 & 0 & -1 \\
\end{bmatrix}.

We can read off the coordinates of the reciprocal frame vectors
\label{eqn:reciprocalblog:1440}
\begin{aligned}
\Bx^0 &= \inv{c} \gamma_0 \\
\Bx^1 &= -\cos\theta \gamma_1 + \sin\theta \gamma_2 \\
\Bx^2 &= \inv{\rho} \lr{ \sin\theta \gamma_1 + \cos\theta \gamma_2 } \\
\Bx^3 &= -\gamma_3.
\end{aligned}

Factoring out $$\gamma^1$$ from the $$\Bx^1$$ terms, we find
\label{eqn:reciprocalblog:1460}
\begin{aligned}
\Bx^1
&= -\cos\theta \gamma_1 + \sin\theta \gamma_2 \\
&= \gamma^1 \lr{ \cos\theta + \gamma_1 \gamma_2 \sin\theta } \\
&= \gamma^1 e^{i\theta}.
\end{aligned}

Similarly for $$\Bx^2$$,
\label{eqn:reciprocalblog:1480}
\begin{aligned}
\Bx^2
&= \inv{\rho} \lr{ \sin\theta \gamma_1 + \cos\theta \gamma_2 } \\
&= \frac{\gamma^2}{\rho} \lr{ \sin\theta \gamma_2 \gamma_1 – \cos\theta } \\
&= -\frac{\gamma^2}{\rho} e^{i\theta}.
\end{aligned}

This matches \ref{eqn:reciprocalblog:1380}, as expected, but required only algebraic work to compute.

There will be circumstances where we parameterize only a subset of spacetime, and are interested in calculating quantities associated with such a surface. For example, suppose that
\label{eqn:reciprocalblog:1500}
x(\rho,\theta) = \gamma_1 \rho e^{i \theta},

where $$i = \gamma_1 \gamma_2$$ as before. We are now parameterizing only the $$x-y$$ plane. We will still find
\label{eqn:reciprocalblog:1520}
\begin{aligned}
\Bx_1 &= \gamma_1 e^{i \theta} \\
\Bx_2 &= -\gamma_2 \rho e^{i \theta}.
\end{aligned}

We can compute the reciprocals of these vectors using the gradient method. It’s possible to state matrix equations representing the reciprocal relationship of \ref{eqn:reciprocalblog:2200}, which, in this case, is $$X^\T G Y = 1$$, where the RHS is a $$2 \times 2$$ identity matrix, and $$X, Y$$ are $$4\times 2$$ matrices of coordinates, with
\label{eqn:reciprocalblog:1540}
X =
\begin{bmatrix}
0 & 0 \\
C & -\rho S \\
-S & -\rho C \\
0 & 0
\end{bmatrix}.

We no longer have a square matrix problem to solve, and our solution set is multivalued. In particular, this matrix equation has solutions
\label{eqn:reciprocalblog:1560}
\begin{aligned}
\Bx^1 &= \gamma^1 e^{i\theta} + \alpha \gamma^0 + \beta \gamma^3 \\
\Bx^2 &= -\frac{\gamma^2}{\rho} e^{i\theta} + \alpha’ \gamma^0 + \beta’ \gamma^3.
\end{aligned}

where $$\alpha, \alpha’, \beta, \beta’$$ are arbitrary constants. In the example we considered, we saw that our $$\rho, \theta$$ parameters were functions of only $$x^1, x^2$$, so taking gradients could not introduce any $$\gamma^0, \gamma^3$$ dependence in $$\Bx^1, \Bx^2$$. It seems reasonable to assert that we seek an algebraic method of computing a set of vectors that satisfies the reciprocal relationships, where that set of vectors is restricted to the tangent space. We will need to figure out how to prove that this reciprocal construction is identical to the parameter gradients, but let’s start with figuring out what such a tangent space restricted solution looks like.

Theorem 1.7: Reciprocal frame for two parameter subspace.

Given two vectors, $$\Bx_1, \Bx_2$$, the vectors $$\Bx^1, \Bx^2 \in \mbox{Span}\setlr{ \Bx_1, \Bx_2 }$$ such that $$\Bx^\mu \cdot \Bx_\nu = {\delta^\mu}_\nu$$ are given by
\label{eqn:reciprocalblog:2260}
\begin{aligned}
\Bx^1 &= \Bx_2 \cdot \inv{\Bx_1 \wedge \Bx_2} \\
\Bx^2 &= -\Bx_1 \cdot \inv{\Bx_1 \wedge \Bx_2},
\end{aligned}

provided $$\Bx_1 \wedge \Bx_2 \ne 0$$ and
$$\lr{ \Bx_1 \wedge \Bx_2 }^2 \ne 0$$.

Start proof:

The most general set of vectors that satisfy the span constraint are
\label{eqn:reciprocalblog:1580}
\begin{aligned}
\Bx^1 &= a \Bx_1 + b \Bx_2 \\
\Bx^2 &= c \Bx_1 + d \Bx_2.
\end{aligned}

We can use wedge products with either $$\Bx_1$$ or $$\Bx_2$$ to eliminate the other from the RHS
\label{eqn:reciprocalblog:1600}
\begin{aligned}
\Bx^1 \wedge \Bx_2 &= a \lr{ \Bx_1 \wedge \Bx_2 } \\
\Bx^1 \wedge \Bx_1 &= – b \lr{ \Bx_1 \wedge \Bx_2 } \\
\Bx^2 \wedge \Bx_2 &= c \lr{ \Bx_1 \wedge \Bx_2 } \\
\Bx^2 \wedge \Bx_1 &= – d \lr{ \Bx_1 \wedge \Bx_2 },
\end{aligned}

and then dot both sides with $$\Bx_1 \wedge \Bx_2$$ to produce four scalar equations
\label{eqn:reciprocalblog:1640}
\begin{aligned}
a \lr{ \Bx_1 \wedge \Bx_2 }^2
&= \lr{ \Bx^1 \wedge \Bx_2 } \cdot \lr{ \Bx_1 \wedge \Bx_2 } \\
&=
\lr{ \Bx_2 \cdot \Bx_1 } \lr{ \Bx^1 \cdot \Bx_2 }

\lr{ \Bx_2 \cdot \Bx_2 } \lr{ \Bx^1 \cdot \Bx_1 } \\
&=
\lr{ \Bx_2 \cdot \Bx_1 } (0)

\lr{ \Bx_2 \cdot \Bx_2 } (1) \\
&= – \Bx_2 \cdot \Bx_2
\end{aligned}

\label{eqn:reciprocalblog:1660}
\begin{aligned}
– b \lr{ \Bx_1 \wedge \Bx_2 }^2
&=
\lr{ \Bx^1 \wedge \Bx_1 } \cdot \lr{ \Bx_1 \wedge \Bx_2 } \\
&=
\lr{ \Bx^1 \cdot \Bx_2 } \lr{ \Bx_1 \cdot \Bx_1 }

\lr{ \Bx^1 \cdot \Bx_1 } \lr{ \Bx_1 \cdot \Bx_2 } \\
&=
(0) \lr{ \Bx_1 \cdot \Bx_1 }

(1) \lr{ \Bx_1 \cdot \Bx_2 } \\
&= – \Bx_1 \cdot \Bx_2
\end{aligned}

\label{eqn:reciprocalblog:1680}
\begin{aligned}
c \lr{ \Bx_1 \wedge \Bx_2 }^2
&= \lr{ \Bx^2 \wedge \Bx_2 } \cdot \lr{ \Bx_1 \wedge \Bx_2 } \\
&=
\lr{ \Bx_2 \cdot \Bx_1 } \lr{ \Bx^2 \cdot \Bx_2 }

\lr{ \Bx_2 \cdot \Bx_2 } \lr{ \Bx^2 \cdot \Bx_1 } \\
&=
\lr{ \Bx_2 \cdot \Bx_1 } (1)

\lr{ \Bx_2 \cdot \Bx_2 } (0) \\
&= \Bx_2 \cdot \Bx_1
\end{aligned}

\label{eqn:reciprocalblog:1700}
\begin{aligned}
– d \lr{ \Bx_1 \wedge \Bx_2 }^2
&= \lr{ \Bx^2 \wedge \Bx_1 } \cdot \lr{ \Bx_1 \wedge \Bx_2 } \\
&=
\lr{ \Bx_1 \cdot \Bx_1 } \lr{ \Bx^2 \cdot \Bx_2 }

\lr{ \Bx_1 \cdot \Bx_2 } \lr{ \Bx^2 \cdot \Bx_1 } \\
&=
\lr{ \Bx_1 \cdot \Bx_1 } (1)

\lr{ \Bx_1 \cdot \Bx_2 } (0) \\
&= \Bx_1 \cdot \Bx_1.
\end{aligned}

Putting the pieces together we have
\label{eqn:reciprocalblog:1740}
\begin{aligned}
\Bx^1
&= \frac{ – \lr{ \Bx_2 \cdot \Bx_2 } \Bx_1 + \lr{ \Bx_1 \cdot \Bx_2 } \Bx_2
}{\lr{\Bx_1 \wedge \Bx_2}^2} \\
&=
\frac{
\Bx_2 \cdot \lr{ \Bx_1 \wedge \Bx_2 }
}{\lr{\Bx_1 \wedge \Bx_2}^2} \\
&=
\Bx_2 \cdot \inv{\Bx_1 \wedge \Bx_2}
\end{aligned}

\label{eqn:reciprocalblog:1760}
\begin{aligned}
\Bx^2
&=
\frac{ \lr{ \Bx_1 \cdot \Bx_2 } \Bx_1 – \lr{ \Bx_1 \cdot \Bx_1 } \Bx_2
}{\lr{\Bx_1 \wedge \Bx_2}^2} \\
&=
\frac{ -\Bx_1 \cdot \lr{ \Bx_1 \wedge \Bx_2 } }
{\lr{\Bx_1 \wedge \Bx_2}^2} \\
&=
-\Bx_1 \cdot \inv{\Bx_1 \wedge \Bx_2}
\end{aligned}

Lemma 1.1: Distribution identity.

Given k-vectors $$B, C$$ and a vector $$a$$, where the grade of $$C$$ is greater than that of $$B$$, then
\label{eqn:reciprocalblog:2280}
\lr{a \wedge B} \cdot C = a \cdot \lr{ B \cdot C }.

See [1] for a proof.

Theorem 1.8: Higher order tangent space reciprocals.

Given an $$N$$ parameter tangent space with basis $$\setlr{ \Bx_0, \Bx_1, \cdots \Bx_{N-1} }$$, the reciprocals are given by
\label{eqn:reciprocalblog:2300}
\Bx^\mu = (-1)^\mu
\lr{ \Bx_0 \wedge \cdots \check{\Bx_\mu} \cdots \wedge \Bx_{N-1} } \cdot I_N^{-1},

where the checked term ($$\check{\Bx_\mu}$$) indicates that all terms are included in the wedges except the $$\Bx_\mu$$ term, and $$I_N = \Bx_0 \wedge \cdots \Bx_{N-1}$$ is the pseudoscalar for the tangent space.

Start proof:

I’ll outline the proof for the three parameter tangent space case, from which the pattern will be clear. The motivation for this proof is a reexamination of the algebraic structure of the two vector solution. Suppose we have a tangent space basis $$\setlr{\Bx_0, \Bx_1}$$, for which we’ve shown that
\label{eqn:reciprocalblog:1860}
\begin{aligned}
\Bx^0
&= \Bx_1 \cdot \inv{\Bx_0 \wedge \Bx_1} \\
&= \frac{\Bx_1 \cdot \lr{\Bx_0 \wedge \Bx_1} }{\lr{ \Bx_0 \wedge \Bx_1}^2 }.
\end{aligned}

If we dot with $$\Bx_0$$ and $$\Bx_1$$ respectively, we find
\label{eqn:reciprocalblog:1800}
\begin{aligned}
\Bx_0 \cdot \Bx^0
&=
\Bx_0 \cdot \frac{ \Bx_1 \cdot \lr{ \Bx_0 \wedge \Bx_1 } }{\lr{ \Bx_0 \wedge \Bx_1}^2 } \\
&=
\lr{ \Bx_0 \wedge \Bx_1 } \cdot \frac{ \Bx_0 \wedge \Bx_1 }{\lr{ \Bx_0 \wedge \Bx_1}^2 }.
\end{aligned}

We end up with unity as expected. Here the
“factored” out vector is reincorporated into the pseudoscalar using the distribution identity \ref{eqn:reciprocalblog:2280}.
Similarly, dotting with $$\Bx_1$$, we find
\label{eqn:reciprocalblog:0810}
\begin{aligned}
\Bx_1 \cdot \Bx^0
&=
\Bx_1 \cdot \frac{ \Bx_1 \cdot \lr{ \Bx_0 \wedge \Bx_1 } }{\lr{ \Bx_0 \wedge \Bx_1}^2 } \\
&=
\lr{ \Bx_1 \wedge \Bx_1 } \cdot \frac{ \Bx_0 \wedge \Bx_1 }{\lr{ \Bx_0 \wedge \Bx_1}^2 }.
\end{aligned}

This is zero, since wedging a vector with itself is zero. We can perform such an operation in reverse, taking the square of the tangent space pseudoscalar, and factoring out one of the basis vectors. After this, division by that squared pseudoscalar will normalize things.

For a three parameter tangent space with basis $$\setlr{ \Bx_0, \Bx_1, \Bx_2 }$$, we can factor out any of the tangent vectors like so
\label{eqn:reciprocalblog:1880}
\begin{aligned}
\lr{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 }^2
&= \Bx_0 \cdot \lr{ \lr{ \Bx_1 \wedge \Bx_2 } \cdot \lr{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 } } \\
&= (-1) \Bx_1 \cdot \lr{ \lr{ \Bx_0 \wedge \Bx_2 } \cdot \lr{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 } } \\
&= (-1)^2 \Bx_2 \cdot \lr{ \lr{ \Bx_0 \wedge \Bx_1 } \cdot \lr{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 } }.
\end{aligned}

The toggling of sign reflects the number of permutations required to move the vector of interest to the front of the wedge sequence. Having factored out any one of the vectors, we can rearrange to find that vector that is it’s inverse and perpendicular to all the others.
\label{eqn:reciprocalblog:1900}
\begin{aligned}
\Bx^0 &= (-1)^0 \lr{ \Bx_1 \wedge \Bx_2 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 } \\
\Bx^1 &= (-1)^1 \lr{ \Bx_0 \wedge \Bx_2 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 } \\
\Bx^2 &= (-1)^2 \lr{ \Bx_0 \wedge \Bx_1 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 }.
\end{aligned}

End proof.

In the fashion above, should we want the reciprocal frame for all of spacetime given dimension 4 tangent space, we can state it trivially
\label{eqn:reciprocalblog:1920}
\begin{aligned}
\Bx^0 &= (-1)^0 \lr{ \Bx_1 \wedge \Bx_2 \wedge \Bx_3 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 \wedge \Bx_3 } \\
\Bx^1 &= (-1)^1 \lr{ \Bx_0 \wedge \Bx_2 \wedge \Bx_3 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 \wedge \Bx_3 } \\
\Bx^2 &= (-1)^2 \lr{ \Bx_0 \wedge \Bx_1 \wedge \Bx_3 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 \wedge \Bx_3 } \\
\Bx^3 &= (-1)^3 \lr{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 } \cdot \inv{ \Bx_0 \wedge \Bx_1 \wedge \Bx_2 \wedge \Bx_3 }.
\end{aligned}

This is probably not an efficient way to compute all these reciprocals, since we can utilize a single matrix inversion to solve them in one shot. However, there are theoretical advantages to this construction that will be useful when we get to integration theory.

On degeneracy.

A small mention of degeneracy was mentioned above. Regardless of metric, $$\Bx_0 \wedge \Bx_1 = 0$$ means that this pair of vectors are colinear. A tangent space with such a pseudoscalar is clearly undesirable, and we must construct parameterizations for which the area element is non-zero in all regions of interest.

Things get more interesting in mixed signature spaces where we can have vectors that square to zero (i.e. lightlike). If the tangent space pseudoscalar has a lightlike factor, then that pseudoscalar will not be invertible. Such a degeneracy will will likely lead to many other troubles, and parameterizations of this sort should be avoided.

This following problem illustrates an example of this sort of degenerate parameterization.

Problem: Degenerate surface parameterization.

Given a spacetime plane parameterization $$x(u,v) = u a + v b$$, where
\label{eqn:reciprocalblog:480}
a = \gamma_0 + \gamma_1 + \gamma_2 + \gamma_3,

\label{eqn:reciprocalblog:500}
b = \gamma_0 – \gamma_1 + \gamma_2 – \gamma_3,

show that this is a degenerate parameterization, and find the bivector that represents the tangent space. Are these vectors lightlike, spacelike, or timelike? Comment on whether this parameterization represents a physically relevant spacetime surface.

To characterize the vectors, we square them
\label{eqn:reciprocalblog:1080}
a^2 = b^2 =
\gamma_0^2 +
\gamma_1^2 +
\gamma_2^2 +
\gamma_3^2
=
1 – 3
= -2,

so $$a, b$$ are both spacelike vectors. The tangent space is clearly just $$\mbox{Span}\setlr{ a, b } = \mbox{Span}\setlr{ e, f }$$ where
\label{eqn:reciprocalblog:1100}
\begin{aligned}
e &= \gamma_0 + \gamma_2 \\
f &= \gamma_1 + \gamma_3.
\end{aligned}

Observe that $$a = e + f, b = e – f$$, and $$e$$ is lightlike ($$e^2 = 0$$), whereas $$f$$ is spacelike ($$f^2 = -2$$), and $$e \cdot f = 0$$, so $$e f = – f e$$. The bivector for the tangent plane is
\label{eqn:reciprocalblog:1120}
a b
}
=
(e + f) (e – f)
}
=
e^2 – f^2 – 2 e f
}
= -2 e f,

where
\label{eqn:reciprocalblog:1140}
e f = \gamma_{01} + \gamma_{21} + \gamma_{23} + \gamma_{03}.

Because $$e$$ is lightlike (zero square), and $$e f = – f e$$,
the bivector $$e f$$ squares to zero
\label{eqn:reciprocalblog:1780}
\lr{ e f }^2
= -e^2 f^2
= 0,

which shows that the parameterization is degenerate.

This parameterization can also be expressed as
\label{eqn:reciprocalblog:1160}
x(u,v)
= u ( e + f ) + v ( e – f )
= (u + v) e + (u – v) f,

a linear combination of a lightlike and spacelike vector. Intuitively, we expect that a physically meaningful spacetime surface involves linear combinations spacelike vectors, or combinations of a timelike vector with spacelike vectors. This beastie is something entirely different.

Final notes.

There are a few loose ends above. In particular, we haven’t conclusively proven that the set of reciprocal vectors $$\Bx^\mu = \grad u^\mu$$ are exactly those obtained through algebraic means. For a full parameterization of spacetime, they are necessarily the same, since both are unique. So we know that \ref{eqn:reciprocalblog:1920} must equal the reciprocals obtained by evaluating the gradient for a full parameterization (and this must also equal the reciprocals that we can obtain through matrix inversion.) We have also not proved explicitly that the three parameter construction of the reciprocals in \ref{eqn:reciprocalblog:1900} is in the tangent space, but that is a fairly trivial observation, so that can be left as an exercise for the reader dismissal. Some additional thought about this is probably required, but it seems reasonable to put that on the back burner and move on to some applications.

References

[1] Peeter Joot. Geometric Algebra for Electrical Engineers. Kindle Direct Publishing, 2019.

Motivation.

In my old classical mechanics notes it appears that I did covariant derivations of the Lorentz force equations a number of times, using different trial Lagrangians (relativistic and non-relativistic), and using both geometric algebra and tensor methods. However, none of these appear to have been done concisely, and a number not even coherently.

The following document has been drafted as replacement text for those incoherent classical mechanics notes. I’ll attempt to cover

• a lighting review of the geometric algebra STA (Space Time Algebra),
• relations between Dirac matrix algebra and STA,
• derivation of the relativistic form of the Euler-Lagrange equations from the covariant form of the action,
• relationship of the STA form of the Euler-Lagrange equations to their tensor equivalents,
• derivation of the Lorentz force equation from the STA Lorentz force Lagrangian,
• relationship of the STA Lorentz force equation to its equivalent in the tensor formalism,
• relationship of the STA Lorentz force equation to the traditional vector form.

Note that some of the prerequisite ideas and auxiliary details are presented as problems with solutions. If the reader has sufficient background to attempt those problems themselves, they are encouraged to do so.

The STA and geometric algebra ideas used here are not complete to learn from in isolation. The reader is referred to [1] for a more complete exposition of both STA and geometric algebra.

Definition 1.1: Index conventions.

Latin indexes $$i, j, k, r, s, t, \cdots$$ are used to designate values in the range $$\setlr{ 1,2,3 }$$. Greek indexes are $$\alpha, \beta, \mu, \nu, \cdots$$ are used for indexes of spacetime quantities $$\setlr{0,1,2,3}$$.
The Einstein convention of implied summation for mixed upper and lower Greek indexes will be used, for example
\begin{equation*}
x^\alpha x_\alpha \equiv \sum_{\alpha = 0}^3 x^\alpha x_\alpha.
\end{equation*}

Space Time Algebra (STA.)

In the geometric algebra literature, the Dirac algebra of quantum field theory has been rebranded Space Time Algebra (STA). The differences between STA and the Dirac theory that uses matrices ($$\gamma_\mu$$) are as follows

• STA completely omits any representation of the Dirac basis vectors $$\gamma_\mu$$. In particular, any possible matrix representation is irrelevant.
• STA provides a rich set of fundamental operations (grade selection, generalized dot and wedge products for multivector elements, rotation and reflection operations, …)
• Matrix trace, and commutator and anticommutator operations are nowhere to be found in STA, as geometrically grounded equivalents are available instead.
• The “slashed” quantities from Dirac theory, such as $$\gamma_\mu p^\mu$$ are nothing more than vectors in their entirety in STA (where the basis is no longer implicit, as is the case for coordinates.)

Our basis vectors have the following properties.

Definition 1.2: Standard basis.

Let the four-vector standard basis be designated $$\setlr{\gamma_0, \gamma_1, \gamma_2, \gamma_3 }$$, where the basis vectors satisfy
\label{eqn:lorentzForceCovariant:1540}
\begin{aligned}
\gamma_0^2 &= -\gamma_i^2 = 1 \\
\gamma_\alpha \cdot \gamma_\beta &= 0, \forall \alpha \ne \beta.
\end{aligned}

Problem: Commutator properties of the STA basis.

In Dirac theory, the commutator properties of the Dirac matrices is considered fundamental, namely
\begin{equation*}
\symmetric{\gamma_\mu}{\gamma_\nu} = 2 \eta_{\mu\nu}.
\end{equation*}

Show that this follows from the axiomatic assumptions of geometric algebra, and describe how the dot and wedge products are related to the anticommutator and commutator products of Dirac theory.

The anticommutator is defined as symmetric sum of products
\label{eqn:lorentzForceCovariant:1040}
\symmetric{\gamma_\mu}{\gamma_\nu}
\equiv
\gamma_\mu \gamma_\nu
+
\gamma_\nu \gamma_\mu,

but this is just twice the dot product in its geometric algebra form $$a b = (a b + ba)/2$$. Observe that the properties of the basis vectors defined in \ref{eqn:lorentzForceCovariant:1540} may be summarized as
\label{eqn:lorentzForceCovariant:1060}
\gamma_\mu \cdot \gamma_\nu = \eta_{\mu\nu},

where $$\eta_{\mu\nu} = \text{diag}(+,-,-,-) = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \\ \end{bmatrix}$$ is the conventional metric tensor. This means
\label{eqn:lorentzForceCovariant:1080}
\gamma_\mu \cdot \gamma_\nu = \eta_{\mu\nu} = 2 \symmetric{\gamma_\mu}{\gamma_\nu},

as claimed.

Similarly, observe that the commutator, defined as the antisymmetric sum of products
\label{eqn:lorentzForceCovariant:1100}
\antisymmetric{\gamma_\mu}{\gamma_\nu} \equiv
\gamma_\mu \gamma_\nu

\gamma_\nu \gamma_\mu,

is twice the wedge product $$a \wedge b = (a b – b a)/2$$. This provides geometric identifications for the respective anti-commutator and commutator products respectively
\label{eqn:lorentzForceCovariant:1120}
\begin{aligned}
\symmetric{\gamma_\mu}{\gamma_\nu} &= 2 \gamma_\mu \cdot \gamma_\nu \\
\antisymmetric{\gamma_\mu}{\gamma_\nu} &= 2 \gamma_\mu \wedge \gamma_\nu,
\end{aligned}

Definition 1.3: Pseudoscalar.

The pseudoscalar for the space is denoted $$I = \gamma_0 \gamma_1 \gamma_2 \gamma_3$$.

Problem: Pseudoscalar.

Show that the STA pseudoscalar $$I$$ defined by \ref{eqn:lorentzForceCovariant:1540} satisfies
\begin{equation*}
\tilde{I} = I,
\end{equation*}
where the tilde operator designates reversion. Also show that $$I$$ has the properties of an imaginary number
\begin{equation*}
I^2 = -1.
\end{equation*}
Finally, show that, unlike the spatial pseudoscalar that commutes with all grades, $$I$$ anticommutes with any vector or trivector, and commutes with any bivector.

Since $$\gamma_\alpha \gamma_\beta = -\gamma_\beta \gamma_\alpha$$ for any $$\alpha \ne \beta$$, any permutation of the factors of $$I$$ changes the sign once. In particular
\label{eqn:lorentzForceCovariant:680}
\begin{aligned}
I &=
\gamma_0
\gamma_1
\gamma_2
\gamma_3 \\
&=

\gamma_1
\gamma_2
\gamma_3
\gamma_0 \\
&=

\gamma_2
\gamma_3
\gamma_1
\gamma_0 \\
&=
+
\gamma_3
\gamma_2
\gamma_1
\gamma_0
= \tilde{I}.
\end{aligned}

Using this, we have
\label{eqn:lorentzForceCovariant:700}
\begin{aligned}
I^2
&= I \tilde{I} \\
&=
(
\gamma_0
\gamma_1
\gamma_2
\gamma_3
)(
\gamma_3
\gamma_2
\gamma_1
\gamma_0
) \\
&=
\lr{\gamma_0}^2
\lr{\gamma_1}^2
\lr{\gamma_2}^2
\lr{\gamma_3}^2 \\
&=
(+1)
(-1)
(-1)
(-1) \\
&= -1.
\end{aligned}

To illustrate the anticommutation property with any vector basis element, consider the following two examples:
\label{eqn:lorentzForceCovariant:720}
\begin{aligned}
I \gamma_0 &=
\gamma_0
\gamma_1
\gamma_2
\gamma_3
\gamma_0 \\
&=

\gamma_0
\gamma_0
\gamma_1
\gamma_2
\gamma_3 \\
&=

\gamma_0 I,
\end{aligned}

\label{eqn:lorentzForceCovariant:740}
\begin{aligned}
I \gamma_2
&=
\gamma_0
\gamma_1
\gamma_2
\gamma_3
\gamma_2 \\
&=

\gamma_0
\gamma_1
\gamma_2
\gamma_2
\gamma_3 \\
&=

\gamma_2
\gamma_0
\gamma_1
\gamma_2
\gamma_3 \\
&= -\gamma_2 I.
\end{aligned}

A total of three sign swaps is required to “percolate” any given $$\gamma_\alpha$$ through the factors of $$I$$, resulting in an overall sign change of $$-1$$.

For any bivector basis element $$\alpha \ne \beta$$
\label{eqn:lorentzForceCovariant:760}
\begin{aligned}
I \gamma_\alpha \gamma_\beta
&=
-\gamma_\alpha I \gamma_\beta \\
&=
+\gamma_\alpha \gamma_\beta I.
\end{aligned}

Similarly for any trivector basis element $$\alpha \ne \beta \ne \sigma$$
\label{eqn:lorentzForceCovariant:780}
\begin{aligned}
I \gamma_\alpha \gamma_\beta \gamma_\sigma
&=
-\gamma_\alpha I \gamma_\beta \gamma_\sigma \\
&=
+\gamma_\alpha \gamma_\beta I \gamma_\sigma \\
&=
-\gamma_\alpha \gamma_\beta \gamma_\sigma I.
\end{aligned}

Definition 1.4: Reciprocal basis.

The reciprocal basis $$\setlr{ \gamma^0, \gamma^1, \gamma^2, \gamma^3 }$$ is defined , such that the property $$\gamma^\alpha \cdot \gamma_\beta = {\delta^\alpha}_\beta$$ holds.

Observe that, $$\gamma^0 = \gamma_0$$ and $$\gamma^i = -\gamma_i$$.

Theorem 1.1: Coordinates.

Coordinates are defined in terms of dot products with the standard basis, or reciprocal basis
\begin{equation*}
\begin{aligned}
x^\alpha &= x \cdot \gamma^\alpha \\
x_\alpha &= x \cdot \gamma_\alpha,
\end{aligned}
\end{equation*}

Start proof:

Suppose that a coordinate representation of the following form is assumed
\label{eqn:lorentzForceCovariant:820}
x = x^\alpha \gamma_\alpha = x_\beta \gamma^\beta.

We wish to determine the representation of the $$x^\alpha$$ or $$x_\beta$$ coordinates in terms of $$x$$ and the basis elements. Taking the dot product with any standard basis element, we find
\label{eqn:lorentzForceCovariant:840}
\begin{aligned}
x \cdot \gamma_\mu
&= (x_\beta \gamma^\beta) \cdot \gamma_\mu \\
&= x_\beta {\delta^\beta}_\mu \\
&= x_\mu,
\end{aligned}

as claimed. Similarly, dotting with a reciprocal frame vector, we find
\label{eqn:lorentzForceCovariant:860}
\begin{aligned}
x \cdot \gamma^\mu
&= (x^\beta \gamma_\beta) \cdot \gamma^\mu \\
&= x^\beta {\delta_\beta}^\mu \\
&= x^\mu.
\end{aligned}

End proof.

Observe that raising or lowering the index of a spatial index toggles the sign of a coordinate, but timelike indexes are left unchanged.
\label{eqn:lorentzForceCovariant:880}
\begin{aligned}
x^0 &= x_0 \\
x^i &= -x_i \\
\end{aligned}

\begin{equation*}
\grad = \gamma^\mu \partial_\mu = \gamma_\nu \partial^\nu,
\end{equation*}
where
\begin{equation*}
\partial_\mu = \PD{x^\mu}{},
\end{equation*}
and
\begin{equation*}
\partial^\mu = \PD{x_\mu}{}.
\end{equation*}

This definition of gradient is consistent with the Dirac gradient (sometimes denoted as a slashed $$\partial$$).

Definition 1.6: Timelike and spacelike components of a four-vector.

Given a four vector $$x = \gamma_\mu x^\mu$$, that would be designated $$x^\mu = \setlr{ x^0, \Bx}$$ in conventional special relativity, we write
\begin{equation*}
x^0 = x \cdot \gamma_0,
\end{equation*}
and
\begin{equation*}
\Bx = x \wedge \gamma_0,
\end{equation*}
or
\begin{equation*}
x = (x^0 + \Bx) \gamma_0.
\end{equation*}

The spacetime split of a four-vector $$x$$ is relative to the frame. In the relativistic lingo, one would say that it is “observer dependent”, as the same operations with $${\gamma_0}’$$, the timelike basis vector for a different frame, would yield a different set of coordinates.

While the dot and wedge products above provide an effective mechanism to split a four vector into a set of timelike and spacelike quantities, the spatial component of a vector has a bivector representation in STA. Consider the following coordinate expansion of a spatial vector
\label{eqn:lorentzForceCovariant:1000}
\Bx =
x \wedge \gamma_0
=
\lr{ x^\mu \gamma_\mu } \wedge \gamma_0
=
\sum_{k = 1}^3 x^k \gamma_k \gamma_0.

Definition 1.7: Spatial basis.

We designate
\label{eqn:lorentzForceCovariant:1560}
\Be_i = \gamma_i \gamma_0,

as the standard basis vectors for $$\mathbb{R}^3$$.

In the literature, this bivector representation of the spatial basis may be designated $$\sigma_i = \gamma_i \gamma_0$$, as these bivectors have the properties of the Pauli matrices $$\sigma_i$$. Because I intend to expand these notes to include purely non-relativistic applications, I won’t use the Pauli notation here.

Problem: Orthonormality of the spatial basis.

Show that the spatial basis $$\setlr{ \Be_1, \Be_2, \Be_3 }$$, defined by \ref{eqn:lorentzForceCovariant:1560}, is orthonormal.

\label{eqn:lorentzForceCovariant:620}
\begin{aligned}
\Be_i \cdot \Be_j
&= \gpgradezero{ \gamma_i \gamma_0 \gamma_j \gamma_0 } \\
&= -\gpgradezero{ \gamma_i \gamma_j } \\
&= – \gamma_i \cdot \gamma_j.
\end{aligned}

This is zero for all $$i \ne j$$, and unity for any $$i = j$$.

Problem: Spatial pseudoscalar.

Show that the STA pseudoscalar $$I = \gamma_0 \gamma_1 \gamma_2 \gamma_3$$ equals the spatial pseudoscalar $$I = \Be_1 \Be_2 \Be_3$$.

The spatial pseudoscalar, expanded in terms of the STA basis vectors, is
\label{eqn:lorentzForceCovariant:1020}
\begin{aligned}
I
&= \Be_1 \Be_2 \Be_3 \\
&= \lr{ \gamma_1 \gamma_0 }
\lr{ \gamma_2 \gamma_0 }
\lr{ \gamma_3 \gamma_0 } \\
&= \lr{ \gamma_1 \gamma_0 } \gamma_2 \lr{ \gamma_0 \gamma_3 } \gamma_0 \\
&= \lr{ -\gamma_0 \gamma_1 } \gamma_2 \lr{ -\gamma_3 \gamma_0 } \gamma_0 \\
&= \gamma_0 \gamma_1 \gamma_2 \gamma_3 \lr{ \gamma_0 \gamma_0 } \\
&= \gamma_0 \gamma_1 \gamma_2 \gamma_3,
\end{aligned}

as claimed.

Problem: Characteristics of the Pauli matrices.

The Pauli matrices obey the following anticommutation relations:
\label{eqn:lorentzForceCovariant:660}
\symmetric{ \sigma_a}{\sigma_b } = 2 \delta_{a b},

and commutation relations:
\label{eqn:lorentzForceCovariant:640}
\antisymmetric{ \sigma_a}{ \sigma_b } = 2 i \epsilon_{a b c}\,\sigma_c,

Show how these relate to the geometric algebra dot and wedge products, and determine the geometric algebra representation of the imaginary $$i$$ above.

Euler-Lagrange equations.

I’ll start at ground zero, with the derivation of the relativistic form of the Euler-Lagrange equations from the action. A relativistic action for a single particle system has the form
\label{eqn:lorentzForceCovariant:20}
S = \int d\tau L(x, \dot{x}),

where $$x$$ is the spacetime coordinate, $$\dot{x} = dx/d\tau$$ is the four-velocity, and $$\tau$$ is proper time.

Theorem 1.2: Relativistic Euler-Lagrange equations.

Let $$x \rightarrow x + \delta x$$ be any variation of the Lagrangian four-vector coordinates, where $$\delta x = 0$$ at the boundaries of the action integral. The variation of the action is
\label{eqn:lorentzForceCovariant:1580}
\delta S = \int d\tau \delta x \cdot \delta L(x, \dot{x}),

where
\label{eqn:lorentzForceCovariant:1600}

where $$\grad = \gamma^\mu \partial_\mu$$, and where we construct a similar velocity-gradient with respect to the proper-time derivatives of the coordinates $$\grad_v = \gamma^\mu \partial/\partial \dot{x}^\mu$$.The action is extremized when $$\delta S = 0$$, or when $$\delta L = 0$$. This latter condition is called the Euler-Lagrange equations.

Start proof:

Let $$\epsilon = \delta x$$, and expand the Lagrangian in Taylor series to first order
\label{eqn:lorentzForceCovariant:60}
\begin{aligned}
S &\rightarrow S + \delta S \\
&= \int d\tau L( x + \epsilon, \dot{x} + \dot{\epsilon})
&=
\int d\tau \lr{
L(x, \dot{x}) + \epsilon \cdot \grad L + \dot{\epsilon} \cdot \grad_v L
}.
\end{aligned}

Subtracting off $$S$$ and integrating by parts, leaves
\label{eqn:lorentzForceCovariant:80}
\delta S =
\int d\tau \epsilon \cdot \lr{
}
+
\int d\tau \frac{d}{d\tau} (\grad_v L ) \cdot \epsilon.

The boundary integral
\label{eqn:lorentzForceCovariant:100}
\int d\tau \frac{d}{d\tau} (\grad_v L ) \cdot \epsilon
=
\evalbar{(\grad_v L ) \cdot \epsilon}{\Delta \tau} = 0,

is zero since the variation $$\epsilon$$ is required to vanish on the boundaries. So, if $$\delta S = 0$$, we must have
\label{eqn:lorentzForceCovariant:120}
0 =
\int d\tau \epsilon \cdot \lr{
},

for all variations $$\epsilon$$. Clearly, this requires that
\label{eqn:lorentzForceCovariant:140}

or
\label{eqn:lorentzForceCovariant:145}

which is the coordinate free statement of the Euler-Lagrange equations.

Problem: Coordinate form of the Euler-Lagrange equations.

Working in coordinates, use the action argument show that the Euler-Lagrange equations have the form
\begin{equation*}
\PD{x^\mu}{L} = \frac{d}{d\tau} \PD{\dot{x}^\mu}{L}
\end{equation*}
Observe that this is identical to the statement of \ref{eqn:lorentzForceCovariant:1600} after contraction with $$\gamma^\mu$$.

In terms of coordinates, the first order Taylor expansion of the action is
\label{eqn:lorentzForceCovariant:180}
\begin{aligned}
S &\rightarrow S + \delta S \\
&= \int d\tau L( x^\alpha + \epsilon^\alpha, \dot{x}^\alpha + \dot{\epsilon}^\alpha) \\
&=
\int d\tau \lr{
L(x^\alpha, \dot{x}^\alpha) + \epsilon^\mu \PD{x^\mu}{L} + \dot{\epsilon}^\mu \PD{\dot{x}^\mu}{L}
}.
\end{aligned}

As before, we integrate by parts to separate out a pure boundary term
\label{eqn:lorentzForceCovariant:200}
\delta S =
\int d\tau \epsilon^\mu
\lr{
\PD{x^\mu}{L} – \frac{d}{d\tau} \PD{\dot{x}^\mu}{L}
}
+
\int d\tau \frac{d}{d\tau} \lr{
\epsilon^\mu \PD{\dot{x}^\mu}{L}
}.

The boundary term is killed since $$\epsilon^\mu = 0$$ at the end points of the action integral. We conclude that extremization of the action ($$\delta S = 0$$, for all $$\epsilon^\mu$$) requires
\label{eqn:lorentzForceCovariant:220}
\PD{x^\mu}{L} – \frac{d}{d\tau} \PD{\dot{x}^\mu}{L} = 0.

Theorem 1.3: Lorentz force.

The relativistic Lagrangian for a charged particle is
\label{eqn:lorentzForceCovariant:1640}
L = \inv{2} m v^2 + q A \cdot v/c.

Application of the Euler-Lagrange equations to this Lagrangian yields the Lorentz-force equation
\label{eqn:lorentzForceCovariant:1660}
\frac{dp}{d\tau} = q F \cdot v/c,

where $$p = m v$$ is the proper momentum, $$F$$ is the Faraday bivector $$F = \grad \wedge A$$, and $$c$$ is the speed of light.

Start proof:

To make life easier, let’s take advantage of the linearity of the Lagrangian, and break it into the free particle Lagrangian $$L_0 = (1/2) m v^2$$ and a potential term $$L_1 = q A \cdot v/c$$. For the free particle case we have
\label{eqn:lorentzForceCovariant:240}
\begin{aligned}
\delta L_0
&= – \frac{d}{d\tau} (m v) \\
&= – \frac{dp}{d\tau}.
\end{aligned}

For the potential contribution we have
\label{eqn:lorentzForceCovariant:260}
\begin{aligned}
\delta L_1
&= \frac{q}{c} \lr{ \grad (A \cdot v) – \frac{d}{d\tau} \lr{ \grad_v (A \cdot v)} } \\
&= \frac{q}{c} \lr{ \grad (A \cdot v) – \frac{dA}{d\tau} }.
\end{aligned}

The proper time derivative can be evaluated using the chain rule
\label{eqn:lorentzForceCovariant:280}
\frac{dA}{d\tau}
=
\frac{\partial x^\mu}{\partial \tau} \partial_\mu A

Putting all the pieces back together we have
\label{eqn:lorentzForceCovariant:300}
\begin{aligned}
0
&= \delta L \\
&=
-\frac{dp}{d\tau} + \frac{q}{c} \lr{ \grad (A \cdot v) – (v \cdot \grad) A } \\
&=
-\frac{dp}{d\tau} + \frac{q}{c} \lr{ \grad \wedge A } \cdot v.
\end{aligned}

Problem: Gradient of a squared position vector.

Show that
\begin{equation*}
\grad (a \cdot x) = a,
\end{equation*}
and
\begin{equation*}
\end{equation*}
It should be clear that the same ideas can be used for the velocity gradient, where we obtain $$\grad_v (v^2) = 2 v$$, and $$\grad_v (A \cdot v) = A$$, as used in the derivation above.

The first identity follows easily by expansion in coordinates
\label{eqn:lorentzForceCovariant:320}
\begin{aligned}
&=
\gamma^\mu \partial_\mu a_\alpha x^\alpha \\
&=
\gamma^\mu a_\alpha \delta_\mu^\alpha \\
&=
\gamma^\mu a_\mu \\
&=
a.
\end{aligned}

The second identity follows by linearity of the gradient
\label{eqn:lorentzForceCovariant:340}
\begin{aligned}
&=
&=
\evalbar{\lr{\grad (x \cdot a)}}{a = x}
+
\evalbar{\lr{\grad (b \cdot x)}}{b = x} \\
&=
\evalbar{a}{a = x}
+
\evalbar{b}{b = x} \\
&=
2x.
\end{aligned}

It is desirable to put this relativistic Lorentz force equation into the usual vector and tensor forms for comparison.

Theorem 1.4: Tensor form of the Lorentz force equation.

The tensor form of the Lorentz force equation is
\label{eqn:lorentzForceCovariant:1620}
\frac{dp^\mu}{d\tau} = \frac{q}{c} F^{\mu\nu} v_\nu,

where the antisymmetric Faraday tensor is defined as $$F^{\mu\nu} = \partial^\mu A^\nu – \partial^\nu A^\mu$$.

Start proof:

We have only to dot both sides with $$\gamma^\mu$$. On the left we have
\label{eqn:lorentzForceCovariant:380}
\gamma^\mu \cdot \frac{dp}{d\tau}
=
\frac{dp^\mu}{d\tau}.

On the right, we have
\label{eqn:lorentzForceCovariant:400}
\begin{aligned}
\gamma^\mu \cdot \lr{ \frac{q}{c} F \cdot v }
&=
\frac{q}{c} (( \grad \wedge A ) \cdot v ) \cdot \gamma^\mu \\
&=
\frac{q}{c} ( \grad ( A \cdot v ) – (v \cdot \grad) A ) \cdot \gamma^\mu \\
&=
\frac{q}{c} \lr{ (\partial^\mu A^\nu) v_\nu – v_\nu \partial^\nu A^\mu } \\
&=
\frac{q}{c} F^{\mu\nu} v_\nu.
\end{aligned}

Problem: Tensor expansion of $$F$$.

An alternate way to demonstrate \ref{eqn:lorentzForceCovariant:1620} is to first expand $$F = \grad \wedge A$$ in terms of coordinates, an expansion that can be expressed in terms of a second rank tensor antisymmetric tensor $$F^{\mu\nu}$$. Find that expansion, and re-evaluate the dot products of \ref{eqn:lorentzForceCovariant:400} using that.

\label{eqn:lorentzForceCovariant:900}
\begin{aligned}
F &=
&=
\lr{ \gamma_\mu \partial^\mu } \wedge \lr{ \gamma_\nu A^\nu } \\
&=
\lr{ \gamma_\mu \wedge \gamma_\nu } \partial^\mu A^\nu.
\end{aligned}

To this we can use the usual tensor trick (add self to self, change indexes, and divide by two), to give
\label{eqn:lorentzForceCovariant:920}
\begin{aligned}
F &=
\inv{2} \lr{
\lr{ \gamma_\mu \wedge \gamma_\nu } \partial^\mu A^\nu
+
\lr{ \gamma_\nu \wedge \gamma_\mu } \partial^\nu A^\mu
} \\
&=
\inv{2}
\lr{ \gamma_\mu \wedge \gamma_\nu } \lr{
\partial^\mu A^\nu

\partial^\nu A^\mu
},
\end{aligned}

which is just
\label{eqn:lorentzForceCovariant:940}
F =
\inv{2} \lr{ \gamma_\mu \wedge \gamma_\nu } F^{\mu\nu}.

Now, let’s expand $$(F \cdot v) \cdot \gamma^\mu$$ to compare to the earlier expansion in terms of $$\grad$$ and $$A$$.
\label{eqn:lorentzForceCovariant:960}
\begin{aligned}
(F \cdot v) \cdot \gamma^\mu
&=
\inv{2}
F^{\alpha\nu}
\lr{ \lr{ \gamma_\alpha \wedge \gamma_\nu } \cdot \lr{ \gamma^\beta v_\beta } } \cdot \gamma^\mu \\
&=
\inv{2}
F^{\alpha\nu} v_\beta
\lr{
{\delta_\nu}^\beta {\gamma_\alpha}^\mu

{\delta_\alpha}^\beta {\gamma_\nu}^\mu
} \\
&=
\inv{2}
\lr{
F^{\mu\beta} v_\beta

F^{\beta\mu} v_\beta
} \\
&=
F^{\mu\nu} v_\nu.
\end{aligned}

This alternate expansion illustrates some of the connectivity between the geometric algebra approach and the traditional tensor formalism.

Problem: Lorentz force direct tensor derivation.

Instead of using the geometric algebra form of the Lorentz force equation as a stepping stone, we may derive the tensor form from the Lagrangian directly, provided the Lagrangian is put into tensor form
\begin{equation*}
L = \inv{2} m v^\mu v_\mu + q A^\mu v_\mu /c.
\end{equation*}
Evaluate the Euler-Lagrange equations in coordinate form and compare to \ref{eqn:lorentzForceCovariant:1620}.

Let $$\delta_\mu L = \gamma_\mu \cdot \delta L$$, so that we can write the Euler-Lagrange equations as
\label{eqn:lorentzForceCovariant:460}
0 = \delta_\mu L = \PD{x^\mu}{L} – \frac{d}{d\tau} \PD{\dot{x}^\mu}{L}.

Operating on the kinetic term of the Lagrangian, we have
\label{eqn:lorentzForceCovariant:480}
\delta_\mu L_0 = – \frac{d}{d\tau} m v_\mu.

For the potential term
\label{eqn:lorentzForceCovariant:500}
\begin{aligned}
\delta_\mu L_1
&=
\frac{q}{c} \lr{
v_\nu \PD{x^\mu}{A^\nu} – \frac{d}{d\tau} A_\mu
} \\
&=
\frac{q}{c} \lr{
v_\nu \PD{x^\mu}{A^\nu} – \frac{dx_\alpha}{d\tau} \PD{x_\alpha}{ A_\mu }
} \\
&=
\frac{q}{c} v^\nu \lr{
\partial_\mu A_\nu – \partial_\nu A_\mu
} \\
&=
\frac{q}{c} v^\nu F_{\mu\nu}.
\end{aligned}

Putting the pieces together gives
\label{eqn:lorentzForceCovariant:520}
\frac{d}{d\tau} (m v_\mu) = \frac{q}{c} v^\nu F_{\mu\nu},

which is identical\footnote{Some minor index raising and lowering gymnastics are required.} to the tensor form that we found by expanding the geometric algebra form of Maxwell’s equation in coordinates.

Theorem 1.5: Vector Lorentz force equation.

Relative to a fixed observer’s frame, the Lorentz force equation of \ref{eqn:lorentzForceCovariant:1660} splits into a spatial rate of change of momentum, and (timelike component) rate of change of energy, as follows
\label{eqn:lorentzForceCovariant:1680}
\begin{aligned}
\ddt{(\gamma m \Bv)} &= q \lr{ \BE + \Bv \cross \BB } \\
\ddt{(\gamma m c^2)} &= q \Bv \cdot \BE,
\end{aligned}

where $$F = \BE + I c \BB$$, $$\gamma = 1/\sqrt{1 – \Bv^2/c^2 }$$.

Start proof:

The first step is to eliminate the proper time dependencies in the Lorentz force equation. Consider first the coordinate representation of an arbitrary position four-vector $$x$$
\label{eqn:lorentzForceCovariant:1140}
x = c t \gamma_0 + x^k \gamma_k.

The corresponding four-vector velocity is
\label{eqn:lorentzForceCovariant:1160}
v = \ddtau{x} = c \ddtau{t} \gamma_0 + \ddtau{t} \ddt{x^k} \gamma_k.

By construction, $$v^2 = c^2$$ is a Lorentz invariant quantity (this is one of the relativistic postulates), so the LHS of \ref{eqn:lorentzForceCovariant:1160} must have the same square. That is
\label{eqn:lorentzForceCovariant:1240}
c^2 = \lr{ \ddtau{t} }^2 \lr{ c^2 – \Bv^2 },

where $$\Bv = v \wedge \gamma_0$$. This shows that we may make the identification
\label{eqn:lorentzForceCovariant:1260}
\gamma = \ddtau{t} = \inv{1 – \Bv^2/c^2 },

and
\label{eqn:lorentzForceCovariant:1280}
\ddtau{} = \ddtau{t} \ddt{} = \gamma \ddt{}.

We may now factor the four-velocity $$v$$ into its spacetime split
\label{eqn:lorentzForceCovariant:1300}
v = \gamma \lr{ c + \Bv } \gamma_0.

In particular the LHS of the Lorentz force equation can be rewritten as
\label{eqn:lorentzForceCovariant:1320}
\ddtau{p} = \gamma \ddt{}\lr{ \gamma \lr{ c + \Bv } } \gamma_0,

and the RHS of the Lorentz force equation can be rewritten as
\label{eqn:lorentzForceCovariant:1340}
\frac{q}{c} F \cdot v
=
\frac{\gamma q}{c} F \cdot \lr{ (c + \Bv) \gamma_0 }.

Equating timelike and spacelike components leaves us
\label{eqn:lorentzForceCovariant:1380}
\ddt{ (m \gamma c) } = \frac{q}{c} \lr{ F \cdot \lr{ (c + \Bv) \gamma_0 } } \cdot \gamma_0,

\label{eqn:lorentzForceCovariant:1400}
\ddt{ (m \gamma \Bv) } = \frac{q}{c} \lr{ F \cdot \lr{ (c + \Bv) \gamma_0 } } \wedge \gamma_0,

Evaluating these products requires some care, but is an essentially manual process. The reader is encouraged to do so once, but the end result may also be obtained easily using software (see lorentzForce.nb in [2]). One finds
\label{eqn:lorentzForceCovariant:1440}
F = \BE + I c \BB
=
E^1 \gamma_{10} +
+ E^2 \gamma_{20} +
+ E^3 \gamma_{30} +
– c B^1 \gamma_{23} +
– c B^2 \gamma_{31} +
– c B^3 \gamma_{12},

\label{eqn:lorentzForceCovariant:1460}
\frac{q}{c} \lr{ F \cdot \lr{ (c + \Bv) \gamma_0 } } \cdot \gamma_0
= \frac{q}{c} \BE \cdot \Bv,

\label{eqn:lorentzForceCovariant:1480}
\frac{q}{c} \lr{ F \cdot \lr{ (c + \Bv) \gamma_0 } } \wedge \gamma_0
= q \lr{ \BE + \Bv \cross \BB }.

Problem: Algebraic spacetime split of the Lorentz force equation.

Derive the results of \ref{eqn:lorentzForceCovariant:1440} through \ref{eqn:lorentzForceCovariant:1480} algebraically.

Problem: Spacetime split of the Lorentz force tensor equation.

Show that \ref{eqn:lorentzForceCovariant:1680} also follows from the tensor form of the Lorentz force equation (\ref{eqn:lorentzForceCovariant:1620}) provided we identify
\label{eqn:lorentzForceCovariant:1500}
F^{k0} = E^k,

and
\label{eqn:lorentzForceCovariant:1520}
F^{rs} = -\epsilon^{rst} B^t.

Also verify that the identifications of \ref{eqn:lorentzForceCovariant:1500} and \ref{eqn:lorentzForceCovariant:1520} is consistent with the geometric algebra Faraday bivector $$F = \BE + I c \BB$$, and the associated coordinate expansion of the field $$F = (1/2) (\gamma_\mu \wedge \gamma_\nu) F^{\mu\nu}$$.

References

[1] C. Doran and A.N. Lasenby. Geometric algebra for physicists. Cambridge University Press New York, Cambridge, UK, 1st edition, 2003.

[2] Peeter Joot. Mathematica modules for Geometric Algebra’s GA(2,0), GA(3,0), and GA(1,3), 2017. URL https://github.com/peeterjoot/gapauli. [Online; accessed 24-Oct-2020].

Potential solutions to the static Maxwell’s equation using geometric algebra

When neither the electromagnetic field strength $$F = \BE + I \eta \BH$$, nor current $$J = \eta (c \rho – \BJ) + I(c\rho_m – \BM)$$ is a function of time, then the geometric algebra form of Maxwell’s equations is the first order multivector (gradient) equation
\label{eqn:staticPotentials:20}

While direct solutions to this equations are possible with the multivector Green’s function for the gradient
\label{eqn:staticPotentials:40}
G(\Bx, \Bx’) = \inv{4\pi} \frac{\Bx – \Bx’}{\Norm{\Bx – \Bx’}^3 },

the aim in this post is to explore second order (potential) solutions in a geometric algebra context. Can we assume that it is possible to find a multivector potential $$A$$ for which
\label{eqn:staticPotentials:60}

is a solution to the Maxwell statics equation? If such a solution exists, then Maxwell’s equation is simply
\label{eqn:staticPotentials:80}

which can be easily solved using the scalar Green’s function for the Laplacian
\label{eqn:staticPotentials:240}
G(\Bx, \Bx’) = -\inv{\Norm{\Bx – \Bx’} },

a beastie that may be easier to convolve than the vector valued Green’s function for the gradient.

It is immediately clear that some restrictions must be imposed on the multivector potential $$A$$. In particular, since the field $$F$$ has only vector and bivector grades, this gradient must have no scalar, nor pseudoscalar grades. That is
\label{eqn:staticPotentials:100}

This constraint on the potential can be avoided if a grade selection operation is built directly into the assumed potential solution, requiring that the field is given by
\label{eqn:staticPotentials:120}

However, after imposing such a constraint, Maxwell’s equation has a much less friendly form
\label{eqn:staticPotentials:140}

Luckily, it is possible to introduce a transformation of potentials, called a gauge transformation, that eliminates the ugly grade selection term, and allows the potential equation to be expressed as a plain old Laplacian. We do so by assuming first that it is possible to find a solution of the Laplacian equation that has the desired grade restrictions. That is
\label{eqn:staticPotentials:160}
\begin{aligned}
\end{aligned}

for which $$F = \spacegrad A’$$ is a grade 1,2 solution to $$\spacegrad F = J$$. Suppose that $$A$$ is any formal solution, free of any grade restrictions, to $$\spacegrad^2 A = J$$, and $$F = \gpgrade{\spacegrad A}{1,2}$$. Can we find a function $$\tilde{A}$$ for which $$A = A’ + \tilde{A}$$?

Maxwell’s equation in terms of $$A$$ is
\label{eqn:staticPotentials:180}
\begin{aligned}
J
\end{aligned}

or
\label{eqn:staticPotentials:200}

This non-homogeneous Laplacian equation that can be solved as is for $$\tilde{A}$$ using the Green’s function for the Laplacian. Alternatively, we may also solve the equivalent first order system using the Green’s function for the gradient.
\label{eqn:staticPotentials:220}

Clearly $$\tilde{A}$$ is not unique, as we can add any function $$\psi$$ satisfying the homogeneous Laplacian equation $$\spacegrad^2 \psi = 0$$.

In summary, if $$A$$ is any multivector solution to $$\spacegrad^2 A = J$$, that is
\label{eqn:staticPotentials:260}
A(\Bx)
= \int dV’ G(\Bx, \Bx’) J(\Bx’)
= -\int dV’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} },

then $$F = \spacegrad A’$$ is a solution to Maxwell’s equation, where $$A’ = A – \tilde{A}$$, and $$\tilde{A}$$ is a solution to the non-homogeneous Laplacian equation or the non-homogeneous gradient equation above.

Integral form of the gauge transformation.

Additional insight is possible by considering the gauge transformation in integral form. Suppose that
\label{eqn:staticPotentials:280}
A(\Bx) = -\int_V dV’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} } – \tilde{A}(\Bx),

is a solution of $$\spacegrad^2 A = J$$, where $$\tilde{A}$$ is a multivector solution to the homogeneous Laplacian equation $$\spacegrad^2 \tilde{A} = 0$$. Let’s look at the constraints on $$\tilde{A}$$ that must be imposed for $$F = \spacegrad A$$ to be a valid (i.e. grade 1,2) solution of Maxwell’s equation.
\label{eqn:staticPotentials:300}
\begin{aligned}
F
&=
-\int_V dV’ \lr{ \spacegrad \inv{\Norm{\Bx – \Bx’} } } J(\Bx’)
&=
\int_V dV’ \lr{ \spacegrad’ \inv{\Norm{\Bx – \Bx’} } } J(\Bx’)
&=
\int_V dV’ \spacegrad’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} } – \int_V dV’ \frac{\spacegrad’ J(\Bx’)}{\Norm{\Bx – \Bx’} }
&=
\int_{\partial V} dA’ \ncap’ \frac{J(\Bx’)}{\Norm{\Bx – \Bx’} } – \int_V \frac{\spacegrad’ J(\Bx’)}{\Norm{\Bx – \Bx’} }
\end{aligned}

Where $$\ncap’ = (\Bx’ – \Bx)/\Norm{\Bx’ – \Bx}$$, and the fundamental theorem of geometric calculus has been used to transform the gradient volume integral into an integral over the bounding surface. Operating on Maxwell’s equation with the gradient gives $$\spacegrad^2 F = \spacegrad J$$, which has only grades 1,2 on the left hand side, meaning that $$J$$ is constrained in a way that requires $$\spacegrad J$$ to have only grades 1,2. This means that $$F$$ has grades 1,2 if
\label{eqn:staticPotentials:320}
= \int_{\partial V} dA’ \frac{ \gpgrade{\ncap’ J(\Bx’)}{0,3} }{\Norm{\Bx – \Bx’} }.

The product $$\ncap J$$ expands to
\label{eqn:staticPotentials:340}
\begin{aligned}
\ncap J
&=
&=
\ncap \cdot (-\eta \BJ) + \gpgradethree{\ncap (-I \BM)} \\
&=- \eta \ncap \cdot \BJ -I \ncap \cdot \BM,
\end{aligned}

so
\label{eqn:staticPotentials:360}
=
-\int_{\partial V} dA’ \frac{ \eta \ncap’ \cdot \BJ(\Bx’) + I \ncap’ \cdot \BM(\Bx’)}{\Norm{\Bx – \Bx’} }.

Observe that if there is no flux of current density $$\BJ$$ and (fictitious) magnetic current density $$\BM$$ through the surface, then $$F = \spacegrad A$$ is a solution to Maxwell’s equation without any gauge transformation. Alternatively $$F = \spacegrad A$$ is also a solution if $$\lim_{\Bx’ \rightarrow \infty} \BJ(\Bx’)/\Norm{\Bx – \Bx’} = \lim_{\Bx’ \rightarrow \infty} \BM(\Bx’)/\Norm{\Bx – \Bx’} = 0$$ and the bounding volume is taken to infinity.

References

Generalizing Ampere’s law using geometric algebra.

The question I’d like to explore in this post is how Ampere’s law, the relationship between the line integral of the magnetic field to current (i.e. the enclosed current)
\label{eqn:flux:20}
\oint_{\partial A} d\Bx \cdot \BH = -\int_A \ncap \cdot \BJ,

generalizes to geometric algebra where Maxwell’s equations for a statics configuration (all time derivatives zero) is
\label{eqn:flux:40}

where the multivector fields and currents are
\label{eqn:flux:60}
\begin{aligned}
F &= \BE + I \eta \BH \\
J &= \eta \lr{ c \rho – \BJ } + I \lr{ c \rho_\txtm – \BM }.
\end{aligned}

Here (fictitious) the magnetic charge and current densities that can be useful in antenna theory have been included in the multivector current for generality.

My presumption is that it should be possible to utilize the fundamental theorem of geometric calculus for expressing the integral over an oriented surface to its boundary, but applied directly to Maxwell’s equation. That integral theorem has the form
\label{eqn:flux:80}
\int_A d^2 \Bx \boldpartial F = \oint_{\partial A} d\Bx F,

where $$d^2 \Bx = d\Ba \wedge d\Bb$$ is a two parameter bivector valued surface, and $$\boldpartial$$ is vector derivative, the projection of the gradient onto the tangent space. I won’t try to explain all of geometric calculus here, and refer the interested reader to [1], which is an excellent reference on geometric calculus and integration theory.

The gotcha is that we actually want a surface integral with $$\spacegrad F$$. We can split the gradient into the vector derivative a normal component
\label{eqn:flux:160}

so
\label{eqn:flux:100}
=
\int_A d^2 \Bx \boldpartial F
+
\int_A d^2 \Bx \ncap \lr{ \ncap \cdot \spacegrad } F,

so
\label{eqn:flux:120}
\begin{aligned}
\oint_{\partial A} d\Bx F
&=
\int_A d^2 \Bx \lr{ J – \ncap \lr{ \ncap \cdot \spacegrad } F } \\
&=
\int_A dA \lr{ I \ncap J – \lr{ \ncap \cdot \spacegrad } I F }
\end{aligned}

This is not nearly as nice as the magnetic flux relationship which was nicely split with the current and fields nicely separated. The $$d\Bx F$$ product has all possible grades, as does the $$d^2 \Bx J$$ product (in general). Observe however, that the normal term on the right has only grades 1,2, so we can split our line integral relations into pairs with and without grade 1,2 components
\label{eqn:flux:140}
\begin{aligned}
&=
\int_A dA \gpgrade{ I \ncap J }{0,3} \\
&=
\int_A dA \lr{ \gpgrade{ I \ncap J }{1,2} – \lr{ \ncap \cdot \spacegrad } I F }.
\end{aligned}

Let’s expand these explicitly in terms of the component fields and densities to check against the conventional relationships, and see if things look right. The line integrand expands to
\label{eqn:flux:180}
\begin{aligned}
d\Bx F
&=
d\Bx \lr{ \BE + I \eta \BH }
=
d\Bx \cdot \BE + I \eta d\Bx \cdot \BH
+
d\Bx \wedge \BE + I \eta d\Bx \wedge \BH \\
&=
d\Bx \cdot \BE
– \eta (d\Bx \cross \BH)
+ I (d\Bx \cross \BE )
+ I \eta (d\Bx \cdot \BH),
\end{aligned}

the current integrand expands to
\label{eqn:flux:200}
\begin{aligned}
I \ncap J
&=
I \ncap
\lr{
\frac{\rho}{\epsilon} – \eta \BJ + I \lr{ c \rho_\txtm – \BM }
} \\
&=
\ncap I \frac{\rho}{\epsilon} – \eta \ncap I \BJ – \ncap c \rho_\txtm + \ncap \BM \\
&=
\ncap \cdot \BM
+ \eta (\ncap \cross \BJ)
– \ncap c \rho_\txtm
+ I (\ncap \cross \BM)
+ \ncap I \frac{\rho}{\epsilon}
– \eta I (\ncap \cdot \BJ).
\end{aligned}

We are left with
\label{eqn:flux:220}
\begin{aligned}
\oint_{\partial A}
\lr{
d\Bx \cdot \BE + I \eta (d\Bx \cdot \BH)
}
&=
\int_A dA
\lr{
\ncap \cdot \BM – \eta I (\ncap \cdot \BJ)
} \\
\oint_{\partial A}
\lr{
– \eta (d\Bx \cross \BH)
+ I (d\Bx \cross \BE )
}
&=
\int_A dA
\lr{
\eta (\ncap \cross \BJ)
– \ncap c \rho_\txtm
+ I (\ncap \cross \BM)
+ \ncap I \frac{\rho}{\epsilon}
-\PD{n}{} \lr{ I \BE – \eta \BH }
}.
\end{aligned}

This is a crazy mess of dots, crosses, fields and sources. We can split it into one equation for each grade, which will probably look a little more regular. That is
\label{eqn:flux:240}
\begin{aligned}
\oint_{\partial A} d\Bx \cdot \BE &= \int_A dA \ncap \cdot \BM \\
\oint_{\partial A} d\Bx \cross \BH
&=
\int_A dA
\lr{
– \ncap \cross \BJ
+ \frac{ \ncap \rho_\txtm }{\mu}
– \PD{n}{\BH}
} \\
\oint_{\partial A} d\Bx \cross \BE &=
\int_A dA
\lr{
\ncap \cross \BM
+ \frac{\ncap \rho}{\epsilon}
– \PD{n}{\BE}
} \\
\oint_{\partial A} d\Bx \cdot \BH &= -\int_A dA \ncap \cdot \BJ \\
\end{aligned}

The first and last equations could have been obtained much more easily from Maxwell’s equations in their conventional form more easily. The two cross product equations with the normal derivatives are not familiar to me, even without the fictitious magnetic sources. It is somewhat remarkable that so much can be packed into one multivector equation:
\label{eqn:flux:260}
\oint_{\partial A} d\Bx F
=
I \int_A dA \lr{ \ncap J – \PD{n}{F} }.

References

[1] A. Macdonald. Vector and Geometric Calculus. CreateSpace Independent Publishing Platform, 2012.

ECE1505H Convex Optimization. Lecture 7: Examples of convex and concave functions, local and global minimums. Taught by Prof. Stark Draper

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Today

• Local and global optimality
• Compositions of functions
• Examples

Example:

\label{eqn:convexOptimizationLecture7:20}
\begin{aligned}
F(x) &= x^2 \\
F”(x) &= 2 > 0
\end{aligned}

strictly convex.

Example:

\label{eqn:convexOptimizationLecture7:40}
\begin{aligned}
F(x) &= x^3 \\
F”(x) &= 6 x.
\end{aligned}

Not always non-negative, so not convex. However $$x^3$$ is convex on $$\textrm{dom} F = \mathbb{R}_{+}$$.

Example:

\label{eqn:convexOptimizationLecture7:60}
\begin{aligned}
F(x) &= x^\alpha \\
F'(x) &= \alpha x^{\alpha-1} \\
F”(x) &= \alpha(\alpha-1) x^{\alpha-2}.
\end{aligned}

fig. 1. Powers of x.

This is convex on $$\mathbb{R}_{+}$$, if $$\alpha \ge 1$$, or $$\alpha \le 0$$.

Example:

\label{eqn:convexOptimizationLecture7:80}
\begin{aligned}
F(x) &= \log x \\
F'(x) &= \inv{x} \\
F”(x) &= -\inv{x^2} \le 0
\end{aligned}

This is concave.

Example:

\label{eqn:convexOptimizationLecture7:100}
\begin{aligned}
F(x) &= x\log x \\
F'(x) &= \log x + x \inv{x} = 1 + \log x \\
F”(x) &= \inv{x}
\end{aligned}

This is strictly convex on
$$\mathbb{R}_{++}$$, where
$$F”(x) \ge 0$$.

Example:

\label{eqn:convexOptimizationLecture7:120}
\begin{aligned}
F(x) &= e^{\alpha x} \\
F'(x) &= \alpha e^{\alpha x} \\
F”(x) &= \alpha^2 e^{\alpha x} \ge 0
\end{aligned}

fig. 2. Exponential.

Such functions are plotted in fig. 2, and are convex function for all $$\alpha$$.

Example:

For symmetric $$P \in S^n$$

\label{eqn:convexOptimizationLecture7:140}
\begin{aligned}
F(\Bx) &= \Bx^\T P \Bx + 2 \Bq^\T \Bx + r \\
\spacegrad F &= (P + P^\T) \Bx + 2 \Bq = 2 P \Bx + 2 \Bq \\
\end{aligned}

This is convex(concave) if $$P \ge 0$$ ($$P \le 0$$).

Example:

\label{eqn:convexOptimizationLecture7:780}
F(x, y) = x^2 + y^2 + 3 x y,

that is neither convex nor concave is plotted in fig 3.

fig 3. Function with saddle point (3d and contours)

This function can be put in matrix form

\label{eqn:convexOptimizationLecture7:160}
F(x, y) = x^2 + y^2 + 3 x y
=
\begin{bmatrix}
x & y
\end{bmatrix}
\begin{bmatrix}
1 & 1.5 \\
1.5 & 1
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix},

and has the Hessian

\label{eqn:convexOptimizationLecture7:180}
\begin{aligned}
&=
\begin{bmatrix}
\partial_{xx} F & \partial_{xy} F \\
\partial_{yx} F & \partial_{yy} F \\
\end{bmatrix} \\
&=
\begin{bmatrix}
2 & 3 \\
3 & 2
\end{bmatrix} \\
&= 2 P.
\end{aligned}

From the plot we know that this is not PSD, but this can be confirmed by checking the eigenvalues

\label{eqn:convexOptimizationLecture7:200}
\begin{aligned}
0
&=
\det ( P – \lambda I ) \\
&=
(1 – \lambda)^2 – 1.5^2,
\end{aligned}

which has solutions

\label{eqn:convexOptimizationLecture7:220}
\lambda = 1 \pm \frac{3}{2} = \frac{3}{2}, -\frac{1}{2}.

This is not PSD nor negative semi-definite, because it has one positive and one negative eigenvalues. This is neither convex nor concave.

Along $$y = -x$$,

\label{eqn:convexOptimizationLecture7:240}
\begin{aligned}
F(x,y)
&=
F(x,-x) \\
&=
2 x^2 – 3 x^2 \\
&=
– x^2,
\end{aligned}

so it is concave along this line. Along $$y = x$$

\label{eqn:convexOptimizationLecture7:260}
\begin{aligned}
F(x,y)
&=
F(x,x) \\
&=
2 x^2 + 3 x^2 \\
&=
5 x^2,
\end{aligned}

so it is convex along this line.

Example:

\label{eqn:convexOptimizationLecture7:280}
F(\Bx) = \sqrt{ x_1 x_2 },

on $$\textrm{dom} F = \setlr{ x_1 \ge 0, x_2 \ge 0 }$$

For the Hessian
\label{eqn:convexOptimizationLecture7:300}
\begin{aligned}
\PD{x_1}{F} &= \frac{1}{2} x_1^{-1/2} x_2^{1/2} \\
\PD{x_2}{F} &= \frac{1}{2} x_2^{-1/2} x_1^{1/2}
\end{aligned}

The Hessian components are

\label{eqn:convexOptimizationLecture7:320}
\begin{aligned}
\PD{x_1}{} \PD{x_1}{F} &= -\frac{1}{4} x_1^{-3/2} x_2^{1/2} \\
\PD{x_1}{} \PD{x_2}{F} &= \frac{1}{4} x_2^{-1/2} x_1^{-1/2} \\
\PD{x_2}{} \PD{x_1}{F} &= \frac{1}{4} x_1^{-1/2} x_2^{-1/2} \\
\PD{x_2}{} \PD{x_2}{F} &= -\frac{1}{4} x_2^{-3/2} x_1^{1/2}
\end{aligned}

or
\label{eqn:convexOptimizationLecture7:340}
=
-\frac{\sqrt{x_1 x_2}}{4}
\begin{bmatrix}
\inv{x_1^2} & -\inv{x_1 x_2} \\
-\inv{x_1 x_2} & \inv{x_2^2}
\end{bmatrix}.

Checking this for PSD against $$\Bv = (v_1, v_2)$$, we have
\label{eqn:convexOptimizationLecture7:360}
\begin{aligned}
\begin{bmatrix}
v_1 & v_2
\end{bmatrix}
\begin{bmatrix}
\inv{x_1^2} & -\inv{x_1 x_2} \\
-\inv{x_1 x_2} & \inv{x_2^2}
\end{bmatrix}
\begin{bmatrix}
v_1 \\ v_2
\end{bmatrix}
&=
\begin{bmatrix}
v_1 & v_2
\end{bmatrix}
\begin{bmatrix}
\inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 \\
-\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2
\end{bmatrix} \\
&=
\lr{ \inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 } v_1 +
\lr{ -\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2 } v_2
\\
&=
\inv{x_1^2} v_1^2
+ \inv{x_2^2} v_2^2
-2 \inv{x_1 x_2} v_1 v_2 \\
&=
\lr{
\frac{v_1}{x_1}
-\frac{v_2}{x_2}
}^2 \\
&\ge 0,
\end{aligned}

so $$\spacegrad^2 F \le 0$$. This is a negative semi-definite function (concave). Observe that this check required checking PSD for all values of $$\Bx$$.

This is an example of a more general result

\label{eqn:convexOptimizationLecture7:380}
F(x) = \lr{ \prod_{i = 1}^n x_i }^{1/n},

which is concave (prove on homework).

Summary.

If $$F$$ is differentiable in \R{n}, then check the curvature of the function along all lines. i.e. At all locations and in all directions.

If the Hessian is PSD at all $$\Bx \in \textrm{dom} F$$, that is

\label{eqn:convexOptimizationLecture7:400}
\spacegrad^2 F \ge 0 \, \forall \Bx \in \textrm{dom} F,

then the function is convex.

Example:

Over $$\textrm{dom} F = \mathbb{R}^n$$

\label{eqn:convexOptimizationLecture7:420}
F(\Bx) = \max_{i = 1}^n x_i

i.e.
\label{eqn:convexOptimizationLecture7:440}
\begin{aligned}
F((1,2) &= 2 \\
F((3,-1) &= 3
\end{aligned}

Example:

\label{eqn:convexOptimizationLecture7:460}
F(\Bx) = \max_{i = 1}^n F_i(\Bx),

where

\label{eqn:convexOptimizationLecture7:480}
F_i(\Bx)
=
… ?

max of a set of convex functions is a convex function.

Example:

\label{eqn:convexOptimizationLecture7:500}
F(x) =
x_{[1]} +
x_{[2]} +
x_{[3]}

where

$$x_{[k]}$$ is the k-th largest number in the list

Write

\label{eqn:convexOptimizationLecture7:520}
F(x) = \max x_i + x_j + x_k

\label{eqn:convexOptimizationLecture7:540}
(i,j,k) \in \binom{n}{3}

Example:

For $$\Ba \in \mathbb{R}^n$$ and $$b_i \in \mathbb{R}$$

\label{eqn:convexOptimizationLecture7:560}
\begin{aligned}
F(\Bx)
&= \sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )^{-1} \\
&= -\sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )
\end{aligned}

This $$b_i – \Ba^\T \Bx$$ is an affine function of $$\Bx$$ so it doesn’t affect convexity.

Since $$\log$$ is concave, $$-\log$$ is convex. Convex functions of affine function of $$\Bx$$ is convex function of $$\Bx$$.

Example:

\label{eqn:convexOptimizationLecture7:580}
F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By }

fig. 3. Max length function

Here $$C \subseteq \mathbb{R}^n$$ is not necessarily convex. We are using $$\sup$$ here because the set $$C$$ may be open. This function is the length of the line from $$\Bx$$ to the point in $$C$$ that is furthest from $$\Bx$$.

• $$\Bx – \By$$ is linear in $$\Bx$$
• $$g_\By(\Bx) = \Norm{\Bx – \By}$$ is convex in $$\Bx$$ since norms are convex functions.
• $$F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By }$$. Each $$\By$$ index is a convex function. Taking max of those.

Example:

\label{eqn:convexOptimizationLecture7:600}
F(\Bx) = \inf_{\By \in C} \Norm{ \Bx – \By }.

Min and max of two convex functions are plotted in fig. 4.

fig. 4. Min and max

The max is observed to be convex, whereas the min is not necessarily so.

\label{eqn:convexOptimizationLecture7:800}
F(\Bz) = F(\theta \Bx + (1-\theta) \By) \ge \theta F(\Bx) + (1-\theta)F(\By).

This is not necessarily convex for all sets $$C \subseteq \mathbb{R}^n$$, because the $$\inf$$ of a bunch of convex function is not necessarily convex. However, if $$C$$ is convex, then $$F(\Bx)$$ is convex.

Consequences of convexity for differentiable functions

• Think about unconstrained functions $$\textrm{dom} F = \mathbb{R}^n$$.
• By first order condition $$F$$ is convex iff the domain is convex and
\label{eqn:convexOptimizationLecture7:620}
F(\Bx) \ge \lr{ \spacegrad F(\Bx)}^\T (\By – \Bx) \, \forall \Bx, \By \in \textrm{dom} F.

If $$F$$ is convex and one can find an $$\Bx^\conj \in \textrm{dom} F$$ such that

\label{eqn:convexOptimizationLecture7:640}

then

\label{eqn:convexOptimizationLecture7:660}
F(\By) \ge F(\Bx^\conj) \, \forall \By \in \textrm{dom} F.

If you can find the point where the gradient is zero (which can’t always be found), then $$\Bx^\conj$$ is a global minimum of $$F$$.

Conversely, if $$\Bx^\conj$$ is a global minimizer of $$F$$, then $$\spacegrad F(\Bx^\conj) = 0$$ must hold. If that were not the case, then you would be able to find a direction to move downhill, contracting the optimality of $$\Bx^\conj$$.

Local vs Global optimum

fig. 6. Global and local minimums

Definition: Local optimum
$$\Bx^\conj$$ is a local optimum of $$F$$ if $$\exists \epsilon > 0$$ such that $$\forall \Bx$$, $$\Norm{\Bx – \Bx^\conj} < \epsilon$$, we have

\begin{equation*}
F(\Bx^\conj) \le F(\Bx)
\end{equation*}

fig. 5. min length function

Theorem:
Suppose $$F$$ is twice continuously differentiable (not necessarily convex)

• If $$\Bx^\conj$$ is a local optimum then\begin{equation*}
\begin{aligned}
\end{aligned}
\end{equation*}
• If
\begin{equation*}
\begin{aligned}
\end{aligned},
\end{equation*}then $$\Bx^\conj$$ is a local optimum.

Proof:

• Let $$\Bx^\conj$$ be a local optimum. Pick any $$\Bv \in \mathbb{R}^n$$.\label{eqn:convexOptimizationLecture7:720}
\lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t}
= \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv
\ge 0.

Here the fraction is $$\ge 0$$ since $$\Bx^\conj$$ is a local optimum.

Since the choice of $$\Bv$$ is arbitrary, the only case that you can ensure that $$\ge 0, \forall \Bv$$ is

\label{eqn:convexOptimizationLecture7:740}

( or else could pick $$\Bv = -\spacegrad F(\Bx^\conj)$$.

This means that $$\spacegrad F(\Bx^\conj) = 0$$ if $$\Bx^\conj$$ is a local optimum.

Consider the 2nd order derivative

\label{eqn:convexOptimizationLecture7:760}
\begin{aligned}
\lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t^2}
&=
\lim_{t \rightarrow 0} \inv{t^2}
\lr{
F(\Bx^\conj) + t \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv + \inv{2} t^2 \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv + O(t^3)
– F(\Bx^\conj)
} \\
&=
\inv{2} \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv \\
&\ge 0.
\end{aligned}

Here the $$\ge$$ condition also comes from the fraction, based on the optimiality of $$\Bx^\conj$$. This is true for all choice of $$\Bv$$, thus $$\spacegrad^2 F(\Bx^\conj)$$.

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

ECE1505H Convex Optimization. Lecture 6: First and second order conditions. Taught by Prof.\ Stark Draper

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Today

• First and second order conditions for convexity of differentiable functions.
• Consequences of convexity: local and global optimality.
• Properties.

Quasi-convex

$$F_1$$ and $$F_2$$ convex implies $$\max( F_1, F_2)$$ convex.

fig. 1. Min and Max

Note that $$\min(F_1, F_2)$$ is NOT convex.

If $$F : \mathbb{R}^n \rightarrow \mathbb{R}$$ is convex, then $$F( \Bx_0 + t \Bv )$$ is convex in $$t\,\forall t \in \mathbb{R}, \Bx_0 \in \mathbb{R}^n, \Bv \in \mathbb{R}^n$$, provided $$\Bx_0 + t \Bv \in \textrm{dom} F$$.

Idea: Restrict to a line (line segment) in $$\textrm{dom} F$$. Take a cross section or slice through $$F$$ alone the line. If the result is a 1D convex function for all slices, then $$F$$ is convex.

This is nice since it allows for checking for convexity, and is also nice numerically. Attempting to test a given data set for non-convexity with some random lines can help disprove convexity. However, to show that $$F$$ is convex it is required to test all possible slices (which isn’t possible numerically, but is in some circumstances possible analytically).

Differentiable (convex) functions

Definition: First order condition.

If

\begin{equation*}
F : \mathbb{R}^n \rightarrow \mathbb{R}
\end{equation*}

is differentiable, then $$F$$ is convex iff $$\textrm{dom} F$$ is a convex set and $$\forall \Bx, \Bx_0 \in \textrm{dom} F$$

\begin{equation*}
F(\Bx) \ge F(\Bx_0) + \lr{\spacegrad F(\Bx_0)}^\T (\Bx – \Bx_0).
\end{equation*}

This is the first order Taylor expansion. If $$n = 1$$, this is $$F(x) \ge F(x_0) + F'(x_0) ( x – x_0)$$.

The first order condition says a convex function \underline{always} lies above its first order approximation, as sketched in fig. 3.

fig. 2. First order approximation lies below convex function

When differentiable, the supporting plane is the tangent plane.

Definition: Second order condition

If $$F : \mathbb{R}^n \rightarrow \mathbb{R}$$ is twice differentiable, then $$F$$ is convex iff $$\textrm{dom} F$$ is a convex set and $$\spacegrad^2 F(\Bx) \ge 0 \,\forall \Bx \in \textrm{dom} F$$.

The Hessian is always symmetric, but is not necessarily positive. Recall that the Hessian is the matrix of the second order partials $$(\spacegrad F)_{ij} = \partial^2 F/(\partial x_i \partial x_j)$$.

The scalar case is $$F”(x) \ge 0 \, \forall x \in \textrm{dom} F$$.

An implication is that if $$F$$ is convex, then $$F(x) \ge F(x_0) + F'(x_0) (x – x_0) \,\forall x, x_0 \in \textrm{dom} F$$

Since $$F$$ is convex, $$\textrm{dom} F$$ is convex.

Consider any 2 points $$x, y \in \textrm{dom} F$$, and $$\theta \in [0,1]$$. Define

\label{eqn:convexOptimizationLecture6:60}
z = (1-\theta) x + \theta y \in \textrm{dom} F,

then since $$\textrm{dom} F$$ is convex

\label{eqn:convexOptimizationLecture6:80}
F(z) =
F( (1-\theta) x + \theta y )
\le
(1-\theta) F(x) + \theta F(y )

Reordering

\label{eqn:convexOptimizationLecture6:220}
\theta F(x) \ge
\theta F(x) + F(z) – F(x),

or
\label{eqn:convexOptimizationLecture6:100}
F(y) \ge
F(x) + \frac{F(x + \theta(y-x)) – F(x)}{\theta},

which is, in the limit,

\label{eqn:convexOptimizationLecture6:120}
F(y) \ge
F(x) + F'(x) (y – x),

completing one direction of the proof.

To prove the other direction, showing that

\label{eqn:convexOptimizationLecture6:140}
F(x) \ge F(x_0) + F'(x_0) (x – x_0),

implies that $$F$$ is convex. Take any $$x, y \in \textrm{dom} F$$ and any $$\theta \in [0,1]$$. Define

\label{eqn:convexOptimizationLecture6:160}
z = \theta x + (1 -\theta) y,

which is in $$\textrm{dom} F$$ by assumption. We want to show that

\label{eqn:convexOptimizationLecture6:180}
F(z) \le \theta F(x) + (1-\theta) F(y).

By assumption

1. $$F(x) \ge F(z) + F'(z) (x – z)$$
2. $$F(y) \ge F(z) + F'(z) (y – z)$$

Compute

\label{eqn:convexOptimizationLecture6:200}
\begin{aligned}
\theta F(x) + (1-\theta) F(y)
&\ge
\theta \lr{ F(z) + F'(z) (x – z) }
+ (1-\theta) \lr{ F(z) + F'(z) (y – z) } \\
&=
F(z) + F'(z) \lr{ \theta( x – z) + (1-\theta) (y-z) } \\
&=
F(z) + F'(z) \lr{ \theta x + (1-\theta) y – \theta z – (1 -\theta) z } \\
&=
F(z) + F'(z) \lr{ \theta x + (1-\theta) y – z} \\
&=
F(z) + F'(z) \lr{ z – z} \\
&= F(z).
\end{aligned}

Proof of the 2nd order case for $$n = 1$$

Want to prove that if

\label{eqn:convexOptimizationLecture6:240}
F : \mathbb{R} \rightarrow \mathbb{R}

is a convex function, then $$F”(x) \ge 0 \,\forall x \in \textrm{dom} F$$.

By the first order conditions $$\forall x \ne y \in \textrm{dom} F$$

\label{eqn:convexOptimizationLecture6:260}
\begin{aligned}
F(y) &\ge F(x) + F'(x) (y – x)
F(x) &\ge F(y) + F'(y) (x – y)
\end{aligned}

Can combine and get

\label{eqn:convexOptimizationLecture6:280}
F'(x) (y-x) \le F(y) – F(x) \le F'(y)(y-x)

Subtract the two derivative terms for

\label{eqn:convexOptimizationLecture6:340}
\frac{(F'(y) – F'(x))(y – x)}{(y – x)^2} \ge 0,

or
\label{eqn:convexOptimizationLecture6:300}
\frac{F'(y) – F'(x)}{y – x} \ge 0.

In the limit as $$y \rightarrow x$$, this is
\label{eqn:convexOptimizationLecture6:320}
\boxed{
F”(x) \ge 0 \,\forall x \in \textrm{dom} F.
}

Now prove the reverse condition:

If $$F”(x) \ge 0 \,\forall x \in \textrm{dom} F \subseteq \mathbb{R}$$, implies that $$F : \mathbb{R} \rightarrow \mathbb{R}$$ is convex.

Note that if $$F”(x) \ge 0$$, then $$F'(x)$$ is non-decreasing in $$x$$.

i.e. If $$x < y$$, where $$x, y \in \textrm{dom} F$$, then

\label{eqn:convexOptimizationLecture6:360}
F'(x) \le F'(y).

Consider any $$x,y \in \textrm{dom} F$$ such that $$x < y$$, where

\label{eqn:convexOptimizationLecture6:380}
F(y) – F(x) = \int_x^y F'(t) dt \ge F'(x) \int_x^y 1 dt = F'(x) (y-x).

This tells us that

\label{eqn:convexOptimizationLecture6:400}
F(y) \ge F(x) + F'(x)(y – x),

which is the first order condition. Similarly consider any $$x,y \in \textrm{dom} F$$ such that $$x < y$$, where

\label{eqn:convexOptimizationLecture6:420}
F(y) – F(x) = \int_x^y F'(t) dt \le F'(y) \int_x^y 1 dt = F'(y) (y-x).

This tells us that

\label{eqn:convexOptimizationLecture6:440}
F(x) \ge F(y) + F'(y)(x – y).

Vector proof:

$$F$$ is convex iff $$F(\Bx + t \Bv)$$ is convex $$\forall \Bx,\Bv \in \mathbb{R}^n, t \in \mathbb{R}$$, keeping $$\Bx + t \Bv \in \textrm{dom} F$$.

Let
\label{eqn:convexOptimizationLecture6:460}
h(t ; \Bx, \Bv) = F(\Bx + t \Bv)

then $$h(t)$$ satisfies scalar first and second order conditions for all $$\Bx, \Bv$$.

\label{eqn:convexOptimizationLecture6:480}
h(t) = F(\Bx + t \Bv) = F(g(t)),

where $$g(t) = \Bx + t \Bv$$, where

\label{eqn:convexOptimizationLecture6:500}
\begin{aligned}
F &: \mathbb{R}^n \rightarrow \mathbb{R} \\
g &: \mathbb{R} \rightarrow \mathbb{R}^n.
\end{aligned}

This is expressing $$h(t)$$ as a composition of two functions. By the first order condition for scalar functions we know that

\label{eqn:convexOptimizationLecture6:520}
h(t) \ge h(0) + h'(0) t.

Note that

\label{eqn:convexOptimizationLecture6:540}
h(0) = \evalbar{F(\Bx + t \Bv)}{t = 0} = F(\Bx).

Let’s figure out what $$h'(0)$$ is. Recall hat for any $$\tilde{F} : \mathbb{R}^n \rightarrow \mathbb{R}^m$$

\label{eqn:convexOptimizationLecture6:560}
D \tilde{F} \in \mathbb{R}^{m \times n},

and
\label{eqn:convexOptimizationLecture6:580}
{D \tilde{F}(\Bx)}_{ij} = \PD{x_j}{\tilde{F_i}(\Bx)}

This is one function per row, for $$i \in [1,m], j \in [1,n]$$. This gives

\label{eqn:convexOptimizationLecture6:600}
\begin{aligned}
\frac{d}{dt} F(\Bx + \Bv t)
&=
\frac{d}{dt} F( g(t) ) \\
&=
\frac{d}{dt} h(t) \\
&= D h(t) \\
&= D F(g(t)) \cdot D g(t)
\end{aligned}

The first matrix is in $$\mathbb{R}^{1\times n}$$ whereas the second is in $$\mathbb{R}^{n\times 1}$$, since $$F : \mathbb{R}^n \rightarrow \mathbb{R}$$ and $$g : \mathbb{R} \rightarrow \mathbb{R}^n$$. This gives

\label{eqn:convexOptimizationLecture6:620}
\frac{d}{dt} F(\Bx + \Bv t)
= \evalbar{D F(\tilde{\Bx})}{\tilde{\Bx} = g(t)} \cdot D g(t).

That first matrix is

\label{eqn:convexOptimizationLecture6:640}
\begin{aligned}
\evalbar{D F(\tilde{\Bx})}{\tilde{\Bx} = g(t)}
&=
\evalbar{
\lr{\begin{bmatrix}
\PD{\tilde{x}_1}{ F(\tilde{\Bx})} &
\PD{\tilde{x}_2}{ F(\tilde{\Bx})} & \cdots
\PD{\tilde{x}_n}{ F(\tilde{\Bx})}
\end{bmatrix}
}}{ \tilde{\Bx} = g(t) = \Bx + t \Bv } \\
&=
\evalbar{
}{
\tilde{\Bx} = g(t)
} \\
=
\end{aligned}

The second Jacobian is

\label{eqn:convexOptimizationLecture6:660}
D g(t)
=
D
\begin{bmatrix}
g_1(t) \\
g_2(t) \\
\vdots \\
g_n(t) \\
\end{bmatrix}
=
D
\begin{bmatrix}
x_1 + t v_1 \\
x_2 + t v_2 \\
\vdots \\
x_n + t v_n \\
\end{bmatrix}
=
\begin{bmatrix}
v_1 \\
v_1 \\
\vdots \\
v_n \\
\end{bmatrix}
=
\Bv.

so

\label{eqn:convexOptimizationLecture6:680}
h'(t) = D h(t) = \lr{ \spacegrad F(g(t))}^\T \Bv,

and
\label{eqn:convexOptimizationLecture6:700}
h'(0) = \lr{ \spacegrad F(g(0))}^\T \Bv
=

Finally

\label{eqn:convexOptimizationLecture6:720}
\begin{aligned}
F(\Bx + t \Bv)
&\ge h(0) + h'(0) t \\
&= F(\Bx) + \lr{ \spacegrad F(\Bx) }^\T (t \Bv) \\
&= F(\Bx) + \innerprod{ \spacegrad F(\Bx) }{ t \Bv}.
\end{aligned}

Which is true for all $$\Bx, \Bx + t \Bv \in \textrm{dom} F$$. Note that the quantity $$t \Bv$$ is a shift.

Epigraph

Recall that if $$(\Bx, t) \in \textrm{epi} F$$ then $$t \ge F(\Bx)$$.

\label{eqn:convexOptimizationLecture6:740}
t \ge F(\Bx) \ge F(\Bx_0) + \lr{\spacegrad F(\Bx_0) }^\T (\Bx – \Bx_0),

or

\label{eqn:convexOptimizationLecture6:760}
0 \ge
-(t – F(\Bx_0)) + \lr{\spacegrad F(\Bx_0) }^\T (\Bx – \Bx_0),

In block matrix form

\label{eqn:convexOptimizationLecture6:780}
0 \ge
\begin{bmatrix}
\lr{ \spacegrad F(\Bx_0) }^\T & -1
\end{bmatrix}
\begin{bmatrix}
\Bx – \Bx_0 \\
t – F(\Bx_0)
\end{bmatrix}

With $$\Bw = \begin{bmatrix} \lr{ \spacegrad F(\Bx_0) }^\T & -1 \end{bmatrix}$$, the geometry of the epigraph relation to the half plane is sketched in fig. 3.

fig. 3. Half planes and epigraph.

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.