math and physics play

Transverse electric and magnetic field relations.

August 10, 2025 math and physics play , , , , , , , , , , , , , , ,

[Click here for a PDF version of this post]

I found a sign error in my book. Here’s I’ll re-derive all the results for myself here in a standalone fashion, also verifying signs as I go.

Setup

Suppose that a field is propagating in a medium along the z-axis. We may represent that field as the real part of
\begin{equation}\label{eqn:transverseField:20}
F = F(x,y) e^{j(\omega t – k z)}.
\end{equation}
This is a doubly complex relationship, as we have a scalar complex imaginary \( j \), as well as the spatial imaginary \(I = \Be_1 \Be_2 \Be_3 \) that is part of the multivector field itself
\begin{equation}\label{eqn:transverseField:40}
F = \BE + I \eta \BH.
\end{equation}

Let’s call
\begin{equation}\label{eqn:transverseField:60}
F_z = \lr{ \BE \cdot \Be_3} \Be_3 + I \eta \lr{ \BH \cdot \Be_3 } \Be_3,
\end{equation}
the propagation component of the field and \( F_t = F – F_z \) the transverse component of the field. We can write these in a more symmetric fashion by expanding the dot products and regrouping
\begin{equation}\label{eqn:transverseField:80}
\begin{aligned}
F_z
&= \lr{ \BE \cdot \Be_3} \Be_3 + I \eta \lr{ \BH \cdot \Be_3 } \Be_3 \\
&= \inv{2} \lr{ \BE \Be_3 + \Be_3 \BE } \Be_3 + \frac{I \eta}{2} \lr{ \BH \Be_3 + \Be_3 \BH} \Be_3 \\
&= \inv{2} \lr{ \BE + \Be_3 \BE \Be_3 } + \frac{I \eta}{2} \lr{ \BH + \Be_3 \BH \Be_3} \Be_3 \\
&= \inv{2} \lr{ F + \Be_3 F \Be_3 }.
\end{aligned}
\end{equation}
By subtraction, we also have
\begin{equation}\label{eqn:transverseField:100}
F_t = \inv{2} \lr{ F – \Be_3 F \Be_3 }.
\end{equation}

Relating the transverse and propagation direction fields

The multivector form of Maxwell’s equation, for source free conditions, is
\begin{equation}\label{eqn:transverseField:120}
0 = \lr{ \spacegrad + \inv{c} \partial_t } F.
\end{equation}
We split the gradient into a propagation direction component and a transverse component \( \spacegrad_t \)
\begin{equation}\label{eqn:transverseField:140}
\spacegrad = \spacegrad_t + \Be_3 \partial_z,
\end{equation}
so
\begin{equation}\label{eqn:transverseField:160}
\begin{aligned}
0
&= \lr{ \spacegrad_t + \Be_3 \partial_z + \inv{c} \partial_t } F \\
&= \lr{ \spacegrad_t + \Be_3 \partial_z + \inv{c} \partial_t } F(x,y) e^{j(\omega t – k z) } \\
&= \lr{ \spacegrad_t – j\Be_3 k + j\frac{\omega}{c} } F(x,y) e^{j(\omega t – k z) },
\end{aligned}
\end{equation}
or
\begin{equation}\label{eqn:transverseField:180}
-j \lr{ \frac{\omega}{c} – k \Be_3 } F = \spacegrad_t F.
\end{equation}

Observe that
\begin{equation}\label{eqn:transverseField:200}
-j \lr{ \frac{\omega}{c} – k \Be_3 } \Be_3 F \Be_3 = -\spacegrad_t \Be_3 F \Be_3,
\end{equation}
which means that
\begin{equation}\label{eqn:transverseField:220}
-j \lr{ \frac{\omega}{c} – k \Be_3 } \inv{2} \lr{ F \pm \Be_3 F \Be_3 } = \spacegrad_t \inv{2} \lr{ F \mp \Be_3 F \Be_3 },
\end{equation}
or
\begin{equation}\label{eqn:transverseField:240}
\begin{aligned}
-j \lr{ \frac{\omega}{c} – k \Be_3 } F_z &= \spacegrad_t F_t \\
-j \lr{ \frac{\omega}{c} – k \Be_3 } F_t &= \spacegrad_t F_z.
\end{aligned}
\end{equation}

Provided \( \omega^2 \ne k^2 c^2 \), this can be inverted, meaning that \( F_t \) fully specifies \( F_z \) if known, as well as the opposite.

That inversion provides the propagation direction field in terms of the transverse
\begin{equation}\label{eqn:transverseField:260a}
F_z = j \frac{ \frac{\omega}{c} + k \Be_3 }{ \omega^2 \mu \epsilon – k^2 } \spacegrad_t F_t,
\end{equation}
and the transverse field in terms of the propagation direction field
\begin{equation}\label{eqn:transverseField:260b}
F_t = j \frac{ \frac{\omega}{c} + k \Be_3 }{ \omega^2 \mu \epsilon – k^2 } \spacegrad_t F_z.
\end{equation}

Transverse field in terms of propagation

Let’s expand \ref{eqn:transverseField:260b} in terms of component electric and magnetic fields. First note that
\begin{equation}\label{eqn:transverseField:280}
\begin{aligned}
\spacegrad_t F_z
&= \spacegrad_t \Be_3 \lr{ E_z + I \eta H_z } \\
&= -\Be_3 \spacegrad_t \lr{ E_z + I \eta H_z }.
\end{aligned}
\end{equation}
so
\begin{equation}\label{eqn:transverseField:300}
F_t = -j \frac{ \frac{\omega}{c} \Be_3 + k }{ \omega^2 \mu \epsilon – k^2 } \spacegrad_t \lr{ E_z + I \eta H_z }.
\end{equation}
This may now be split into electric and magnetic fields, but first note that the multivector operator
\begin{equation}\label{eqn:transverseField:320}
\begin{aligned}
\Be_3 \spacegrad_t
&=
\Be_3 \cdot \spacegrad_t + \Be_3 \wedge \spacegrad_t \\
&=
\Be_3 \wedge \spacegrad_t,
\end{aligned}
\end{equation}
has only a bivector component.

For the transverse electric field component, we have
\begin{equation}\label{eqn:transverseField:340}
\begin{aligned}
\gpgradeone{ \lr{ \frac{\omega}{c} \Be_3 + k } \spacegrad_t \lr{ E_z + I \eta H_z } }
&=
k \spacegrad_t E_z + \frac{\omega}{c} \Be_3 \wedge \spacegrad_t \lr{ I \eta H_z } \\
&=
k \spacegrad_t E_z – \frac{\eta \omega}{c} \Be_3 \cross \spacegrad_t H_z.
\end{aligned}
\end{equation}
and for the magnetic field component
\begin{equation}\label{eqn:transverseField:360}
\begin{aligned}
\gpgradetwo{ \lr{ \frac{\omega}{c} \Be_3 + k } \spacegrad_t \lr{ E_z + I \eta H_z } }
=
\frac{\omega}{c} \Be_3 \wedge \spacegrad_t E_z + I \eta k \spacegrad_t H_z
\end{aligned}
\end{equation}

This means that
\begin{equation}\label{eqn:transverseField:380}
\begin{aligned}
\BE_t &= \frac{j}{\omega^2 \mu \epsilon – k^2 } \lr{ -k \spacegrad_t E_z + \frac{\eta \omega}{c} \Be_3 \cross \spacegrad_t H_z } \\
\eta I \BH_t &= -\frac{j}{\omega^2 \mu \epsilon – k^2 } \lr{ \frac{\omega}{c} \Be_3 \wedge \spacegrad_t E_z + I \eta k \spacegrad_t H_z }
\end{aligned}
\end{equation}

Cancelling out the \( \eta I \) factors in the magnetic field component, and substituting \( \eta/c = \mu, 1/(c\eta) = \epsilon \), leaves us with
\begin{equation}\label{eqn:transverseField:400}
\begin{aligned}
\BE_t &= \frac{j}{\omega^2 \mu \epsilon – k^2 } \lr{ -k \spacegrad_t E_z + \mu \omega \Be_3 \cross \spacegrad_t H_z } \\
\BH_t &= -\frac{j}{\omega^2 \mu \epsilon – k^2 } \lr{ \epsilon \omega \Be_3 \cross \spacegrad_t E_z + k \spacegrad_t H_z }.
\end{aligned}
\end{equation}

Propagation field in terms of transverse.

Now let’s invert \ref{eqn:transverseField:260a}. We seek the grade selections
\begin{equation}\label{eqn:transverseField:420}
\gpgrade{ \lr{ \frac{\omega}{c} + k \Be_3 } \spacegrad_t F_t }{1,2}
\end{equation}

Performing each of these four grade selections in turn, for the \( \spacegrad_t F_t \) products we have
\begin{equation}\label{eqn:transverseField:440}
\begin{aligned}
\gpgradeone{ \spacegrad_t F_t }
&=
\gpgradeone{ \spacegrad_t \lr{ \BE_t + I \eta \BH_t } } \\
&=
\eta \gpgradeone{ I \spacegrad_t \BH_t } \\
&=
\eta I \lr{ \spacegrad_t \wedge \BH_t } \\
&=
-\eta \lr{ \spacegrad_t \cross \BH_t }.
\end{aligned}
\end{equation}
Because \( \spacegrad_t \BE_t \) has only 0,2 grades, so the grade-one selection was zero, leaving us with only \( \BH_t \) dependence.

For the grade two selection of the same, we have
\begin{equation}\label{eqn:transverseField:460}
\begin{aligned}
\gpgradetwo{ \spacegrad_t F_t }
&=
\gpgradetwo{ \spacegrad_t \lr{ \BE_t + I \eta \BH_t } } \\
&=
\spacegrad_t \wedge \BE_t \\
&=
I \lr{ \spacegrad_t \cross \BE_t }.
\end{aligned}
\end{equation}
This time we note that the vector-bivector product \( \spacegrad_t (I \BH_t) \) has only 1,3 grades, and is killed by the grade-2 selection.

For the \( \Be_3 \spacegrad_t F_t \) products, we have
\begin{equation}\label{eqn:transverseField:480}
\begin{aligned}
\gpgradeone{ \Be_3 \spacegrad_t F_t }
&=
\gpgradeone{ \Be_3 \spacegrad_t \lr{ \BE_t + I \eta \BH_t } } \\
&=
\gpgradeone{ \lr{ \Be_3 \cdot \spacegrad_t + \Be_3 \wedge \spacegrad_t } \BE_t }
+
\eta \gpgradeone{ I \Be_3 \lr{ \spacegrad_t \cdot \BH_t + \spacegrad_t \wedge \BH_t } } \\
&=
\gpgradeone{ I \lr{ \Be_3 \cross \spacegrad_t } \BE_t } \\
&=
-\lr{ \Be_3 \cross \spacegrad_t } \cross \BE_t.
\end{aligned}
\end{equation}
Observe that we’ve made use of \( \Be_3 \cdot \spacegrad_t = 0 \), regardless of what it operates on. For the \( \BH_t \) dependence, we had a bivector-scalar product \( (I \Be_3) (\spacegrad_t \cdot \BH_t) \), and a bivector-bivector product \( (I \Be_3) (\spacegrad_t \wedge \BH_t) \), neither of which have any vector grades.

Finally
\begin{equation}\label{eqn:transverseField:500}
\begin{aligned}
\gpgradetwo{ \Be_3 \spacegrad_t F_t }
&=
\eta \gpgradetwo{ I \Be_3 \spacegrad_t \BH_t } \\
&=
-\eta \gpgradetwo{ \lr{\Be_3 \cross \spacegrad_t} \BH_t } \\
&=
-\eta I \lr{\Be_3 \cross \spacegrad_t} \cross \BH_t.
\end{aligned}
\end{equation}
Here we’ve discarded the \( \BE_t \) dependent terms, since the bivector-vector product \( \lr{ \Be_3 \wedge \spacegrad_t } \BE_t \) has only grades 1,3, and we seek grade 2 only.

Putting all the pieces together, noting that \( \eta/c = \mu \) and \( 1/(c \eta) = \epsilon \), we have
we have
\begin{equation}\label{eqn:transverseField:520}
\BE_z = -\frac{j}{\omega^2 \mu \epsilon – k^2 } \lr{ \omega \mu \lr{ \spacegrad_t \cross \BH_t } + k \lr{ \Be_3 \cross \spacegrad_t } \cross \BE_t },
\end{equation}
and
\begin{equation}\label{eqn:transverseField:540}
\BH_z = \frac{j}{\omega^2 \mu \epsilon – k^2 } \lr{ \omega \epsilon \lr{ \spacegrad_t \cross \BE_t } – k \lr{\Be_3 \cross \spacegrad_t} \cross \BH_t }.
\end{equation}

A generalized Gaussian integral.

July 26, 2025 math and physics play , , , , , , ,

[Click here for a PDF version of this post]

Here’s another problem from [1]. The point is to show that
\begin{equation}\label{eqn:generalizedGaussian:20}
G(x,x’,\tau) = \inv{2\pi} \int_{-\infty}^\infty e^{i k\lr{ x – x’ } } e^{-k^2 \tau} dk,
\end{equation}
has the value
\begin{equation}\label{eqn:generalizedGaussian:40}
G(x,x’,\tau) = \inv{\sqrt{4 \pi \tau} } e^{-\lr{ x – x’}^2/4 \tau },
\end{equation}
not just for real \(\tau\), but also for purely imaginary \(\tau\).

Real case.

The authors claim the real case is easy, but I don’t think the real case is that trivial. The trivial part is basically just completing the square. Writing \( x – x ‘ = \Delta x \), that is
\begin{equation}\label{eqn:generalizedGaussian:60}
\begin{aligned}
-k^2 \tau + i k\lr{ x – x’ }
&=
-\tau \lr{ k^2 – i k \Delta x/\tau } \\
&=
-\tau \lr{ \lr{ k – i \Delta x/2\tau }^2 – \lr{ – i \Delta x/2\tau }^2 } \\
&=
-\tau \lr{ \lr{ k – i \Delta x/2\tau }^2 + \lr{ \Delta x/2\tau }^2 }.
\end{aligned}
\end{equation}
So we have
\begin{equation}\label{eqn:generalizedGaussian:80}
G(x,x’,\tau) = \inv{2\pi} e^{-(\Delta x/2)^2/\tau} \int_{-\infty}^\infty e^{-\tau \lr{ k – i \Delta x/2\tau }^2 } dk.
\end{equation}
Let’s call the integral factor part of this \( I \)
\begin{equation}\label{eqn:generalizedGaussian:100}
I = \int_{-\infty}^\infty e^{-\tau \lr{ k – i \Delta x/2\tau }^2 } dk.
\end{equation}
However, making a change of variables makes this an integral over a complex path
\begin{equation}\label{eqn:generalizedGaussian:120}
I = \int_{-\infty – i \Delta x/2\tau }^{\infty – i \Delta x/2\tau} e^{-\tau k^2 } dk.
\end{equation}
If you are lazy you could say that \( \pm \infty \) adjusted by a constant, even if that constant is imaginary, leaves the integration limits unchanged. That’s clearly true if the constant is real, but I don’t think it’s that obvious if the constant is imaginary.

To answer that question more exactly, let’s consider the integral
\begin{equation}\label{eqn:generalizedGaussian:140}
0 = I + J + K + L = \oint e^{-\tau z^2} dz,
\end{equation}
where the path is illustrated in fig. 1.

fig. 1. Contour for the Gaussian.

Since there are no enclosed poles, we have
\begin{equation}\label{eqn:generalizedGaussian:160}
I = \int_{-\infty}^\infty e^{-\tau k^2 } dk + K + L,
\end{equation}
where
\begin{equation}\label{eqn:generalizedGaussian:180}
\begin{aligned}
K &= \int_{- i \Delta x/2\tau}^0 e^{-\tau z^2} dz \\
L &= \int_0^{- i \Delta x/2\tau} e^{-\tau z^2} dz.
\end{aligned}
\end{equation}
We see now that we see perfect cancellation of \( K \) and \( L \), which justifies the change of variables, and the corresponding integration limits.

We can use the usual trick to evaluate \( I^2 \), to find
\begin{equation}\label{eqn:generalizedGaussian:200}
\begin{aligned}
I^2
&=
\int_{-\infty}^\infty e^{-\tau k^2 } dk
\int_{-\infty}^\infty e^{-\tau m^2 } dm \\
&=
2 \pi \int_0^\infty r e^{-\tau r^2} r dr \\
&=
2 \pi \evalrange{ – \frac{e^{-\tau r^2}}{-2 \tau} }{0}{\infty} \\
&=
\frac{\pi}{\tau}.
\end{aligned}
\end{equation}

So, for real values of \( \tau \) we have
\begin{equation}\label{eqn:generalizedGaussian:220}
G(x,x’,\tau) = \inv{2\pi} \sqrt{\frac{\pi}{\tau}}e^{-(\Delta x/2)^2/\tau},
\end{equation}
as expected.

Imaginary case.

For the imaginary case, let \( \tau = i \alpha \). Let’s recomplete the square from scratch with this substitution
\begin{equation}\label{eqn:generalizedGaussian:240}
\begin{aligned}
-k^2 i \alpha + i k \Delta x
&=
– i \alpha \lr{ k^2 – \frac{i k \Delta x}{i \alpha} } \\
&=
– i \alpha \lr{ k^2 – \frac{k \Delta x}{\alpha} } \\
&=
– i \alpha \lr{ \lr{ k – \frac{\Delta x}{2 \alpha} }^2 – \lr{ \frac{\Delta x}{2 \alpha} }^2 }.
\end{aligned}
\end{equation}
So we have
\begin{equation}\label{eqn:generalizedGaussian:260}
\begin{aligned}
G(x,x’,\tau)
&= \inv{2\pi} e^{i \alpha(\Delta x/2\alpha)^2} \int_{-\infty}^\infty e^{- i \alpha \lr{ k – \frac{\Delta x}{2 \alpha } }^2 } dk \\
&= \inv{\sqrt{\pi^2 \alpha}} e^{- (\Delta x)^2/4\tau} \int_0^\infty e^{- i m^2 } dm.
\end{aligned}
\end{equation}
This time we can make a \( m = \sqrt{\alpha} \lr{k – \frac{\Delta x}{2 \alpha }} \) change of variables, but don’t have to worry about imaginary displacements of the integration limits.

The task is now reduced to the evaluation of an imaginary Gaussian like integral, and we are given the hint integrate \( e^{-z^2} \) over a pie shaped contour fig. 2.

fig. 2. Pie shaped integration contour.

Over \( I \) we set \( z = x, x \in [0, R] \), over \( J \), we set \( z = R e^{i\theta}, \theta \in [0, \pi/4] \), and on \( K \) we set \( z = u e^{i\pi/4}, u \in [R, 0] \). Since there are no enclosed poles we have
\begin{equation}\label{eqn:generalizedGaussian:280}
\begin{aligned}
0 &= I + J + K \\
&= \int_0^R e^{- x^2} dx + \int_0^{\pi/4} e^{-R^2 e^{2 i \theta} } i R e^{i\theta} d\theta – \int_0^R e^{-i u^2 } e^{i \pi/4} du.
\end{aligned}
\end{equation}
In the limit we have
\begin{equation}\label{eqn:generalizedGaussian:300}
\begin{aligned}
\int_0^\infty e^{-i u^2 } du
&= e^{-i\pi/4} \int_0^\infty e^{- x^2} dx + L \\
&= e^{-i\pi/4} \sqrt{\pi}/2 + L,
\end{aligned}
\end{equation}
where
\begin{equation}\label{eqn:generalizedGaussian:320}
L = \lim_{R\rightarrow \infty} e^{-i\pi/4} \int_0^{\pi/4} e^{-R^2 e^{2 i \theta} } i R e^{i\theta} d\theta.
\end{equation}
We hope that this is zero in the limit, but showing this requires a bit of care around the \( \pi/4 \) endpoint. We start with
\begin{equation}\label{eqn:generalizedGaussian:340}
\begin{aligned}
\Abs{L}
&\le \lim_{R\rightarrow \infty} \int_0^{\pi/4} R e^{-R^2 \cos\lr{2 \theta} } d\theta \\
&= \lim_{R\rightarrow \infty} \inv{2} \int_{\pi/4 – \epsilon/2}^{\pi/4} R e^{-R^2 \cos\lr{2 \theta} } (2 d\theta),
\end{aligned}
\end{equation}
Now we make a change of variables
\begin{equation}\label{eqn:generalizedGaussian:360}
t = \frac{\pi}{2} – 2 \theta.
\end{equation}
At the limits we have
\begin{equation}\label{eqn:generalizedGaussian:380}
\begin{aligned}
t(\pi/4 – \epsilon/2) &= \epsilon \\
t(\pi/4) &= 0.
\end{aligned}
\end{equation}
Also,
\begin{equation}\label{eqn:generalizedGaussian:400}
\begin{aligned}
\cos\lr{ 2 \theta }
&= \textrm{Re} \lr{ e^{2 i \theta} } \\
&= \textrm{Re} \lr{ e^{i \lr{\pi/2 – t} } } \\
&= \textrm{Re} \lr{ i e^{-i t } } \\
&= \sin t,
\end{aligned}
\end{equation}
so
\begin{equation}\label{eqn:generalizedGaussian:420}
\Abs{L} \le \lim_{R\rightarrow \infty} \inv{2} \int_0^\epsilon R e^{-R^2 \sin t} dt.
\end{equation}
Since we have forced \( t \) small, we can use the small angle approximation for the sine
\begin{equation}\label{eqn:generalizedGaussian:440}
\begin{aligned}
\int_0^\epsilon R e^{-R^2 \sin t} dt
&\approx
\int_0^\epsilon R e^{-R^2 t} dt \\
&= \evalrange{ \inv{-2 R} e^{-R^2 t} }{0}{\epsilon} \\
&= \frac{ 1 – e^{-R^2 \epsilon }}{2 R}.
\end{aligned}
\end{equation}
The numerator goes to zero for either \( \epsilon \rightarrow 0 \), or \( R \rightarrow \infty \), and the denominator to infinity, so we have the desired zero in the limit. This means that
\begin{equation}\label{eqn:generalizedGaussian:460}
\begin{aligned}
G(x,x’,\tau)
&= \frac{e^{-i\pi/4}}{\sqrt{4 \pi \alpha}} e^{- (\Delta x)^2/4\tau} \\
&= \frac{1}{\sqrt{4 \pi i \alpha}} e^{- (\Delta x)^2/4\tau} \\
&= \frac{1}{\sqrt{4 \pi \tau}} e^{- (\Delta x)^2/4\tau},
\end{aligned}
\end{equation}
as expected.

A fun application, also noted in the problem, is that we can decompose the imaginary integral
\begin{equation}\label{eqn:generalizedGaussian:480}
\int_{-\infty}^\infty e^{-i u^2} du = \sqrt{\pi} e^{-i\pi/4},
\end{equation}
into real and imaginary parts, to find
\begin{equation}\label{eqn:generalizedGaussian:500}
\int_{-\infty}^\infty \cos u^2 du = \int_{-\infty}^\infty \sin u^2 du = \sqrt{\frac{\pi}{2}}.
\end{equation}
Despite being real valued integrals, it it not at all obvious how one would go about finding those without these contour integration tricks.

References

[1] F.W. Byron and R.W. Fuller. Mathematics of Classical and Quantum Physics. Dover Publications, 1992.

A first pytorch program to build a simple pattern recognizer.

July 12, 2025 math and physics play , , , , ,

What is it called if you use AI to build an AI?  That’s what I’ve done to understand a bit of how PyTorch works, building a little program to train a simple network to attempt to recognize Fibonacci like sequences.

Fibonacci recap.

Recall that the Fibonacci series is defined by a recurrence relationship where the next term is the sum of the previous two terms
\begin{equation}\label{eqn:pytorch:10}
F_k = F_{k-1} + F_{k-2}.
\end{equation}
Two specific values are required to start things off, and the usual two such starting values are \( 0, 1 \), or \( 1, 1 \).

I liked the idea of using something deterministic for training data, and asked Claude to build me a simple NN that used Fibonacci like sequences
\begin{equation}\label{eqn:pytorch:20}
F_k = \alpha F_{k-1} + \beta F_{k-2},
\end{equation}
as training data, using \( a = F_0, b = F_1 \). It turns out that choices of \( \alpha, \beta \) greater than one make the neural network training blow up, as the values increase quickly, getting out of control. It was possible to work around that in the NN training, but renormalizing the input data using a log transformation, and then re-exponentiating it afterwards. However, I decided that such series were less interesting than those closer to the Fibonacci series itself, and disabled that log renormalization by default (a command line option –logtx is available to force that, required for both training and inferrence, if used.)

Neural Network Architecture

There are a couple building blocks that the network uses.

  • \(\mathrm{ReLU} \) (Rectified Linear Unit) is an activation function in PyTorch
    \begin{equation*}
    \textrm{ReLU}(x) = \max(0, x).
    \end{equation*}
    If input is positive, then the output is the input, but if the input is negative or zero, the output is zero.
  • \( \mathrm{Dropout} \).
    Dropout is a regularization technique that randomly sets some neurons to zero during training to prevent overfitting.During training, for each neuron:
    \begin{equation*}
    y_i =
    \begin{cases}
    0 & \text{with probability } p \\
    \frac{x_i}{1-p} & \text{with probability } 1-p
    \end{cases}
    \end{equation*}
    where \( p \) is the dropout probability, \( x_i \) is the input to the neuron, and \( y_i \) is the output.With the 10\% dropout probability, this means that some inputs are zeroed randomly, with whatever is left increased slightly (approximately 1.1x).

The model is a feedforward neural network with the following structure:

Input Layer

  • Input: (\mathbf{x} = [f_{k-2}, f_{k-1}] \in \mathbb{R}^2\)
  • Output: \(f_k \in \mathbb{R}\)

Hidden Layers

The network has 3 hidden layers with ReLU activations:

\begin{equation*}
\mathbf{h}_1 = \textrm{ReLU}(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1)
\end{equation*}
\begin{equation*}
\mathbf{h}_1′ = \textrm{Dropout}(\mathbf{h}_1, p=0.1)
\end{equation*}

\begin{equation*}
\mathbf{h}_2 = \textrm{ReLU}(\mathbf{W}_2 \mathbf{h}_1′ + \mathbf{b}_2)
\end{equation*}
\begin{equation*}
\mathbf{h}_2′ = \textrm{Dropout}(\mathbf{h}_2, p=0.1)
\end{equation*}

\begin{equation*}
\mathbf{h}_3 = \textrm{ReLU}(\mathbf{W}_3 \mathbf{h}_2′ + \mathbf{b}_3)
\end{equation*}

Output Layer

\begin{equation*}
\hat{f}_k = \mathbf{W}_4 \mathbf{h}_3 + \mathbf{b}_4
\end{equation*}

Where:

  • \(\mathbf{W}_1 \in \mathbb{R}^{32 \times 2}, \mathbf{b}_1 \in \mathbb{R}^{32}\)
  • \(\mathbf{W}_2, \mathbf{W}_3 \in \mathbb{R}^{32 \times 32}, \mathbf{b}_2, \mathbf{b}_3 \in \mathbb{R}^{32}\)
  • \(\mathbf{W}_4 \in \mathbb{R}^{1 \times 32}, \mathbf{b}_4 \in \mathbb{R}^{1}\)

Essentially, we have some FMA like operations, but using matrices, not floats, with some functional filters between layers.  Eons ago, I recall using sigmoid functions as the filters, which were non-linear.  It looks like modern networks use operations that are more amenable to parallel computation.

The setup for the network layers is pretty simple

class SequencePredictor(nn.Module):
    def __init__(self, input_size=2, hidden_size=32, output_size=1):
        super(SequencePredictor, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.fc4 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.1)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.relu(self.fc3(x))
        x = self.fc4(x)
        return x

Training.

Training only takes a couple lines of code:

    # Convert to PyTorch tensors
    X_tensor = torch.FloatTensor(X_norm)
    y_tensor = torch.FloatTensor(y_norm).unsqueeze(1)
    
    # Create model
    model = SequencePredictor()
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    # Training loop
    print("Training model...")
    losses = []

    for epoch in range(epochs):
        # Forward pass
        predictions = model(X_tensor)
        loss = criterion(predictions, y_tensor)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        losses.append(loss.item())

        if (epoch + 1) % 200 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.6f}')

A lot of this is a blackbox, but unlike more complicated and sophisticated models, it looks pretty feasible to learn what each of these steps is doing. It looks like the optimizer is probably doing a step of gradient descent in the current neighbourhood.

Results

The classic Fibonacci series wasn’t in the training data, and the model only achieves order of magnitude predictions for it:

Testing on α=1, β=1, a=1, b=1...

Step True     Predicted  Absolute Error %     Relative Error %    
----------------------------------------------------------------
0    1.0      1.0        0.0                  0.0                 
1    1.0      1.0        0.0                  0.0                 
2    2.0      2.7        0.7                  35.9                
3    3.0      4.2        1.2                  39.3                
4    5.0      6.4        1.4                  27.8                
5    8.0      9.5        1.5                  19.0                
6    13.0     13.6       0.6                  4.9                 
7    21.0     18.8       2.2                  10.6                
8    34.0     25.0       9.0                  26.6                
9    55.0     33.4       21.6                 39.3                

Total absolute error: 38.30111360549927
Total relative error: 203.36962890625

Of the training data, one particular sequence matches fairly closely:

Testing on α=0.911811, β=0.857173, a=1.45565, b=1.65682...

Step True     Predicted  Absolute Error %     Relative Error %
----------------------------------------------------------------
0    1.5      1.5        0.0                  0.0
1    1.7      1.7        0.0                  0.0
2    2.8      3.7        0.9                  32.8
3    3.9      5.4        1.5                  37.9
4    6.0      8.2        2.3                  38.0
5    8.8      12.2       3.4                  38.5
6    13.1     17.5       4.3                  33.0
7    19.5     23.7       4.2                  21.3
8    29.0     31.7       2.7                  9.2
9    43.2     43.2       0.1                  0.1

Total absolute error: 19.270923376083374
Total relative error: 210.85977474600077

The worse matching from the training data has increasing relative error as the sequence progresses:

Testing on α=0.942149, β=1.03861, a=1.36753, b=1.94342...

Step True     Predicted  Absolute Error %     Relative Error %
----------------------------------------------------------------
0    1.4      1.4        0.0                  0.0
1    1.9      1.9        0.0                  0.0
2    3.3      3.7        0.5                  15.1
3    5.1      5.6        0.6                  11.0
4    8.2      8.5        0.3                  4.2 
5    13.0     12.5       0.4                  3.3 
6    20.7     17.8       2.9                  14.1
7    33.0     24.0       9.0                  27.3
8    52.6     32.1       20.4                 38.9
9    83.8     43.7       40.0                 47.8

Total absolute error: 74.20500636100769
Total relative error: 161.64879322052002

To get a better idea of how the model does against the training data, here are a couple plots, one with good matching initially, and then divergence:

and one with good average on the whole, but no great specific matching anywhere:

and plots against all the training data:

 

It would be more interesting to train this against something that isn’t completely monotonic, perhaps some set of characteristic functions, sine, sinc, exp, …  Since the model ends up extrapolating against the first couple elements, having training data that starts with a wider variation would be useful.

Weighted geometric series

March 23, 2025 math and physics play , , ,

[Click here for a PDF version of this post]

Karl needed to evaluate the sum:

\begin{equation}\label{eqn:weightedGeometric:20}
S = \sum_{k = 0}^9 \frac{a + b k}{\lr{ 1 + i }^k}
\end{equation}

He ended up using a spreadsheet, which was a quick and effective way to deal with the problem. I was curious about this sum, since he asked me how to sum it symbolically, and I didn’t know.

Mathematica doesn’t have any problem with it, as seen in fig. 1.

fig. 1. A funky sum.

How can we figure this out?

Let’s write \( r = 1/(1+i) \) to start with, and break up the sum into constituent parts
\begin{equation}\label{eqn:weightedGeometric:40}
\begin{aligned}
S_n
&= \sum_{k = 0}^n \frac{a + b k}{\lr{ 1 + i }^k} \\
&= a \sum_{k = 0}^n r^k + b \sum_{k = 0}^n k r^k.
\end{aligned}
\end{equation}
We can evaluate the geometric part of this easily using the usual trick. Let
\begin{equation}\label{eqn:weightedGeometric:60}
T_n = \sum_{k = 0}^n r^k,
\end{equation}
then
\begin{equation}\label{eqn:weightedGeometric:80}
r T_n – T_n = r^{n+1} – 1,
\end{equation}
so
\begin{equation}\label{eqn:weightedGeometric:100}
T_n = \frac{r^{n+1} – 1}{r – 1}.
\end{equation}
Now we just have to figure out how to sum
\begin{equation}\label{eqn:weightedGeometric:120}
G_n = \sum_{k = 0}^n k r^k = \sum_{k = 1}^n k r^k.
\end{equation}
This looks suspiciously like the derivative of a geometric series. Let’s evaluate such a derivative, as a function of r:
\begin{equation}\label{eqn:weightedGeometric:140}
\begin{aligned}
\frac{d T_n(r)}{dr}
&= \sum_{k = 0}^n \frac{d}{dr} r^k \\
&= \sum_{k = 0}^n k r^{k-1} \\
&= \sum_{k = 1}^n k r^{k-1} \\
&= \inv{r} \sum_{k = 1}^n k r^k.
\end{aligned}
\end{equation}
Having summed the geometric series, we may also take the derivative of that summed result
\begin{equation}\label{eqn:weightedGeometric:160}
\begin{aligned}
\frac{d T_n(r)}{dr}
&= \frac{d}{dr} \lr{ \frac{r^{n+1} – 1}{r – 1} } \\
&= \frac{(n+1)r^n}{r – 1} – \frac{r^{n+1} – 1}{\lr{r -1}^2} \\
&= \frac{(n+1)r^n(r-1) – \lr{r^{n+1} – 1}}{\lr{r -1}^2}.
\end{aligned}
\end{equation}
Putting the pieces together, we have
\begin{equation}\label{eqn:weightedGeometric:180}
T_n = \frac{r}{\lr{r -1}^2} \lr{ (n+1)r^n(r-1) – \lr{r^{n+1} – 1} }.
\end{equation}

This means that our sum is
\begin{equation}\label{eqn:weightedGeometric:200}
S_n = a \frac{r^{n+1} – 1}{r – 1} + b \frac{r}{\lr{r -1}^2} \lr{ (n+1)r^n(r-1) – \lr{r^{n+1} – 1} }.
\end{equation}
Putting back \( r = 1/(1+i) \), and subsequent simplification, gives the Mathematica result. It’s not pretty, but at least we can do it if we want to.

Shout out to Grok that pointed out the derivative trick for the second series.  I’d forgotten that one.

Impedance refresher.

March 21, 2025 math and physics play , , , , , , , , , , , ,

[Click here for a PDF version of this post]

Karl is taking his circuits course right now, which means that I get a chance to field some questions. I don’t get an excuse to think about this stuff any more. It’s fun material, since most of the ideas are all really simple, and you can figure out everything from first principles.

Karl just started sinusoidal circuits, which I think is a bit exciting. They are such a nice special case, as complex calculations are all effectively reduced to \( V = I R \) style computations.

Solving RLC circuits for general time dependent sources.

To contrast the simple case of sinusoidal sources, let’s consider what we have to do in order to solve a general case RLC circuit. The simple basic RC circuit sketched in fig. 1 provides a good illustrative example, even though it does not include any inductance.

With \( v \) as the voltage at the capacitor, the equations that describe the circuit are
\begin{equation}\label{eqn:impedance:20}
\begin{aligned}
v_s – v &= i R \\
i &= C \frac{dv}{dt}.
\end{aligned}
\end{equation}

We can combine these into one equation for \( v \). Letting \( \tau = RC \), that is
\begin{equation}\label{eqn:impedance:40}
v + \tau \frac{dv}{dt} = v_s.
\end{equation}
Here \( v_s = v_s(t) \) can be an arbitrary function of time. This is a simple enough differential equation, and can probably be solved in various ways (integrating factors, Fourier transforms, Laplace transforms, …)

For illustration purposes, let’s tackle this little equation with Fourier transforms, a method logically equivalent to the computation of the Green’s function for the system.

Let’s use a symmetric representation of the Fourier transform
\begin{equation}\label{eqn:impedance:60}
\begin{aligned}
F(\omega) &= \inv{\sqrt{2 \pi}} \int_{-\infty}^\infty e^{-j\omega t} f(t) dt \\
f(t) &= \inv{\sqrt{2 \pi}} \int_{-\infty}^\infty e^{j\omega t} F(\omega) d\omega \\
\end{aligned}
\end{equation}
Recall that the Fourier transform of the derivative is just a \( j \omega \) scaled frequency domain function, which we show with integration by parts
\begin{equation}\label{eqn:impedance:80}
\begin{aligned}
\inv{\sqrt{2 \pi}} \int_{-\infty}^\infty e^{-j\omega t} \frac{df}{dt} dt
&= \inv{\sqrt{2 \pi}} \int_{-\infty}^\infty \lr{ \frac{d}{dt} \lr{ f(t) e^{-j \omega t} } – f(t) \frac{d}{dt}\lr{ e^{-j\omega t}} } dt \\
&= j \omega F(\omega).
\end{aligned}
\end{equation}
That means that the frequency domain equivalent of our system is
\begin{equation}\label{eqn:impedance:100}
V + j \omega \tau V = V_s,
\end{equation}
or
\begin{equation}\label{eqn:impedance:120}
V(\omega) = \frac{V_s(\omega)}{1 + j \omega \tau}.
\end{equation}
Inverse transformation yields
\begin{equation}\label{eqn:impedance:140}
\begin{aligned}
v(t)
&= \inv{\sqrt{2 \pi}} \int_{-\infty}^\infty e^{j\omega t} \frac{V_s(\omega)}{1 + j \omega \tau} d\omega \\
&= \inv{2 \pi} \iint_{-\infty}^\infty e^{j\omega t} \frac{1}{1 + j \omega \tau} d\omega e^{-j\omega t’} v_s(t’) dt’ \\
&= \int_{-\infty}^\infty dt’ v_s(t’) \inv{2\pi} \int_{-\infty}^\infty \frac{e^{j\omega(t-t’)}}{1 + j \omega \tau} d\omega,
\end{aligned}
\end{equation}
or with
\begin{equation}\label{eqn:impedance:160}
G(u) = \inv{2\pi} \int_{-\infty}^\infty \frac{e^{j\omega u}}{1 + j \omega \tau} d\omega,
\end{equation}
\begin{equation}\label{eqn:impedance:180}
v(t) = \int_{-\infty}^\infty v_s(t’) G(t – t’) dt’.
\end{equation}
We just need to evaluate the Green’s function \( G(u) \) to proceed, which we can do with standard contour integration, first writing:

\begin{equation}\label{eqn:impedance:200}
\begin{aligned}
G(u)
&= \inv{2\pi j \tau} \int_{-\infty}^\infty \frac{e^{j\omega u}}{\inv{j\tau} + \omega} d\omega \\
&= \inv{2\pi j \tau} \oint \frac{e^{j z u}}{\inv{j\tau} + z} dz.
\end{aligned}
\end{equation}
This has a single pole at \( z = j/\tau \). We need an infinite semicircular contour in the lower half plane for \( u < 0 \), and can use the upper half plane infinite semicircular contour (surrounding the pole) for \( u > 0 \). That gives
\begin{equation}\label{eqn:impedance:220}
\begin{aligned}
G(u)
&= \Theta(u) \frac{2 \pi j}{2\pi j \tau} \evalbar{e^{j z u}}{z = j/\tau} \\
&= \frac{\Theta(u)}{\tau} e^{- u/\tau}.
\end{aligned}
\end{equation}

The solution to the problem, for any Fourier integrable source \( v_s(t) \), is
\begin{equation}\label{eqn:impedance:240}
\boxed{
v(t) = \int_{-\infty}^t v_s(t’) \frac{e^{- \lr{t – t’}/\tau}}{\tau} dt’.
}
\end{equation}

As a check, let’s evaluate this convolution integral for a step source \( v_s(t) = V \Theta(t) \), to find
\begin{equation}\label{eqn:impedance:260}
\begin{aligned}
v(t)
&= \frac{V e^{-t/\tau}}{\tau} \int_0^t e^{t’/\tau} dt’ \\
&= V e^{-t/\tau} \evalrange{e^{t’/\tau} }{0}{t} \\
&= V e^{-t/\tau} \lr{ e^{t/\tau} – 1 } \\
&= V \lr{ 1 – e^{-t/\tau} }.
\end{aligned}
\end{equation}
This is the damped time domain response that we remember for an RC circuit. In Karl’s first year engineering notes, this was presented as a given (without the step factor), and he had to verify that it worked by differentiation (for \( t > 0 \).)

Solving this exactly, even for arbitrary sources, as we’ve done above, is not strictly hard, if you have all the required tools. But the first year engineering student doesn’t have all those tools to start with. This is where the beauty of the phasor techniques for sinusoidal sources comes in. Let’s now see how that works.

Phasor approach.

Let’s consider the three simplest RLC circuits, each with just a single element, and a variable voltage source. I’ll depict that element with a box as in fig. 2.

Resistor case.

If the element is a resistor with value \( R \), our equations are simple
\begin{equation}\label{eqn:impedance:280}
v_s = i R.
\end{equation}
Clearly \( i \) is directly proportional to the source voltage. In particular, if \( v_s(t) \) has a sinusoidal character, such as
\begin{equation}\label{eqn:impedance:300}
v_s(t) = V \cos(\omega t),
\end{equation}
then
\begin{equation}\label{eqn:impedance:320}
i(t) = \frac{V}{R} \cos\lr{ \omega t }.
\end{equation}
In particular, if we let \( i(t) = I \cos\lr{ \omega t } \), then we have
\begin{equation}\label{eqn:impedance:340}
I = \frac{V}{R},
\end{equation}
or \( V = I R \).

Capacitor case.

If the load element is a capacitor with capacitance \( C \), then the equation for the system is
\begin{equation}\label{eqn:impedance:360}
i = C \frac{dv_s(t)}{dt}.
\end{equation}
If we just plug in \( v_s(t) = V \cos(\omega t) \), as before, we get a bit of a mess
\begin{equation}\label{eqn:impedance:380}
i = -C \omega V \sin\lr{ \omega t }.
\end{equation}
We no longer have a nice simple proportionality relationship between the current and the voltage source, as the capacitor has introduced a phase shift into the mix.
We can figure out that phase factor by solving the equation
\begin{equation}\label{eqn:impedance:400}
-\sin x = \cos\lr{ x + \phi }.
\end{equation}
The easiest way to solve this is to express the sinusoids in complex exponential form
\begin{equation}\label{eqn:impedance:420}
\textrm{Re} \lr{ e^{j + \phi} } = \Real \lr{ j e^{j x} } = \Real \lr{ e^{j \pi/2} e^{j x} }.
\end{equation}
We see that the phase factor is a \( \pi/2 \) shift. However, even better, we have a strong hint that working with complex exponentials may be a better approach to formulating the problem.

Let’s write
\begin{equation}\label{eqn:impedance:440}
v_s(t) = V \cos\lr{ \omega t } = \textrm{Re} \lr{ V e^{j \omega t} }.
\end{equation}
Then we have
\begin{equation}\label{eqn:impedance:460}
i(t) = C V \textrm{Re} \lr{ \frac{d}{dt} e^{j \omega t} }.
\end{equation}
If we also assume that we can write
\begin{equation}\label{eqn:impedance:480}
i(t) = \textrm{Re} \lr{ I e^{j \omega t} },
\end{equation}
then if the real parts are equal, we must also have
\begin{equation}\label{eqn:impedance:500}
I e^{j \omega t} = j \omega C V e^{j \omega t},
\end{equation}
or
\begin{equation}\label{eqn:impedance:520}
I = j \omega C V.
\end{equation}
We have a \( V = I R \) relationship, which we write as
\begin{equation}\label{eqn:impedance:540}
V = I Z,
\end{equation}
where
\begin{equation}\label{eqn:impedance:560}
Z = \inv{j \omega C}.
\end{equation}
This is the phasor description of the circuit.

Inductive case.

If the circuit has an inductive load, then the system equation is
\begin{equation}\label{eqn:impedance:580}
v_s(t) = L \frac{di}{dt}.
\end{equation}
Again, we can write \( v_s(t) = \textrm{Re} \lr{ V e^{j\omega t} } \), and assume that \( i = \Real \lr{ I e^{j\omega t} } \). We then require
\begin{equation}\label{eqn:impedance:600}
V e^{j \omega t} = L \frac{d}{dt} \lr{ I e^{j \omega t} },
\end{equation}
or
\begin{equation}\label{eqn:impedance:620}
V = j \omega L I.
\end{equation}
We write
\begin{equation}\label{eqn:impedance:640}
Z = j \omega L,
\end{equation}
so once again \( V = I Z \).

Solving a more complex RLC configuration.

An example of a more complicated RLC circuit is sketched in fig. 3.

Here we have two impedances in parallel
\begin{equation}\label{eqn:impedance:660}
\begin{aligned}
Z_C &= \inv{j \omega C} \\
Z_L &= j \omega L.
\end{aligned}
\end{equation}
The parallel impedance through that reactive load is
\begin{equation}\label{eqn:impedance:680}
\begin{aligned}
Z
&= \lr{ \inv{Z_C} + \inv{Z_L} }^{-1} \\
&= \lr{ j \omega C + \inv{ j \omega L } }^{-1}.
\end{aligned}
\end{equation}
We can compute the current through \( R \) now
\begin{equation}\label{eqn:impedance:700}
I = \frac{V_s}{ R + \lr{ j \omega C + \inv{ j \omega L } }^{-1} }.
\end{equation}
We also have
\begin{equation}\label{eqn:impedance:720}
\frac{V_s – V}{R} = I,
\end{equation}
or
\begin{equation}\label{eqn:impedance:740}
\begin{aligned}
V
&= V_s – I R \\
&= V_s \lr{ 1 – \frac{R}{ R + \lr{ j \omega C + \inv{ j \omega L } }^{-1} } } \\
&= V_s \frac{\lr{ j \omega C + \inv{ j \omega L } }^{-1} }{ R + \lr{ j \omega C + \inv{ j \omega L } }^{-1} } \\
&= \frac{V_s}{ R \lr{ j \omega C + \inv{ j \omega L } } + 1 }.
\end{aligned}
\end{equation}
The complicated time response for this system is reduced to a trivial voltage divider calculation. We see that it was kind of pointless to run an inductor and capacitor in parallel, as they are both purely reactive (imaginary). That’s a detail that I didn’t remember, since it’s been decades since I did any practical circuits applications. However, the point is, by using a complex exponential source representation, these types of systems are reduced from systems of coupled differential equations to simple linear systems. Imagine how messy it would be to try to solve this system using the Green’s function methods that we used above!