[Click here for a PDF of this post with nicer formatting]
Motivation
The conventional form for the Pauli matrices is
\begin{equation}\label{eqn:pauliMatrixXYgeometry:20}
\begin{aligned}
\sigma_x &=
\begin{bmatrix}
0 & 1 \\
1 & 0 \\
\end{bmatrix} \\
\sigma_y &=
\begin{bmatrix}
0 & -i \\
i & 0 \\
\end{bmatrix} \\
\sigma_z &=
\begin{bmatrix}
1 & 0 \\
0 & -1 \\
\end{bmatrix}
\end{aligned}.
\end{equation}
In [1] these forms are derived based on the commutation relations
\begin{equation}\label{eqn:pauliMatrixXYgeometry:40}
\antisymmetric{\sigma_r}{\sigma_s} = 2 i \epsilon_{r s t} \sigma_t,
\end{equation}
by defining raising and lowering operators \( \sigma_{\pm} = \sigma_x \pm i \sigma_y \) and figuring out what form the matrix must take. I noticed an interesting geometrical relation hiding in that derivation if \( \sigma_{+} \) is not assumed to be real.
Derivation
For completeness, I’ll repeat the argument of [1], which builds on the commutation relations of the raising and lowering operators. Those are
\begin{equation}\label{eqn:pauliMatrixXYgeometry:60}
\begin{aligned}
\antisymmetric{\sigma_z}{\sigma_{\pm}}
&=
\sigma_z \lr{ \sigma_x \pm i \sigma_y }
-\lr{ \sigma_x \pm i \sigma_y } \sigma_z \\
&=
\antisymmetric{\sigma_z}{\sigma_x} \pm i \antisymmetric{\sigma_z}{\sigma_y} \\
&=
2 i \sigma_y \pm i (-2 i) \sigma_x \\
&= \pm 2 \lr{ \sigma_x \pm i \sigma_y } \\
&= \pm 2 \sigma_{\pm},
\end{aligned}
\end{equation}
and
\begin{equation}\label{eqn:pauliMatrixXYgeometry:80}
\begin{aligned}
\antisymmetric{\sigma_{+}}{\sigma_{-}}
&=
\lr{ \sigma_x + i \sigma_y } \lr{ \sigma_x – i \sigma_y }
-\lr{ \sigma_x – i \sigma_y } \lr{ \sigma_x + i \sigma_y } \\
&=
-i \sigma_x \sigma_y + i \sigma_y \sigma_x
– i \sigma_x \sigma_y + i \sigma_y \sigma_x \\
&= 2 i \antisymmetric{ \sigma_y }{\sigma_x} \\
&= 2 i (-2i) \sigma_z \\
&= 4 \sigma_z
\end{aligned}
\end{equation}
From these a matrix representation containing unknown values can be assumed. Let
\begin{equation}\label{eqn:pauliMatrixXYgeometry:100}
\sigma_{+} =
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}.
\end{equation}
The commutator with \( \sigma_z \) can be computed
\begin{equation}\label{eqn:pauliMatrixXYgeometry:120}
\begin{aligned}
\antisymmetric{\sigma_z}{\sigma_{+}}
&=
\begin{bmatrix}
1 & 0 \\
0 & -1 \\
\end{bmatrix}
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
–
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
0 & -1 \\
\end{bmatrix}
\\
&=
\begin{bmatrix}
a & b \\
-c & -d
\end{bmatrix}
–
\begin{bmatrix}
a & -b \\
c & -d
\end{bmatrix} \\
&=
2
\begin{bmatrix}
0 & b \\
-c & 0
\end{bmatrix}
\end{aligned}
\end{equation}
Now compare this with \ref{eqn:pauliMatrixXYgeometry:60}
\begin{equation}\label{eqn:pauliMatrixXYgeometry:140}
2
\begin{bmatrix}
0 & b \\
-c & 0
\end{bmatrix}
=
2 \sigma_{+}
=
2
\begin{bmatrix}
a & b \\
d & d
\end{bmatrix}.
\end{equation}
This shows that \( a = 0 \), and \( d = 0 \). Similarly the \( \sigma_z \) commutator with the lowering operator is
\begin{equation}\label{eqn:pauliMatrixXYgeometry:160}
\begin{aligned}
\antisymmetric{\sigma_z}{\sigma_{-}}
&=
\begin{bmatrix}
1 & 0 \\
0 & -1 \\
\end{bmatrix}
\begin{bmatrix}
0 & -c^\conj \\
b^\conj & 0
\end{bmatrix}
–
\begin{bmatrix}
0 & -c^\conj \\
b^\conj & 0
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
0 & -1 \\
\end{bmatrix}
\\
&=
\begin{bmatrix}
0 & -c^\conj \\
-b^\conj & 0
\end{bmatrix}
–
\begin{bmatrix}
0 & c^\conj \\
b^\conj & 0
\end{bmatrix} \\
&=
-2
\begin{bmatrix}
0 & c^\conj \\
b^\conj & 0
\end{bmatrix}
\end{aligned}
\end{equation}
Again comparing to \ref{eqn:pauliMatrixXYgeometry:60}, we have
\begin{equation}\label{eqn:pauliMatrixXYgeometry:180}
-2
\begin{bmatrix}
0 & c^\conj \\
b^\conj & 0
\end{bmatrix}
= – 2 \sigma_{-}
= – 2
\begin{bmatrix}
0 & -c^\conj \\
b^\conj & 0
\end{bmatrix},
\end{equation}
so \( c = 0 \). Computing the commutator of the raising and lowering operators fixes \( b \)
\begin{equation}\label{eqn:pauliMatrixXYgeometry:200}
\begin{aligned}
\antisymmetric{\sigma_{+}}{\sigma_{-}}
&=
\begin{bmatrix}
0 & b \\
0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
0 & 0 \\
b^\conj & 0 \\
\end{bmatrix}
–
\begin{bmatrix}
0 & 0 \\
b^\conj & 0 \\
\end{bmatrix}
\begin{bmatrix}
0 & b \\
0 & 0 \\
\end{bmatrix} \\
&=
\begin{bmatrix}
\Abs{b}^2 & 0 \\
0 & 0
\end{bmatrix}
–
\begin{bmatrix}
0 & 0
0 & -\Abs{b}^2 \\
\end{bmatrix} \\
&=
\Abs{b}^2
\begin{bmatrix}
1 & 0 \\
0 & -1 \\
\end{bmatrix}
\\
&=
\Abs{b}^2 \sigma_z.
\end{aligned}
\end{equation}
From \ref{eqn:pauliMatrixXYgeometry:80} it must be that \( \Abs{b}^2 = 4\), so the most general form of the raising operator is
\begin{equation}\label{eqn:pauliMatrixXYgeometry:220}
\sigma_{+}
=
2
\begin{bmatrix}
0 & e^{i \phi} \\
0 & 0
\end{bmatrix}.
\end{equation}
Observation
The conventional choice is to set \( \phi = 0 \), but I found it interesting to see the form of \( \sigma_x, \sigma_y \) without that choice. That is
\begin{equation}\label{eqn:pauliMatrixXYgeometry:240}
\begin{aligned}
\sigma_x
&= \inv{2} \lr{ \sigma_{+} + \sigma_{-} } \\
&=
\begin{bmatrix}
0 & e^{i \phi} \\
e^{-i \phi} & 0 \\
\end{bmatrix}
\end{aligned}
\end{equation}
\begin{equation}\label{eqn:pauliMatrixXYgeometry:260}
\begin{aligned}
\sigma_y
&= \inv{2 i} \lr{ \sigma_{+} – \sigma_{-} } \\
&=
\begin{bmatrix}
0 & -i e^{i \phi} \\
-i e^{-i \phi} & 0 \\
\end{bmatrix} \\
&=
\begin{bmatrix}
0 & e^{i (\phi – \pi/2) } \\
e^{-i (\phi – \pi/2)} & 0 \\
\end{bmatrix}.
\end{aligned}
\end{equation}
Notice that the Pauli matrices \( \sigma_x \) and \( \sigma_y \) actually both have the same form as \( \sigma_x \), but the phase of the complex argument of each differs by \(90^\circ\). That \( 90^\circ \) separation isn’t obvious in the standard form \ref{eqn:pauliMatrixXYgeometry:20}.
It’s a small detail, but I thought it was kind of cool that the orthogonality of these matrix unit vector representations is built directly into the structure of their matrix representations.
References
[1] BR Desai. Quantum mechanics with basic field theory. Cambridge University Press, 2009.