ECE1505H Convex Optimization. Lecture 4: Sets and convexity. Taught by Prof. Stark Draper
[Click here for a PDF of this post with nicer formatting]
Disclaimer
Peeter’s lecture notes from class. These may be incoherent and rough.
These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, covering [1] content.
Today
- more on various sets: hyperplanes, half-spaces, polyhedra, balls, ellipses, norm balls, cone of PSD
- generalize inequalities
- operations that preserve convexity
- separating and supporting hyperplanes.
Hyperplanes
Find some \( \Bx_0 \in \mathbb{R}^n \) such that \( \Ba^\T \Bx_0 = \Bb \), so
\begin{equation}\label{eqn:convexOptimizationLecture4:20}
\begin{aligned}
\setlr{ \Bx | \Ba^\T \Bx = \Bb }
&=
\setlr{ \Bx | \Ba^\T \Bx = \Ba^\T \Bx_0 } \\
&=
\setlr{ \Bx | \Ba^\T (\Bx – \Bx_0) } \\
&=
\Bx_0 + \Ba^\perp,
\end{aligned}
\end{equation}
where
\begin{equation}\label{eqn:convexOptimizationLecture4:40}
\Ba^\perp = \setlr{ \Bv | \Ba^\T \Bv = 0 }.
\end{equation}
Recall
\begin{equation}\label{eqn:convexOptimizationLecture4:60}
\Norm{\Bz}_\conj = \sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx} \le 1 }
\end{equation}
Denote the optimizer of above as \( \Bx^\conj \). By definition
\begin{equation}\label{eqn:convexOptimizationLecture4:80}
\Bz^\T \Bx^\conj \ge \Bz^\T \Bx \quad \forall \Bx, \Norm{\Bx} \le 1
\end{equation}
This defines a half space in which the unit ball
\begin{equation}\label{eqn:convexOptimizationLecture4:100}
\setlr{ \Bx | \Bz^\T (\Bx – \Bx^\conj \le 0 }
\end{equation}
Start with the \( l_1 \) norm, duals of \( l_1 \) is \( l_\infty \)
Similar pic for \( l_\infty \), for which the dual is the \( l_1 \) norm, as sketched in fig. 3. Here the optimizer point is at \( (1,1) \)
and a similar pic for \( l_2 \), which is sketched in fig. 4.
Q: What was this optimizer point?
Polyhedra
\begin{equation}\label{eqn:convexOptimizationLecture4:120}
\begin{aligned}
\mathcal{P}
&= \setlr{ \Bx |
\Ba_j^\T \Bx \le \Bb_j, j \in [1,m],
\Bc_i^\T \Bx = \Bd_i, i \in [1,p]
} \\
&=
\setlr{ \Bx | A \Bx \le \Bb, C \Bx = d },
\end{aligned}
\end{equation}
where the final inequality and equality are component wise.
Proving \( \mathcal{P} \) is convex:
- Pick \(\Bx_1 \in \mathcal{P}\), \(\Bx_2 \in \mathcal{P} \)
- Pick any \(\theta \in [0,1]\)
- Test \( \theta \Bx_1 + (1-\theta) \Bx_2 \). Is it in \(\mathcal{P}\)?
\begin{equation}\label{eqn:convexOptimizationLecture4:140}
\begin{aligned}
A \lr{ \theta \Bx_1 + (1-\theta) \Bx_2 }
&=
\theta A \Bx_1 + (1-\theta) A \Bx_2 \\
&\le
\theta \Bb + (1-\theta) \Bb \\
&=
\Bb.
\end{aligned}
\end{equation}
Balls
Euclidean ball for \( \Bx_c \in \mathbb{R}^n, r \in \mathbb{R} \)
\begin{equation}\label{eqn:convexOptimizationLecture4:160}
\mathcal{B}(\Bx_c, r)
= \setlr{ \Bx | \Norm{\Bx – \Bx_c}_2 \le r },
\end{equation}
or
\begin{equation}\label{eqn:convexOptimizationLecture4:180}
\mathcal{B}(\Bx_c, r)
= \setlr{ \Bx | \lr{\Bx – \Bx_c}^\T \lr{\Bx – \Bx_c} \le r^2 }.
\end{equation}
Let \( \Bx_1, \Bx_2 \), \(\theta \in [0,1]\)
\begin{equation}\label{eqn:convexOptimizationLecture4:200}
\begin{aligned}
\Norm{ \theta \Bx_1 + (1-\theta) \Bx_2 – \Bx_c }_2
&=
\Norm{ \theta (\Bx_1 – \Bx_c) + (1-\theta) (\Bx_2 – \Bx_c) }_2 \\
&\le
\Norm{ \theta (\Bx_1 – \Bx_c)}_2 + \Norm{(1-\theta) (\Bx_2 – \Bx_c) }_2 \\
&=
\Abs{\theta} \Norm{ \Bx_1 – \Bx_c}_2 + \Abs{1 -\theta} \Norm{ \Bx_2 – \Bx_c }_2 \\
&=
\theta \Norm{ \Bx_1 – \Bx_c}_2 + \lr{1 -\theta} \Norm{ \Bx_2 – \Bx_c }_2 \\
&\le
\theta r + (1 – \theta) r \\
&= r
\end{aligned}
\end{equation}
Ellipse
\begin{equation}\label{eqn:convexOptimizationLecture4:220}
\mathcal{E}(\Bx_c, P)
=
\setlr{ \Bx | (\Bx – \Bx_c)^\T P^{-1} (\Bx – \Bx_c) \le 1 },
\end{equation}
where \( P \in S^n_{++} \).
- Euclidean ball is an ellipse with \( P = I r^2 \)
- Ellipse is image of Euclidean ball \( \mathcal{B}(0,1) \) under affine mapping.
Given
\begin{equation}\label{eqn:convexOptimizationLecture4:240}
F(\Bu) = P^{1/2} \Bu + \Bx_c
\end{equation}
\begin{equation}\label{eqn:convexOptimizationLecture4:260}
\begin{aligned}
\setlr{ F(\Bu) | \Norm{\Bu}_2 \le r }
&=
\setlr{ P^{1/2} \Bu + \Bx_c | \Bu^\T \Bu \le r^2 } \\
&=
\setlr{ \Bx | \Bx = P^{1/2} \Bu + \Bx_c, \Bu^\T \Bu \le r^2 } \\
&=
\setlr{ \Bx | \Bu = P^{-1/2} (\Bx – \Bx_c), \Bu^\T \Bu \le r^2 } \\
&=
\setlr{ \Bx | (\Bx – \Bx_c)^\T P^{-1} (\Bx – \Bx_c) \le r^2 }
\end{aligned}
\end{equation}
Geometry of an ellipse
Decomposition of positive definite matrix \( P \in S^n_{++} \subset S^n \) is:
\begin{equation}\label{eqn:convexOptimizationLecture4:280}
\begin{aligned}
P &= Q \textrm{diag}(\lambda_i) Q^\T \\
Q^\T Q &= 1
\end{aligned},
\end{equation}
where \( \lambda_i \in \mathbb{R}\), and \(\lambda_i > 0 \). The ellipse is defined by
\begin{equation}\label{eqn:convexOptimizationLecture4:300}
(\Bx – \Bx_c)^\T Q \textrm{diag}(1/\lambda_i) (\Bx – \Bx_c) Q \le r^2
\end{equation}
The term \( (\Bx – \Bx_c)^\T Q \) projects \( \Bx – \Bx_c \) onto the columns of \( Q \). Those columns are perpendicular since \( Q \) is an orthogonal matrix. Let
\begin{equation}\label{eqn:convexOptimizationLecture4:320}
\tilde{\Bx} = Q^\T (\Bx – \Bx_c),
\end{equation}
this shifts the origin around \( \Bx_c \) and \( Q \) rotates into a new coordinate system. The ellipse is therefore
\begin{equation}\label{eqn:convexOptimizationLecture4:340}
\tilde{\Bx}^\T
\begin{bmatrix}
\inv{\lambda_1} & & & \\
&\inv{\lambda_2} & & \\
& \ddots & \\
& & & \inv{\lambda_n}
\end{bmatrix}
\tilde{\Bx}
=
\sum_{i = 1}^n \frac{\tilde{x}_i^2}{\lambda_i} \le 1.
\end{equation}
An example is sketched for \( \lambda_1 > \lambda_2 \) below.
- \( \lambda_i \) tells us length of the semi-major axis.
- Larger \( \lambda_i \) means \( \tilde{x}_i^2 \) can be bigger and still satisfy constraint \( \le 1 \).
- Volume of ellipse if proportional to \( \sqrt{ \det P } = \sqrt{ \prod_{i = 1}^n \lambda_i } \).
- When any \( \lambda_i \rightarrow 0 \) a dimension is lost and the volume goes to zero. That removes the invertibility required.
Ellipses will be seen a lot in this course, since we are interested in “bowl” like geometries (and the ellipse is the image of a Euclidean ball).
Norm ball.
The norm ball
\begin{equation}\label{eqn:convexOptimizationLecture4:360}
\mathcal{B} = \setlr{ \Bx | \Norm{\Bx} \le 1 },
\end{equation}
is a convex set for all norms. Proof:
Take any \( \Bx, \By \in \mathcal{B} \)
\begin{equation}\label{eqn:convexOptimizationLecture4:380}
\Norm{ \theta \Bx + (1 – \theta) \By }
\le
\Abs{\theta} \Norm{ \Bx } + \Abs{1 – \theta} \Norm{ \By }
=
\theta \Norm{ \Bx } + \lr{1 – \theta} \Norm{ \By }
\lr
\theta + \lr{1 – \theta}
=
1.
\end{equation}
This is true for any p-norm \( 1 \le p \), \( \Norm{\Bx}_p = \lr{ \sum_{i = 1}^n \Abs{x_i}^p }^{1/p} \).
The shape of a \( p < 1 \) norm unit ball is sketched below (lines connecting points in such a region can exit the region).
Cones
Recall that \( C \) is a cone if \( \forall \Bx \in C, \theta \ge 0, \theta \Bx \in C \).
Impt cone of PSD matrices
\begin{equation}\label{eqn:convexOptimizationLecture4:400}
\begin{aligned}
S^n &= \setlr{ X \in \mathbb{R}^{n \times n} | X = X^\T } \\
S^n_{+} &= \setlr{ X \in S^n | \Bv^\T X \Bv \ge 0, \quad \forall v \in \mathbb{R}^n } \\
S^n_{++} &= \setlr{ X \in S^n_{+} | \Bv^\T X \Bv > 0, \quad \forall v \in \mathbb{R}^n } \\
\end{aligned}
\end{equation}
These have respectively
- \( \lambda_i \in \mathbb{R} \)
- \( \lambda_i \in \mathbb{R}_{+} \)
- \( \lambda_i \in \mathbb{R}_{++} \)
\( S^n_{+} \) is a cone if:
\( X \in S^n_{+}\), then \( \theta X \in S^n_{+}, \quad \forall \theta \ge 0 \)
\begin{equation}\label{eqn:convexOptimizationLecture4:420}
\Bv^\T (\theta X) \Bv
= \theta \Bv^\T \Bv
\ge 0,
\end{equation}
since \( \theta \ge 0 \) and because \( X \in S^n_{+} \).
Shorthand:
\begin{equation}\label{eqn:convexOptimizationLecture4:440}
\begin{aligned}
X &\in S^n_{+} \Rightarrow X \succeq 0
X &\in S^n_{++} \Rightarrow X \succ 0.
\end{aligned}
\end{equation}
Further \( S^n_{+} \) is a convex cone.
Let \( A \in S^n_{+} \), \( B \in S^n_{+} \), \( \theta_1, \theta_2 \ge 0, \theta_1 + \theta_2 = 1 \), or \( \theta_2 = 1 – \theta_1 \).
Show that \( \theta_1 A + \theta_2 B \in S^n_{+} \) :
\begin{equation}\label{eqn:convexOptimizationLecture4:460}
\Bv^\T \lr{ \theta_1 A + \theta_2 B } \Bv
=
\theta_1 \Bv^\T A \Bv
+\theta_2 \Bv^\T B \Bv
\ge 0,
\end{equation}
since \( \theta_1 \ge 0, \theta_2 \ge 0, \Bv^\T A \Bv \ge 0, \Bv^\T B \Bv \ge 0 \).
Inequalities:
Start with a proper cone \( K \subseteq \mathbb{R}^n \)
- closed, convex
- non-empty interior (“solid”)
- “pointed” (contains no lines)
The \( K \) defines a generalized inequality in \R{n} defined as “\(\le_K\)”
Interpreting
\begin{equation}\label{eqn:convexOptimizationLecture4:480}
\begin{aligned}
\Bx \le_K \By &\leftrightarrow \By – \Bx \in K
\Bx \end{aligned}
\end{equation}
Why pointed? Want if \( \Bx \le_K \By \) and \( \By \le_K \Bx \) with this \( K \) is a half space.
Example:1: \( K = \mathbb{R}^n_{+}, \Bx \in \mathbb{R}^n, \By \in \mathbb{R}^n \)
\begin{equation}\label{eqn:convexOptimizationLecture4:500}
\Bx \le_K \By \Rightarrow \By – \Bx \in K
\end{equation}
say:
\begin{equation}\label{eqn:convexOptimizationLecture4:520}
\begin{bmatrix}
y_1 – x_1
y_2 – x_2
\end{bmatrix}
\in R^2_{+}
\end{equation}
Also:
\begin{equation}\label{eqn:convexOptimizationLecture4:540}
K = R^1_{+}
\end{equation}
(pointed, since it contains no rays)
\begin{equation}\label{eqn:convexOptimizationLecture4:560}
\Bx \le_K \By ,
\end{equation}
with respect to \( K = \mathbb{R}^n_{+} \) means that \( x_i \le y_i \) for all \( i \in [1,n]\).
Example:2: For \( K = PSD \subseteq S^n \),
\begin{equation}\label{eqn:convexOptimizationLecture4:580}
\Bx \le_K \By ,
\end{equation}
means that
\begin{equation}\label{eqn:convexOptimizationLecture4:600}
\By – \Bx \in K = S^n_{+}.
\end{equation}
- Difference \( \By – \Bx \) is always in \( S \)
- check if in \( K \) by checking if all eigenvalues \( \ge 0 \).
- \( S^n_{++} \) is the interior of \( S^n_{+} \).
Interpretation:
\begin{equation}\label{eqn:convexOptimizationLecture4:620}
\begin{aligned}
\Bx \le_K \By &\leftrightarrow \By – \Bx \in K \\
\Bx \end{aligned}
\end{equation}
We’ll use these with vectors and matrices so often the \( K \) subscript will often be dropped, writing instead (for vectors)
\begin{equation}\label{eqn:convexOptimizationLecture4:640}
\begin{aligned}
\Bx \le \By &\leftrightarrow \By – \Bx \in \mathbb{R}^n_{+} \\
\Bx < \By &\leftrightarrow \By – \Bx \in \textrm{int} \mathbb{R}^n_{++}
\end{aligned}
\end{equation}
and for matrices
\begin{equation}\label{eqn:convexOptimizationLecture4:660}
\begin{aligned}
\Bx \le \By &\leftrightarrow \By – \Bx \in S^n_{+} \\
\Bx < \By &\leftrightarrow \By – \Bx \in \textrm{int} S^n_{++}.
\end{aligned}
\end{equation}
Intersection
Take the intersection of (perhaps infinitely many) sets \( S_\alpha \):
If \( S_\alpha \) is (affine,convex, conic) for all \( \alpha \in A \) then
\begin{equation}\label{eqn:convexOptimizationLecture4:680}
\cap_\alpha S_\alpha
\end{equation}
is (affine,convex, conic). To prove in homework:
\begin{equation}\label{eqn:convexOptimizationLecture4:700}
\mathcal{P} = \setlr{ \Bx | \Ba_i^\T \Bx \le \Bb_i, \Bc_j^\T \Bx = \Bd_j, \quad \forall i \cdots j }
\end{equation}
This is convex since the intersection of a bunch of hyperplane and half space constraints.
- If \( S \subseteq \mathbb{R}^n \) is convex then\begin{equation}\label{eqn:convexOptimizationLecture4:720}
F(S) = \setlr{ F(\Bx) | \Bx \in S }
\end{equation}is convex. - If \( S \subseteq \mathbb{R}^m \) then\begin{equation}\label{eqn:convexOptimizationLecture4:740}
F^{-1}(S) = \setlr{ \Bx | F(\Bx) \in S }
\end{equation}is convex. Such a mapping is sketched in fig. 14.
References
[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.