Month: January 2017

ECE1505H Convex Optimization. Lecture 5: Sets, epigraphs, quasi-convexity, and sublevel sets. Taught by Prof. Stark Draper

January 26, 2017 ece1505 , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Last time

  • examples of sets: planes, half spaces, balls, ellipses, cone of positive semi-definite matrices
  • generalized inequalities
  • examples of convexity preserving operations

Today

  • more examples of convexity preserving operations
  • separating and supporting hyperplanes
  • basic definitions of convex functions
  • epigraphs, quasi-convexity, sublevel sets
  • first and second order conditions for convexity of differentiable functions.

Operations that preserve convexity

If \( S_\alpha \) is convex \( \forall \alpha \in A \), then

\begin{equation}\label{eqn:convexOptimizationLecture5:40}
\cup_{\alpha \in A} S_\alpha,
\end{equation}

is convex.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture5:60}
F(\Bx) = A \Bx + \Bb
\end{equation}

\begin{equation}\label{eqn:convexOptimizationLecture5:80}
\begin{aligned}
\Bx &\in \mathbb{R}^n \\
A &\in \mathbb{R}^{m \times n} \\
F &: \mathbb{R}^{n} \rightarrow \mathbb{R}^m \\
\Bb &\in \mathbb{R}^m
\end{aligned}
\end{equation}

  1. If \( S \in \mathbb{R}^n \) is convex, then\begin{equation}\label{eqn:convexOptimizationLecture5:100}
    F(S) = \setlr{ F(\Bx) | \Bx \in S }
    \end{equation}is convex if \( F \) is affine.
  2. If \( S \in \mathbb{R}^m \) is convex, then\begin{equation}\label{eqn:convexOptimizationLecture5:120}
    F^{-1}(S) = \setlr{ \Bx | F(\Bx) \in S }
    \end{equation}

    is convex.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture5:140}
\setlr{ \By | \By = A \Bx + \Bb, \Norm{\Bx} \le 1}
\end{equation}

is convex. Here \( A \Bx + \Bb \) is an affine function (\(F(\Bx)\). This is the image of a (convex) unit ball, through an affine map.

Earlier saw when defining ellipses

\begin{equation}\label{eqn:convexOptimizationLecture5:160}
\By = P^{1/2} \Bx + \Bx_c
\end{equation}

Example :

\begin{equation}\label{eqn:convexOptimizationLecture5:180}
\setlr{ \Bx | \Norm{ A \Bx + \Bb } \le 1 },
\end{equation}

is convex. This can be seen by writing

\begin{equation}\label{eqn:convexOptimizationLecture5:200}
\begin{aligned}
\setlr{ \Bx | \Norm{ A \Bx + \Bb } \le 1 }
&=
\setlr{ \Bx | \Norm{ F(\Bx) } \le 1 } \\
&=
\setlr{ \Bx | F(\Bx) \in \mathcal{B} },
\end{aligned}
\end{equation}

where \( \mathcal{B} = \setlr{ \By | \Norm{\By} \le 1 } \). This is the pre-image (under \(F()\)) of a unit norm ball.

Example:

\begin{equation}\label{eqn:convexOptimizationLecture5:220}
\setlr{ \Bx \in \mathbb{R}^n | x_1 A_1 + x_2 A_2 + \cdots x_n A_n \le \mathcal{B} }
\end{equation}

where \( A_i \in S^m \) and \( \mathcal{B} \in S^m \), and the inequality is a matrix inequality. This is a convex set. The constraint is a “linear matrix inequality” (LMI).

This has to do with an affine map:

\begin{equation}\label{eqn:convexOptimizationLecture5:240}
F(\Bx) = B – 1 x_1 A_1 – x_2 A_2 – \cdots x_n A_n \ge 0
\end{equation}

(positive semi-definite inequality). This is a mapping

\begin{equation}\label{eqn:convexOptimizationLecture5:480}
F : \mathbb{R}^n \rightarrow S^m,
\end{equation}

since all \( A_i \) and \( B \) are in \( S^m \).

This \( F(\Bx) = B – A(\Bx) \) is a constant and a factor linear in x, so is affine. Can be written

\begin{equation}\label{eqn:convexOptimizationLecture5:260}
\setlr{ \Bx | B – A(\Bx) \ge 0 }
=
\setlr{ \Bx | B – A(\Bx) \in S^m_{+} }
\end{equation}

This is a pre-image of a cone of PSD matrices, which is convex. Therefore, this is a convex set.

Separating hyperplanes

Theorem: Separating hyperplanes

If \( S, T \subseteq \mathbb{R}^n \) are convex and disjoint
i.e. \( S \cup T = 0\), then
there exists on \( \Ba \in \mathbb{R}^n \) \( \Ba \ne 0 \) and a \( \Bb \in \mathbb{R}^n \) such that

\begin{equation*}
\Ba^\T \Bx \ge \Bb \, \forall \Bx \in S
\end{equation*}

and
\begin{equation*}
\Ba^\T \Bx < \Bb \,\forall \Bx \in T.
\end{equation*}

An example of a hyperplanes that separates two sets and two sets that are not separable is sketched in fig 1.1

Proof in the book.

Theorem: Supporting hyperplane
If \( S \) is convex then \( \forall x_0 \in \partial S = \textrm{cl}(S) \ \textrm{int}(S) \), where
\( \partial S \) is the boundary of \( S \), then \( \exists \) an \( \Ba \ne 0 \in \mathbb{R}^n \) such that \( \Ba^\T \Bx \le \Ba^\T x_0 \, \forall \Bx \in S \).

Here \( \ \) denotes “without”.

An example is sketched in fig. 3, for which

fig. 3. Supporting hyperplane.

  • The vector \( \Ba \) perpendicular to tangent plane.
  • inner product \( \Ba^\T (\Bx – \Bx_0) \le 0 \).

A set with a supporting hyperplane is sketched in fig 4a whereas fig 4b shows that there is not necessarily a unique supporting hyperplane at any given point, even if \( S \) is convex.

 

fig 4a. Set with supporting hyperplane.

 

fig 4b. No unique supporting hyperplane possible.

basic definitions of convex functions

Theorem: Convex functions
If \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is defined on a convex domain (i.e. \( \textrm{dom} F \subseteq \mathbb{R}^n \) is a convex set), then \( F \) is convex if \( \forall \Bx, \By \in \textrm{dom} F \), \( \forall \theta \in [0,1] \in \mathbb{R} \)

\begin{equation}\label{eqn:convexOptimizationLecture5:340}
F( \theta \Bx + (1-\theta) \By \le \theta F(\Bx) + (1-\theta) F(\By)
\end{equation}

An example is sketched in fig. 5.

 

fig. 5. Example of convex function.

Remarks

  • Require \( \textrm{dom} F \) to be a convex set. This is required so that the function at the point \( \theta u + (1-\theta) v \) can be evaluated. i.e. so that \( F(\theta u + (1-\theta) v) \) is well defined. Example: \( \textrm{dom} F = (-\infty, 0] \cup [1, \infty) \) is not okay, because a linear combination in \( (0,1) \) would be undesirable.
  • Parameter \( \theta \) is “how much up” the line segment connecting \( (u, F(u) \) and \( (v, F(v) \). This line segment never below the bottom of the bowl.
    The function is \underlineAndIndex{concave}, if \( -F \) is convex.
    i.e. If the convex function is flipped upside down. That is\begin{equation}\label{eqn:convexOptimizationLecture5:360}
    F(\theta \Bx + (1-\theta) \By ) \ge \theta F(\Bx) + (1-\theta) F(\By) \,\forall \Bx,\By \in \textrm{dom} F, \theta \in [0,1].
    \end{equation}
  • a “strictly” convex function means \(\forall \theta \in [0,1] \)\begin{equation}\label{eqn:convexOptimizationLecture5:380}
    F(\theta \Bx + (1-\theta) \By ) < \theta F(\Bx) + (1-theta) F(\By).
    \end{equation}
  • Strictly concave function \( F \) means \( -F \) is strictly convex.
  • Examples:\imageFigure{../figures/ece1505-convex-optimization/l5Fig6a}{}{fig:l5:l5Fig6a}{0.2}

    fig 6a. Not convex or concave.

     

    fig 6b. Not strictly convex

Definition: Epigraph of a function

The epigraph \( \textrm{epi} F \) of a function \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is

\begin{equation*}
\textrm{epi} F = \setlr{ (\Bx,t) \in \mathbb{R}^{n +1} | \Bx \in \textrm{dom} F, t \ge F(\Bx) },
\end{equation*}

where \( \Bx \in \mathbb{R}^n, t \in \mathbb{R} \).

 

fig. 7. Epigraph.

Theorem: Convexity and epigraph.
If \( F \) is convex implies \( \textrm{epi} F \) is a convex set.

Proof:

For convex function, a line segment connecting any 2 points on function is above the function. i.e. it is \( \textrm{epi} F \).

Many authors will go the other way around, showing \ref{dfn:convexOptimizationLecture5:400} from \ref{thm:convexOptimizationLecture5:420}. That is:

Pick any 2 points in \( \textrm{epi} F \), \( (\Bx,\mu) \in \textrm{epi} F\) and \( (\By, \nu) \in \textrm{epi} F \). Consider convex combination

\begin{equation}\label{eqn:convexOptimizationLecture5:420}
\theta( \Bx, \mu ) + (1-\theta) (\By, \nu) =
(\theta \Bx (1-\theta) \By, \theta \mu (1-\theta) \nu )
\in \textrm{epi} F,
\end{equation}

since \( \textrm{epi} F \) is a convex set.

By definition of \( \textrm{epi} F \)

\begin{equation}\label{eqn:convexOptimizationLecture5:440}
F( \theta \Bx (1-\theta) \By ) \le \theta \mu (1-\theta) \nu.
\end{equation}

Picking \( \mu = F(\Bx), \nu = F(\By) \) gives
\begin{equation}\label{eqn:convexOptimizationLecture5:460}
F( \theta \Bx (1-\theta) \By ) \le \theta F(\Bx) (1-\theta) F(\By).
\end{equation}

Extended value function

Sometimes convenient to work with “extended value function”

\begin{equation}\label{eqn:convexOptimizationLecture5:500}
\tilde{F}(\Bx) =
\left\{
\begin{array}{l l}
F(\Bx) & \quad \mbox{If \( \Bx \in \textrm{dom} F\)} \\
\infty & \quad \mbox{otherwise.}
\end{array}
\right.
\end{equation}

Examples:

  • Linear (affine) functions (fig. 8) are both convex and concave.

    fig. 8. Linear functions.

  • \( x^2 \) is convex, sketched in fig. 9.

    fig. 9. Convex (quadratic.)

  • \( \log x, \textrm{dom} F = \mathbb{R}_{+} \) concave, sketched in fig. 10.

    fig. 10. Concave (logarithm.)

  • \( \Norm{\Bx} \) is convex. \( \Norm{ \theta \Bx + (1-\theta) \By } \le \theta \Norm{ \Bx } + (1-\theta) \Norm{\By } \).
  • \( 1/x \) is convex on \( \setlr{ x | x > 0 } = \textrm{dom} F \), and concave on \( \setlr{ x | x < 0 } = \textrm{dom} F \). \begin{equation}\label{eqn:convexOptimizationLecture5:520} \tilde{F}(x) = \left\{ \begin{array}{l l} \inv{x} & \quad \mbox{If \( x > 0 \)} \\
    \infty & \quad \mbox{else.}
    \end{array}
    \right.
    \end{equation}

Definition: Sublevel

The sublevel set of a function \( F : \mathbb{R}^n \rightarrow \mathbb{R} \) is

\begin{equation*}
C(\alpha) = \setlr{ \Bx \in \textrm{dom} F | F(\Bx) \le \alpha }
\end{equation*}

 

Convex sublevel

 

Non-convex sublevel.

Theorem:
If \( F \) is convex then \( C(\alpha) \) is a convex set \( \forall \alpha \).

This is not an if and only if condition, as illustrated in fig. 12.

 

fig. 12. Convex sublevel does not imply convexity.

There \( C(\alpha) \) is convex, but the function itself is not.

Proof:

Since \( F \) is convex, then \( \textrm{epi} F \) is a convex set.

  • Let\begin{equation}\label{eqn:convexOptimizationLecture5:580}
    \mathcal{A} = \setlr{ (\Bx,t) | t = \alpha }
    \end{equation}is a convex set.
  • \( \mathcal{A} \cap \textrm{epi} F \)is a convex set since it is the intersection of convex sets.
  • Project \( \mathcal{A} \cap \textrm{epi} F \) onto \R{n} (i.e. domain of \( F \) ). The projection is an affine mapping. Image of a convex set through affine mapping is a convex set.

Definition: Quasi-convex.

A function is quasi-convex if \underline{all} of its sublevel sets are convex.

Composing convex functions

Properties of convex functions:

  • If \( F \) is convex, then \( \alpha F \) is convex \( \forall \alpha > 0 \).
  • If \( F_1, F_2 \) are convex, then the sum \( F_1 + F_2 \) is convex.
  • If \( F \) is convex, then \( g(\Bx) = F(A \Bx + \Bb) \) is convex \( \forall \Bx \in \setlr{ \Bx | A \Bx + \Bb \in \textrm{dom} F } \).

Note: for the last

\begin{equation}\label{eqn:convexOptimizationLecture5:620}
\begin{aligned}
g &: \mathbb{R}^m \rightarrow \mathbb{R} \\
F &: \mathbb{R}^n \rightarrow \mathbb{R} \\
\Bx &\in \mathbb{R}^m \\
A &\in \mathbb{R}^{n \times m} \\
\Bb &\in \mathbb{R}^n
\end{aligned}
\end{equation}

Proof (of last):

\begin{equation}\label{eqn:convexOptimizationLecture5:640}
\begin{aligned}
g( \theta \Bx + (1-\theta) \By )
&=
F( \theta (A \Bx + \Bb) + (1-\theta) (A \By + \Bb) ) \\
&\le
\theta F( A \Bx + \Bb) + (1-\theta) F (A \By + \Bb) \\
&= \theta g(\Bx) + (1-\theta) g(\By).
\end{aligned}
\end{equation}

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

ECE1505H Convex Optimization. Lecture 4: Sets and convexity. Taught by Prof. Stark Draper

January 25, 2017 ece1505 , , , , , , ,

ECE1505H Convex Optimization. Lecture 4: Sets and convexity. Taught by Prof. Stark Draper

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, covering [1] content.

Today

  • more on various sets: hyperplanes, half-spaces, polyhedra, balls, ellipses, norm balls, cone of PSD
  • generalize inequalities
  • operations that preserve convexity
  • separating and supporting hyperplanes.

Hyperplanes

Find some \( \Bx_0 \in \mathbb{R}^n \) such that \( \Ba^\T \Bx_0 = \Bb \), so

\begin{equation}\label{eqn:convexOptimizationLecture4:20}
\begin{aligned}
\setlr{ \Bx | \Ba^\T \Bx = \Bb }
&=
\setlr{ \Bx | \Ba^\T \Bx = \Ba^\T \Bx_0 } \\
&=
\setlr{ \Bx | \Ba^\T (\Bx – \Bx_0) } \\
&=
\Bx_0 + \Ba^\perp,
\end{aligned}
\end{equation}

where

\begin{equation}\label{eqn:convexOptimizationLecture4:40}
\Ba^\perp = \setlr{ \Bv | \Ba^\T \Bv = 0 }.
\end{equation}

fig. 1. Parallel hyperplanes.

 

Recall

\begin{equation}\label{eqn:convexOptimizationLecture4:60}
\Norm{\Bz}_\conj = \sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx} \le 1 }
\end{equation}

Denote the optimizer of above as \( \Bx^\conj \). By definition

\begin{equation}\label{eqn:convexOptimizationLecture4:80}
\Bz^\T \Bx^\conj \ge \Bz^\T \Bx \quad \forall \Bx, \Norm{\Bx} \le 1
\end{equation}

This defines a half space in which the unit ball

\begin{equation}\label{eqn:convexOptimizationLecture4:100}
\setlr{ \Bx | \Bz^\T (\Bx – \Bx^\conj \le 0 }
\end{equation}

Start with the \( l_1 \) norm, duals of \( l_1 \) is \( l_\infty \)

 

fig. 2. Half space containing unit ball.

Similar pic for \( l_\infty \), for which the dual is the \( l_1 \) norm, as sketched in fig. 3.  Here the optimizer point is at \( (1,1) \)

 

fig. 3. Half space containing the unit ball for l_infinity

and a similar pic for \( l_2 \), which is sketched in fig. 4.

 

fig. 4. Half space containing for l_2 unit ball.

Q: What was this optimizer point?

Polyhedra

\begin{equation}\label{eqn:convexOptimizationLecture4:120}
\begin{aligned}
\mathcal{P}
&= \setlr{ \Bx |
\Ba_j^\T \Bx \le \Bb_j, j \in [1,m],
\Bc_i^\T \Bx = \Bd_i, i \in [1,p]
} \\
&=
\setlr{ \Bx | A \Bx \le \Bb, C \Bx = d },
\end{aligned}
\end{equation}

where the final inequality and equality are component wise.

Proving \( \mathcal{P} \) is convex:

  • Pick \(\Bx_1 \in \mathcal{P}\), \(\Bx_2 \in \mathcal{P} \)
  • Pick any \(\theta \in [0,1]\)
  • Test \( \theta \Bx_1 + (1-\theta) \Bx_2 \). Is it in \(\mathcal{P}\)?

\begin{equation}\label{eqn:convexOptimizationLecture4:140}
\begin{aligned}
A \lr{ \theta \Bx_1 + (1-\theta) \Bx_2 }
&=
\theta A \Bx_1 + (1-\theta) A \Bx_2 \\
&\le
\theta \Bb + (1-\theta) \Bb \\
&=
\Bb.
\end{aligned}
\end{equation}

Balls

Euclidean ball for \( \Bx_c \in \mathbb{R}^n, r \in \mathbb{R} \)

\begin{equation}\label{eqn:convexOptimizationLecture4:160}
\mathcal{B}(\Bx_c, r)
= \setlr{ \Bx | \Norm{\Bx – \Bx_c}_2 \le r },
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture4:180}
\mathcal{B}(\Bx_c, r)
= \setlr{ \Bx | \lr{\Bx – \Bx_c}^\T \lr{\Bx – \Bx_c} \le r^2 }.
\end{equation}

Let \( \Bx_1, \Bx_2 \), \(\theta \in [0,1]\)

\begin{equation}\label{eqn:convexOptimizationLecture4:200}
\begin{aligned}
\Norm{ \theta \Bx_1 + (1-\theta) \Bx_2 – \Bx_c }_2
&=
\Norm{ \theta (\Bx_1 – \Bx_c) + (1-\theta) (\Bx_2 – \Bx_c) }_2 \\
&\le
\Norm{ \theta (\Bx_1 – \Bx_c)}_2 + \Norm{(1-\theta) (\Bx_2 – \Bx_c) }_2 \\
&=
\Abs{\theta} \Norm{ \Bx_1 – \Bx_c}_2 + \Abs{1 -\theta} \Norm{ \Bx_2 – \Bx_c }_2 \\
&=
\theta \Norm{ \Bx_1 – \Bx_c}_2 + \lr{1 -\theta} \Norm{ \Bx_2 – \Bx_c }_2 \\
&\le
\theta r + (1 – \theta) r \\
&= r
\end{aligned}
\end{equation}

Ellipse

\begin{equation}\label{eqn:convexOptimizationLecture4:220}
\mathcal{E}(\Bx_c, P)
=
\setlr{ \Bx | (\Bx – \Bx_c)^\T P^{-1} (\Bx – \Bx_c) \le 1 },
\end{equation}

where \( P \in S^n_{++} \).

  • Euclidean ball is an ellipse with \( P = I r^2 \)
  • Ellipse is image of Euclidean ball \( \mathcal{B}(0,1) \) under affine mapping.

 

fig. 5. Circle and ellipse.

Given

\begin{equation}\label{eqn:convexOptimizationLecture4:240}
F(\Bu) = P^{1/2} \Bu + \Bx_c
\end{equation}

\begin{equation}\label{eqn:convexOptimizationLecture4:260}
\begin{aligned}
\setlr{ F(\Bu) | \Norm{\Bu}_2 \le r }
&=
\setlr{ P^{1/2} \Bu + \Bx_c | \Bu^\T \Bu \le r^2 } \\
&=
\setlr{ \Bx | \Bx = P^{1/2} \Bu + \Bx_c, \Bu^\T \Bu \le r^2 } \\
&=
\setlr{ \Bx | \Bu = P^{-1/2} (\Bx – \Bx_c), \Bu^\T \Bu \le r^2 } \\
&=
\setlr{ \Bx | (\Bx – \Bx_c)^\T P^{-1} (\Bx – \Bx_c) \le r^2 }
\end{aligned}
\end{equation}

Geometry of an ellipse

Decomposition of positive definite matrix \( P \in S^n_{++} \subset S^n \) is:

\begin{equation}\label{eqn:convexOptimizationLecture4:280}
\begin{aligned}
P &= Q \textrm{diag}(\lambda_i) Q^\T \\
Q^\T Q &= 1
\end{aligned},
\end{equation}

where \( \lambda_i \in \mathbb{R}\), and \(\lambda_i > 0 \). The ellipse is defined by

\begin{equation}\label{eqn:convexOptimizationLecture4:300}
(\Bx – \Bx_c)^\T Q \textrm{diag}(1/\lambda_i) (\Bx – \Bx_c) Q \le r^2
\end{equation}

The term \( (\Bx – \Bx_c)^\T Q \) projects \( \Bx – \Bx_c \) onto the columns of \( Q \). Those columns are perpendicular since \( Q \) is an orthogonal matrix. Let

\begin{equation}\label{eqn:convexOptimizationLecture4:320}
\tilde{\Bx} = Q^\T (\Bx – \Bx_c),
\end{equation}

this shifts the origin around \( \Bx_c \) and \( Q \) rotates into a new coordinate system. The ellipse is therefore

\begin{equation}\label{eqn:convexOptimizationLecture4:340}
\tilde{\Bx}^\T
\begin{bmatrix}
\inv{\lambda_1} & & & \\
&\inv{\lambda_2} & & \\
& \ddots & \\
& & & \inv{\lambda_n}
\end{bmatrix}
\tilde{\Bx}
=
\sum_{i = 1}^n \frac{\tilde{x}_i^2}{\lambda_i} \le 1.
\end{equation}

An example is sketched for \( \lambda_1 > \lambda_2 \) below.

 

Ellipse with \( \lambda_1 > \lambda_2 \).

  • \( \lambda_i \) tells us length of the semi-major axis.
  • Larger \( \lambda_i \) means \( \tilde{x}_i^2 \) can be bigger and still satisfy constraint \( \le 1 \).
  • Volume of ellipse if proportional to \( \sqrt{ \det P } = \sqrt{ \prod_{i = 1}^n \lambda_i } \).
  • When any \( \lambda_i \rightarrow 0 \) a dimension is lost and the volume goes to zero. That removes the invertibility required.

Ellipses will be seen a lot in this course, since we are interested in “bowl” like geometries (and the ellipse is the image of a Euclidean ball).

Norm ball.

The norm ball

\begin{equation}\label{eqn:convexOptimizationLecture4:360}
\mathcal{B} = \setlr{ \Bx | \Norm{\Bx} \le 1 },
\end{equation}

is a convex set for all norms. Proof:

Take any \( \Bx, \By \in \mathcal{B} \)

\begin{equation}\label{eqn:convexOptimizationLecture4:380}
\Norm{ \theta \Bx + (1 – \theta) \By }
\le
\Abs{\theta} \Norm{ \Bx } + \Abs{1 – \theta} \Norm{ \By }
=
\theta \Norm{ \Bx } + \lr{1 – \theta} \Norm{ \By }
\lr
\theta + \lr{1 – \theta}
=
1.
\end{equation}

This is true for any p-norm \( 1 \le p \), \( \Norm{\Bx}_p = \lr{ \sum_{i = 1}^n \Abs{x_i}^p }^{1/p} \).

 

Norm ball.

The shape of a \( p < 1 \) norm unit ball is sketched below (lines connecting points in such a region can exit the region).

 

Cones

Recall that \( C \) is a cone if \( \forall \Bx \in C, \theta \ge 0, \theta \Bx \in C \).

Impt cone of PSD matrices

\begin{equation}\label{eqn:convexOptimizationLecture4:400}
\begin{aligned}
S^n &= \setlr{ X \in \mathbb{R}^{n \times n} | X = X^\T } \\
S^n_{+} &= \setlr{ X \in S^n | \Bv^\T X \Bv \ge 0, \quad \forall v \in \mathbb{R}^n } \\
S^n_{++} &= \setlr{ X \in S^n_{+} | \Bv^\T X \Bv > 0, \quad \forall v \in \mathbb{R}^n } \\
\end{aligned}
\end{equation}

These have respectively

  • \( \lambda_i \in \mathbb{R} \)
  • \( \lambda_i \in \mathbb{R}_{+} \)
  • \( \lambda_i \in \mathbb{R}_{++} \)

\( S^n_{+} \) is a cone if:

\( X \in S^n_{+}\), then \( \theta X \in S^n_{+}, \quad \forall \theta \ge 0 \)

\begin{equation}\label{eqn:convexOptimizationLecture4:420}
\Bv^\T (\theta X) \Bv
= \theta \Bv^\T \Bv
\ge 0,
\end{equation}

since \( \theta \ge 0 \) and because \( X \in S^n_{+} \).

Shorthand:

\begin{equation}\label{eqn:convexOptimizationLecture4:440}
\begin{aligned}
X &\in S^n_{+} \Rightarrow X \succeq 0
X &\in S^n_{++} \Rightarrow X \succ 0.
\end{aligned}
\end{equation}

Further \( S^n_{+} \) is a convex cone.

Let \( A \in S^n_{+} \), \( B \in S^n_{+} \), \( \theta_1, \theta_2 \ge 0, \theta_1 + \theta_2 = 1 \), or \( \theta_2 = 1 – \theta_1 \).

Show that \( \theta_1 A + \theta_2 B \in S^n_{+} \) :

\begin{equation}\label{eqn:convexOptimizationLecture4:460}
\Bv^\T \lr{ \theta_1 A + \theta_2 B } \Bv
=
\theta_1 \Bv^\T A \Bv
+\theta_2 \Bv^\T B \Bv
\ge 0,
\end{equation}

since \( \theta_1 \ge 0, \theta_2 \ge 0, \Bv^\T A \Bv \ge 0, \Bv^\T B \Bv \ge 0 \).

 

fig. 8. Cone.

Inequalities:

Start with a proper cone \( K \subseteq \mathbb{R}^n \)

  • closed, convex
  • non-empty interior (“solid”)
  • “pointed” (contains no lines)

The \( K \) defines a generalized inequality in \R{n} defined as “\(\le_K\)”

Interpreting

\begin{equation}\label{eqn:convexOptimizationLecture4:480}
\begin{aligned}
\Bx \le_K \By &\leftrightarrow \By – \Bx \in K
\Bx \end{aligned}
\end{equation}

Why pointed? Want if \( \Bx \le_K \By \) and \( \By \le_K \Bx \) with this \( K \) is a half space.

Example:1: \( K = \mathbb{R}^n_{+}, \Bx \in \mathbb{R}^n, \By \in \mathbb{R}^n \)

 

fig. 12. K is non-negative “orthant”

\begin{equation}\label{eqn:convexOptimizationLecture4:500}
\Bx \le_K \By \Rightarrow \By – \Bx \in K
\end{equation}

say:

\begin{equation}\label{eqn:convexOptimizationLecture4:520}
\begin{bmatrix}
y_1 – x_1
y_2 – x_2
\end{bmatrix}
\in R^2_{+}
\end{equation}

Also:

\begin{equation}\label{eqn:convexOptimizationLecture4:540}
K = R^1_{+}
\end{equation}

(pointed, since it contains no rays)

\begin{equation}\label{eqn:convexOptimizationLecture4:560}
\Bx \le_K \By ,
\end{equation}

with respect to \( K = \mathbb{R}^n_{+} \) means that \( x_i \le y_i \) for all \( i \in [1,n]\).

Example:2: For \( K = PSD \subseteq S^n \),

\begin{equation}\label{eqn:convexOptimizationLecture4:580}
\Bx \le_K \By ,
\end{equation}

means that

\begin{equation}\label{eqn:convexOptimizationLecture4:600}
\By – \Bx \in K = S^n_{+}.
\end{equation}

  • Difference \( \By – \Bx \) is always in \( S \)
  • check if in \( K \) by checking if all eigenvalues \( \ge 0 \).
  • \( S^n_{++} \) is the interior of \( S^n_{+} \).

Interpretation:

\begin{equation}\label{eqn:convexOptimizationLecture4:620}
\begin{aligned}
\Bx \le_K \By &\leftrightarrow \By – \Bx \in K \\
\Bx \end{aligned}
\end{equation}

We’ll use these with vectors and matrices so often the \( K \) subscript will often be dropped, writing instead (for vectors)

\begin{equation}\label{eqn:convexOptimizationLecture4:640}
\begin{aligned}
\Bx \le \By &\leftrightarrow \By – \Bx \in \mathbb{R}^n_{+} \\
\Bx < \By &\leftrightarrow \By – \Bx \in \textrm{int} \mathbb{R}^n_{++}
\end{aligned}
\end{equation}

and for matrices

\begin{equation}\label{eqn:convexOptimizationLecture4:660}
\begin{aligned}
\Bx \le \By &\leftrightarrow \By – \Bx \in S^n_{+} \\
\Bx < \By &\leftrightarrow \By – \Bx \in \textrm{int} S^n_{++}.
\end{aligned}
\end{equation}

Intersection

Take the intersection of (perhaps infinitely many) sets \( S_\alpha \):

If \( S_\alpha \) is (affine,convex, conic) for all \( \alpha \in A \) then

\begin{equation}\label{eqn:convexOptimizationLecture4:680}
\cap_\alpha S_\alpha
\end{equation}

is (affine,convex, conic). To prove in homework:

\begin{equation}\label{eqn:convexOptimizationLecture4:700}
\mathcal{P} = \setlr{ \Bx | \Ba_i^\T \Bx \le \Bb_i, \Bc_j^\T \Bx = \Bd_j, \quad \forall i \cdots j }
\end{equation}

This is convex since the intersection of a bunch of hyperplane and half space constraints.

  1. If \( S \subseteq \mathbb{R}^n \) is convex then\begin{equation}\label{eqn:convexOptimizationLecture4:720}
    F(S) = \setlr{ F(\Bx) | \Bx \in S }
    \end{equation}is convex.
  2. If \( S \subseteq \mathbb{R}^m \) then\begin{equation}\label{eqn:convexOptimizationLecture4:740}
    F^{-1}(S) = \setlr{ \Bx | F(\Bx) \in S }
    \end{equation}is convex. Such a mapping is sketched in fig. 14.

 

fig. 14. Mapping functions of sets.

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

superkill!

January 20, 2017 Mainframe , , , ,

I was amused to find the following in the z/OS C/C++ runtime library reference

#include <signal.h>

int __superkill(pid_t pid);

where the documentation includes:

“The __superkill() function generates a more robust version of the SIGKILL signal to
the process with pid as the process ID. The SIGKILL will be able to break through
almost all of the current signal deterrents that can be an obstacle to the normal
delivery of a SIGKILL and the resulting termination of the target process.”

The obvious question, not mentioned in the documentation, is whether or not this API can kill zombies?

ECE1505H Convex Optimization. Lecture 3: Matrix functions, SVD, and types of Sets. Taught by Prof. Stark Draper

January 19, 2017 ece1505 , , , , , , , , , , , , , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper.

Matrix inner product

Given real matrices \( X, Y \in \mathbb{R}^{m\times n} \), one possible matrix inner product definition is

\begin{equation}\label{eqn:convexOptimizationLecture3:20}
\begin{aligned}
\innerprod{X}{Y}
&= \textrm{Tr}( X^\T Y) \\
&= \textrm{Tr} \lr{ \sum_{k = 1}^m X_{ki} Y_{kj} } \\
&= \sum_{k = 1}^m \sum_{j = 1}^n X_{kj} Y_{kj} \\
&= \sum_{i = 1}^m \sum_{j = 1}^n X_{ij} Y_{ij}.
\end{aligned}
\end{equation}

This inner product induces a norm on the (matrix) vector space, called the Frobenius norm

\begin{equation}\label{eqn:convexOptimizationLecture3:40}
\begin{aligned}
\Norm{X }_F
&= \textrm{Tr}( X^\T X) \\
&= \sqrt{ \innerprod{X}{X} } \\
&=
\sum_{i = 1}^m \sum_{j = 1}^n X_{ij}^2.
\end{aligned}
\end{equation}

Range, nullspace.

Definition: Range: Given \( A \in \mathbb{R}^{m \times n} \), the range of A is the set:

\begin{equation*}
\mathcal{R}(A) = \setlr{ A \Bx | \Bx \in \mathbb{R}^n }.
\end{equation*}

Definition: Nullspace: Given \( A \in \mathbb{R}^{m \times n} \), the nullspace of A is the set:

\begin{equation*}
\mathcal{N}(A) = \setlr{ \Bx | A \Bx = 0 }.
\end{equation*}

SVD.

To understand operation of \( A \in \mathbb{R}^{m \times n} \), a representation of a linear transformation from \R{n} to \R{m}, decompose \( A \) using the singular value decomposition (SVD).

Definition: SVD: Given \( A \in \mathbb{R}^{m \times n} \), an operator on \( \Bx \in \mathbb{R}^n \), a decomposition of the following form is always possible

\begin{equation*}
\begin{aligned}
A &= U \Sigma V^\T \\
U &\in \mathbb{R}^{m \times r} \\
V &\in \mathbb{R}^{n \times r},
\end{aligned}
\end{equation*}

where \( r \) is the rank of \(A\), and both \( U \) and \( V \) are orthogonal

\begin{equation*}
\begin{aligned}
U^\T U &= I \in \mathbb{R}^{r \times r} \\
V^\T V &= I \in \mathbb{R}^{r \times r}.
\end{aligned}
\end{equation*}

Here \( \Sigma = \textrm{diag}( \sigma_1, \sigma_2, \cdots, \sigma_r ) \), is a diagonal matrix of “singular” values, where

\begin{equation*}
\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r.
\end{equation*}

For simplicity consider square case \( m = n \)

\begin{equation}\label{eqn:convexOptimizationLecture3:100}
A \Bx = \lr{ U \Sigma V^\T } \Bx.
\end{equation}

The first product \( V^\T \Bx \) is a rotation, which can be checked by looking at the length

\begin{equation}\label{eqn:convexOptimizationLecture3:120}
\begin{aligned}
\Norm{ V^\T \Bx}_2
&= \sqrt{ \Bx^\T V V^\T \Bx } \\
&= \sqrt{ \Bx^\T \Bx } \\
&= \Norm{ \Bx }_2,
\end{aligned}
\end{equation}

which shows that the length of the vector is unchanged after application of the linear transformation represented by \( V^\T \) so that operation must be a rotation.

Similarly the operation of \( U \) on \( \Sigma V^\T \Bx \) also must be a rotation. The operation \( \Sigma = [\sigma_i]_i \) applies a scaling operation to each component of the vector \( V^\T \Bx \).

All linear (square) transformations can therefore be thought of as a rotate-scale-rotate operation. Often the \( A \) of interest will be symmetric \( A = A^\T \).

Set of symmetric matrices

Let \( S^n \) be the set of real, symmetric \( n \times n \) matrices.

Theorem: Spectral theorem: When \( A \in S^n \) then it is possible to factor \( A \) as

\begin{equation*}
A = Q \Lambda Q^\T,
\end{equation*}

where \( Q \) is an orthogonal matrix, and \( \Lambda = \textrm{diag}( \lambda_1, \lambda_2, \cdots \lambda_n)\). Here \( \lambda_i \in \mathbb{R} \, \forall i \) are the (real) eigenvalues of \( A \).

A real symmetric matrix \( A \in S^n\) is “positive semi-definite” if

\begin{equation*}
\Bv^\T A \Bv \ge 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0,
\end{equation*}
and is “positive definite” if

\begin{equation*}
\Bv^\T A \Bv > 0 \qquad\forall \Bv \in \mathbb{R}^n, \Bv \ne 0.
\end{equation*}

The set of such matrices is denoted \( S^n_{+} \), and \( S^n_{++} \) respectively.

Consider \( A \in S^n_{+} \) (or \( S^n_{++} \) )

\begin{equation}\label{eqn:convexOptimizationLecture3:200}
A = Q \Lambda Q^\T,
\end{equation}

possible since the matrix is symmetric. For such a matrix

\begin{equation}\label{eqn:convexOptimizationLecture3:220}
\begin{aligned}
\Bv^\T A \Bv
&=
\Bv^\T Q \Lambda A^\T \Bv \\
&=
\Bw^\T \Lambda \Bw,
\end{aligned}
\end{equation}

where \( \Bw = A^\T \Bv \). Such a product is

\begin{equation}\label{eqn:convexOptimizationLecture3:240}
\Bv^\T A \Bv
=
\sum_{i = 1}^n \lambda_i w_i^2.
\end{equation}

So, if \( \lambda_i \ge 0 \) (\(\lambda_i > 0 \) ) then \( \sum_{i = 1}^n \lambda_i w_i^2 \) is non-negative (positive) \( \forall \Bw \in \mathbb{R}^n, \Bw \ne 0 \). Since \( \Bw \) is just a rotated version of \( \Bv \) this also holds for all \( \Bv \). A necessary and sufficient condition for \( A \in S^n_{+} \) (\( S^n_{++} \) ) is \( \lambda_i \ge 0 \) (\(\lambda_i > 0\)).

Square root of positive semi-definite matrix

Real symmetric matrix power relationships such as

\begin{equation}\label{eqn:convexOptimizationLecture3:260}
A^2
=
Q \Lambda Q^\T
Q \Lambda Q^\T
=
Q \Lambda^2
Q^\T
,
\end{equation}

or more generally \( A^k = Q \Lambda^k Q^\T,\, k \in \mathbb{Z} \), can be further generalized to non-integral powers. In particular, the square root (non-unique) of a square matrix can be written

\begin{equation}\label{eqn:convexOptimizationLecture3:280}
A^{1/2} = Q
\begin{bmatrix}
\sqrt{\lambda_1} & & & \\
& \sqrt{\lambda_2} & & \\
& & \ddots & \\
& & & \sqrt{\lambda_n} \\
\end{bmatrix}
Q^\T,
\end{equation}

since \( A^{1/2} A^{1/2} = A \), regardless of the sign picked for the square roots in question.

Functions of matrices

Consider \( F : S^n \rightarrow \mathbb{R} \), and define

\begin{equation}\label{eqn:convexOptimizationLecture3:300}
F(X) = \log \det X,
\end{equation}

Here \( \textrm{dom} F = S^n_{++} \). The task is to find \( \spacegrad F \), which can be done by looking at the perturbation \( \log \det ( X + \Delta X ) \)

\begin{equation}\label{eqn:convexOptimizationLecture3:320}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det ( X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2}) X^{1/2} ) \\
&=
\log \det ( X (I + X^{-1/2} \Delta X X^{-1/2}) ) \\
&=
\log \det X + \log \det (I + X^{-1/2} \Delta X X^{-1/2}).
\end{aligned}
\end{equation}

Let \( X^{-1/2} \Delta X X^{-1/2} = M \) where \( \lambda_i \) are the eigenvalues of \( M : M \Bv = \lambda_i \Bv \) when \( \Bv \) is an eigenvector of \( M \). In particular

\begin{equation}\label{eqn:convexOptimizationLecture3:340}
(I + M) \Bv =
(1 + \lambda_i) \Bv,
\end{equation}

where \( 1 + \lambda_i \) are the eigenvalues of the \( I + M \) matrix. Since the determinant is the product of the eigenvalues, this gives

\begin{equation}\label{eqn:convexOptimizationLecture3:360}
\begin{aligned}
\log \det ( X + \Delta X )
&=
\log \det X +
\log \prod_{i = 1}^n (1 + \lambda_i) \\
&=
\log \det X +
\sum_{i = 1}^n \log (1 + \lambda_i).
\end{aligned}
\end{equation}

If \( \lambda_i \) are sufficiently “small”, then \( \log ( 1 + \lambda_i ) \approx \lambda_i \), giving

\begin{equation}\label{eqn:convexOptimizationLecture3:380}
\log \det ( X + \Delta X )
=
\log \det X +
\sum_{i = 1}^n \lambda_i
\approx
\log \det X +
\textrm{Tr}( X^{-1/2} \Delta X X^{-1/2} ).
\end{equation}

Since
\begin{equation}\label{eqn:convexOptimizationLecture3:400}
\textrm{Tr}( A B ) = \textrm{Tr}( B A ),
\end{equation}

this trace operation can be written as

\begin{equation}\label{eqn:convexOptimizationLecture3:420}
\log \det ( X + \Delta X )
\approx
\log \det X +
\textrm{Tr}( X^{-1} \Delta X )
=
\log \det X +
\innerprod{ X^{-1}}{\Delta X},
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:440}
\spacegrad F(X) = X^{-1}.
\end{equation}

To check this, consider the simplest example with \( X \in \mathbb{R}^{1 \times 1} \), where we have

\begin{equation}\label{eqn:convexOptimizationLecture3:460}
\frac{d}{dX} \lr{ \log \det X } = \frac{d}{dX} \lr{ \log X } = \inv{X} = X^{-1}.
\end{equation}

This is a nice example demonstrating how the gradient can be obtained by performing a first order perturbation of the function. The gradient can then be read off from the result.

Second order perturbations

  • To get first order approximation found the part that varied linearly in \( \Delta X \).
  • To get the second order part, perturb \( X^{-1} \) by \( \Delta X \) and see how that perturbation varies in \( \Delta X \).

For \( G(X) = X^{-1} \), this is

\begin{equation}\label{eqn:convexOptimizationLecture3:480}
\begin{aligned}
(X + \Delta X)^{-1}
&=
\lr{ X^{1/2} (I + X^{-1/2} \Delta X X^{-1/2} ) X^{1/2} }^{-1} \\
&=
X^{-1/2} (I + X^{-1/2} \Delta X X^{-1/2} )^{-1} X^{-1/2}
\end{aligned}
\end{equation}

To be proven in the homework (for “small” A)

\begin{equation}\label{eqn:convexOptimizationLecture3:500}
(I + A)^{-1} \approx I – A.
\end{equation}

This gives

\begin{equation}\label{eqn:convexOptimizationLecture3:520}
\begin{aligned}
(X + \Delta X)^{-1}
&=
X^{-1/2} (I – X^{-1/2} \Delta X X^{-1/2} ) X^{-1/2} \\
&=
X^{-1} – X^{-1} \Delta X X^{-1},
\end{aligned}
\end{equation}

or

\begin{equation}\label{eqn:convexOptimizationLecture3:800}
\begin{aligned}
G(X + \Delta X)
&= G(X) + (D G) \Delta X \\
&= G(X) + (\spacegrad G)^\T \Delta X,
\end{aligned}
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:820}
(\spacegrad G)^\T \Delta X
=
– X^{-1} \Delta X X^{-1}.
\end{equation}

The Taylor expansion of \( F \) to second order is

\begin{equation}\label{eqn:convexOptimizationLecture3:840}
F(X + \Delta X)
=
F(X)
+
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
+
\inv{2}
\lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}.
\end{equation}

The first trace can be expressed as an inner product

\begin{equation}\label{eqn:convexOptimizationLecture3:860}
\begin{aligned}
\textrm{Tr} \lr{ (\spacegrad F)^\T \Delta X}
&=
\innerprod{ \spacegrad F }{\Delta X} \\
&=
\innerprod{ X^{-1} }{\Delta X}.
\end{aligned}
\end{equation}

The second trace also has the structure of an inner product

\begin{equation}\label{eqn:convexOptimizationLecture3:880}
\begin{aligned}
(\Delta X)^\T (\spacegrad^2 F) \Delta X
&=
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X} \\
&=
\innerprod{ (\spacegrad^2 F)^\T \Delta X }{\Delta X},
\end{aligned}
\end{equation}

where a no-op trace could be inserted in the second order term since that quadratic form is already a scalar. This \( (\spacegrad^2 F)^\T \Delta X \) term has essentially been found implicitly by performing the linear variation of \( \spacegrad F \) in \( \Delta X \), showing that we must have

\begin{equation}\label{eqn:convexOptimizationLecture3:900}
\textrm{Tr} \lr{ (\Delta X)^\T (\spacegrad^2 F) \Delta X}
=
\innerprod{ – X^{-1} \Delta X X^{-1} }{\Delta X},
\end{equation}

so
\begin{equation}\label{eqn:convexOptimizationLecture3:560}
F( X + \Delta X) = F(X) +
\innerprod{X^{-1}}{\Delta X}
+\inv{2} \innerprod{-X^{-1} \Delta X X^{-1}}{\Delta X},
\end{equation}

or
\begin{equation}\label{eqn:convexOptimizationLecture3:580}
\log \det ( X + \Delta X) = \log \det X +
\textrm{Tr}( X^{-1} \Delta X )
– \inv{2} \textrm{Tr}( X^{-1} \Delta X X^{-1} \Delta X ).
\end{equation}

Convex Sets

  • Types of sets: Affine, convex, cones
  • Examples: Hyperplanes, polyhedra, balls, ellipses, norm balls, cone of PSD matrices.

Definition: Affine set:

A set \( C \subseteq \mathbb{R}^n \) is affine if \( \forall \Bx_1, \Bx_2 \in C \) then

\begin{equation*}
\theta \Bx_1 + (1 -\theta) \Bx_2 \in C, \qquad \forall \theta \in \mathbb{R}.
\end{equation*}

The affine sum above can
be rewritten as

\begin{equation}\label{eqn:convexOptimizationLecture3:600}
\Bx_2 + \theta (\Bx_1 – \Bx_2).
\end{equation}

Since \( \theta \) is a scaling, this is the line containing \( \Bx_2 \) in the direction between \( \Bx_1 \) and \( \Bx_2 \).

Observe that the solution to a set of linear equations

\begin{equation}\label{eqn:convexOptimizationLecture3:620}
C = \setlr{ \Bx | A \Bx = \Bb },
\end{equation}

is an affine set. To check, note that

\begin{equation}\label{eqn:convexOptimizationLecture3:640}
\begin{aligned}
A (\theta \Bx_1 + (1 – \theta) \Bx_2)
&=
\theta A \Bx_1 + (1 – \theta) A \Bx_2 \\
&=
\theta \Bb + (1 – \theta) \Bb \\
&= \Bb.
\end{aligned}
\end{equation}

Definition: Affine combination: An affine combination of points \( \Bx_1, \Bx_2, \cdots \Bx_n \) is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that for \( \theta_i \in \mathbb{R} \)

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1.
\end{equation*}

An affine set contains all affine combinations of points in the set. Examples of a couple affine sets are sketched in fig 1.1

For comparison, a couple of non-affine sets are sketched in fig 1.2

 

Definition: Convex set: A set \( C \subseteq \mathbb{R}^n \) is convex if \( \forall \Bx_1, \Bx_2 \in C \) and \( \forall \theta \in \mathbb{R}, \theta \in [0,1] \), the combination

\begin{equation}\label{eqn:convexOptimizationLecture3:700}
\theta \Bx_1 + (1 – \theta) \Bx_2 \in C.
\end{equation}

Definition: Convex combination: A convex combination of \( \Bx_1, \Bx_2, \cdots \Bx_n \) is

\begin{equation*}
\sum_{i = 1}^n \theta_i \Bx_i,
\end{equation*}

such that \( \forall \theta_i \ge 0 \)

\begin{equation*}
\sum_{i = 1}^n \theta_i = 1
\end{equation*}

Definition: Convex hull: Convex hull of a set \( C \) is a set of all convex combinations of points in \(C\), denoted

\begin{equation}\label{eqn:convexOptimizationLecture3:720}
\textrm{conv}(C) = \setlr{ \sum_{i=1}^n \theta_i \Bx_i | \Bx_i \in C, \theta_i \ge 0, \sum_{i=1}^n \theta_i = 1 }.
\end{equation}

A non-convex set can be converted into a convex hull by filling in all the combinations of points connecting points in the set, as sketched in fig 1.3.

Definition: Cone: A set \(C\) is a cone if \( \forall \Bx \in C \) and \( \forall \theta \ge 0 \) we have \( \theta \Bx \in C\).

This scales out if \(\theta > 1\) and scales in if \(\theta < 1\).

A convex cone is a cone that is also a convex set. A conic combination is

\begin{equation*}
\sum_{i=1}^n \theta_i \Bx_i, \theta_i \ge 0.
\end{equation*}

A convex and non-convex 2D cone is sketched in fig. 1.4

A comparison of properties for different set types is tabulated in table 1.1

Hyperplanes and half spaces

Definition: Hyperplane: A hyperplane is defined by

\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb, \Ba \ne 0 }.
\end{equation*}

A line and plane are examples of this general construct as sketched in
fig. 1.5

An alternate view is possible should one
find any specific \( \Bx_0 \) such that \( \Ba^\T \Bx_0 = \Bb \)

\begin{equation}\label{eqn:convexOptimizationLecture3:740}
\setlr{\Bx | \Ba^\T \Bx = b }
=
\setlr{\Bx | \Ba^\T (\Bx -\Bx_0) = 0 }
\end{equation}

This shows that \( \Bx – \Bx_0 = \Ba^\perp \) is perpendicular to \( \Ba \), or

\begin{equation}\label{eqn:convexOptimizationLecture3:780}
\Bx
=
\Bx_0 + \Ba^\perp.
\end{equation}

This is the subspace perpendicular to \( \Ba \) shifted by \(\Bx_0\), subject to \( \Ba^\T \Bx_0 = \Bb \). As a set

\begin{equation}\label{eqn:convexOptimizationLecture3:760}
\Ba^\perp = \setlr{ \Bv | \Ba^\T \Bv = 0 }.
\end{equation}

Half space

Definition: Half space: The half space is defined as
\begin{equation*}
\setlr{ \Bx | \Ba^\T \Bx = \Bb }
= \setlr{ \Bx | \Ba^\T (\Bx – \Bx_0) \le 0 }.
\end{equation*}

This can also be expressed as \( \setlr{ \Bx | \innerprod{ \Ba }{\Bx – \Bx_0 } \le 0 } \).

ECE1505H Convex Optimization. Lecture 2: Mathematical background. Taught by Prof. Stark Draper

January 17, 2017 ece1505 , , , , , , , ,

[Click here for a PDF of this post with nicer formatting]

Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

Topics

  • Calculus: Derivatives and Jacobians, Gradients, Hessians, approximation functions.
  • Linear algebra, Matrices, decompositions, …

Norms

Vector space:

A set of elements (vectors) that is closed under vector addition and scaling.

This generalizes the directed arrow concept of vector space (fig. 1) that is familiar from geometry.

fig. 1. Vector addition.

Normed vector spaces:

A vector space with a notion of length of any single vector, the “norm”.

Inner product space:
A normed vector space with a notion of a real angle between any pair of vectors.

This course has a focus on optimization in \R{n}. Complex spaces in the context of this course can be considered with a mapping \( \text{\C{n}} \rightarrow \mathbb{R}^{2 n} \).

Norm:
A norm is a function operating on a vector

\begin{equation*}
\Bx = ( x_1, x_2, \cdots, x_n )
\end{equation*}

that provides a mapping

\begin{equation*}
\Norm{ \cdot } : \mathbb{R}^{n} \rightarrow \mathbb{R},
\end{equation*}

where

  • \( \Norm{ \Bx } \ge 0 \)
  • \( \Norm{ \Bx } = 0 \qquad \iff \Bx = 0 \)
  • \( \Norm{ t \Bx } = \Abs{t} \Norm{ \Bx } \)
  • \( \Norm{ \Bx + \By } \le \Norm{ \Bx } + \Norm{\By} \). This is the triangle inequality.

Example: Euclidean norm

\begin{equation}\label{eqn:convex-optimizationLecture2:24}
\Norm{\Bx} = \sqrt{ \sum_{i = 1}^n x_i^2 }
\end{equation}

Example: \(l_p\)-norms

\begin{equation}\label{eqn:convex-optimizationLecture2:44}
\Norm{\Bx}_p = \lr{ \sum_{i = 1}^n \Abs{x_i}^p }^{1/p}.
\end{equation}

For \( p = 1 \), this is

\begin{equation}\label{eqn:convex-optimizationLecture2:64}
\Norm{\Bx}_1 = \sum_{i = 1}^n \Abs{x_i},
\end{equation}

For \( p = 2 \), this is the Euclidean norm \ref{eqn:convex-optimizationLecture2:24}.
For \( p = \infty \), this is

\begin{equation}\label{eqn:convex-optimizationLecture2:324}
\Norm{\Bx}_\infty = \max_{i = 1}^n \Abs{x_i}.
\end{equation}

Unit ball:

\begin{equation*}
\setlr{ \Bx | \Norm{\Bx} \le 1 }
\end{equation*}

The regions of the unit ball under the \( l_1, l_2, and l_\infty \) norms are plotted in fig. 2.

fig. 2. Some unit ball regions.

The \( l_2 \) norm is not only familiar, but can be “induced” by an inner product

\begin{equation}\label{eqn:convex-optimizationLecture2:84}
\left\langle \Bx, \By \right\rangle = \Bx^\T \By = \sum_{i = 1}^n x_i y_i,
\end{equation}

which is not true for all norms. The norm induced by this inner product is

\begin{equation}\label{eqn:convex-optimizationLecture2:104}
\Norm{\Bx}_2 = \sqrt{ \left\langle \Bx, \By \right\rangle }
\end{equation}

Inner product spaces have a notion of angle (fig. 3) given by

\begin{equation}\label{eqn:convex-optimizationLecture2:124}
\left\langle \Bx, \By \right\rangle = \Norm{\Bx} \Norm{\By} \cos \theta,
\end{equation}

 

fig. 3. Inner product induced angle.

and always satisfy the Cauchy-Schwartz inequality

\begin{equation}\label{eqn:convex-optimizationLecture2:144}
\left\langle \Bx, \By \right\rangle \le \Norm{\Bx}_2 \Norm{\By}_2.
\end{equation}

In an inner product space we say \( \Bx \) and \( \By \) are orthogonal vectors \( \Bx \perp \By \) if \( \left\langle \Bx, \By \right\rangle = 0 \), as sketched in fig. 4.

 

fig. 4. Orthogonality.

Dual norm

Let \( \Norm{ \cdot } \) be a norm in \R{n}. The “dual” norm \( \Norm{ \cdot }_\conj \) is defined as

\begin{equation*}
\Norm{\Bz}_\conj = \sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx} \le 1 }.
\end{equation*}

where \( \sup \) is roughly the “least upper bound”.
\index{sup}

This is a limit over the unit ball of \( \Norm{\cdot} \).

\( l_2 \) dual

Dual of the \( l_2 \) is the \( l_2 \) norm.

fig. 5. l_2 dual norm determination.

 

Proof:

\begin{equation}\label{eqn:convex-optimizationLecture2:164}
\begin{aligned}
\Norm{\Bz}_\conj
&= \sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx}_2 \le 1 } \\
&= \sup_\Bx \setlr{ \Norm{\Bz}_2 \Norm{\Bx}_2 \cos\theta | \Norm{\Bx}_2 \le 1 } \\
&\le \sup_\Bx \setlr{ \Norm{\Bz}_2 \Norm{\Bx}_2 | \Norm{\Bx}_2 \le 1 } \\
&\le
\Norm{\Bz}_2
\Norm{
\frac{\Bz}{ \Norm{\Bz}_2 }
}_2 \\
&=
\Norm{\Bz}_2.
\end{aligned}
\end{equation}

\( l_1 \) dual

For \( l_1 \), the dual is the \( l_\infty \) norm. Proof:

\begin{equation}\label{eqn:convex-optimizationLecture2:184}
\Norm{\Bz}_\conj
=
\sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx}_1 \le 1 },
\end{equation}

but
\begin{equation}\label{eqn:convex-optimizationLecture2:204}
\Bz^\T \Bx
=
\sum_{i=1}^n z_i x_i \le
\Abs{
\sum_{i=1}^n z_i x_i
}
\le
\sum_{i=1}^n \Abs{z_i x_i },
\end{equation}

so
\begin{equation}\label{eqn:convex-optimizationLecture2:224}
\begin{aligned}
\Norm{\Bz}_\conj
&=
\sum_{i=1}^n \Abs{z_i}\Abs{ x_i } \\
&\le \lr{ \max_{j=1}^n \Abs{z_j} }
\sum_{i=1}^n \Abs{ x_i } \\
&\le \lr{ \max_{j=1}^n \Abs{z_j} } \\
&=
\Norm{\Bz}_\infty.
\end{aligned}
\end{equation}

 

\( l_\infty \) dual

.

fig. 6. l_1 dual norm determination.

fig. 7. l_\infinity dual norm determination.

\begin{equation}\label{eqn:convex-optimizationLecture2:244}
\Norm{\Bz}_\conj
=
\sup_\Bx \setlr{ \Bz^\T \Bx | \Norm{\Bx}_\infty \le 1 }.
\end{equation}

Here
\begin{equation}\label{eqn:convex-optimizationLecture2:264}
\begin{aligned}
\Bz^\T \Bx
&=
\sum_{i=1}^n z_i x_i \\
&\le
\sum_{i=1}^n \Abs{z_i}\Abs{ x_i } \\
&\le
\lr{ \max_j \Abs{ x_j } }
\sum_{i=1}^n \Abs{z_i} \\
&=
\Norm{\Bx}_\infty
\sum_{i=1}^n \Abs{z_i}.
\end{aligned}
\end{equation}

So
\begin{equation}\label{eqn:convex-optimizationLecture2:284}
\Norm{\Bz}_\conj
\le
\sum_{i=1}^n \Abs{z_i}
=
\Norm{\Bz}_1.
\end{equation}

Statement from the lecture: I’m not sure where this fits:

\begin{equation}\label{eqn:convex-optimizationLecture2:304}
x_i^\conj
=
\left\{
\begin{array}{l l}
+1 & \quad \mbox{\( z_i \ge 0 \)} \\
-1 & \quad \mbox{\( z_i \le 0 \)}
\end{array}
\right.
\end{equation}

References

[1] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.