[Click here for a PDF of this post with nicer formatting]

### Disclaimer

Peeter’s lecture notes from class. These may be incoherent and rough.

These are notes for the UofT course ECE1505H, Convex Optimization, taught by Prof. Stark Draper, from [1].

### Today

- Local and global optimality
- Compositions of functions
- Examples

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:20}

\begin{aligned}

F(x) &= x^2 \\

F”(x) &= 2 > 0

\end{aligned}

\end{equation}

strictly convex.

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:40}

\begin{aligned}

F(x) &= x^3 \\

F”(x) &= 6 x.

\end{aligned}

\end{equation}

Not always non-negative, so not convex. However \( x^3 \) is convex on \( \textrm{dom} F = \mathbb{R}_{+} \).

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:60}

\begin{aligned}

F(x) &= x^\alpha \\

F'(x) &= \alpha x^{\alpha-1} \\

F”(x) &= \alpha(\alpha-1) x^{\alpha-2}.

\end{aligned}

\end{equation}

This is convex on \( \mathbb{R}_{+} \), if \( \alpha \ge 1 \), or \( \alpha \le 0 \).

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:80}

\begin{aligned}

F(x) &= \log x \\

F'(x) &= \inv{x} \\

F”(x) &= -\inv{x^2} \le 0

\end{aligned}

\end{equation}

This is concave.

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:100}

\begin{aligned}

F(x) &= x\log x \\

F'(x) &= \log x + x \inv{x} = 1 + \log x \\

F”(x) &= \inv{x}

\end{aligned}

\end{equation}

This is strictly convex on

\( \mathbb{R}_{++} \), where

\( F”(x) \ge 0 \).

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:120}

\begin{aligned}

F(x) &= e^{\alpha x} \\

F'(x) &= \alpha e^{\alpha x} \\

F”(x) &= \alpha^2 e^{\alpha x} \ge 0

\end{aligned}

\end{equation}

Such functions are plotted in fig. 2, and are convex function for all \( \alpha \).

### Example:

For symmetric \( P \in S^n \)

\begin{equation}\label{eqn:convexOptimizationLecture7:140}

\begin{aligned}

F(\Bx) &= \Bx^\T P \Bx + 2 \Bq^\T \Bx + r \\

\spacegrad F &= (P + P^\T) \Bx + 2 \Bq = 2 P \Bx + 2 \Bq \\

\spacegrad^2 F &= 2 P.

\end{aligned}

\end{equation}

This is convex(concave) if \( P \ge 0 \) (\( P \le 0\)).

### Example:

A quadratic function

\begin{equation}\label{eqn:convexOptimizationLecture7:780}

F(x, y) = x^2 + y^2 + 3 x y,

\end{equation}

that is neither convex nor concave is plotted in fig 3.

This function can be put in matrix form

\begin{equation}\label{eqn:convexOptimizationLecture7:160}

F(x, y) = x^2 + y^2 + 3 x y

=

\begin{bmatrix}

x & y

\end{bmatrix}

\begin{bmatrix}

1 & 1.5 \\

1.5 & 1

\end{bmatrix}

\begin{bmatrix}

x \\

y

\end{bmatrix},

\end{equation}

and has the Hessian

\begin{equation}\label{eqn:convexOptimizationLecture7:180}

\begin{aligned}

\spacegrad^2 F

&=

\begin{bmatrix}

\partial_{xx} F & \partial_{xy} F \\

\partial_{yx} F & \partial_{yy} F \\

\end{bmatrix} \\

&=

\begin{bmatrix}

2 & 3 \\

3 & 2

\end{bmatrix} \\

&= 2 P.

\end{aligned}

\end{equation}

From the plot we know that this is not PSD, but this can be confirmed by checking the eigenvalues

\begin{equation}\label{eqn:convexOptimizationLecture7:200}

\begin{aligned}

0

&=

\det ( P – \lambda I ) \\

&=

(1 – \lambda)^2 – 1.5^2,

\end{aligned}

\end{equation}

which has solutions

\begin{equation}\label{eqn:convexOptimizationLecture7:220}

\lambda = 1 \pm \frac{3}{2} = \frac{3}{2}, -\frac{1}{2}.

\end{equation}

This is not PSD nor negative semi-definite, because it has one positive and one negative eigenvalues. This is neither convex nor concave.

Along \( y = -x \),

\begin{equation}\label{eqn:convexOptimizationLecture7:240}

\begin{aligned}

F(x,y)

&=

F(x,-x) \\

&=

2 x^2 – 3 x^2 \\

&=

– x^2,

\end{aligned}

\end{equation}

so it is concave along this line. Along \( y = x \)

\begin{equation}\label{eqn:convexOptimizationLecture7:260}

\begin{aligned}

F(x,y)

&=

F(x,x) \\

&=

2 x^2 + 3 x^2 \\

&=

5 x^2,

\end{aligned}

\end{equation}

so it is convex along this line.

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:280}

F(\Bx) = \sqrt{ x_1 x_2 },

\end{equation}

on \( \textrm{dom} F = \setlr{ x_1 \ge 0, x_2 \ge 0 } \)

For the Hessian

\begin{equation}\label{eqn:convexOptimizationLecture7:300}

\begin{aligned}

\PD{x_1}{F} &= \frac{1}{2} x_1^{-1/2} x_2^{1/2} \\

\PD{x_2}{F} &= \frac{1}{2} x_2^{-1/2} x_1^{1/2}

\end{aligned}

\end{equation}

The Hessian components are

\begin{equation}\label{eqn:convexOptimizationLecture7:320}

\begin{aligned}

\PD{x_1}{} \PD{x_1}{F} &= -\frac{1}{4} x_1^{-3/2} x_2^{1/2} \\

\PD{x_1}{} \PD{x_2}{F} &= \frac{1}{4} x_2^{-1/2} x_1^{-1/2} \\

\PD{x_2}{} \PD{x_1}{F} &= \frac{1}{4} x_1^{-1/2} x_2^{-1/2} \\

\PD{x_2}{} \PD{x_2}{F} &= -\frac{1}{4} x_2^{-3/2} x_1^{1/2}

\end{aligned}

\end{equation}

or

\begin{equation}\label{eqn:convexOptimizationLecture7:340}

\spacegrad^2 F

=

-\frac{\sqrt{x_1 x_2}}{4}

\begin{bmatrix}

\inv{x_1^2} & -\inv{x_1 x_2} \\

-\inv{x_1 x_2} & \inv{x_2^2}

\end{bmatrix}.

\end{equation}

Checking this for PSD against \( \Bv = (v_1, v_2) \), we have

\begin{equation}\label{eqn:convexOptimizationLecture7:360}

\begin{aligned}

\begin{bmatrix}

v_1 & v_2

\end{bmatrix}

\begin{bmatrix}

\inv{x_1^2} & -\inv{x_1 x_2} \\

-\inv{x_1 x_2} & \inv{x_2^2}

\end{bmatrix}

\begin{bmatrix}

v_1 \\ v_2

\end{bmatrix}

&=

\begin{bmatrix}

v_1 & v_2

\end{bmatrix}

\begin{bmatrix}

\inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 \\

-\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2

\end{bmatrix} \\

&=

\lr{ \inv{x_1^2} v_1 -\inv{x_1 x_2} v_2 } v_1 +

\lr{ -\inv{x_1 x_2} v_1 + \inv{x_2^2} v_2 } v_2

\\

&=

\inv{x_1^2} v_1^2

+ \inv{x_2^2} v_2^2

-2 \inv{x_1 x_2} v_1 v_2 \\

&=

\lr{

\frac{v_1}{x_1}

-\frac{v_2}{x_2}

}^2 \\

&\ge 0,

\end{aligned}

\end{equation}

so \( \spacegrad^2 F \le 0 \). This is a negative semi-definite function (concave). Observe that this check required checking PSD for all values of \( \Bx \).

This is an example of a more general result

\begin{equation}\label{eqn:convexOptimizationLecture7:380}

F(x) = \lr{ \prod_{i = 1}^n x_i }^{1/n},

\end{equation}

which is concave (prove on homework).

### Summary.

If \( F \) is differentiable in \R{n}, then check the curvature of the function along all lines. i.e. At all locations and in all directions.

If the Hessian is PSD at all \( \Bx \in \textrm{dom} F \), that is

\begin{equation}\label{eqn:convexOptimizationLecture7:400}

\spacegrad^2 F \ge 0 \, \forall \Bx \in \textrm{dom} F,

\end{equation}

then the function is convex.

### more examples of convex, but not necessarily differentiable functions

### Example:

Over \( \textrm{dom} F = \mathbb{R}^n \)

\begin{equation}\label{eqn:convexOptimizationLecture7:420}

F(\Bx) = \max_{i = 1}^n x_i

\end{equation}

i.e.

\begin{equation}\label{eqn:convexOptimizationLecture7:440}

\begin{aligned}

F((1,2) &= 2 \\

F((3,-1) &= 3

\end{aligned}

\end{equation}

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:460}

F(\Bx) = \max_{i = 1}^n F_i(\Bx),

\end{equation}

where

\begin{equation}\label{eqn:convexOptimizationLecture7:480}

F_i(\Bx)

=

… ?

\end{equation}

max of a set of convex functions is a convex function.

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:500}

F(x) =

x_{[1]} +

x_{[2]} +

x_{[3]}

\end{equation}

where

\( x_{[k]} \) is the k-th largest number in the list

Write

\begin{equation}\label{eqn:convexOptimizationLecture7:520}

F(x) = \max x_i + x_j + x_k

\end{equation}

\begin{equation}\label{eqn:convexOptimizationLecture7:540}

(i,j,k) \in \binom{n}{3}

\end{equation}

### Example:

For \( \Ba \in \mathbb{R}^n \) and \( b_i \in \mathbb{R} \)

\begin{equation}\label{eqn:convexOptimizationLecture7:560}

\begin{aligned}

F(\Bx)

&= \sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )^{-1} \\

&= -\sum_{i = 1}^n \log( b_i – \Ba^\T \Bx )

\end{aligned}

\end{equation}

This \( b_i – \Ba^\T \Bx \) is an affine function of \( \Bx \) so it doesn’t affect convexity.

Since \( \log \) is concave, \( -\log \) is convex. Convex functions of affine function of \( \Bx \) is convex function of \( \Bx \).

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:580}

F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By }

\end{equation}

Here \( C \subseteq \mathbb{R}^n \) is not necessarily convex. We are using \( \sup \) here because the set \( C \) may be open. This function is the length of the line from \( \Bx \) to the point in \( C \) that is furthest from \( \Bx \).

- \( \Bx – \By \) is linear in \( \Bx \)
- \( g_\By(\Bx) = \Norm{\Bx – \By} \) is convex in \( \Bx \) since norms are convex functions.
- \( F(\Bx) = \sup_{\By \in C} \Norm{ \Bx – \By } \). Each \( \By \) index is a convex function. Taking max of those.

### Example:

\begin{equation}\label{eqn:convexOptimizationLecture7:600}

F(\Bx) = \inf_{\By \in C} \Norm{ \Bx – \By }.

\end{equation}

Min and max of two convex functions are plotted in fig. 4.

The max is observed to be convex, whereas the min is not necessarily so.

\begin{equation}\label{eqn:convexOptimizationLecture7:800}

F(\Bz) = F(\theta \Bx + (1-\theta) \By) \ge \theta F(\Bx) + (1-\theta)F(\By).

\end{equation}

This is not necessarily convex for all sets \( C \subseteq \mathbb{R}^n \), because the \( \inf \) of a bunch of convex function is not necessarily convex. However, if \( C \) is convex, then \( F(\Bx) \) is convex.

### Consequences of convexity for differentiable functions

- Think about unconstrained functions \( \textrm{dom} F = \mathbb{R}^n \).
- By first order condition \( F \) is convex iff the domain is convex and

\begin{equation}\label{eqn:convexOptimizationLecture7:620}

F(\Bx) \ge \lr{ \spacegrad F(\Bx)}^\T (\By – \Bx) \, \forall \Bx, \By \in \textrm{dom} F.

\end{equation}

If \( F \) is convex and one can find an \( \Bx^\conj \in \textrm{dom} F \) such that

\begin{equation}\label{eqn:convexOptimizationLecture7:640}

\spacegrad F(\Bx^\conj) = 0,

\end{equation}

then

\begin{equation}\label{eqn:convexOptimizationLecture7:660}

F(\By) \ge F(\Bx^\conj) \, \forall \By \in \textrm{dom} F.

\end{equation}

If you can find the point where the gradient is zero (which can’t always be found), then \( \Bx^\conj\) is a global minimum of \( F \).

Conversely, if \( \Bx^\conj \) is a global minimizer of \( F \), then \( \spacegrad F(\Bx^\conj) = 0 \) must hold. If that were not the case, then you would be able to find a direction to move downhill, contracting the optimality of \( \Bx^\conj\).

### Local vs Global optimum

Definition: Local optimum

\( \Bx^\conj \) is a local optimum of \( F \) if \( \exists \epsilon > 0 \) such that \( \forall \Bx \), \( \Norm{\Bx – \Bx^\conj} < \epsilon \), we have

\begin{equation*}

F(\Bx^\conj) \le F(\Bx)

\end{equation*}

Theorem:

Suppose \( F \) is twice continuously differentiable (not necessarily convex)

- If \( \Bx^\conj\) is a local optimum then\begin{equation*}

\begin{aligned}

\spacegrad F(\Bx^\conj) &= 0 \\

\spacegrad^2 F(\Bx^\conj) \ge 0

\end{aligned}

\end{equation*} - If

\begin{equation*}

\begin{aligned}

\spacegrad F(\Bx^\conj) &= 0 \\

\spacegrad^2 F(\Bx^\conj) \ge 0

\end{aligned},

\end{equation*}then \( \Bx^\conj\) is a local optimum.

Proof:

- Let \( \Bx^\conj \) be a local optimum. Pick any \( \Bv \in \mathbb{R}^n \).\begin{equation}\label{eqn:convexOptimizationLecture7:720}

\lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t}

= \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv

\ge 0.

\end{equation}

Here the fraction is \( \ge 0 \) since \( \Bx^\conj \) is a local optimum.

Since the choice of \( \Bv \) is arbitrary, the only case that you can ensure that \( \ge 0, \forall \Bv \) is

\begin{equation}\label{eqn:convexOptimizationLecture7:740}

\spacegrad F = 0,

\end{equation}

( or else could pick \( \Bv = -\spacegrad F(\Bx^\conj) \).

This means that \( \spacegrad F(\Bx^\conj) = 0 \) if \( \Bx^\conj \) is a local optimum.

Consider the 2nd order derivative

\begin{equation}\label{eqn:convexOptimizationLecture7:760}

\begin{aligned}

\lim_{t \rightarrow 0} \frac{ F(\Bx^\conj + t \Bv) – F(\Bx^\conj)}{t^2}

&=

\lim_{t \rightarrow 0} \inv{t^2}

\lr{

F(\Bx^\conj) + t \lr{ \spacegrad F(\Bx^\conj) }^\T \Bv + \inv{2} t^2 \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv + O(t^3)

– F(\Bx^\conj)

} \\

&=

\inv{2} \Bv^\T \spacegrad^2 F(\Bx^\conj) \Bv \\

&\ge 0.

\end{aligned}

\end{equation}

Here the \( \ge \) condition also comes from the fraction, based on the optimiality of \( \Bx^\conj \). This is true for all choice of \( \Bv \), thus \( \spacegrad^2 F(\Bx^\conj) \).

# References

[1] Stephen Boyd and Lieven Vandenberghe. *Convex optimization*. Cambridge university press, 2004.