# Jensen's inequality

(Difference between revisions)
 Revision as of 13:14, 15 July 2008 (edit)m (Reverted edits by 4.244.84.124 (Talk); changed back to last version by Kwokwah)← Previous diff Current revision (22:20, 4 December 2011) (edit) (undo) (9 intermediate revisions not shown.) Line 3: Line 3: $\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)$ $\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)$ - whenever $\,0 \leq t \leq 1\,$ and $\,a, b\,$ are in the domain of $\,\phi\,$. + whenever $\,0 \leq t \leq 1\,$ and $\,a\,, b\,$ are in the domain of $\,\phi\,$. It follows by induction on It follows by induction on Line 11: Line 11:
Jensen's inequality says this:
If $\,\mu\,$ is a probability
Jensen's inequality says this:
If $\,\mu\,$ is a probability - measure on $\,X\,$,
$\,f\,$ is a real-valued function on $\,X\,$, + measure on $\,X\,$ ,
$\,f\,$ is a real-valued function on $\,X\,$ ,
$\,f\,$ is integrable, and
$\,\phi\,$ is convex on the range
$\,f\,$ is integrable, and
$\,\phi\,$ is convex on the range of $\,f\,$ then of $\,f\,$ then Line 18: Line 18:
'''Proof 1:''' By some limiting argument we can assume
'''Proof 1:''' By some limiting argument we can assume - that $\,f\,$ is simple (this limiting argument is the missing + that $\,f\,$ is simple. (This limiting argument is a missing detail to this proof...) - detail).
That is, $\,X\,$ is the disjoint union of $\,X_1 \,\ldots\, X_n\,$ +
That is, $\,X\,$ is the disjoint union of $\,X_1 \,\ldots\, X_n\,$ - and $\,f\,$ is constant on each $\,X_j\,$. + and $\,f\,$ is constant on each $\,X_j\,$ . + + Say $\,t_j=\mu(X_j)\,$ and $\,a_j\,$ is the value of $\,f\,$ on $\,X_j\,$ .  - Say $\,t_j=\mu(X_j)\,$ and $\,a_j\,$ is the value of $\,f\,$ on $\,X_j\,$. Then (1) and (2) say exactly the same thing. QED. Then (1) and (2) say exactly the same thing. QED. -
'''Proof 2:''' The lemma shows that $\,\phi\,$ has a right-hand +
'''Proof 2:''' - derivative at every point and that the graph of $\,\phi\,$ + - lies above the "tangent" line through any point on the + Lemma. If $\,a < b,\, \,a' < b',\, \,a \leq a'\,$ and $\,b \leq b'\,$ then - graph with slope = the right derivative. + + $\,(f(a) - f(b)) / (a - b) \leq (f(a') - f(b')) / (a' - b')\quad\diamond$ + + The lemma shows: + *$\,\phi\,$ has a right-hand derivative at every point, and + *the graph of $\,\phi\,$ lies above the "tangent" line through any point on the graph with slope equal to the right derivative. + + Say $\,a = \int f d \mu\,$ - Say $\,a = \int f d \mu\,$, let $\,m =\,$ the right derivative of $\,\phi\,$ + Let $\,m\,$ be the right derivative of $\,\phi\,$ - at $\,a\,$, and let + at $\,a\,$ ,  and let $\,L(t) = \phi(a) + m(t-a)\,$ $\,L(t) = \phi(a) + m(t-a)\,$ - The comment above says that $\,\phi(t) \geq L(t)\,$ for + The bullets above say $\,\phi(t)\geq L(t)\,$ for - all $\,t\,$ in the domain of $\,\phi\,$. So + all $\,t\,$ in the domain of  $\,\phi\,$ .  So $\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ [itex]\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ Line 42: Line 50: &= \phi(a) + (m \int f) - ma\\ &= \phi(a) + (m \int f) - ma\\ &= \phi(a)\\ &= \phi(a)\\ - &= \phi(\int f)\end{array}$ + &= \phi(\int f)\end{array} + [/itex] - $\,-\,$D. Ullrich + $\,-\,$David C. Ullrich

## Current revision

By definition $LaTeX: \,\phi\,$ is convex if and only if

$LaTeX: \phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)$

whenever $LaTeX: \,0 \leq t \leq 1\,$ and $LaTeX: \,a\,, b\,$ are in the domain of $LaTeX: \,\phi\,$.

It follows by induction on $LaTeX: \,n\,$ that if $LaTeX: \,t_j \geq 0\,$ for $LaTeX: \,j = 1, 2\ldots n\,$ then

$LaTeX: \phi(\sum t_j a_j) \leq \sum t_j \phi(a_j)$           (1)

Jensen's inequality says this:
If $LaTeX: \,\mu\,$ is a probability measure on $LaTeX: \,X\,$ ,
$LaTeX: \,f\,$ is a real-valued function on $LaTeX: \,X\,$ ,
$LaTeX: \,f\,$ is integrable, and
$LaTeX: \,\phi\,$ is convex on the range of $LaTeX: \,f\,$ then

$LaTeX: \phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad$          (2)

Proof 1: By some limiting argument we can assume that $LaTeX: \,f\,$ is simple. (This limiting argument is a missing detail to this proof...)
That is, $LaTeX: \,X\,$ is the disjoint union of $LaTeX: \,X_1 \,\ldots\, X_n\,$ and $LaTeX: \,f\,$ is constant on each $LaTeX: \,X_j\,$ .

Say $LaTeX: \,t_j=\mu(X_j)\,$ and $LaTeX: \,a_j\,$ is the value of $LaTeX: \,f\,$ on $LaTeX: \,X_j\,$ .

Then (1) and (2) say exactly the same thing. QED.

Proof 2:

Lemma. If $LaTeX: \,a < b,\, \,a' < b',\, \,a \leq a'\,$ and $LaTeX: \,b \leq b'\,$ then

$LaTeX: \,(f(a) - f(b)) / (a - b) \leq (f(a') - f(b')) / (a' - b')\quad\diamond$

The lemma shows:

• $LaTeX: \,\phi\,$ has a right-hand derivative at every point, and
• the graph of $LaTeX: \,\phi\,$ lies above the "tangent" line through any point on the graph with slope equal to the right derivative.

Say $LaTeX: \,a = \int f d \mu\,$

Let $LaTeX: \,m\,$ be the right derivative of $LaTeX: \,\phi\,$ at $LaTeX: \,a\,$ ,  and let

$LaTeX: \,L(t) = \phi(a) + m(t-a)\,$

The bullets above say $LaTeX: \,\phi(t)\geq L(t)\,$ for all $LaTeX: \,t\,$ in the domain of  $LaTeX: \,\phi\,$ .  So

$LaTeX: \begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\

 &= \int (\phi(a) + m(f - a))\\ &= \phi(a) + (m \int f) - ma\\ &= \phi(a)\\ &= \phi(\int f)\end{array}

$

$LaTeX: \,-\,$David C. Ullrich