Jensen's inequality

From Wikimization

(Difference between revisions)

Jump to: navigation, search

Current revision

By definition $LaTeX: \,\phi\,$ is convex if and only if

$LaTeX: \phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)$

whenever $LaTeX: \,0 \leq t \leq 1\,$ and $LaTeX: \,a\,, b\,$ are in the domain of $LaTeX: \,\phi\,$ .

It follows by induction on $LaTeX: \,n\,$ that if $LaTeX: \,t_j \geq 0\,$ for $LaTeX: \,j = 1, 2\ldots n\,$ then

$LaTeX: \phi(\sum t_j a_j) \leq \sum t_j \phi(a_j)$ (1)

Jensen's inequality says this:
If $LaTeX: \,\mu\,$ is a probability measure on $LaTeX: \,X\,$ ,
$LaTeX: \,f\,$ is a real-valued function on $LaTeX: \,X\,$ ,
$LaTeX: \,f\,$ is integrable, and
$LaTeX: \,\phi\,$ is convex on the range of $LaTeX: \,f\,$ then

$LaTeX: \phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad$ (2)

Proof 1: By some limiting argument we can assume that $LaTeX: \,f\,$ is simple. (This limiting argument is a missing detail to this proof...)
That is, $LaTeX: \,X\,$ is the disjoint union of $LaTeX: \,X_1 \,\ldots\, X_n\,$ and $LaTeX: \,f\,$ is constant on each $LaTeX: \,X_j\,$ .

Say $LaTeX: \,t_j=\mu(X_j)\,$ and $LaTeX: \,a_j\,$ is the value of $LaTeX: \,f\,$ on $LaTeX: \,X_j\,$ .

Then (1) and (2) say exactly the same thing. QED.

Proof 2:

Lemma. If $LaTeX: \,a < b,\, \,a' < b',\, \,a \leq a'\,$ and $LaTeX: \,b \leq b'\,$ then

$LaTeX: \,(f(a) - f(b)) / (a - b) \leq (f(a') - f(b')) / (a' - b')\quad\diamond$

The lemma shows:

$LaTeX: \,\phi\,$ has a right-hand derivative at every point, and
the graph of $LaTeX: \,\phi\,$ lies above the "tangent" line through any point on the graph with slope equal to the right derivative.

Say $LaTeX: \,a = \int f d \mu\,$

Let $LaTeX: \,m\,$ be the right derivative of $LaTeX: \,\phi\,$ at $LaTeX: \,a\,$ , and let

$LaTeX: \,L(t) = \phi(a) + m(t-a)\,$

The bullets above say $LaTeX: \,\phi(t)\geq L(t)\,$ for all $LaTeX: \,t\,$ in the domain of $LaTeX: \,\phi\,$ . So

$LaTeX: \begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ </p> <pre> &= \int (\phi(a) + m(f - a))\\ &= \phi(a) + (m \int f) - ma\\ &= \phi(a)\\ &= \phi(\int f)\end{array} </pre> <p>$

$LaTeX: \,-\,$ David C. Ullrich

Retrieved from "http://www.convexoptimization.com/wikimization/index.php/Jensen%27s_inequality"

@@ Line 1: / Line 1: @@
-First, by definition <math>\,\phi\,</math> is convex if and only if
+By definition <math>\,\phi\,</math> is convex if and only if
-<math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) phi(b)</math>
+<math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math>
+whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a\,, b\,</math> are in the domain of <math>\,\phi\,</math>.
-whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a, b\,</math> are in the range of <math>\,phi\,</math>.
 It follows by induction on
 <math>\,n\,</math> that if <math>\,t_j \geq 0\,</math> for <math>\,j = 1, 2\ldots n\,</math> then
+<br><math>\phi(\sum t_j a_j) \leq \sum t_j \phi(a_j) </math> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (1)
-*(iii)  phi(sum t_j a_j) <= sum t_j phi(a_j).
+<br>Jensen's inequality says this: <br>If <math>\,\mu\,</math> is a probability
+measure on <math>\,X\,</math>&nbsp;,&nbsp; <br><math>\,f\,</math> is a real-valued function on <math>\,X\,</math>&nbsp;,&nbsp;
+<br><math>\,f\,</math> is integrable, and <br><math>\,\phi\,</math> is convex on the range
+of <math>\,f\,</math> then
+<br><math>\phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad</math> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(2)
-Jensen's inequality says this: If mu is a probability
+<br>'''Proof 1:''' By some limiting argument we can assume
-measure on X, f is a real-valued function on X,
+that <math>\,f\,</math> is simple. (This limiting argument is a missing detail to this proof...)
-f is integrable, and phi is convex on the range
+<br>That is, <math>\,X\,</math> is the disjoint union of <math>\,X_1 \,\ldots\, X_n\,</math>
-of f then
+and <math>\,f\,</math> is constant on each <math>\,X_j\,</math>&nbsp;.
+Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math>&nbsp;.&nbsp;
-(iv)  phi(int f d mu) <= int phi o f d mu.
+Then (1) and (2) say exactly the same thing. QED.
+<br>'''Proof 2:'''
-Proof 1: By some limiting argument we can assume
+Lemma. If <math>\,a < b,\, \,a' < b',\, \,a \leq a'\,</math> and <math>\,b \leq b'\,</math> then
-that f is simple (this limiting argument is the missing
-detail). That is, X is the disjoint union of X_1, ... X_n
-and f is constant on each X_j.
+<math>\,(f(a) - f(b)) / (a - b) \leq (f(a') - f(b')) / (a' - b')\quad\diamond</math>
-Say t_j = mu(X_j) and a_j is the value of f on X_j.
+The lemma shows:
-Then (iii) and (iv) say exactly the same thing. QED.
+*<math>\,\phi\,</math> has a right-hand derivative at every point, and
+*the graph of <math>\,\phi\,</math> lies above the "tangent" line through any point on the graph with slope equal to the right derivative.
+Say <math>\,a = \int f d \mu\,</math>
-That's worth noting because it seems to me it explains
+Let <math>\,m\,</math> be the right derivative of <math>\,\phi\,</math>
-"why" the thing's true. Here's the elegant complete proof:
+at <math>\,a\,</math>&nbsp;,&nbsp; and let
+<math>\,L(t) = \phi(a) + m(t-a)\,</math>
-Proof 2: The lemma shows that phi has a right-hand
+The bullets above say <math>\,\phi(t)\geq L(t)\,</math> for
-derivative at every point and that the graph of phi
+all <math>\,t\,</math> in the domain of &nbsp;<math>\,\phi\,</math>&nbsp;. &nbsp;So
-lies above the "tangent" line through any point on the
-graph with slope = the right derivative.
+<math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\
+         &= \int (\phi(a) + m(f - a))\\
+         &= \phi(a) + (m \int f) - ma\\
+         &= \phi(a)\\
+         &= \phi(\int f)\end{array}
+</math>
-Say a = int f d mu, let m = the right derivative of phi
+<math>\,-\,</math>David C. Ullrich
-at a, and let
-  L(t) = phi(a) + m(t-a).
-The comment above says that phi(t) >= L(t) for
-all t in the range of phi. So
-  int phi o f >= int L o f
-         = int (phi(a) + m(f - a))
-         = phi(a) + (m int f) - ma
-         = phi(a)
-         = phi(int f).

Jensen's inequality

From Wikimization

Current revision

Views

Personal tools

Navigation

Search

Toolbox