Jensen's inequality

From Wikimization

(Difference between revisions)
Jump to: navigation, search
(New page: First, by definition <math>\,\phi\,</math> is convex if and only if <math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) phi(b)</math> whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a...)
Line 1: Line 1:
-
First, by definition <math>\,\phi\,</math> is convex if and only if
+
By definition <math>\,\phi\,</math> is convex if and only if
-
<math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) phi(b)</math>
+
<math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math>
Line 9: Line 9:
-
*(iii) phi(sum t_j a_j) <= sum t_j phi(a_j).
+
<math>\phi(\sum t_j a_j) \leq \sum t_j \phi(a_j) </math> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (1)
-
Jensen's inequality says this: If mu is a probability
+
Jensen's inequality says this: If <math>\,mu\,</math> is a probability
-
measure on X, f is a real-valued function on X,
+
measure on <math>\,X\,</math>, <math>\,f\,</math> is a real-valued function on <math>\,X\,</math>,
-
f is integrable, and phi is convex on the range
+
<math>\,f\,</math> is integrable, and <math>\,phi\,</math> is convex on the range
-
of f then
+
of <math>\,f\,</math> then
-
(iv) phi(int f d mu) <= int phi o f d mu.
+
<math>\phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad</math> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(2)
-
Proof 1: By some limiting argument we can assume
+
'''Proof 1:''' By some limiting argument we can assume
-
that f is simple (this limiting argument is the missing
+
that <math>\,f\,</math> is simple (this limiting argument is the missing
-
detail). That is, X is the disjoint union of X_1, ... X_n
+
detail). That is, <math>\,X\,</math> is the disjoint union of <math>\,X_1 \,\ldots\, X_n\,</math>
-
and f is constant on each X_j.
+
and <math>\,f\,</math> is constant on each <math>\,X_j\,</math>.
-
Say t_j = mu(X_j) and a_j is the value of f on X_j.
+
Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math>.
-
Then (iii) and (iv) say exactly the same thing. QED.
+
Then (1) and (2) say exactly the same thing. QED.
-
That's worth noting because it seems to me it explains
 
-
"why" the thing's true. Here's the elegant complete proof:
 
-
 
+
'''Proof 2:''' The lemma shows that <math>\,\phi\,</math> has a right-hand
-
Proof 2: The lemma shows that phi has a right-hand
+
derivative at every point and that the graph of <math>\,\phi\,</math>
-
derivative at every point and that the graph of phi
+
lies above the "tangent" line through any point on the
lies above the "tangent" line through any point on the
graph with slope = the right derivative.
graph with slope = the right derivative.
-
Say a = int f d mu, let m = the right derivative of phi
+
Say <math>\,a = \int f d \mu\,</math>, let <math>\,m =\,</math> the right derivative of <math>\,\phi\,</math>
-
at a, and let
+
at <math>\,a\,</math>, and let
 +
 
 +
<math>\,L(t) = \phi(a) + m(t-a)\,</math>
-
L(t) = phi(a) + m(t-a).
 
 +
The comment above says that <math>\,\phi(t) \geq L(t)\,</math> for
 +
all <math>\,t\,</math> in the range of <math>\,\phi\,</math>. So
-
The comment above says that phi(t) >= L(t) for
 
-
all t in the range of phi. So
 
 +
<math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\
 +
&= \int (\phi(a) + m(f - a))\\
 +
&= \phi(a) + (m \int f) - ma\\
 +
&= \phi(a)\\
 +
&= \phi(\int f)\end{array}</math>
-
int phi o f >= int L o f
+
<math>\,-</math>Ullrich
-
= int (phi(a) + m(f - a))
+
-
= phi(a) + (m int f) - ma
+
-
= phi(a)
+
-
= phi(int f).
+

Revision as of 22:29, 13 July 2008

By definition LaTeX: \,\phi\, is convex if and only if

LaTeX: \phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)


whenever LaTeX: \,0 \leq t \leq 1\, and LaTeX: \,a, b\, are in the range of LaTeX: \,phi\,. It follows by induction on LaTeX: \,n\, that if LaTeX: \,t_j \geq 0\, for LaTeX: \,j = 1, 2\ldots n\, then


LaTeX: \phi(\sum t_j a_j) \leq \sum t_j \phi(a_j)           (1)


Jensen's inequality says this: If LaTeX: \,mu\, is a probability measure on LaTeX: \,X\,, LaTeX: \,f\, is a real-valued function on LaTeX: \,X\,, LaTeX: \,f\, is integrable, and LaTeX: \,phi\, is convex on the range of LaTeX: \,f\, then


LaTeX: \phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad          (2)


Proof 1: By some limiting argument we can assume that LaTeX: \,f\, is simple (this limiting argument is the missing detail). That is, LaTeX: \,X\, is the disjoint union of LaTeX: \,X_1 \,\ldots\, X_n\, and LaTeX: \,f\, is constant on each LaTeX: \,X_j\,.


Say LaTeX: \,t_j=\mu(X_j)\, and LaTeX: \,a_j\, is the value of LaTeX: \,f\, on LaTeX: \,X_j\,. Then (1) and (2) say exactly the same thing. QED.


Proof 2: The lemma shows that LaTeX: \,\phi\, has a right-hand derivative at every point and that the graph of LaTeX: \,\phi\, lies above the "tangent" line through any point on the graph with slope = the right derivative.


Say LaTeX: \,a = \int f d \mu\,, let LaTeX: \,m =\, the right derivative of LaTeX: \,\phi\, at LaTeX: \,a\,, and let


LaTeX: \,L(t) = \phi(a) + m(t-a)\,


The comment above says that LaTeX: \,\phi(t) \geq L(t)\, for all LaTeX: \,t\, in the range of LaTeX: \,\phi\,. So


LaTeX: \begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ 
</p>
<pre>        &= \int (\phi(a) + m(f - a))\\ 
        &= \phi(a) + (m \int f) - ma\\
        &= \phi(a)\\
        &= \phi(\int f)\end{array}

LaTeX: \,-Ullrich

Personal tools