Jensen's inequality
From Wikimization
| Line 3: | Line 3: | ||
<math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math> | <math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math> | ||
| + | whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a, b\,</math> are in the range of <math>\,\phi\,</math>. | ||
| - | whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a, b\,</math> are in the range of <math>\,phi\,</math>. | ||
It follows by induction on | It follows by induction on | ||
| - | <math>\,n\,</math> that if <math>\,t_j \geq 0\,</math> for <math>\,j = 1, 2\ldots n\,</math> then | + | <math>\,n\,</math> that if <math>\,t_j \geq 0\,</math> for <math>\,j = 1, 2\ldots n\,</math> then |
| + | <br><math>\phi(\sum t_j a_j) \leq \sum t_j \phi(a_j) </math> (1) | ||
| - | < | + | <br>Jensen's inequality says this: <br>If <math>\,\mu\,</math> is a probability |
| - | + | measure on <math>\,X\,</math>, <br><math>\,f\,</math> is a real-valued function on <math>\,X\,</math>, | |
| - | + | <br><math>\,f\,</math> is integrable, and <br><math>\,\phi\,</math> is convex on the range | |
| - | Jensen's inequality says this: If <math>\,mu\,</math> is a probability | + | |
| - | measure on <math>\,X\,</math>, <math>\,f\,</math> is a real-valued function on <math>\,X\,</math>, | + | |
| - | <math>\,f\,</math> is integrable, and <math>\,phi\,</math> is convex on the range | + | |
of <math>\,f\,</math> then | of <math>\,f\,</math> then | ||
| + | <br><math>\phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad</math> (2) | ||
| - | < | + | <br>'''Proof 1:''' By some limiting argument we can assume |
| - | + | ||
| - | + | ||
| - | '''Proof 1:''' By some limiting argument we can assume | + | |
that <math>\,f\,</math> is simple (this limiting argument is the missing | that <math>\,f\,</math> is simple (this limiting argument is the missing | ||
| - | detail). That is, <math>\,X\,</math> is the disjoint union of <math>\,X_1 \,\ldots\, X_n\,</math> | + | detail). <br>That is, <math>\,X\,</math> is the disjoint union of <math>\,X_1 \,\ldots\, X_n\,</math> |
and <math>\,f\,</math> is constant on each <math>\,X_j\,</math>. | and <math>\,f\,</math> is constant on each <math>\,X_j\,</math>. | ||
| - | |||
Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math>. | Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math>. | ||
Then (1) and (2) say exactly the same thing. QED. | Then (1) and (2) say exactly the same thing. QED. | ||
| - | + | <br>'''Proof 2:''' The lemma shows that <math>\,\phi\,</math> has a right-hand | |
| - | + | ||
| - | '''Proof 2:''' The lemma shows that <math>\,\phi\,</math> has a right-hand | + | |
derivative at every point and that the graph of <math>\,\phi\,</math> | derivative at every point and that the graph of <math>\,\phi\,</math> | ||
lies above the "tangent" line through any point on the | lies above the "tangent" line through any point on the | ||
graph with slope = the right derivative. | graph with slope = the right derivative. | ||
| - | |||
Say <math>\,a = \int f d \mu\,</math>, let <math>\,m =\,</math> the right derivative of <math>\,\phi\,</math> | Say <math>\,a = \int f d \mu\,</math>, let <math>\,m =\,</math> the right derivative of <math>\,\phi\,</math> | ||
at <math>\,a\,</math>, and let | at <math>\,a\,</math>, and let | ||
| - | |||
<math>\,L(t) = \phi(a) + m(t-a)\,</math> | <math>\,L(t) = \phi(a) + m(t-a)\,</math> | ||
| - | |||
The comment above says that <math>\,\phi(t) \geq L(t)\,</math> for | The comment above says that <math>\,\phi(t) \geq L(t)\,</math> for | ||
all <math>\,t\,</math> in the range of <math>\,\phi\,</math>. So | all <math>\,t\,</math> in the range of <math>\,\phi\,</math>. So | ||
| - | |||
<math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ | <math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ | ||
| Line 55: | Line 44: | ||
&= \phi(\int f)\end{array}</math> | &= \phi(\int f)\end{array}</math> | ||
| - | <math>\,-</math>Ullrich | + | <math>\,-\,</math>D. Ullrich |
Revision as of 22:37, 13 July 2008
By definition is convex if and only if
whenever and
are in the range of
.
It follows by induction on
that if
for
then
(1)
Jensen's inequality says this:
If is a probability
measure on
,
is a real-valued function on
,
is integrable, and
is convex on the range
of
then
(2)
Proof 1: By some limiting argument we can assume
that is simple (this limiting argument is the missing
detail).
That is, is the disjoint union of
and
is constant on each
.
Say and
is the value of
on
.
Then (1) and (2) say exactly the same thing. QED.
Proof 2: The lemma shows that has a right-hand
derivative at every point and that the graph of
lies above the "tangent" line through any point on the
graph with slope = the right derivative.
Say , let
the right derivative of
at
, and let
The comment above says that for
all
in the range of
. So
D. Ullrich