Jensen's inequality
From Wikimization
(New page: First, by definition <math>\,\phi\,</math> is convex if and only if <math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) phi(b)</math> whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a...) |
|||
| Line 1: | Line 1: | ||
| - | + | By definition <math>\,\phi\,</math> is convex if and only if | |
| - | <math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) phi(b)</math> | + | <math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math> |
| Line 9: | Line 9: | ||
| - | + | <math>\phi(\sum t_j a_j) \leq \sum t_j \phi(a_j) </math> (1) | |
| - | Jensen's inequality says this: If mu is a probability | + | Jensen's inequality says this: If <math>\,mu\,</math> is a probability |
| - | measure on X, f is a real-valued function on X, | + | measure on <math>\,X\,</math>, <math>\,f\,</math> is a real-valued function on <math>\,X\,</math>, |
| - | f is integrable, and phi is convex on the range | + | <math>\,f\,</math> is integrable, and <math>\,phi\,</math> is convex on the range |
| - | of f then | + | of <math>\,f\,</math> then |
| - | + | <math>\phi(\int f d \mu) \leq \int \phi \circ f d \mu\qquad</math> (2) | |
| - | Proof 1: By some limiting argument we can assume | + | '''Proof 1:''' By some limiting argument we can assume |
| - | that f is simple (this limiting argument is the missing | + | that <math>\,f\,</math> is simple (this limiting argument is the missing |
| - | detail). That is, X is the disjoint union of X_1, | + | detail). That is, <math>\,X\,</math> is the disjoint union of <math>\,X_1 \,\ldots\, X_n\,</math> |
| - | and f is constant on each X_j. | + | and <math>\,f\,</math> is constant on each <math>\,X_j\,</math>. |
| - | Say t_j = mu(X_j) and a_j is the value of f on X_j. | + | Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math>. |
| - | Then ( | + | Then (1) and (2) say exactly the same thing. QED. |
| - | That's worth noting because it seems to me it explains | ||
| - | "why" the thing's true. Here's the elegant complete proof: | ||
| - | + | '''Proof 2:''' The lemma shows that <math>\,\phi\,</math> has a right-hand | |
| - | Proof 2: The lemma shows that phi has a right-hand | + | derivative at every point and that the graph of <math>\,\phi\,</math> |
| - | derivative at every point and that the graph of phi | + | |
lies above the "tangent" line through any point on the | lies above the "tangent" line through any point on the | ||
graph with slope = the right derivative. | graph with slope = the right derivative. | ||
| - | Say a = int f d mu, let m = the right derivative of phi | + | Say <math>\,a = \int f d \mu\,</math>, let <math>\,m =\,</math> the right derivative of <math>\,\phi\,</math> |
| - | at a, and let | + | at <math>\,a\,</math>, and let |
| + | |||
| + | <math>\,L(t) = \phi(a) + m(t-a)\,</math> | ||
| - | L(t) = phi(a) + m(t-a). | ||
| + | The comment above says that <math>\,\phi(t) \geq L(t)\,</math> for | ||
| + | all <math>\,t\,</math> in the range of <math>\,\phi\,</math>. So | ||
| - | The comment above says that phi(t) >= L(t) for | ||
| - | all t in the range of phi. So | ||
| + | <math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ | ||
| + | &= \int (\phi(a) + m(f - a))\\ | ||
| + | &= \phi(a) + (m \int f) - ma\\ | ||
| + | &= \phi(a)\\ | ||
| + | &= \phi(\int f)\end{array}</math> | ||
| - | + | <math>\,-</math>Ullrich | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | + | ||
Revision as of 22:29, 13 July 2008
By definition is convex if and only if
whenever and
are in the range of
.
It follows by induction on
that if
for
then
(1)
Jensen's inequality says this: If is a probability
measure on
,
is a real-valued function on
,
is integrable, and
is convex on the range
of
then
(2)
Proof 1: By some limiting argument we can assume
that is simple (this limiting argument is the missing
detail). That is,
is the disjoint union of
and
is constant on each
.
Say and
is the value of
on
.
Then (1) and (2) say exactly the same thing. QED.
Proof 2: The lemma shows that has a right-hand
derivative at every point and that the graph of
lies above the "tangent" line through any point on the
graph with slope = the right derivative.
Say , let
the right derivative of
at
, and let
The comment above says that for
all
in the range of
. So
Ullrich