Jensen's inequality
From Wikimization
| (10 intermediate revisions not shown.) | |||
| Line 1: | Line 1: | ||
| - | I'm D. Ullrich. I did not post this. I didn't | ||
| - | write it either - it's wikimization's garbled | ||
| - | transcription of something I posted elsewhere, | ||
| - | posted here without my permission. | ||
| - | |||
By definition <math>\,\phi\,</math> is convex if and only if | By definition <math>\,\phi\,</math> is convex if and only if | ||
<math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math> | <math>\phi(ta + (1-t)b) \leq t \phi(a) + (1-t) \phi(b)</math> | ||
| - | whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a, b\,</math> are in the domain of <math>\,\phi\,</math>. | + | whenever <math>\,0 \leq t \leq 1\,</math> and <math>\,a\,, b\,</math> are in the domain of <math>\,\phi\,</math>. |
It follows by induction on | It follows by induction on | ||
| Line 16: | Line 11: | ||
<br>Jensen's inequality says this: <br>If <math>\,\mu\,</math> is a probability | <br>Jensen's inequality says this: <br>If <math>\,\mu\,</math> is a probability | ||
| - | measure on <math>\,X\,</math>, <br><math>\,f\,</math> is a real-valued function on <math>\,X\,</math>, | + | measure on <math>\,X\,</math> , <br><math>\,f\,</math> is a real-valued function on <math>\,X\,</math> , |
<br><math>\,f\,</math> is integrable, and <br><math>\,\phi\,</math> is convex on the range | <br><math>\,f\,</math> is integrable, and <br><math>\,\phi\,</math> is convex on the range | ||
of <math>\,f\,</math> then | of <math>\,f\,</math> then | ||
| Line 23: | Line 18: | ||
<br>'''Proof 1:''' By some limiting argument we can assume | <br>'''Proof 1:''' By some limiting argument we can assume | ||
| - | that <math>\,f\,</math> is simple ( | + | that <math>\,f\,</math> is simple. (This limiting argument is a missing detail to this proof...) |
| - | detail | + | <br>That is, <math>\,X\,</math> is the disjoint union of <math>\,X_1 \,\ldots\, X_n\,</math> |
| - | and <math>\,f\,</math> is constant on each <math>\,X_j\,</math>. | + | and <math>\,f\,</math> is constant on each <math>\,X_j\,</math> . |
| + | |||
| + | Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math> . | ||
| - | Say <math>\,t_j=\mu(X_j)\,</math> and <math>\,a_j\,</math> is the value of <math>\,f\,</math> on <math>\,X_j\,</math>. | ||
Then (1) and (2) say exactly the same thing. QED. | Then (1) and (2) say exactly the same thing. QED. | ||
| - | <br>'''Proof 2:''' The lemma shows | + | <br>'''Proof 2:''' |
| - | derivative at every point and | + | |
| - | lies above the "tangent" line through any point on the | + | Lemma. If <math>\,a < b,\, \,a' < b',\, \,a \leq a'\,</math> and <math>\,b \leq b'\,</math> then |
| - | graph with slope | + | |
| + | <math>\,(f(a) - f(b)) / (a - b) \leq (f(a') - f(b')) / (a' - b')\quad\diamond</math> | ||
| + | |||
| + | The lemma shows: | ||
| + | *<math>\,\phi\,</math> has a right-hand derivative at every point, and | ||
| + | *the graph of <math>\,\phi\,</math> lies above the "tangent" line through any point on the graph with slope equal to the right derivative. | ||
| + | |||
| + | Say <math>\,a = \int f d \mu\,</math> | ||
| - | + | Let <math>\,m\,</math> be the right derivative of <math>\,\phi\,</math> | |
| - | at <math>\,a\,</math>, and let | + | at <math>\,a\,</math> , and let |
<math>\,L(t) = \phi(a) + m(t-a)\,</math> | <math>\,L(t) = \phi(a) + m(t-a)\,</math> | ||
| - | The | + | The bullets above say <math>\,\phi(t)\geq L(t)\,</math> for |
| - | all <math>\,t\,</math> in the domain of <math>\,\phi\,</math>. So | + | all <math>\,t\,</math> in the domain of <math>\,\phi\,</math> . So |
<math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ | <math>\begin{array}{rl}\int \phi \circ f &\geq \int L \circ f\\ | ||
| Line 47: | Line 50: | ||
&= \phi(a) + (m \int f) - ma\\ | &= \phi(a) + (m \int f) - ma\\ | ||
&= \phi(a)\\ | &= \phi(a)\\ | ||
| - | &= \phi(\int f)\end{array}</math> | + | &= \phi(\int f)\end{array} |
| + | </math> | ||
| - | <math>\,-\,</math> | + | <math>\,-\,</math>David C. Ullrich |
Current revision
By definition is convex if and only if
whenever and
are in the domain of
.
It follows by induction on
that if
for
then
(1)
Jensen's inequality says this:
If is a probability
measure on
,
is a real-valued function on
,
is integrable, and
is convex on the range
of
then
(2)
Proof 1: By some limiting argument we can assume
that is simple. (This limiting argument is a missing detail to this proof...)
That is, is the disjoint union of
and
is constant on each
.
Say and
is the value of
on
.
Then (1) and (2) say exactly the same thing. QED.
Proof 2:
Lemma. If and
then
The lemma shows:
has a right-hand derivative at every point, and
- the graph of
lies above the "tangent" line through any point on the graph with slope equal to the right derivative.
Say
Let be the right derivative of
at
, and let
The bullets above say for
all
in the domain of
. So
David C. Ullrich