Did you get here from lemonade? If you're lost, click here to go back...
See all sketches...Sketch 7
It Turns Out Measure Theory Actually Is Useful
Huh...
I was motivated to make this note because I was confused about why we need to use measure theory to study probability. What kinds of questions does probability theory ask that inspire a measure theoretic approach?
If you are reading this without having seen any measure theory (to be honest it's probably best to see a bit of measure theory first), I want to write it in a way that is understandable by people who have taken basic probability, so here are some terms I may use that are not too complicated intuitively to understand:
- Measurable: A set is measurable if we want to and can "measure" the probability of that set occuring. For example, the set
is "measurable" below, in the sense that we can find the probability of the event "both players play rocks OR the first player plays rock and the second player plays paper". -algebra: This is essentially the collection of sets which are measurable. Do note that while I motivate things backwards, we generally start by making a -algebra, and then defining all the sets in that -algebra to be measurable.- Power set: The set of all subsets of a set.
If you find any other complicated-sounding terms I didn't define above, let me know at aathreyakadambi@gmail.com. To be honest, I anticipate a good number of complaints.
Date Started: June 20, 2024
Date Finished: June 20, 2024

An Example of Measure Theoretic Notation on A Classical Problem
Let's examine a game of rock-paper-scissors. We have two players. They play five games, and the person who one the most games wins.
We want to know what strategies the players should use. How do we write our model for this in the language of measure theory? Here's one idea. Since each round is independent, it suffices to figure out what probabilities the players should play with to maximize their chances of winning on an individual round. Suppose the first player plays R, P, and S with probabilities
Our sample space here is the 9-element space:
Now whether the first player wins, loses, or ties is determined by the indicator random variable:
We can compute the expected payoff for the first player as
Now what is the best mixed strategy for the first player? I won't explain the details here, but we can proceed with the usual computations to find the best mixed strategy:
So far here, note that we haven't actually needed measure theory for anything. All of the ideas come from the idea of Bayes rule and classic ideas of probability, just wrapped in the language of measure theory. But that's not "needing" measure theory, that's just wanting to make simple things fancier.
The Need for Measure Theory
In the above problem, we saw how the expectation of our random variable became simply the integral over a simple function. If all our random variables are simple functions, the theory reduces so that there is no need for more advanced machinery.
In fact, our
In any case, to see a more illustrative example of where measure theory is more practical, it thus makes sense to consider cases where our
Where Bayes' Rule Breaks Down
I have spent half of today plagued by a horrible and painful misunderstanding, and I think I finally understand it. All my life, I have thought of the expected value of a random variable as an "average". In other words, it is just one real number.
I was absolutely (as opposed to conditionally ;) confused when I read in Appendix B of Oksendal that the expected value with respect to a
Of course, if you know what you're doing, that all makes sense. But the key is actually a subtle change in perspective.
When I lied back in the first section, I was viewing conditional probability as taken by restricting your sample space. In other words, it comes from the following perspective:
Idea: When you gain information, you restrict your sample space.
In that perspective, once we know that the first player picks paper, our sample space "collapses" into
Now there is another perspective which motivates the perspective in Oksendal.
Idea: When you gain information, you add to your
In this perspective, our entire model is different than what I had discussed above! We define two different
Now if we take
In other words, when we take expectations with respect to a certain filtration, we "smudge" our function based on the level of information we know to obtain a new function from
To answer the question of
In other words, to relate the two perspectives,
Which Perspective is Better?
The first idea embodies the idea of conditional probability and Bayes rule, and at least for me, that is foundational to probability. On the other hand, the second idea motivates the idea of a filtration: an increasing sequence of
There is actually a nice discussion in Rosenthal's book A First Look at Rigorous Probability Theory which discusses this difference (although I only ended up seeing this discussion after I ended up spend hours figuring it out with no reference that didn't confuse me more 😭). In the end, Rosenthal gives a great and very reasonable example for why the second perspective may be better.
He mentions that it is difficult to define the conditional probability
However, here's Rosenthal's example: consider
If we create a
Of course, I'm being very hand wavy with this smudging idea. This is mainly so that it appeals to intuition, and for readability. A more thorough mathematical construction of the idea is in the appendix.
Filtrations and Martingales 🤩
I should probably preface by saying I don't know enough about filtrations and martingales to be writing this section of this sketch, but I'm writing it anyway because... that's the idea behind the "sketches": they're just sketches. I just think filtrations and martingales sound ultracool.
In the context of stochastic processes, things become more interesting with the notion of a filtration.
A stochastic process (stochastic is just a fancy word for "random") is simply a collection of random variables indexed by time:
At each point in time, we take the pre-image of our topology on
To understand the idea of a filtration, it's most illustrative to consider a series of coin tosses. Suppose someone is tossing coins repeatedly, in an interesting stochastic demonstration. Our sample space
Now after she tosses the coin once, we obtain a new
This is the sense in which we mean "resolution". As our
Now what does it mean for a random variable to be measurable starting at time
A martingale is a stochastic process
Another interesting question, though, is how to model information and order in these random systems. Perhaps there are some deep ideas built into the foundations of our theory that can give us insights into what information is in the real world? I might be spouting pure nonsense, but if there is something interesting there, I'll make another sketch later.
What Michael Greinecker and Chris Evans Have to Say
To end this sketch, I'll finish off by discussing some ideas from Stack Exchange posts on this question:
I'll start with Michael Grinecker's answer.
He starts by mentioning that there is another problem which has a very simple answer when expressed measure theoretically:
Problem. Let
Of course, my guy doesn't actually say the answers to his questions in his post, leaving us all to suffer and think on our own in a cruel but I guess guru-like fashion.
Classically, this indeed does seem hard. The definition I have seen is that conditional on any value of
To be honest, the whole notion seems a bit fuzzy and painful because the way we are thinking about it above forces us to condition
The definition of independence is more clear from a measure-theoretic perspective because making random variables be measurable functions allows us to easily compute whether random variables or collections of events for that matter are independent. It does this by formulating the independence of random variables as the independence of collections of events generated by the random variable, which comes from the very fundamental idea of measurability. The definition is: (taken from Oksendal)
Definition (Independence of Collections of Families of Measurable Sets). A collection
Then we define random variables to be independent if the
He also mentions notions of convergence, which I would personally say is more on the measure theory side of things than the probability side anyway, although I might be being naive. Continuous time stochastic processes are also a good point, which I discussed above. It is certainly a very pretty result that paths of Brownian motion can be chosen to be almost surely continuous. This does indeed seem to be interesting, and I have a feeling it uses Borel-Cantelli in its proof (although I have to figure it out).
I will have to read up on all of Kolmogorov's work as well, since he seems to have tons of beautiful results in probability theory. For example, there's Kolmogorov's zero-one law which Greineker mentions in his post, and there is also Kolmogorov's extension theorem and Kolmogorov's continuity theorem, and so many others made by him. I can't say I understand his work yet, but now that I've finally gotten over this misunderstanding, hopefully I'll be able to figure out what his work was all about in the upcoming weeks!
Now I'll discuss Evans's answer. He again discusses the joint probability ideas. We've seem this point made time and time again. He does mention the idea that maybe distribution theory can resolve the issue as well. I actually happened to take PDEs along with measure theory, so at the time I was very curious about the relationship between distribution spaces and collections of real-valued (not just nonnegative) measures. This topic is something I should surely read about in the future.
Appendix: Radon-Nikodym to the Rescue!
This should be quite short, but it turns out that we can use the Radon-Nikodym theorem to more rigorously construct our notion of expectation outlined above. To come up with an appropriate definition, it's a good idea to list properties we would like the expectation to have:
whenever and are not separable in (i.e. any set in either contains both and or contains neither). is -measurable (so that we get the property that , or once we've smudged it to the level of , it is also refined to the level of ). is as close to as possible, while still being -measurable.
This last point is the main idea which motivates the definition: Consider
Here, one may see that we have interpretted the notion of closeness in the sense that
There exists a unique (up to measure zero difference) function satisfying this condition. This is pretty much exactly the statement of the Radon-Nikodym theorem. The random variable
In some sense, the Radon-Nikodym theorem essentially gives us a way to "restrict" our measurable function onto a
Reading List
As with every post, this has given me a long reading list to check out:
- Rosenthal's book.
- All of Kolmogorov's Theorems (I say all, but I'm low key a little afraid to see how many "all" is considering how much cool stuff Kolmogorov did).
- Can we resolve any issues discussed above with distribution theory, and are the ideas equivalent to what we study in measure theory? Although, there are differences where we can have weird distributions, or something... I'll have to refresh on the ideas.
- How do filtrations and martingales relate to the theory of information and thermodynamics?
- Strook and Varadhan's book, and dang Strook has a lot of interesting work I'll have to check out in general.
References
- Bass, Richard F. Real Analysis for Graduate Students Version 4.3
- Øksendal, B. K. Stochastic Differential Equations: An Introduction with Applications. Springer, 2013.
- Rosenthal, Jeffrey S. A First Look at Rigorous Probability Theory. World Scientific, 2011.
- Shannon, Claude Elwood, et al. The Mathematical Theory Of Communication. University of Illinois Press, 1949.
- Shreve, Steven E. Stochastic Calculus for Finance II: Continuous-Time Models. Springer New York, 2004.
- "Why Measure Theory for Probability?" Mathematics Stack Exchange, 1 Dec. 1958, math.stackexchange.com/questions/393712/why-measure-theory-for-probability.
- "What Can I Do with Measure Theory That I Can't with Probability and Statistics." Mathematics Stack Exchange, 1 Sept. 1959, math.stackexchange.com/questions/668752/what-can-i-do-with-measure-theory-that-i-cant-with-probability-and-statistics.