Measure, Integral and Probability : Summary

I have found this book very challenging to go over a few years ago. This was one subject that I found it very difficult to understand. The main reason being I was never exposed to any real analysis course back then. Needless to say, my understanding was shallow. The offshoot of this limited understanding has come to plague me now when my task is to develop a model that is completely based on general measures. The model that I am trying to customize is heavily based on general measures. The model’s creator has given a few guidelines and that’s about it. The guidelines point to heavy usage of martingale theory. I realized that my understanding of general measures and martingales was pathetic and I decided to go over this book, being well aware of the fact that the book would be very challenging.

Reading this book after 3.5 years, I realized that this book is amazingly useful for a persistent reader. However unlike the last time, I was prepared with the prerequisites needed to go over this book. This book has a tall order of pre-requisites something the title fails to convey. A reader must have good knowledge about Analysis on Real Line, Finite Dimensional Vector Spaces, Metric Spaces and at least 10,000 ft view of Functional Analysis. Summarizing a math book is very tricky and usually no summary can do justice to the contents of the book. My intent of writing this summary is to highlight an important aspect of the book, i.e, the way probability theory and measure theory concepts are explained in parallel. The intent behind theorems is made crystal clear by providing the relevant applications in probability theory and math finance. As you go over Lebesgue theory, you will be able to connect the various probability concepts that are related to Lebesgue theory. In that way, you will not find the book dry at all. Let me try to summarize the main chapters of the book from this point of view.

Chapter 1 : Motivation and Preliminaries

As the title of the chapter suggests, it contains some necessary real analysis concepts to follow the content of the book. It starts off with asking a simple question “What is the probability of choosing a random number between (0,1)?”. For a reader who is exposed to classic probability stuff where he/she deals with finite outcomes, the way to answer such a question poses a challenge. How to compute probability when there are infinite outcomes? Such questions can be only answered with the help of a mathematical framework. The chapter introduces some set theory notation and topological concepts. It then moves on providing arguments for an alternate integral rather than Riemann integral. The classic Dirichlet example is used to show the reader that there are integrals which are not Riemann integrable. Infact there are tons of issues with Riemann integration. A reader would be well advised to read other books before reading this concise Riemann bashing in the book. The authors state that the focus of the book is to show the essential role of Lebesgue measure in the description of probabilistic phenomena based on infinite sample spaces.

Chapter 2 : Measure

Probability is basically a type of measure on a probability space. In order to understand probability spaces, one needs to have a thorough knowledge of sigma fields and lebesgue measure. Probability measure is a countably additive measure on the sigma algebra of the outcome space. This chapter does a superb job of making the reader understand where all the pieces fit like outer measure, lebesgue measure, Lebesgue measurable field, Borel sigma algebra etc. The chapter starts off with defining null sets as they are key to understanding measure of sets like rationals,finite sets, cantor sets. All these sets have lebesgue measure 0. Outer measure is defined and its properties are discussed. The typical problem with outer measure is explained, i.e measure is countably subadditive. Subsequently, Lebesgue measure is introduced precisely to solve this issue. By selecting a subset of the power set of R(Lebesgue measurable sets), countably additivity is achieved. Topological properties of Lebesgue measurable sets are discussed so that one can use appropriate definitions and concepts from topology to prove and derive new theorems. Borel Sets and Borel Sigma algebras are then introduced so that one can conveniently talk about family of Lebesgue measurable sets which have the necessary properties. It is also shown that the Lebesgue sigma algebra has more elements than Borel Sigma algebra, though for probability applications, one can conveniently ignore the differences and work with Borel Sigma algebra.

Chapter 3 : Measurable Functions

Random variables , a 101 term in probability theory, is actually a misnomer. It is not a random variable but a function. To understand the function and properties of the function, this chapter talks about measurable functions. The beauty of these functions is that limiting operations on a sequence of measurable functions preserve measurability. One often comes across “Sigma field generated by X” in most of the theorems. This chapter makes the term precise by giving a definition for the same. Intuitively the sigma field generated by X is coarser than the sigma field of the measurable space. Once you understand that a probability distributions is nothing but a set function on a coarser sigma algebra you can organize your thoughts the following way:

You have Omega which is the set of all out comes
You define a measure on a sigma algebra generated by Intervals ( Borel Sigma Algebra)
The measure is defined in such a way that the measure is countable additive on Borel Sigma Algebra
This measure defined in such a manner is Lebesgue measure
Each subset of Sigma algebra is an event
You define random variable on Sample space
So, obviously the sigma algebra of this random variable is coarse
The probability measure on this coarser sigma algebra can be summarized with the help of distribution function / discrete measures / absolutely continuous measures / mix of discrete and absolutely continuous measures.

Chapter 4 : Lebesgue Integral

“What has probability distribution got to do with integration”? , is a beginner’s question as it opens up the whole Lebesgue integral theory. An integral is basically an area under the curve of a function, at least that’s the intuitive notion. Now one can find measures on probability space such that integral with respect to a measurable function f, over the measurable space is 1. Such measures are called absolutely continuous measures. From a computational angle, one always writes an integral for continuous variable for calculating moments, probabilities etc. But behind the integral lies the entire theory of Lebesgue measurable functions. A non negative Lebesgue measurable function serves as the density function for an absolutely continuous variable. For a very loooooong time I thought there were only discrete / continuous/ mix of discrete and continuous variables. However after going over Cantor set and its mysterious properties, I realized that it is very much possible to construct measures which do not belong to { Discrete, Absolutely continuous, mix of Discrete and Absolutely Continuous measures}. They go by the name singular measures. Anyways, this chapter does not discuss such measures. The relevant background to understanding concepts like density function, distribution function, expectation and characteristic function of a random variable is provided in this chapter. They are

Definition of Lebesgue integral via Simple functions
Properties of Lebesgue integral
Convergence Theorems like Monotone Convergence theorem and Dominated Convergence theorem
Relation between Riemann Integral and Lebesgue integral .

Chapter 5 : Spaces of Integrable Functions

Moments for Random variables are usually introduced in a computational way. If it is discrete variable or a continuous variable, then there are usually formulae that are introduced in any undergraduate course. However to marry with the concept of “X is a measurable function”, one needs to know functional analysis. Any concept relating to Moment calculations or Characteristic function computations is related deeply to functional analysis. This chapter introduces the concepts of metric space, vector space and inner product spaces. The concept of Lebesgue measurable function is expanded to spaces where each individual function is a point. Vector spaces with Lebesgue measurable functions are introduced with the norm defined. Subsequently Hilbert Spaces are introduced where each point is a Lebesgue measurable function and the functions have their L2 norm finite. Hilbert Space is important as one can define an inner product space and can make use of orthogonality aspects of a vector elements. The inner product induced norm gives more flexibility in working with orthogonal variables and Fourier functions. Even though Fourier series are not talked about in the book, Hilbert Spaces are extremely useful in relation to Fourier series. The other interesting aspect of Lp spaces is that they are complete, meaning every Cauchy sequence of functions converges to a point in the space.

One of the highlights of this chapter is the construction of Conditional Expectation where Hilbert Spaces are used to give a preliminary introduction. Conditional expectation is a very interesting concept and opens up a completely different way to look at random variables. Conditional expectation deals with understanding a random variable under a sub sigma-algebra. For a discrete variable E(X/Y) calculation is easy. For a continuous variable, it is easy if the conditional density is given. This is the way it is presented in undergrad books where all the discussion about sub sigma algebra is avoided. However if Y is a general random variable, then Expectation of X given Y has no formula. It needs to be implicitly calculated. Existence of such a variable is obtained from a deep theorem in probability Radon-Nikodym theorem. This theorem is covered in a separate chapter of the book, a chapter which I found to be the most difficult to follow.

Chapter 6 : Product Measure

The natural transition from a single dimensional probability space to multidimensional probability space is shown in this chapter. For simplicity sake, if there are two sigma fields, one must work on product sigma field to make sense of random variables defined on both spaces. Lets say X is defined on Sigma algebra F1 and Y is defined on Sigma Algebra F2, then a variable like Joint distribution would be defined on the product sigma algebra. For this new space, obviously one needs a measure. The notion of length in a one dimensional space is extended to a set of intervals on a plane to define the measure. The chapter gives a very good construction to the product measure by introducing projections. It culminates in Fubini Theorem which gives the condition for swapping integrals for a multiple integral case. All these fundas are extremely useful in describing Joint Distributions, Convolutions, Conditional density and Characteristic functions. There is a long proof to show that Characteristic functions determine distributions for random variables.

Chapter 7 : Radon Nikodym Theorem

I think this is the toughest of all chapter as there are far too many concepts that are introduced. At times I felt overwhelming. It starts off by introducing a property of measures called “absolutely continous”. Measure m1 is said to be absolutely continous with respect to measure m2 if they agree on Null sets. So, why is it relevant to explore this property ? The reason behind this question is “Radon Nikodym theorem”. The relation between the measure m1 and m2 where m1 is absolutely continous with respect to m2 is given by Radon Nikodym derivative. If I were to describe naively, it would be like this : Lets say you have a space Omega1 equipped with m as measure. Now if one were to look at another measure m1 which is absolutely continous with m, then all the computations with respect to m1 can automatically be written as an equivalent problem with respect to measure m. (Option pricing math thrives on the existence of Radon Nikodym derivative). The section ends with Lebesgue decomposition theorem which states that a sigma finite measure m2 can be expressed as a combination of two measures , one being absolutely continous with m1 and other being singular / perpendicular to m1. This smells like the case of a vector in finite dimensional space which can be decomposed as a vector in a projected space and vector perpendicular to the projected case. In this case, instead of vectors, we are dealing with measures.

The chapter then moves on Lebesgue-Stieltjes measure. Well, frankly I did not understand this section and had to go over a book on Lebesgue – Stieltjes to make sense of this section. The effort of going over a separate book and then coming back to this section proved useful in understanding this section. It took me sometime to understand this section and I think I kind of got the crux of it, thought I intend to read it a few more times later. The crux of Lebesgue Stieltjes measure is this : If you want to weight the interval over which a function is integrated in a different way, lets say g(x), then one needs Lebesgue Stieljes integral and this integral induces a sigma finite measure called Distribution measure, which is a function. Now given this distribution function F, the key questions to answer are:

Does this induce a measure ?
Does this induce a density function ?

Both are very important questions. It is shown methodically that F does induce a measure mF

However there are additional restrictions for F in order to induce a density. The condition is that F should be absolutely continuous. Only when F is absolutely continous , one can talk about density of such a Distribution function. What’s the connection between this density and the lebesgue measure ? It’s the Radon Nikodym derivative which again comes to rescue and clearly fits in the linkage. There are other measure decompositions that are discussed like Hahn-Jordan decomposition. An elegant description of these decompositions can be found at A Mind for Madness. Connecting these concepts to probability applications, the chapter explores the properties of Conditional Expectation. It then discusses Martingales, which are the most important mathematical objects as far as math-fin is concerned. Typically a stats student spends his time looking at random variables, distributions, limit theorems like Weak Law, Strong Law, Central Limit Theorems etc. However this exposure is not enough for understanding random processes. Like one tries to understand iids, one must spend time in understanding Martingale processes. One slowly begins to realize that Martingales have their own set of rules, set of limiting theorems, set of inequalities etc. So even though this book covers martingales in a few pages, one must not mistake that that is all there is to it. This chapter requires multiple readings as it has too many concepts to be digested at one go. I am certain to revisit this chapter over the course of my working life.

Chapter 8 : Limit Laws

This is the most useful chapter of the book from stats perspective. To begin with, the chapter discusses various modes of convergence for random variables. Uniform, pointwise, almost everywhere, Lp norm and convergence in probability. One needs to have a clear picture of the relationships between these modes of convergence. The chapter then introduces weak law of large numbers and strong law of large numbers where the former talks about convergence in probability and the latter talks about almost sure convergence.

Convergence in distribution is a very different animal as compared to other modes of convergence. That’s the reason it is called weak convergence. Irrespective of the individual distribution of random variables, the cumulative distribution function can converge to a specific probability distribution and hence the name, convergence in distribution. Central Limit Theorem is one of the theorems introduced in various elementary statistics course / business statistics courses in MBA etc. One usually comes across the statement and wonders why it is true. Why should let’s say a specific form of sum of random variables converge to standard normal distribution? The question might seem appropriate but to understand the rationale behind it requires a good understanding of limit laws. In that sense this chapter is very unique. However the theorems themselves are very long winding. I have gone through them but I will certainly go over them again on some long rainy day.

Takeaway:

Authors build up measure theory ground up and provide motivation for the reader to understand the concepts by shows applications in math finance area. In every chapter, whatever concepts are introduced, an immediate application in math-fin area is provided, thus making the seemingly dry measure theory, an interesting read.