Probability Theory in Finance : Book Review
This book reminds me of “Elementary Stochastic Calculus with Finance in view”, a book by Thomas Mikosch, in terms of the overall goal. This book has a goal of making the reader understand the nuts and bolts of Black Scholes pricing formula. Probability theory, Lebesgue integration and Ito Calculus are the main ingredients in the Black Scholes formula and these rely on set theory, analysis and an axiomatic approach to mathematics. Any thing in math is built ground up. This means that every idea/proof/lemma/axiom is pieced together in a logical manner so that the overall framework makes sense. This book introduces all the necessary ingredients in a pleasant way. There are some challenging exercises at the end of every chapter and the reader is advised to work through all of them, and the author motivates the reader by saying
An hour or two attempting a problem is never a waster of time and to make sure this happened, exercises were the focus of our smallgroup weekly workshops
Chapter 1: Money and Markets
The first chapter gives a basic introduction to the time value of money and serves as a basic refresher to calculus.
Chapter 2: Fair Games
The irrelevance of expectation pricing in finance is wonderfully illustrated in Baxter and Rennie’s book on Financial Calculus. Why is expectation based pricing dangerous? The reason being the expectation pricing is not enforceable. There is another kind of pricing that is enforceable and any other kind of pricing techniques is dangerous. This pricing technique goes by the name ``arbitrage pricing''. This chapter starts off with a basic example where two people, John and Mark, play a betting game with each other. The expectation pricing will make sense in this case if they play a lot of games with each other.
The second example is where John and Mark place bets with a bookmaker on a two horse race contest. The bookmaker offers odds for each of the horse. These odds can be used to cull out implied probability of each horse winning. If the bookmaker does not quote odds based on arbitrage pricing, he risks going bankrupt. There is no place for expectations based pricing here. Based on the odds quoted, a particular horse might have two different probabilities with respect to John and Mark. Basically this means that John and Mark are operating in different probability spaces. If there is a single player betting one each horse, the bookmaker can quote odds and be done with it, assuring himself a guaranteed profit. However as the bets start accumulating, he becomes more and more prone to a risk of huge loss. He must either change the odds or hedge the exposure. To remove uncertainty from his exposure, the bookmaker can place a bet on the horse that whose win is likely to make him bankrupt, with another bookmaker. In this way he gets a guaranteed profit or at least can think of a breakeven.
If you think from a bookmaker’s perspective, all the activities he does like quoting the odds, hedging, changing the odds are the same activities of a derivative contract seller. In fact this chapter is a superb introduction to the concept of derivative pricing under equivalent martingale measure. I had loved this introduction in Baxter and Rennie’s and was thrilled to see the same kind of intro in this book. In fact I think any book on derivative pricing should have “Banish expectation pricing  Embrace arbitrage pricing” slogan at the very beginning.
Chapter 3: Set Theory
The chapter begins with motivating the reader that he/she has to go through theorems , proofs, lemma , corollaries as they are the organizing principles of any field. By learning these abstract ideas, one can apply the learning to various situations and is a much better way than learning case specific results. In the author’s words
The axiomatic approach does contain a degree of uncertainty not experienced when the classical approach of learning by rote and reproducing mathematical responses in carefully controlled situations us followed, but this uncertainty gradually diminishes as progress is achieved. The cause of this uncertainty is easy to identify. Over time one becomes confident that understanding has been acquired, but then a new example or result or failure to solve a problem or even to understand a solution may destroy this confidence and everything becomes confused. A reexamination is required, and with clarification, a better understanding is achieved and confidence returns. This cycle of effort, understanding, confidence and confusion keeps repeating itself, but with each cycle progress is achieved. To initiate this new way of thinking usually requires a change of attitude towards mathematics and this takes time.
Having given this motivation, the author talks about the mysterious ``infinity'' that is at the heart of mathematical abstraction. The reader is taken through countability and least upper bound / greatest lower bound concepts etc. After this selected journey through real numbers, the chapter talks about sigma algebras. May be some other books carried a visual about filtration. I don’t recollect it now. In any case, the best thing about this chapter is the visual that it provides for a discrete filtration.
In a typical undergrad probability one works with situations where the entire outcome space is visible right away, one does not need the concept of sigma algebras. In finance though, there is a time element for a random stock price or any random quantity. You do not know the entire outcome space. Information gets added as you move from one day to the next. So, in one sense, one needs to be comfortable with sigma algebras that are subsets of the master sigma algebra, a term that I am using just to make things easier to write. Not all subsets of outcome space select themselves as events at an appropriate time. Definitions of sigma algebra and measurable space are given. Subsequently, the concept of ``generating set for the sigma field'' is explored. The generating sets are usually small in size. A nice example is given where one is asked to describe the sigma field generated by a collection of subsets. Soon enough you realize that the exercise becomes tedious if you try to include unions and intersections manually. The mess is too painful to ward through. An alternative solution using partitions is presented in the chapter that makes the exercise of “sigma field generated by a collection of subsets " workable.The key idea is to relate partitions to equivalence classes and then use these equivalence classes to quickly generate the sigma algebra. In Shreve, I came across the phrase,`` Sets are resolved by the sigma algebra''. In this chapter though, the sentence is aptly summarized by many visuals. Getting a good idea about the filtration in discrete space will mightily help in transitioning to the continuous time space.
Chapter 4 : Random Variables
Random variables are basically functions that map from outcome space to R. Ideally one can just be in (Omega, F) and do all the computations necessary. However to take advantage of the rich structure of R, it should be connected with F. This is typically done using the concept of measurability. So, in a way one is talking about moving to a different space for computational ease. To move from one space to another, the structure has to be preserved. This structure goes by the name sigma field. The mappings are called measurable functions.

For measurable spaces (Omega, F), you have measurable functions to move to R

For measure spaces (Omega, F, P), you have random variables to move to R
Whenever one talks about Measurable functions, there are some core concepts that one needs to grasp. Firstly, the inverse mapping of the measurable functions should be defined in the F. Only then it makes sense. An intuitive way of thinking about measurable functions is that they are carrier of information. Once the basic criteria for measurable function is satisfied, then one needs to give some recognition to measurable function. So, we will give names to, let’s say the ``sigma algebra generated by the random variable'' to the inverse mappings of all the borel sets in R as the sigma algebra generated by X, denoted by F_X. Typically F_X is a subset of F. In order to check whether a function is a measurable function on a measurable space, one needs test candidates. Borel sigma algebra is very huge and hence testing each set is going to take ages. A convenient way is to select candidate sets that are collection of sets, A that generate a Borel sigma algebra, B. So, instead of checking all subsets of Borel sigma algebra, one can merely check whether the inverse mappings of the collection A is in the F.The total information available from an experiment is embedded in the sigma field of events F of the sample space Omega.
The book puts it this way :
The real valued function X on Omega may be regarded as a carrier of information in the same way as the satellite relays message from one source to another. The receiver will hopefully extract from the Borel set B information. An important requirement when transmitting information is accuracy. If X is not measurable, then inverse mapping is not an observable event and information. For this reason we require X to be measurable. If X is measurable, then the information transferred will be about events in F_X. Complete information will be transferred if F_X = F, and at times this may be desirable. On the other hand, F may be extremely large and we may only be interested in extracting certain information. In such cases, if X secures the required information, we may consider X as a file storing the information about all the events in F_X. In the case of a filtration, we obtain a filing system.
Chapter 5 : Probability Spaces
If you take a measure space and then attach a measure, it becomes a measurable space. It can be called a probability space if measure of the entire outcome space is 1 and measure satisfies countably additivity. With an experiment you can associate a probability space. Let’s say you have two variables with their probability spaces. If you want to combine the two probability spaces and create a more generic structure, you can go about it this way: First, define a combined outcome space, then define product sigma algebra and then extend the probability measure on to events in the product sigma algebra using the probability measures of the individual space.
Random variables are introduced in this chapter. These are measurable functions that are defined with a restriction on the type of measure. The random variables are mappings from the outcome space to real line. These random variables in turn generate a sigma algebra and thus there are two probability spaces that one can think of , in the context of a random variable, one the original space and the other the induced probability space that is defined on the outcome space of real line and having the event space borel sigma algebra. Thus two variables can be dependent or independent depending on the probability measure applicable to the measurable space Two random variables with the same measurable space but different probability measures can be dependent or independent. The chapter gives conditions that are necessary for the independence of two random variables. Obviously if the sigma fields generated by random variables are independent , then the variable are independent. But if the sigma fields generated by the random variables are not independent, then the independence is decided by the probability measure attached to the common event space.
Chapter 6 : Expected Values
Expected values, lengths and areas are measurements which share a common mathematical background. Basically if you want to measure something, you try to divide it in to arbitrary lengths and then approximate the lengths by a suitable number. During the final decade of 19th century, mathematicians in Paris began investigating which sets in Euclidean space R^n were capable of being measured.Emile Borel made the fundamental observation that all figures used in science could be obtained from simple figures such as line segments, squares and cubes by forming countable unions and taking complements. He suggested the term sigma field a collection of sets of large enough to cover most of the stuff that we come across. He defined the measure of a set as a limit of the measurements obtained by taking countable unions. He did not succeed, as he could not show that the resulting measure of a set was independent of the way it was built up from simple sets.
Henri Lebesgue used Borel’s ideas on countability and complements but proceeded in a different way. He defined measurability in a different way and in the process lead to the introduction of ``Lebesgue measure'' and ``Lebesgue measurable set'' and `` Lebesgue measurable spaces''.
There are 4 types of mathematical objects described in this chapter, i.e. simple random variables, positive bounded random variables, positive measurable variables and integrable random variables. Firstly, simple random variables are just step functions for which expectations are easy to compute. When you talk about expectation, one can use either E or integral sign to denote it. Survival of fittest did not seem to have happened in the notation for expectation. Using E for expectation is better for denoting conditional expectation, Martingales etc. However integral sign is good for showing expectation for disjoint subsets etc. It is important to pay attention to this fact of notation.
The second level of sophistication is defining a positive bounded random variable. Any positive bounded random variable has a canonical representation that involves simple random variables.All the properties of positive bounded random variables can be explored using the canonical representation.
The third level of sophistication is defining a positive random variable. Any positive random variable can be represented as a sequence of positive bounded variables. Using this fact, all the properties of positive random variables are stated and proved.
The fourth level of sophistical is defining an integrable random variable. Any integrable random variable can be represented by positive random variables
The pattern that is used across the four mathematical objects is the following :

Form two increasing sequences of a specific type of variable(simple random variables, positive bounded random variables, positive measurable variables and integrable random variables.

Assume they are pointwise convergent

Prove that expectation of both sequences converge
The reason for introducing these 4 types of mathematical objects is this: Use the expectation of simpler random variables to form the expectation of a sophisticated random variables. The chapter ends with proving two important theorems, Monotone convergence theorem and Dominated convergence theorem. These theorems are used in most of the proofs.
Chapter 7 : Continuity and Integrability
The following are the highlights of this chapter :

There is a connection between integrability and convergence of series of real numbers. The fact that every absolutely convergent series is convergent can be proved using a random variable defined on a probability triple.

Independence property between two random variables can be characterized by the expected values. If you take the product of two random variables and take the expectation, the expectation operator splits the product in to individual expectations.

Measures defined and introduced.

Riemann integral is defined for continuous functions and criterion for Riemann integration is given. Given this context, Lebesgue integral is introduced as an integral of a random variable with respect to a probability measure.

Law of Unconscious Statistician stated

Convergence Pointwise implies Convergence almost surely which in turn implies Convergence in Distribution

Chebyshev’s Inequality

Convex functions and Jensen’s inequality
Chapter 8 : Conditional Expectation
Instead of diving in to the topic of conditional expectation right away, the chapter introduced a 2 step binomial tree and shows the realizations of a conditional expectation variable for various sample paths. Principles such as discrete filtration, adapted to a filtration are discussed in the context of a two step tree. If the conditional sigma algebra is generated by countable partitions then the conditional expectation variable can be interpreted pointwise. However in other general cases, the conditional expectation needs to be inferred from the a few conditions. They are no specific formula for conditional expectation. In this context, the Radon Nikodym theorem is stated with out proof. Certain basic properties of conditional expectation are also stated and proved.
Chapter 9: Martingales
The chapter starts off with a basic definition of a discrete martingale adapted to a filtration on a probability space. Four examples are provided which give the reader enough knowledge to apply the principles in option pricing. The think I liked about this chapter is the Martingale convergence section. If you have a martingale and if it is bounded in Lebesgue measurable space, then it converges almost surely. This property of Martingale convergence is extremely useful in various problems. The line of attack for producing a distribution function for a random variable is to figure out a martingale that has this variable in it , apply martingale convergence theorem, find the limiting value of the martingale and from that expression, extract the distribution of the random variable. The chapter also defines continuous martingales and mentions Girsanov theorem in its elementary form. Chapter 10 talks about Black Scholes option pricing formula. The last chapter is a good rigorous introduction to stochastic integration.
Takeaway:
This book painstakingly builds up all the relevant concepts in probability from scratch. The only prerequisite for this text is “persistence” as there are a ton of theorems and lemmas throughout the book. The pleasant thing about the book is there are good visuals explaining various concepts. One may forget a proof but visual sticks for a long time. So, in that sense this book scores over many other books.