An Introduction to Probability and Inductive Logic

With total silence around me and my mind wanting to immerse in a book, I picked up this book from my inventory. I came across a reference to this work in Aaron Brown’s book on Risk Management.

First something about the cover:

The young woman on the right is the classical Goddess Fortuna, whom today we might call Lady Luck. The young man on the left is Chance. Fortuna is holding an enormous bunch of fruits, symbolizing the good luck that she can bring. But notice that she has only one sandal. That means that she can also bring bad luck. And she is sitting on a soap bubble! This is to indicate that what you get from luck does not last. Chance is holding lottery tickets. Dosso Dossi was a court painter in the northern Italian city of Ferrara, which is near Venice . Venice had recently introduced a state lottery to raise money. It was not so different from modern state-run lotteries, except that Venice gave you better odds than any state-run lottery today. Art critics say that Dosso Dossi believed that life is a lottery for everyone. Do you agree that life is a lottery for everyone? The painting is in the J. Paul Getty Museum, Los Angeles, and the above note is adapted from notes for a Dossi exhibit, 1999.

The chapter starts with a set of 7 questions and hit is suggested that readers solve them before proceeding with the book.

Logic

The first chapter deals with some basic terminology that logicians use. The following terms are defined and examples are given to explain each of them in detail:

Argument: A point or series of reasons presented to support a proposition which is the conclusion of the argument.
Premises + Conclusion: An argument can be divided in to premises and a conclusion.
Propositions: Premises and conclusion are propositions, statements that can be either true or false.
Validity of an argument: Validity has to do with the logical connection between premises and conclusion, and not with the truth of the premises or the conclusion. If the conclusion is false, irrespective of whether the premises are true or false, we have an invalid argument.
Soundness of an argument: Soundness for deductive logic has to do with both validity and the truth of the premises.
Validity vs. Truth: Validity is not truth. It takes premises as true and proceeds to check the validity of a conclusion. If the premises are false, the reasoning can still be valid but not the TRUTH.

Logic is concerned only with the reasoning. Given the premises, it can tell you whether the conclusion is valid or not. It cannot say anything about the veracity of the premises. Hence there are two ways to criticize a deduction: 1) A premise is false, 2) The argument is invalid. So there is a division of labor. Who is an expert on the truth of premises? Detectives, nurses, surgeons, pollsters, historians, astrologers, zoologists, investigative reporters, you and me. Who is an expert on validity? A logician.

The takeaway of the chapter is that valid arguments are risk-free arguments, i.e. given the true premise; you arrive at a valid conclusion

Inductive Logic

The chapter introduces risky-arguments and inductive logic as a mechanism for reasoning. Valid arguments are risk-free arguments. A risky argument is one that is very good, yet its conclusion can be false, even when the premises are true. Inductive logic studies risky arguments. There are many forms of risky arguments like making a statement on population from a statement on sample, making a statement of sample from a statement on population, making a statement on a sample based on statement on another sample etc. Not all these statements can be studied via Inductive logic. Also, there may be more to risky arguments than inductive logic. Inductive logic does study risky arguments— but maybe not every kind of risky argument. The terms introduced in this chapter are

Inference to the best explanation
Risky Argument
Inductive Logic
Testimony
Decision theory

The takeaway of the chapter is that Inductive logic analyzes risky arguments using probability ideas.

The Gambler’s fallacy

This chapter talks about the gambler’s fallacy who justifies his betting on a red slot roulette wheel; given that last X outcomes on the wheel have been black. His premise is that the wheel is fair, but his action is against the premise where he is questioning the independence of outcomes. Informal Definitions are given for bias, randomness, complexity and no regularity. Serious thinking about risks, which uses probability models, can go wrong in two very different ways. 1) The model may not represent reality well. That is a mistake about the real world. 2) We can draw wrong conclusions from the model. That is a logical error. Criticizing the model is like challenging the premises. Criticizing the analysis of the model is like challenging the reasoning.

Elementary Probability Ideas

This chapter introduces some basic ideas of events, ways to compute probability of compound events etc. The chapter also gives an idea of the different terminologies used by statisticians and logicians, though they mean the same thing. Logicians are interested in arguments that go from premises to conclusions. Premises and conclusions are propositions. So, inductive logic textbooks usually talk about the probability of propositions. Most statisticians and most textbooks on probability talk about the probability of events. So there are two languages of probability. Why learn two languages when one will do? Because some students will talk the event language, and others will talk the proposition language. Some students will go on to learn more statistics, and talk the event language. Other students will follow logic, and talk the proposition language. The important thing is to be able to understand anyone who has something useful to say.

Conditional Probability

This chapter gives formulae for computing conditional probabilities. All the conditioning is done for a discrete random variable. Anything more sophisticated than a discrete RV would have alienated non-math readers of the book. A few examples are given to solidify the notions of conditional probability.

The Basic Rules of Probability & Bayes Rule

Rules of probability such as normality, additivity, total probability, statistical independence are explained via visuals. I think this chapter and previous three are geared towards a person who is a total novice in probability theory. The book also gives an intuition in to Bayes rule using elementary examples that anyone can understand. Concepts such as reliability testing are also discussed.

How to combine Probabilities and Utilities?

There are three chapters under this section. The chapter on expected value introduces a measure of the utility of a consequence and explores various lottery situations to show that cards are stacked against every lottery buyer and the lottery owner always holds an edge. The chapter on maximizing expected value says that one of the ways to choose amongst a set of actions is to choose the one that gives the highest expected value. To compute the expected value one has to represent the degrees of belief by probabilities and the consequences of action via utiles( they can be converted in to equivalent monetary units). Despite the obviousness of the expected value rule, there are a few paradoxes and those are explored in the chapter; the popular one covered being the Allais Paradox. All these paradoxes have a common message – The expected value rule does not factor in such attitudes as risk aversion and other behavioral biases and hence might just be a way to definite utilities in the first place. So, the whole expected value rule is not as water tight as it might seem. Also there are situations where decision theory cannot be of help. One may disagree about the probability of the consequences; one may also disagree about the utilities(how dangerous or desirable the consequences are). Often there is a disagreement about both probability and utility. Decision theory cannot settle such disagreements. But at least it can analyze the disagreement, so that both parties can see what they are arguing about. The last chapter in this section deals with decision theory. The three decision rules explained in the chapter are 1) Dominance rule 2) Expected value rule 3) Dominant expected value rule. Pascal’s wager is introduced to explain the three decision rules. The basic framework is to come up with a partition of possible states of affairs, possible acts that agents can undertake and utilities of the consequences of each possible act, in each possible state of affairs in the partition.

Kinds of Probability

What do you mean ?

This chapter brings out the real meaning of the word, “probability” and probably J the most important chapter of the book.

This coin is biased toward heads. The probability of getting heads is about 0.6.
It is probable that the dinosaurs were made extinct by a giant asteroid hitting the Earth.
1. The probability that the dinosaurs were made extinct by a giant asteroid hitting the Earth is very high— about 0.9.
Taking all the evidence into consideration, the probability that the dinosaurs were made extinct by a giant asteroid hitting the Earth is about 90%.
The dinosaurs were made extinct by a giant asteroid hitting the Earth.

Statements (1) and (4) [but not (3)] are similar in one respect. Statement (4), like (1), is either true or false, regardless of what we know about the dinosaurs. If (4) is true, it is because of how the world is, especially what happened at the end of the dinosaur era. If (3) is true, it is not true because of “how the world is,” but because of how well the evidence supports statement (4). If (3) is true, it is because of inductive logic, not because of how the world is. The evidence mentioned in (3) will go back to laws of physics (iridium), geology (the asteroid), geophysics, climatology, and biology. But these special sciences do not explain why (3) is true. Statement (3) states a relation between the evidence provided by these special sciences, and statement (4), about dinosaurs. We cannot do experiments to test (3). Notice that the tests of (1) may involve repeated tosses of the coin. But it makes no sense at all to talk about repeatedly testing (3). Statement (2.a) is different from (3), because it does not mention evidence. Unfortunately, there are at least two ways to understand (2.a). When people say that so and so is probable, they mean that relative to the available evidence, so and so is probable. This the interpersonal/ evidential way. The other way to understand(2.a) is based on Personal sense of belief.

Statement (4) was a proposition about dinosaur extinction; (2 ) and (3) are about how credible (believable) (4) is. They are about the degree to which someone believes, or should believe, (4). They are about how confident one can or should be, in the light of that evidence.The use of word probability in statements(2) and (3) are related to the ideas such as belief, credibility, confidence, evidence and general name used to describe them is “Belief-type probability”

In contrast, The truth of statement(1) seems to have nothing to do with what we believe. We seem to be making a completely factual statement about a material object, namely the coin (and the device for tossing it ). We could be simply wrong, whether we know it or not . This might be a fair coin, and we may simply have been misled by the small number of times we tossed it. We are talking about a physical property of the coin, which can be investigated by experiment. The use of probability in (1) is related to ideas such as frequency, propensity, disposition etc. and the general name used to describe these is “frequency-type probability”

Belief-type probabilities have been called “epistemic”— from episteme, a Greek word for knowledge. Frequency-type probabilities have been called “aleatory,” from alea, a Latin word for games of chance, which provide clear examples of frequency-type probabilities. These words have never caught on. And it is much easier for most of us to remember plain English words rather than fancy Greek and Latin ones.

Frequency-type probability statements state how the world is. They state, for example, a physical property about a coin and tossing device, or the production practices of Acme and Bolt. Belief-type probability statements express a person’s confidence in a belief, or state the credibility of a conjecture or proposition in the light of evidence.

The takeaway from the chapter is that any statement with the word, probability carries two types of meanings, belief-type of frequency-type. It is important to understand the exact type of probability that is being talked about in any statement.

Theories about Probability

The chapter describes four theories of probability,

Belief type - Personal Probability
Belief type - Logical Probability - Interpersonal /Evidential probability
Frequency type – Limiting frequency based
Frequency type – Propensity based

Probability as Measure of Belief

Personal Probabilities

This chapter explains the way in which degrees of belief can be represented as betting rates or odds ratio. Let’s say my friend and I enter in to a bet about an event A, let’s say, “India wins the next cricket world cup“. If I think that India is 3 times more likely to win than to lose, then to translate this belief in to bet, I would invite my friend to take part in a bet where the total stake amount is 4000(Rs). My friend has agreed to bet 1000 Rs AGAINST the event and I should take the other side of the bet by offer 3000 Rs. Why is this bet according to my beliefs? My expected payoff is (1000*3/4)+(-3000*1/4=0. My friend’s expected payoff is (-1000*3/4)+(3000*1/4) = 0. Hence from my point of view it is a fair bet. There can be a bet ON the event too. I bet 3000 Rs on the event and my friend is on the other side of the bet with 1000Rs. This is again a fair bet from my belief system as my expected value is (1000*3/4)+(-3000*1/4) and my friend’s expected value is (1000*-3/4)+(3000*1/4). .By agreeing to place a bet on or against the event, my friend and I are quantifying out MY degree of belief in to betting fraction, i.e. my bet/total stake, my friend’s bet/total stake.

It is important to note that this might not be a fair bet according to my FRIEND’s belief system. He might be thinking that the event that “India wins the next cricketing world cup” has 50/50 chance. In that case, if my friend’s belief pans out, he will have an edge betting against the event and he will be at a disadvantage betting for the event. Why? In the former case, his expected payoff would be (-1000*1/2)+(3000*1/2) >0 and in the latter case, it would be (1000*1/2)+(-3000*1/2) <0. As you can see a bet in place means that the bet at least matches the belief system of one of the two players. Generalizing this to a market where investors buy and sell securities and there is a market maker, you get the picture that placing bets on securities is an act of quantifying the implicit belief system of the investors. A book maker / market marker never quotes fair bets, he always adds a component that keeps him safe, i.e., he doesn’t go bankrupt. The first ever example I came across in the context of pricing financial derivatives was in the book by Baxter and Rennie. Their introductory comments that describe arbitrage pricing and expectation pricing sets the tone for a beautiful adventure of reading the book.

The takeaway of this chapter is , 1) belief cannot be measured exactly, 2) you can think of artificial randomizers to calibrate degree of belief.

Coherence

This chapter explains that betting rates ought to satisfy basic rules of probability. There are three steps to proving this argument,

Personal degrees of belief can be represented by betting rates.
Personal betting rates should be coherent.
A set of betting rates is coherent if and only if it satisfies the basic rules of probability.

Via examples, the chapter shows that any inconsistency in odds quoted for and against by a person will lead to arbitrate in gamble. Hence the betting fractions or the odds should satisfy basic rules of probability.

The first systematic theory of personal probability was presented in 1926 by F. P. Ramsey, in a talk he gave to a philosophy club in Cambridge, England. He mentioned that if your betting rates don’t satisfy the basic rules of probability, then you are open to a sure-loss contract. But he had a much more profound— and difficult— argument that personal degrees of belief should satisfy the probability rules. In 1930, another young man, the Italian mathematician Bruno de Finetti, independently pioneered the theory of personal probability. He invented the word “coherence,” and did make considerable use of the sure-loss argument.

Learning from Experience

This chapter talks about the application of Bayes rule. It’s basically a way to combine personal probability and evidence to get a handle of an updated personal probability. The theory of personal probability was independently invented by Frank Ramsey and Bruno De Finetti. But the credit of the idea— and the very name “personal probability”— goes to the American statistician L. J. Savage (1917– 1971). He clarified the idea of personal probability and combined it with Bayes’ Rule. The chapter also talks about contributions of various statisticians/scientists such as Richard Jeffrey, Harold Jeffrey, Rudolf Carnap, and L.J. Savage, and I.J.Good.

Probability as Frequency

The four chapters under this section explore frequentist ideas. It starts off by describing some deductive connections between probability rules and our intuitions about stable frequencies. Subsequently, a core idea of frequency-type inductive inference— the significance idea is presented. The last chapter in the section presents a second core idea of frequency-type inductive inference— the confidence idea. This idea explains the way opinion polls are now reported. It also explains how we can think of the use of statistics as inductive behavior. Basically all the chapters give a crash course on classical statistics without too much of math.

Probability applied to Philosophy

The book introduces David Hume’s idea that there is no justification for inductive inferences. Karl Popper, another philosopher agreed with Hume but held the view that it doesn’t matter as inductive inferences are invalid. According to Popper, “The only good reasoning is deductively valid reasoning. And that is all we need in order to get around in the world or do science”. There are two chapters that talk about evading Hume’s problem, one via Bayesian evasion(argues that Bayes’ Rule shows us the rational way to learn from experience) and the other one via Behavior evasion(argues that although there is no justification for any individual inductive inference there is still a justification for inductive behavior).

The Bayesian’s response to Hume is :

Hume, you’re right. Given a set of premises, supposed to be all the reasons bearing on a conclusion, you can form any opinion you like. But you’re not addressing the issue that concerns us! At any point in our grown-up lives (let’s leave babies out of this), we have a lot of opinions and various degrees of belief about our opinions. The question is not whether these opinions are “rational.” The question is whether we are reasonable in modifying these opinions in the light of new experience, new evidence. That is where the theory of personal probability comes in. On pain of incoherence, we should always have a belief structure that satisfies the probability axioms. That means that there is a uniquely reasonable way to learn from experience— using Bayes’ Rule.

The Bayesian evades Hume’s problem by saying that Hume is right. But, continues the Bayesian, all we need is a model of reasonable change in belief. That is sufficient for us to be rational agents in a changing world.

The frequentist response to Hume is:

We do our work in two steps: 1) Actively interfering in the course of nature, using a randomized experimental design.2) Using a method of inference which is right most of the time— say, 95% of the time. Frequentist says: “ Hume you are right , I do not have reasons for believing any one conclusion. But I have a reason for using my method of inference, namely that it is right most of the time.”

The chapter ends with a single-case objection and discusses the arguments used by Charles Sanders Pierce. In essence, the chapter under this section point to the conclusion of Pierce:

An argument form is deductively valid if the conclusion of an argument of such a form is always true when the premises are true.
An argument form is inductively good if the conclusion of an argument of such a form is usually true when the premises are true.
An argument form is inductively 95% good if the conclusion of an argument of such a form is true in 95% of the cases where the premises are true.

Takeaway :

The field of probability was not discovered; rather, it was created by the confusion of two concepts. The first is the frequency with which certain events recur, and the second is the degree of belief to attach to a proposition. If you want to understand these two schools of from a logician’s perspective and get a grasp on various philosophical takes on the word, “probability”, then this book is a suitable text as it gives a thorough exposition without too much of math.