1 of 15

Examples from The Book of Why Illustrated with BayesiaLab

We are starting this new section to share and explain Bayesian networks inspired by Judea Pearl's The Book of Why.
For each example, we provide an excerpt from Pearl's book to introduce the problem domain and then offer a solution in the form of a Bayesian network.
We share all networks in BayesiaLab's XBL format and publish them as interactive WebSimulators so you can experiment with the models without installing BayesiaLab.

Firing Squad

The Prisoner and the Firing Squad

"Suppose that a prisoner is about to be executed by a firing squad. A certain chain of events must occur for this to happen. First, the court orders the execution. The order goes to a captain, who signals the soldiers on the firing squad (A and B) to fire. We’ll assume that they are obedient and expert marksmen, so they only fire on command, and if either one of them shoots, the prisoner dies."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (p. 40). Basic Books. Kindle Edition.

The Problem as a Bayesian Network

We implemented Pearl's Firing Squad problem as a Causal Bayesian Network in BayesiaLab.
"Causal" means that the arc directions represent causal relationships between the nodes.
You can download this XBL file and open it with any version of BayesiaLab: BoW_Firing Squad.xbl
Alternatively, you can experiment with the Firing Squad model on this WebSimulator page: https://simulator.bayesialab.com/#!simulator/211001563973

Question #1

"Using this diagram, we can start answering causal questions from different rungs of the ladder. First, we can answer questions of association (i.e., what one fact tells us about another). If the prisoner is dead, does that mean the court order was given?" (Pearl, p. 40)

Querying the Bayesian Network

To answer this question in BayesiaLab, you set a Hard Positive Evidence on Death=True (double-click on the state True) to indicate that you learned that the prisoner had been executed (first rung of the ladder).

In the WebSimulator, you move the slider Death=True to 100%. The Observed Box is automatically checked upon releasing the mouse button, and the evidence is propagated in the network to update the probability distributions of the other variables.

Once we know that the prisoner is dead, we infer that both soldiers fired and that the court order was given.

Question #2

"Suppose we find out that A fired. What does that tell us about B? By following the arrows, the computer concludes that B must have fired too. (A would not have fired if the captain hadn’t signaled, so B must have fired as well.) This is true even though A does not cause B (there is no arrow from A to B)." (Pearl, p. 40)

Querying the Bayesian Network

This is another observational query, the "first rung of the ladder."
In BayesiaLab, you set a Hard Positive Evidence on Soldier A=True (double-click on the state True) to indicate that you found out that Soldier A fired.

In the WebSimulator, you move the slider Soldier A=True to 100%.

Given that we know that Soldier A fired, we infer that the court order was given and that Soldier B also fired.

Question #3

"Going up the Ladder of Causation, we can ask questions about intervention. What if Soldier A decides on his own initiative to fire, without waiting for the captain’s command? Will the prisoner be dead or alive?" (Pearl, p. 40)

Querying the Bayesian Network

You can answer this causal question in BayesiaLab by setting Soldier A to Intervention Mode (Monitor Context Menu > Intervention) and then setting Soldier A=True.
This triggers the "mutilation of the graph" (or "graph surgery"), which blocks the associational path between Soldier A and Death that goes via Captain.
Alternatively, instead of setting Soldier A to Intervention Mode, you can hold constant the probability distribution of Captain to block the path: Monitor Context Menu > Fix Probabilities.
Then, you set Hard Evidence on Soldier A=True.

In the WebSimulator, you can simulate this intervention by first controlling for Court Order, i.e., checking the box Observed, and then setting Soldier A=True to 100%.

If Soldier A decides to fire on his own initiative, this implies the death of the prisoner without affecting our belief regarding the other variables.

Smallpox Vaccine

Context

"Let me give you an example in which probabilities make all the difference. It echoes the public debate that erupted in Europe when the smallpox vaccine was first introduced. Unexpectedly, data showed that more people died from smallpox inoculations than from smallpox itself. Naturally, some people used this information to argue that inoculation should be banned, when in fact it was saving lives by eradicating smallpox."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (pp. 43–44). Basic Books. Kindle Edition.

The Problem as a Bayesian Network

We implement this example as a Causal Bayesian network. "Causal" means that the arc directions represent causal relationships between the variables.
In this network, the green node #Dead is a Function Node that calculates the number of children who died within a population of 1 million.

We created a WebSimulator that allows you to experiment with this model and try out different scenarios: https://simulator.bayesialab.com/#!simulator/685025884871

Question: Do Vaccines Kill?

"I can empathize with the parents who might march to the health department with signs saying, 'Vaccines kill!' And the data seem to be on their side; the vaccinations indeed cause more deaths than smallpox itself. But is logic on their side? Should we ban vaccination or take into account the deaths prevented?" (Pearl, p. 44)

Querying the Bayesian Network

We attempt to answer this counterfactual question in BayesiaLab.
To do so, we need to set Vaccinated=False as Hard Evidence, thus simulating a counterfactual world in which no children are vaccinated.
The Bayesian network infers that not vaccinating would cost the lives of 4,000 children, as shown in the green Function Node.

To replicate the same step in the WebSimulator you need to move the slider Vaccinated=False to 100.

Teahouse

The Teahouse Example

"To see how Bayes’s method works, let’s start with a simple example about customers in a teahouse, for whom we have data documenting their preferences. Data, as we know from Chapter 1, are totally oblivious to cause-effect asymmetries and hence should offer us a way to resolve the inverse-probability puzzle."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (pp. 99-100). Basic Books. Kindle Edition.

Will you have a scone with your tea?

Upon completing the data import, the two variables, Tea and Scones, are represented as nodes.
Now we manually add an arc from Tea to Scones to represent a relationship between the nodes.
Then, we let BayesiaLab estimate the probabilities of this relationship using Maximum Likelihood Estimation: Main Menu > Learning > Parameter Estimation.

Note that the arc between Tea and Scones does not have any causal meaning here. It merely represents the association between Tea and Scones.
As a result, we could invert the arc without changing the representation of this non-causal example.
You can experiment with this model in BayesiaLab or via this WebSimulator page: https://simulator.bayesialab.com/#!simulator/160655093718
The following screen capture from the WebSimulator illustrates that the proportion of customers who ordered both tea and scones is indeed 1/3, i.e., the Joint Probability equals 1/3, as shown in the Output Panel on the right.

The Inverse Probability Problem

This innocent-looking equation came to be known as “Bayes’s rule.” If we look carefully at what it says, we find that it offers a general solution to the inverse-probability problem." (Pearl, p. 101)

Will you have tea with your scone?

To answer this question, we need to perform probabilistic inference with the WebSimulator by setting Scones to Yes.
Then, the WebSimulator automatically infers the probability of Tea=Yes, which is now 80%.

Updating Beliefs in Response to Evidence

"We can also look at Bayes’s rule as a way to update our belief in a particular hypothesis. This is extremely important to understand because a large part of human belief about future events rests on the frequency with which they or similar events have occurred in the past. [...]
As we saw, Bayes’s rule is formally an elementary consequence of his definition of conditional probability. But epistemologically, it is far from elementary. It acts, in fact, as a normative rule for updating beliefs in response to evidence. In other words, we should view Bayes’s rule not just as a convenient definition of the new concept of “conditional probability” but as an empirical claim to faithfully represent the English expression “given that I know.” (Pearl, pp. 101-102)

Breast Cancer

Context

"Subjectivity (Ed, i.e., the prior) is sometimes seen as a deficiency of Bayesian inference. Others regard it as a powerful advantage; it permits us to express our personal experience mathematically and combine it with data in a principled and transparent way. Bayes’s rule informs our reasoning in cases where ordinary intuition fails us or where emotion might lead us astray. We will demonstrate this power in a situation familiar to all of us.
Suppose you take a medical test to see if you have a disease, and it comes back positive. How likely is it that you have the disease? For specificity, let’s say the disease is breast cancer, and the test is a mammogram."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (pp. 104-105). Basic Books. Kindle Edition.

Representing the Problem Domain as a Bayesian Network

We implement this example as a causal Bayesian network, which means the arc between Breast Cancer and Mammogram represents a causal relationship.

You can also experiment with this model via our WebSimulator: https://simulator.bayesialab.com/#!simulator/186824514911

Should I worry about a positive test result?

"Suppose a forty-year-old woman gets a mammogram to check for breast cancer, and it comes back positive. The hypothesis, D (for “disease”), is that she has cancer. The evidence, T (for “test”), is the result of the mammogram. How strongly should she believe the hypothesis? Should she have surgery?" (Pearl, p. 105)

Calculating the Cancer Risk with BayesiaLab

We use the probabilities described by Pearl to set the parameters of the Causal Bayesian Network:
- For a typical forty-year-old woman, the probability of getting breast cancer in the next year is about one in seven hundred, 0.14%. We use that as our prior;
- The sensitivity (true-positive) of a mammogram is 73%;
- The specificity (true-negative) of a mammogram is 88%.
Notice the Input component Breast Cancer—Your Prior Estimate in the WebSimulator. This allows you to set your own initial belief that a patient has breast cancer.
Upon setting Mammogram=Positive as Hard Evidence, the probability of Breast Cancer=True increases from 0.14% to 0.86%.

Counterintuitive Results

"The conclusion is startling. I think that most forty-year-old women who have a positive mammogram would be astounded to learn that they still have less than a 1 percent chance of having breast cancer. Figure 3.3 might make the reason easier to understand: the tiny number of true positives (i.e., women with breast cancer) is overwhelmed by the number of false positives."(Pearl, p. 106)

Should I worry now?

"However, the story would be very different if our patient had a gene that put her at high risk for breast cancer—say, a one-in-twenty chance within the next year. [...]
For a woman in this situation, the chances that the test provides lifesaving information are much higher. That is why the task force continued recommending annual mammograms for high-risk women.
This example shows that P(disease | test) is not the same for everyone; it is context-dependent (Ed: it depends on the prior). If you know that you are at high risk for a disease to begin with, Bayes’s rule allows you to factor that information in. Or if you know that you are immune, you need not even bother with the test!" (Pearl, pp. 107–108)

Recalculating the Risk

To answer this question with BayesiaLab, you can either modify the model by setting the prior of Breast Cancer to 5% via the Node Editor, or you can set a Probabilistic Evidence via the Monitor.
In the WebSimulator, you would set the Input Breast Cancer—Your Prior Estimate (initial belief) to 5%.
Upon setting Mammogram=Positive, the probability of Breast Cancer=True increases to 24.25%.

Visualizing the Impact of the Prior

To illustrate the impact of the prior (or prevalence), we added a parent node to Breast Cancer for defining such prior. This is what we call a "hyperparameter."
You can now set Mammogram=Positive as Hard Evidence.
With this evidence set, you can use Target Mean Analysis to explore a range of values for the prior, from 0% to 100%: Main Menu > Analysis > Visual > Target > Target's Posterior > Curves > Total Effects.
You will obtain a plot in which the x-axis represents the prior of Breast Cancer=True, i.e., the hyper-parameter.
The y-axis represents the updated probability of Breast Cancer=True given a positive mammogram result.

Where Is My Bag?

Context

"So far I have emphasized only one aspect of Bayesian networks—namely, the diagram and its arrows that preferably point from cause to effect. Indeed, the diagram is like the engine of the Bayesian network. But like any engine, a Bayesian network runs on fuel. The fuel is called a conditional probability table [...]
Let’s look at a concrete example, suggested by Stefan Conrady and Lionel Jouffe of BayesiaLab, Inc. It’s a scenario familiar to all travelers: we can call it “Where Is My Bag?” Suppose you’ve just landed in Zanzibar after making a tight connection in Aachen, and you’re waiting for your suitcase to appear on the carousel. Other passengers have started to get their bags, but you keep waiting… and waiting… and waiting. What are the chances that your suitcase did not actually make the connection from Aachen to Zanzibar? The answer depends, of course, on how long you have been waiting. If the bags have just started to show up on the carousel, perhaps you should be patient and wait a little bit longer. If you’ve been waiting a long time, then things are looking bad."

The Problem Domain as a Bayesian Network

What does this table mean?

"This table, though large, should be easy to understand. The first eleven rows say that if your bag didn’t make it onto the plane (bag on plane = false) then, no matter how much time has elapsed, it won’t be on the carousel (carousel = false). That is, P(carousel = false | bag on plane = false) is 100 percent. That is the meaning of the 100s in the first eleven rows. The other eleven rows say that the bags are unloaded from the plane at a steady rate. If your bag is indeed on the plane, there is a 10 percent probability it will be unloaded in the first minute, a 10 percent probability in the second minute, and so forth. For example, after 5 minutes there is a 50 percent probability it has been unloaded, so we see a 50 for P(carousel = true | bag on plane = true, time = 5). After ten minutes, all the bags have been unloaded, so P(carousel = true | bag on plane = true, time = 10) is 100 percent. Thus we see a 100 in the last entry of the table." (Pearl, p. 119)

The Curve of Abandoning Hope

"The most interesting thing to do with this Bayesian network, as with most Bayesian networks, is to solve the inverse-probability problem: if x minutes have passed and I still haven’t gotten my bag, what is the probability that it was on the plane? Bayes’s rule automates this computation and reveals an interesting pattern. After one minute, there is still a 47 percent chance that it was on the plane. (Remember that our prior assumption was a 50 percent probability.) After five minutes, the probability drops to 33 percent. After ten minutes, of course, it drops to zero." (Pearl, p. 119)

Querying the Bayesian Network

In BayesiaLab, you can automatically generate and plot the "Curve of Abandoning Hope."
First, you need to define Bag on Plane as your Target Node.
Then set Bag on Carousel=False as Hard Evidence.
Finally, select Main Menu > Analysis > Visual > Target > Target's Posterior > Histogram.
The x-axis represents Elapsed Time, the y-axis the posterior probability of Bag on Plane=True given Bag on Carousel=False and Elapsed Time.

The Back-Door Criterion

Context

As the originator of an entire school of thought on causality, Judea Pearl is certainly at liberty to take a more light-hearted and playful approach in presenting this serious topic. Chapter 4 in The Book of Why he titled "Confounding and Deconfounding: Or, Slaying the Lurking Variable." In fact, Pearl presents the task of "deconfounding" for causal effect estimation as a series of "games," which we now wish to illustrate with Bayesian networks.

The Back-Door Criterion and Deconfounding — It's All Fun and Games

We begin with a selection of quotes from the beginning of Chapter 4 to provide motivation for the forthcoming examples.

"To understand the back-door criterion, it helps first to have an intuitive sense of how information flows in a causal diagram. I like to think of the links as pipes that convey information from a starting point X to a finish Y. Keep in mind that the conveying of information goes in both directions, causal and noncausal, as we saw in Chapter 3.
In fact, the noncausal paths are precisely the source of confounding."
"To deconfound two variables X and Y, we need only to block every noncausal path between them without blocking or perturbing any causal paths."
"With these rules, decounfounding becomes so simple and fun that you can treat it like a game"

For each of the proposed games in Chapter 4, we prepare a corresponding Bayesian network in BayesiaLab. These networks allow you to experiment with the "pipes that convey information" as if they were set up in a laboratory, where you can look inside the tubes and measure the flows in pipes:

Game 1

Game 1 in BayesiaLab

The causal Bayesian network for Game 1 is available for download here:

In the spirit of games, we took advantage of BayesiaLab's ability to embellish nodes and added "start" and "finish" icons for the variables X and Y, respectively.
In each game, you need to determine the set of variables you need to adjust for (if any), to estimate the causal effect of X (Start) on Y (Finish) without bias.
As with all networks presented here, we will reason purely based on the causal structure and do not need to consider parameters or numerical values.
For demonstration purposes, the Bayesian networks you can download do contain states and numerical values. However, we chose them arbitrarily, and you should feel free to replace them with any other values of your choice. As long as you maintain the causal structure, the content of the nodes, e.g., numerical or categorical, does not matter at all.
BayesiaLab offers the Influence Paths to Target function, which highlights causal and noncausal paths in a network.
This feature analyzes the paths from the selected node X to the Target Node Y.
To start the function, select Main Menu > Analysis > Visual > Graph > Influence Paths to Target.

This analysis highlights causal paths in blue and noncausal paths in pink.
However, no pink paths appear, which means that no noncausal paths exist from X to Y.
As a result, no noncausal paths need to be blocked, and, therefore, we do not need to control for any variables to estimate the causal effect of X on Y. The association between X and Y corresponds to the causal effect.

Game 2

Context

"In this example you should think of A, B, C, and D as “pretreatment” variables. (The treatment, as usual, is X.) Now there is one back-door path X←A→ B←D→E→Y. This path is already blocked by the collider at B, so we don’t need to control for anything." (Pearl, p. 160)

Game 2 in BayesiaLab

For Game 2, we have once again created a causal Bayesian network, which is available for download here:

Note that the associated probability tables are fictitious. For our purposes, only the causal graph is relevant.
As before, we select Main Menu > Analysis > Visual > Graph > Influence Paths to Target to analyze the paths from X to Y.

We can see that there is no noncausal path.
Hence, there is then no need to control for any variables.

Game 3

"In Games 1 and 2 you didn’t have to do anything, but this time you do. There is one back-door path from X to Y, X←B→Y, which can only be blocked by controlling for B. If B is unobservable, then there is no way of estimating the effect of X on Y without running a randomized controlled experiment. Some (in fact, most) statisticians in this situation would control for A, as a proxy for the unobservable variable B, but this only partially eliminates the confounding bias and introduces a new collider bias." (Pearl, p. 160)

Game 3 in BayesiaLab

As with the earlier games, we encode Game 3 as a causal Bayesian network graph:

Again, the probabilities are fictitious and irrelevant.
We select Main Menu > Analysis > Visual > Graph > Influence Paths to Target to analyze the paths from X to Y.

Given the presence of a noncausal path (highlighted in pink), it becomes clear that we need to control for B to block that path.
Here, "fixing the probabilities" of B are a practical way of controlling for that variable. Note that the states and the values of the variable are irrelevant.

Now, after controlling for B, only one causal path remains, highlighted in blue, which allows us to estimate the effect of X and Y.
However, if B were unobservable ("not observable" or "hidden" in BayesiaLab terminology), some statisticians would perhaps propose to control for A as a proxy of B.
Let's try that scenario as well. We are now fixing A while leaving B "open."

The Influence Path Analysis reveals that controlling for proxy A does not achieve our objective.
Not only does it not block the noncausal path X←B→Y, controlling for A introduces an additional noncausal path X→A ←B→Y, i.e., another bias that prevents us from estimating the effect of X on Y.
This phenomenon is known as "collider bias," as it is produced by conditioning on a collider, such as A.

Game 4

"This one introduces a new kind of bias, called 'M-bias' (named for the shape of the graph). [...]

M-bias puts a finger on what is wrong with the traditional approach. It is incorrect to call a variable, like B, a confounder merely because it is associated with both X and Y. To reiterate, X and Y are unconfounded if we do not control for B. B only becomes a confounder when you control for it!" (Pearl, pp. 161–162)

Game 4 in BayesiaLab

The structure of this example seems simple and can be easily analyzed in BayesiaLab:

Given that B is a collider, there is no open path and, thus, there is no effect of X on Y at all.
As a result, nothing needs to be blocked.
However, as Pearl explains, if one were to apply a traditional three-step for a confounder, one might (incorrectly) conclude that B should be controlled for as a confounder.
Let's try this scenario in BayesiaLab and see what happens.

By controlling for B, we inadvertently open up a noncausal path between X and Y, i.e., we are introducing a bias.
The Influence Path Analysis highlights the M-shape, for which this bias is known.

Game 5

"Game 5 is just Game 4 with a little extra wrinkle. Now a second back-door path X←B←C→Y needs to be closed. If we close this path by controlling for B, then we open up the M-shaped path X←A→B←C→Y. To close that path, we must control for A or C as well. However, notice that we could just control for C alone; that would close the path X←B←C→Y and not affect the other path." (Pearl, p. 162)

Game 5 in BayesiaLab

Here we have the causal Bayesian network corresponding to Game 5:

Select Main Menu > Analysis > Visual > Graph > Influence Paths to Target and see that there is a noncausal path that needs to be blocked.
You can block this path by fixing the probability distribution of variable C.
You can check if this proposed approach is correct by setting your evidence — Fix Probabilities on C — and then running the Influence Paths Analysis again.

Smoking and Asthma

Context

Judea Pearl concludes Chapter 4 in The Book of Why with a model from a paper by Andrew Forbes and Elizabeth Williamson on the effect of smoking (X) on adult asthma (Y). It is the final example illustrating the Back-Door Criterion. (Pearl, p. 164)

Here is the causal Bayesian network for this problem domain:

You can use Main Menu > Analysis > Visual > Graph > Influence Paths to Target to find all paths from Smoking to Asthma.
As it turns out, there are 14 noncausal paths and one causal path!
Our task is to block all 14 noncausal paths and keep the one causal path open. If we can't do that, we won't be able to estimate the effect causal effect of Smoking on Asthma.
In this example, the variable Predisposition toward Asthma provides an extra challenge. It is a not-observable (or hidden) variable. Hence, you cannot adjust for it, which means you cannot use it to block any of the 14 noncausal paths.
In the end, you have to adjust for five variables (highlighted in green in the screen capture) to block all noncausal paths to estimate the causal effect of Smoking on Asthma.
After controlling for these variables, only one causal path remains, representing the relationship of interest, i.e., the effect of Smoking on Asthma.

Workflow Animation

The Birth-Weight Paradox

Context

"In the mid-1960s, Jacob Yerushalmy pointed out that a mother’s smoking during pregnancy seemed to benefit the health of her newborn baby, if the baby happened to be born underweight."
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (p. 183). Basic Books. Kindle Edition.

The Paradox Illustrated

We implemented this counterintuitive example as a causal Bayesian network, which means the arcs represent causal relationships.

Since the problem's description in the book is purely qualitative, and no data is available, we associated arbitrary probability distributions with the nodes. Although arbitrary, we specified the probabilities so that the network produces the paradoxical behavior described by Pearl.

The birth-weight paradox can be highlighted with two observations:
- Babies of smokers have a lower birth weight than babies of non-smokers.
- Low-birth-weight babies of smoking mothers have a higher survival rate compared to those of non-smokers.

The Paradox Resolved

"Smoking may be harmful in that it contributes to low birth weight, but certain other causes of low birth weight, such as serious or life-threatening genetic abnormalities, are much more harmful. There are two possible explanations for low birth weight in one particular baby: it might have a smoking mother, or it might be affected by one of those other causes." (Pearl, pp. 184–185)

In other words, Low-Birth-Weight is a collider in the structure Smoking Mother → Low-Birth-Weight ← Birth Defect.
In BayesiaLab, we can illustrate what happens by highlighting all information paths:
- Set Mortality of Child as Target Node.
- Set evidence on Low-Birth-Weight.
- Select Smoking Mother. Then, run Main Menu > Analysis > Visual > Graph > Influence Paths to Target.
- Now, all influence paths are visible.
If we observe Smoking Mother=False, this explains away Low-Birth Weight=True and reduces the probability of Birth Defect=True;
On the other hand, if we observe Smoking Mother=False, the probability of Birth Defect=True increases, and the probability of Mortality of Child=True increases, too.

The Monty Hall Problem

Context

“Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, ‘Do you want to pick door #2?’ Is it to your advantage to switch your choice of doors?”
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (p. 190). Basic Books. Kindle Edition.

The problem that Judea Pearl describes is based on the popular game show, Let's Make a Deal. The show, launched in 1963, was hosted for nearly 30 years by Monty Hall. Given its counterintuitive (and controversial) solution, the stated problem has been debated extensively in academia and popular science and became widely known as the "Monty Hall Problem."

The Game as a Causal Bayesian Network

We implemented the Monty Hall Problem as a Causal Bayesian Network in which arcs represent causal relationships.

The game can be described with three nodes, and each one of them with three states: Door #1, Door #2, Door #3.

Your Door, which represents the initial choice that you make as the contestant in this game. We assume a uniform prior distribution, i.e., each door has the same probability of being picked by you.
Location of Car, as the name implies, refers to the door behind which the car is hidden. We also assume a uniform prior distribution, i.e., you do not have any knowledge as to where the prize might be located.

Door Opened is the door that Monty Hall, the game host, opens. He chooses the door according to the following two rules:
- He won't open the door that you just selected;
- He also knows where the car is, so he won't open the door behind which the car is located.
We represent these two rules in the following Conditional Probability Table:

We can interpret the first row of this Conditional Probability Table as "If you choose Door #1 and the car is behind Door #1, then Monty Hall will open either Door #2 or Door #3."
The second row reads: "If you choose Door #1 and the car is behind Door #2, then Monty Hall can only open Door #3."
Analogously for the third row, "If you choose Door #1 and the car is behind Door #3, then Monty Hall can only open ."
Alternatively, you can experiment with different game scenarios via our WebSimulator: https://simulator.bayesialab.com/#!simulator/137248128314

What's the Optimal Decision?

If you experiment with this network in BayesiaLab or the WebSimulator, you will quickly discover that the optimal policy is always to change your door choice.

Let's go through this step by step and try it out in BayesiaLab or the WebSimulator:

You choose Door #1 and set evidence accordingly on the corresponding node: Your Door=Door #1
As per the game rules, Monty Hall cannot open Door #1, which you just picked.
As a result, Monty Hall could only open Door #2 or Door #3.
However, behind one of the two doors is the car, and Monty Hall knows where the car is.
As per the game rules, he won't reveal the car and, therefore, must open a door that presents a goat.
We simulate that Monty Hall opens Door #2 and set the evidence Door Opened=Door #2.
With Door #2 having revealed a goat, the car can only hide behind Door #1 or Door #3.
Given these pieces of evidence set so far, the Bayesian network updates the distribution of the node Location of Car:
- Door #1 remains at 1/3.
- Door #3 increases from 1/3 to 2/3.
This means that the grand prize, the car, is twice as likely to be behind Door #3 compared to Door #1.
As a result, you should indeed revise your original door selection and pick the other closed door instead.

Workflow Animation

The optimal decision policy could be different if we also considered the psychological cost of regret and the expected utility from the prizes.

A Mind-Reading Game?

"If I should switch no matter what door I originally chose, then it means that the producers somehow read my mind. How else could they position the car so that it is more likely to be behind the door I did not choose?
The key element in resolving this paradox is that we need to take into account not only the data (i.e., the fact that the host opened a particular door) but also the data-generating process—in other words, the rules of the game." (Pearl, pp. 191–192)

V-Structures

The Causal Bayesian Network we encoded above does indeed represent the data-generating process of this domain: Monty Hall decides which door to open based on two criteria, (1) the choice of the contestant and (2) the location of the car.

This particular arrangement of two causes and their common effect is called a V-structure. In such a V-structure, we call the common effect a Collider. In our example, the node Door Opened is such a Collider.

V-structures have important characteristics: parent nodes, which are initially independent, become dependent if we set any pieces of evidence on their common child node, i.e., the Collider or any of the Collider's descendants. In other words, in a V-structure, parent nodes are marginally independent but conditionally dependent given evidence on their descendants.

Please see Structures Within a DAG in Chapter 10 of our book to learn more about the important characteristics of different network structures.

Now we know that conditioning on Door Opened allows for information to flow from Your Door to Location of Car, as visualized by the green arrow below:

"It is a bizarre dependence for sure, one of a type that most of us are unaccustomed to. It is a dependence that has no cause." (Pearl, p. 194)

It is precisely the difficulty in understanding this conditional dependency that has made this game so intriguing.

An Incorrect "Common-Sense" Approach

Most casual observers, however, would attempt to reason about this problem the way we illustrate in the following noncausal Bayesian network.

In this context, the following Conditional Probability Tables would apply:

For the node Door Opened: Monty Hall cannot open the door the player chose:

For Location of Car: It cannot be behind the door that Monty Hall opened:

Now that we have formally encoded such a (mis)understanding of the domain, we can simulate the game again:

We pick Door #1, then Monty Hall opens Door #2.

With the given (incorrect) network, we would infer that Door #1 and Door#3 have equal probabilities of containing the car. As a result, there would be no reason to reconsider our initial choice.

Dining Room Experiment

The disagreement between the normative choice we explained earlier (What's the Optimal Decision?) and the "common-sense" solution presented just now (An Incorrect "Common-Sense" Approach) has fueled fierce debates and puzzled great minds for decades. With all that has been written about this paradox over the years — and we have used the Monty Hall Problem extensively as an example in our training sessions, we should let the inventor of this puzzle illuminate us. The matter was settled once and for all in 1991 with an experiment at the dining table of Monty Hall's residence in Beverly Hills. New York Times journalist John Tierney shares Monty Hall's perspective on the controversy in his article, Behind Monty Hall's Doors: Puzzle, Debate and Answer?

The Monty Hall Problem

Context

“Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, ‘Do you want to pick door #2?’ Is it to your advantage to switch your choice of doors?”
Pearl, Judea. The Book of Why: The New Science of Cause and Effect (p. 190). Basic Books. Kindle Edition.

The Game as a Causal Bayesian Network

We implemented the Monty Hall Problem as a Causal Bayesian Network in which arcs represent causal relationships.

The game can be described with three nodes, and each one of them with three states: Door #1, Door #2, Door #3.

Your Door, which represents the initial choice that you make as the contestant in this game. We assume a uniform prior distribution, i.e., each door has the same probability of being picked by you.
Location of Car, as the name implies, refers to the door behind which the car is hidden. We also assume a uniform prior distribution, i.e., you do not have any knowledge as to where the prize might be located.

Door Opened is the door that Monty Hall, the game host, opens. He chooses the door according to the following two rules:
- He won't open the door that you just selected;
- He also knows where the car is, so he won't open the door behind which the car is located.
We represent these two rules in the following Conditional Probability Table:

We can interpret the first row of this Conditional Probability Table as "If you choose Door #1 and the car is behind Door #1, then Monty Hall will open either Door #2 or Door #3."
The second row reads: "If you choose Door #1 and the car is behind Door #2, then Monty Hall can only open Door #3."
Analogously for the third row, "If you choose Door #1 and the car is behind Door #3, then Monty Hall can only open ."
You can download this Bayesian network in XBL format here: BoW_MontyHall.xbl
Alternatively, you can experiment with different game scenarios via our WebSimulator: https://simulator.bayesialab.com/#!simulator/137248128314

What's the Optimal Decision?

If you experiment with this network in BayesiaLab or the WebSimulator, you will quickly discover that the optimal policy is always to change your door choice.

Let's go through this step by step and try it out in BayesiaLab or the WebSimulator:

You choose Door #1 and set evidence accordingly on the corresponding node: Your Door=Door #1
As per the game rules, Monty Hall cannot open Door #1, which you just picked.
As a result, Monty Hall could only open Door #2 or Door #3.
However, behind one of the two doors is the car, and Monty Hall knows where the car is.
As per the game rules, he won't reveal the car and, therefore, must open a door that presents a goat.
We simulate that Monty Hall opens Door #2 and set the evidence Door Opened=Door #2.
With Door #2 having revealed a goat, the car can only hide behind Door #1 or Door #3.
Given these pieces of evidence set so far, the Bayesian network updates the distribution of the node Location of Car:
- Door #1 remains at 1/3.
- Door #3 increases from 1/3 to 2/3.
This means that the grand prize, the car, is twice as likely to be behind Door #3 compared to Door #1.
As a result, you should indeed revise your original door selection and pick the other closed door instead.

Workflow Animation

The optimal decision policy could be different if we also considered the psychological cost of regret and the expected utility from the prizes.

A Mind-Reading Game?

"If I should switch no matter what door I originally chose, then it means that the producers somehow read my mind. How else could they position the car so that it is more likely to be behind the door I did not choose?
The key element in resolving this paradox is that we need to take into account not only the data (i.e., the fact that the host opened a particular door) but also the data-generating process—in other words, the rules of the game." (Pearl, pp. 191–192)

V-Structures

Please see Structures Within a DAG in Chapter 10 of our book to learn more about the important characteristics of different network structures.

Now we know that conditioning on Door Opened allows for information to flow from Your Door to Location of Car, as visualized by the green arrow below:

"It is a bizarre dependence for sure, one of a type that most of us are unaccustomed to. It is a dependence that has no cause." (Pearl, p. 194)

It is precisely the difficulty in understanding this conditional dependency that has made this game so intriguing.

An Incorrect "Common-Sense" Approach

Most casual observers, however, would attempt to reason about this problem the way we illustrate in the following noncausal Bayesian network.

In this context, the following Conditional Probability Tables would apply:

For the node Door Opened: Monty Hall cannot open the door the player chose:

For Location of Car: It cannot be behind the door that Monty Hall opened:

Now that we have formally encoded such a (mis)understanding of the domain, we can simulate the game again:

We pick Door #1, then Monty Hall opens Door #2.

With the given (incorrect) network, we would infer that Door #1 and Door#3 have equal probabilities of containing the car. As a result, there would be no reason to reconsider our initial choice.