1 of 6

The Back-Door Criterion

Context

As the originator of an entire school of thought on causality, Judea Pearl is certainly at liberty to take a more light-hearted and playful approach in presenting this serious topic. Chapter 4 in The Book of Why he titled "Confounding and Deconfounding: Or, Slaying the Lurking Variable." In fact, Pearl presents the task of "deconfounding" for causal effect estimation as a series of "games," which we now wish to illustrate with Bayesian networks.

The Back-Door Criterion and Deconfounding — It's All Fun and Games

We begin with a selection of quotes from the beginning of Chapter 4 to provide motivation for the forthcoming examples.

"To understand the back-door criterion, it helps first to have an intuitive sense of how information flows in a causal diagram. I like to think of the links as pipes that convey information from a starting point X to a finish Y. Keep in mind that the conveying of information goes in both directions, causal and noncausal, as we saw in Chapter 3.
In fact, the noncausal paths are precisely the source of confounding."
"To deconfound two variables X and Y, we need only to block every noncausal path between them without blocking or perturbing any causal paths."
"With these rules, decounfounding becomes so simple and fun that you can treat it like a game"

For each of the proposed games in Chapter 4, we prepare a corresponding Bayesian network in BayesiaLab. These networks allow you to experiment with the "pipes that convey information" as if they were set up in a laboratory, where you can look inside the tubes and measure the flows in pipes:

Game 1

Game 1 in BayesiaLab

The causal Bayesian network for Game 1 is available for download here:

In the spirit of games, we took advantage of BayesiaLab's ability to embellish nodes and added "start" and "finish" icons for the variables X and Y, respectively.
In each game, you need to determine the set of variables you need to adjust for (if any), to estimate the causal effect of X (Start) on Y (Finish) without bias.
As with all networks presented here, we will reason purely based on the causal structure and do not need to consider parameters or numerical values.
For demonstration purposes, the Bayesian networks you can download do contain states and numerical values. However, we chose them arbitrarily, and you should feel free to replace them with any other values of your choice. As long as you maintain the causal structure, the content of the nodes, e.g., numerical or categorical, does not matter at all.
BayesiaLab offers the Influence Paths to Target function, which highlights causal and noncausal paths in a network.
This feature analyzes the paths from the selected node X to the Target Node Y.
To start the function, select Main Menu > Analysis > Visual > Graph > Influence Paths to Target.

This analysis highlights causal paths in blue and noncausal paths in pink.
However, no pink paths appear, which means that no noncausal paths exist from X to Y.
As a result, no noncausal paths need to be blocked, and, therefore, we do not need to control for any variables to estimate the causal effect of X on Y. The association between X and Y corresponds to the causal effect.

Game 2

Context

"In this example you should think of A, B, C, and D as “pretreatment” variables. (The treatment, as usual, is X.) Now there is one back-door path X←A→ B←D→E→Y. This path is already blocked by the collider at B, so we don’t need to control for anything." (Pearl, p. 160)

Game 2 in BayesiaLab

For Game 2, we have once again created a causal Bayesian network, which is available for download here:

Note that the associated probability tables are fictitious. For our purposes, only the causal graph is relevant.
As before, we select Main Menu > Analysis > Visual > Graph > Influence Paths to Target to analyze the paths from X to Y.

We can see that there is no noncausal path.
Hence, there is then no need to control for any variables.

Game 3

"In Games 1 and 2 you didn’t have to do anything, but this time you do. There is one back-door path from X to Y, X←B→Y, which can only be blocked by controlling for B. If B is unobservable, then there is no way of estimating the effect of X on Y without running a randomized controlled experiment. Some (in fact, most) statisticians in this situation would control for A, as a proxy for the unobservable variable B, but this only partially eliminates the confounding bias and introduces a new collider bias." (Pearl, p. 160)

Game 3 in BayesiaLab

As with the earlier games, we encode Game 3 as a causal Bayesian network graph:

Again, the probabilities are fictitious and irrelevant.
We select Main Menu > Analysis > Visual > Graph > Influence Paths to Target to analyze the paths from X to Y.

Given the presence of a noncausal path (highlighted in pink), it becomes clear that we need to control for B to block that path.
Here, "fixing the probabilities" of B are a practical way of controlling for that variable. Note that the states and the values of the variable are irrelevant.

Now, after controlling for B, only one causal path remains, highlighted in blue, which allows us to estimate the effect of X and Y.
However, if B were unobservable ("not observable" or "hidden" in BayesiaLab terminology), some statisticians would perhaps propose to control for A as a proxy of B.
Let's try that scenario as well. We are now fixing A while leaving B "open."

The Influence Path Analysis reveals that controlling for proxy A does not achieve our objective.
Not only does it not block the noncausal path X←B→Y, controlling for A introduces an additional noncausal path X→A ←B→Y, i.e., another bias that prevents us from estimating the effect of X on Y.
This phenomenon is known as "collider bias," as it is produced by conditioning on a collider, such as A.

Game 4

"This one introduces a new kind of bias, called 'M-bias' (named for the shape of the graph). [...]

M-bias puts a finger on what is wrong with the traditional approach. It is incorrect to call a variable, like B, a confounder merely because it is associated with both X and Y. To reiterate, X and Y are unconfounded if we do not control for B. B only becomes a confounder when you control for it!" (Pearl, pp. 161–162)

Game 4 in BayesiaLab

The structure of this example seems simple and can be easily analyzed in BayesiaLab:

Given that B is a collider, there is no open path and, thus, there is no effect of X on Y at all.
As a result, nothing needs to be blocked.
However, as Pearl explains, if one were to apply a traditional three-step for a confounder, one might (incorrectly) conclude that B should be controlled for as a confounder.
Let's try this scenario in BayesiaLab and see what happens.

By controlling for B, we inadvertently open up a noncausal path between X and Y, i.e., we are introducing a bias.
The Influence Path Analysis highlights the M-shape, for which this bias is known.

Game 5

"Game 5 is just Game 4 with a little extra wrinkle. Now a second back-door path X←B←C→Y needs to be closed. If we close this path by controlling for B, then we open up the M-shaped path X←A→B←C→Y. To close that path, we must control for A or C as well. However, notice that we could just control for C alone; that would close the path X←B←C→Y and not affect the other path." (Pearl, p. 162)

Game 5 in BayesiaLab

Here we have the causal Bayesian network corresponding to Game 5:

Select Main Menu > Analysis > Visual > Graph > Influence Paths to Target and see that there is a noncausal path that needs to be blocked.
You can block this path by fixing the probability distribution of variable C.
You can check if this proposed approach is correct by setting your evidence — Fix Probabilities on C — and then running the Influence Paths Analysis again.

Game 1

Game 1 in BayesiaLab

The causal Bayesian network for Game 1 is available for download here:

BoW_BackDoor1.xbl

In the spirit of games, we took advantage of BayesiaLab's ability to embellish nodes and added "start" and "finish" icons for the variables X and Y, respectively.
In each game, you need to determine the set of variables you need to adjust for (if any), to estimate the causal effect of X (Start) on Y (Finish) without bias.
As with all networks presented here, we will reason purely based on the causal structure and do not need to consider parameters or numerical values.
For demonstration purposes, the Bayesian networks you can download do contain states and numerical values. However, we chose them arbitrarily, and you should feel free to replace them with any other values of your choice. As long as you maintain the causal structure, the content of the nodes, e.g., numerical or categorical, does not matter at all.
BayesiaLab offers the Influence Paths to Target function, which highlights causal and noncausal paths in a network.
This feature analyzes the paths from the selected node X to the Target Node Y.
To start the function, select Main Menu > Analysis > Visual > Graph > Influence Paths to Target.

This analysis highlights causal paths in blue and noncausal paths in pink.
However, no pink paths appear, which means that no noncausal paths exist from X to Y.
As a result, no noncausal paths need to be blocked, and, therefore, we do not need to control for any variables to estimate the causal effect of X on Y. The association between X and Y corresponds to the causal effect.

The Back-Door Criterion

Context

Causal effect estimation is the topic of . In this context, we discuss the central role of confounders and non-confounders in identifying and estimating causal effects. Much of what we explain in that chapter is a practical illustration of Judea Pearl's teaching on causality.