it's very hard to derive a causal graph (i.e., a model of causal structure) from observational data alone. this is called causal discovery. Consider that ML finds
usually we build such a graph modelling causal relationships using reasonable assumptions/heuristics.
then we can start to answer 'what if' questions about varying causal factors, i.e. produce estimations of the scale of causal effect from these 'what if' scenarios, from observational data. this is called causal inference.
assuming markov conditions to hold, lets us transform a graph into an expression for the joint probability distribution of all the graph variables, e.g.
local markov conditions and global markov conditions are different only in terms of language. they are both just saying how markov conditions would be applied to the graph.
a hidden driving factor can cause two of its targets to be correlated.
a driver's conditional dependence on a third factor, can cause the third factor to be correlated with driver's targets.
we model causal relationships using a DAG
![[DAG example.png|250]]
Markov condition: if causal relationships as drawn in the causal graph are true, then statistical independence is known, satisfying
every variable
$X_i$ , when conditioned on its own parents, is independent of its non descendant$Y$ . $$ (X_i \perp!!!\perp Y | \text{parents}(X_i) )$$
1 when we assume the markov condition is true, we can model the joint probability distribution of all variables, as the product of independent conditional probabilities.
in the above example we have
the above formulation of the markov condition, for a variable
for three disjoint sets of variables
$\mathbf{X, Y, S}$ the set$\mathbf{X}$ is d-separated from$\mathbf{Y}$ , conditional on$\mathbf{S}$ iff all paths between any member of$\mathbf{X}$ and any member of$\mathbf{Y}$ are blocked by$\mathbf{S}$ .
path: 'path', in the global condition is used to mean any connection, regardless of direction in the causal graph.
a path
$q$ is blocked by$\mathbf{S}$ if
$q$ contains a chain$i \to m \to j$ or a fork$i \leftarrow m \to j$ where$m$ is in$\mathbf{S}$ $q$ contains a collider$i \to m \leftarrow j$ such that$m$ is not in$\mathbf{S}$ , and no descendant of$m$ is in$\mathbf{S}$
note for the path to be blocked,
We use this notation to say
consider that the independence observed (using markov conditions) in a graph may not actually be statistically true.
we denote the statistical independence observed in a probability distribution
when there are no additional independent conditional probability terms in the true probability distribution, compared to the graph, i.e.
the assumption that markov conditions hold, is called the causal faithfulness assumption.
when we add observations to the simple DAG causal graph, we have a causal bayesian net.
e.g. ![[bayesian network example.png|500]]
some natural 'what if' questions arise as follows.
-
conditioning
e.g. what is the chance the pavement will be slippery if we find the sprinkler off?
$$P(\text{Slippery} | \text{Sprinkler} = \text{OFF}) $$ -
intervention
e.g. what is the chance the pavement will be slippery if we TURN the sprinkler off?
$$P(\text{Slippery} | do(\text{Sprinkler} = \text{OFF})) $$ -
counterfactual reasoning
e.g. what is the chance that the pavement would be slippery, if the sprinkler had not been on, given the sprinkler IS on and the pavement IS NOT slippery.
$$P(\text{Slippery}_{Sprinkler=OFF} | \text{Sprinkler} = \text{ON}, \text{Slippery} = False) $$
how do we answer these 'what if' questions, from observational data? i.e. how can we estimate these probabilities from observation?
ideally we use randomised controlled experiments.
For example we want to find the effects of treatment
how might we use our heuristical construction of a causal graph to aid our estimation of causal effects? we can use back-door adjustment or the front-door adjustment.
A set of variables
- no node in
$\mathbf Z$ is a descendant of$X$ -
$\mathbf Z$ blocks every non-causal path from$X$ to$Y$ .- to construct such a non-causal path, flip arrows targeting
$X$ , then ignore arrow directions and see if you can reach$Y$ .
- to construct such a non-causal path, flip arrows targeting
note 'ordered pair' means
example: ![[backdoor example.png|200]]
if
now we consider a fixed value of
conditioning on all possible value combinations in the back-door
A set of variables
-
$\mathbf Z$ intercepts all directed paths from$X$ to$Y$ - there is no back-door path from
$X$ to$\mathbf{Z}$ . - all back-door paths from
$\mathbf{Z}$ to$Y$ are blocked by$X$ .
now we consider a fixed value of
if
is it possible to estimate causal effects from observational data?
- if we have constructed a causal graph from reasonable assumptions, then yes, we can estimate causal effects i.e. answer our 'what if' questions.
however, can we find a causal structure, i.e. our causal graph, from observational data alone('causal discovery')? i.e. from empirical construction of