Playing with Bayesian Causal Graphs.

The main references have so far been the following:

References

References for the dowhy package:

Material which helped my introductory reading so far by :

Brief list of concepts I found useful:

  • at the center of interest is the calculation of potential outcomes / counterfactuals $\rightarrow$ what Y would have been if X would have been different (so everything after all observations are done) $\rightarrow$ so we may want to know if a certain drug was taken or a placebo and measure the difference in effect
  • potential outcomes and counterfactuals can be seen as being the same thing, see Rubin et al. 2005 (really more of a useful thing know when reading the literature - man was I confused before knowing this)
  • potential outcomes can be calculated without using a graphical model but graphical models help by guiding which variables should be conditioned on legally
  • graphical models themselves require justification

Technical note: In order to run this notebook, you'll need a symbolic link from /notebooks/bcg to ./bcg.

Small example using $ Y = a \cdot X_0 ^ b + c + X_1 $ to generate observations

n = 1000
a,b,c = 1.5, 1., 0
target = 'Y'
obs = pd.DataFrame(columns=['X0', 'X1', target])
obs['X0'] = stats.uniform.rvs(loc=-1, scale=2, size=n)
# obs['X1'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs['X1'] = stats.uniform.rvs(loc=-.5, scale=1, size=n)
obs[target] = a * obs.X0 ** b + c + obs.X1

obs.head()
X0 X1 Y
0 -0.806061 0.370908 -0.838184
1 0.667759 0.228029 1.229667
2 0.158370 0.352362 0.589917
3 0.103955 -0.018955 0.136977
4 -0.755793 -0.387319 -1.521008

Inspecting the observed data

plot_target_vs_rest(obs)
plot_var_hists(obs)
show_correlations(obs)

Generating the probably simplest possible causal graphical model

gg = GraphGenerator(obs)
print(f'target var: {gg.target}, not target vars: {", ".join(gg.not_targets)}')
target var: Y, not target vars: X0, X1
g = gg.get_only_Xi_to_Y()
gml = gg.get_gml(g)
gg.vis_g(g)
treatment = ['X0', ] # 'X1'
causal_model = dw.CausalModel(data=obs,  treatment=treatment, 
                              outcome=target, graph=gml)
INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['X0'] on outcome ['Y']

Note how CausalModel added an unobserved confounder variable U

causal_model.view_model()
WARNING:dowhy.causal_graph:Warning: Pygraphviz cannot be loaded. Check that graphviz and pygraphviz are installed.
INFO:dowhy.causal_graph:Using Matplotlib for plotting

Identifying the estimant

identified_estimand = causal_model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                  
─────(Expectation(Y))
d[X₀]                
Estimand assumption 1, Unconfoundedness: If U→{X0} and U→Y then P(Y|X0,,U) = P(Y|X0,)
### Estimand : 2
Estimand name: iv
No such variable found!

Computing the causal treatment estimate

method_name = 'backdoor.linear_regression'

effect_kwargs = dict(
    method_name=method_name,
    control_value = 0,
    treatment_value = 1,
    target_units = 'ate',
    test_significance = True
)
causal_estimate = causal_model.estimate_effect(identified_estimand,
                                        **effect_kwargs)
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Y~X0+X0*X1
print(causal_estimate)
*** Causal Estimate ***

## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                  
─────(Expectation(Y))
d[X₀]                
Estimand assumption 1, Unconfoundedness: If U→{X0} and U→Y then P(Y|X0,,U) = P(Y|X0,)
### Estimand : 2
Estimand name: iv
No such variable found!

## Realized estimand
b: Y~X0+X0*X1
## Estimate
Value: 1.4816271269256485

## Statistical Significance
p-value: <0.001

Trying to poke holes into the causal treatment effect

method_name = 'placebo_treatment_refuter'

refute_kwargs = dict(
    method_name=method_name,
    placebo_type = "permute",  # relevant for placebo refutation
)

refute_res = causal_model.refute_estimate(identified_estimand, 
                                   causal_estimate, 
                                   **refute_kwargs)
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Y~placebo+placebo*X1
print(refute_res)
Refute: Use a Placebo Treatment
Estimated effect:(1.4816271269256485,)
New effect:(0.10464178347001926,)