In the rapidly evolving world of data science and machine learning, one crucial concept continues to garner attention for its potential to revolutionize how we draw conclusions from data: causal inference.Whether we're evaluating the effectiveness of a policy, determining if a new product feature drove user engagement, or understanding the societal impact of a public health intervention, we are essentially seeking causal answers.

Traditional machine learning methods often fall short in this regard because they rely on associational patterns. While these models are excellent for prediction within known contexts, they may fail catastrophically when the data environment changes — for example, due to an intervention or an external shift like climate change. This is where causal models stand out: they are built to withstand such changes by capturing the underlying mechanisms that govern relationships

Why Causality?

We often hear phrases like "doing X may reduce Y," but then we read the small print that says "this is an observational study, so we can't say anything about causation."That's like Schrödinger's conclusion — both causal and not causal at the same time

Many important questions are inherently causal:

  • What would've happened if we hadn't changed the policy
  • Will this new product actually lead to higher sales?
  • Why did this outcome occur?

Most machine learning models can't handle these questions reliably as they rely on correlation. They predict things based on patterns seen in the past. What happens if the environment changes or you plan to intervene in the system? That's where causal models are more reliable, because they account for why variables affect each other — not just that they do.

Randomised Trials vs. Real Life

Of course, the gold standard for causal inference is the Randomised Controlled Trial (RCT). You randomly assign people to a treatment and control group and then compare outcomes. That works well in medicine and controlled experiments.

But what if:

  • You can't run a trial (it's too expensive, unethical, or simply impossible)?
  • You only have historical data?

The good news? We can still estimate causal effects from observational data — if we bring in some domain knowledge and use the right tools

Modeling Causality with DoWhy

Through the use of DoWhy, a Python library that enforces a structured, four-step approach to causal inference:

Model the Causal Problem Using domain knowledge, the first step involves creating a causal graph (a Directed Acyclic Graph or DAG) that outlines relationships between variables. This graph is crucial because it makes your assumptions explicit — no more hidden confounders lurking beneath a black-box model. Simply, this graph shows how you believe the variables are related causally.

You also specify:

  • treatment — The variable you're changing
  • outcome — the result you're measuring

Example:

  • Variable A (training program) → Variable B (income)
  • Variable C (education) → A and B

Let's use the classic Lalonde dataset (which looks at whether a job training program improved participants' income):

from dowhy import CausalModel
import pandas as pd

# Load data
data = pd.read_csv("lalonde.csv")  # Assumes you have the dataset

# Define causal graph
causal_graph = """
digraph {
    age -> training;
    education -> training;
    education -> earnings;
    training -> earnings;
    race -> earnings;
    race -> training;
}
"""

# Create the causal model
model = CausalModel(
    data=data,
    treatment="training",
    outcome="earnings",
    graph=causal_graph
)

Identify the Causal Effect DoWhy then checks whether the causal effect you're interested in (say, the effect of training on income) can be computed from the graph and your data.It outputs something called an estimand, which is just a fancy way of saying "this is how we'll estimate the effect."

identified_estimand = model.identify_effect()
print(identified_estimand)

Estimate the Effect The library offers a range of estimation strategies — from simple regression to advanced machine learning techniques (including integration with EconML). In using the Lalonde dataset, DoWhy estimates that participating in a training program leads to an average increase of $1,629 in wages.

DoWhy supports a variety of estimation methods under the method_name parameter in the estimate_effect() function. These methods are specified using a naming convention like: {identification_strategy}.{estimation_method}

The most common identification strategies are:

backdoor

iv (instrumental variables)

frontdoor

Backdoor Estimation Methods in DoWhy

backdoor.linear_regression — Standard OLS regression controlling for confounders

backdoor.propensity_score_matching — Matches treated and control units with similar propensity scores

backdoor.propensity_score_weighting — Weights samples by inverse probability of treatment

backdoor.propensity_score_stratification — Stratifies population into propensity score groups

backdoor.doublyrobust — Combines outcome and treatment models for double robustness

backdoor.econml.dml.LinearDML — Uses Double ML from the EconML library

backdoor.econml.drlearner.DRLearner — DR Learner for heterogeneous treatment effects

backdoor.econml.causalforest.CausalForestDML — Causal forest method (nonlinear, tree-based models)

backdoor.econml.orf.OrthoForest — OrthoForest method for flexible causal modeling

backdoor.econml.meta.IntervalEstimation — Interval estimation using ML meta-learners

Instrumental Variable (IV) Estimation Methods in DoWhy

iv.two_stage_least_squares — Standard 2SLS (Two-Stage Least Squares) estimation

iv.instrumental_variable_regression — General IV regression using statsmodels

iv.econml.ivdr.DRLearner — DR Learner for IV-based estimation (via EconML)

iv.econml.deepiv.DeepIVEstimator — Deep IV estimation using neural networks

iv.econml.dml.LinearDML — Double ML with IVs from EconML

iv.econml.orf.OrthoForest — OrthoForest with instrumental variables

iv.econml.local.LinearLocalIV — Local IV for estimating marginal effects

iv.econml.two_stage_least_squares.TwoStageLeastSquares — 2SLS from EconML

Frontdoor Estimation Methods in DoWhy

frontdoor.linear_regression — Basic frontdoor adjustment using linear regression

frontdoor.nonparametric — Nonparametric frontdoor adjustment (if implemented and supported) (experimental in some forks; not always available in core DoWhy)

Note: Frontdoor methods are limited because valid frontdoor adjustment requires strict conditions that are rarely met in practice. Most use cases revolve around backdoor and IV strategies.

estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_matching"
)
print(estimate.value)

#Notes:
##You need to install econml for methods under backdoor.econml.*:

pip install econml

#sample code for Alternate method

estimate = model.estimate_effect(
    identified_estimand,
    method_name="frontdoor.linear_regression"
)
print("Estimated Effect:", estimate.value)

Refute the Estimate Finally, DoWhy includes tools to test the robustness (to make sure the result isn't just noise or coincidence) of the findings. For example, a placebo test might randomly assign the treatment and check whether the estimated effect vanishes (which it should). This helps ensure the result isn't a statistical artifact.

refutation = model.refute_estimate(
    identified_estimand,
    estimate,
    method_name="placebo_treatment_refuter"
)
print(refutation)

If the estimated effect drops to near zero, that's a good sign that your original result was meaningful.

Refutation Methods in DoWhy

placebo_treatment_refuter — Randomly permutes the treatment variable to test if the effect vanishes (i.e., placebo test).

data_subset_refuter — Re-estimates the causal effect on random subsets of the data to test consistency.

add_unobserved_common_cause — Simulates the impact of an unobserved confounder on the estimated effect.

random_common_cause — Adds a randomly generated covariate to test if it spuriously influences the estimate.

dummy_outcome_refuter — Replaces the outcome variable with random noise to verify the estimator doesn't falsely detect an effect.

bootstrap_refuter — Applies bootstrapping (resampling) to assess the stability and variance of the causal estimate.

Bonus: Counterfactuals and "What If" Analysis

One of the most powerful features of DoWhy is the ability to simulate counterfactual scenarios using the do() operator.

For example:

  • What if everyone in the dataset had received training?
  • What if no one had?

DoWhy lets you estimate these hypothetical outcomes, which is gold for policy evaluation and planning. For example

# Assuming you've already built your model and estimated the effect
# Now use the do() operator to simulate intervention scenarios

# Scenario 1: What if everyone received the treatment?
# Set the treatment variable to 1 for everyone
counterfactual_treated = model.do(x={"training": 1})
print("Counterfactual outcome if everyone received training:")
print(counterfactual_treated.mean())

# Scenario 2: What if no one received the treatment?
# Set the treatment variable to 0 for everyone
counterfactual_untreated = model.do(x={"training": 0})
print("Counterfactual outcome if no one received training:")
print(counterfactual_untreated.mean())

What This Does

  • model.do(x={"training": 1}) simulates the average outcome under the condition that everyone received training.
  • model.do(x={"training": 0}) simulates the same, but under the condition that no one received training.

This lets you explore policy-relevant questions like:

  • "Should we scale this program to everyone?"
  • "What would the impact be if we didn't run the program at all?"

Note: You can also pass in a DataFrame to get individual-level counterfactuals.

# Assume your model is already defined and fitted
# You can pass your data to simulate individual outcomes under a given treatment

# Scenario: What if each individual had received the training?
individual_counterfactual_treated = model.do(x={"training": 1}, data=data)
data["cf_outcome_if_treated"] = individual_counterfactual_treated

# Scenario: What if each individual had NOT received the training?
individual_counterfactual_untreated = model.do(x={"training": 0}, data=data)
data["cf_outcome_if_untreated"] = individual_counterfactual_untreated

# Compare actual vs. counterfactual
print(data[["training", "earnings", "cf_outcome_if_treated", "cf_outcome_if_untreated"]].head())

What happens?

For each individual, you now have two new columns:

  • cf_outcome_if_treated: What their income would have been if they had received the training.
  • cf_outcome_if_untreated: What their income would have been if they had not.

This enables individual-level comparison and is especially helpful for:

  • Personalized decision-making
  • Evaluating fairness
  • Policy targeting

Causal inference is no longer just a theoretical concept reserved for academia or randomized trials. With tools like DoWhy, we can now apply causal thinking directly to observational data — which is what most of us work with in the real world.

DoWhy makes this possible through:

  • A transparent, structured 4-step framework
  • Built-in support for backdoor, frontdoor, and instrumental variable strategies
  • Multiple robust estimation and refutation techniques
  • Powerful capabilities for counterfactual and individual-level analysis

By combining your domain knowledge with solid statistical tools, you can go beyond "what is associated with what" and start answering what causes what and, more importantly, what would happen if things were different.

Whether you work in public policy, business, healthcare, or research, learning and applying causal inference with DoWhy can lead to deeper insight, stronger conclusions, and more impactful decisions.