library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/meghapsimatrix/datasets/master/causal/HSLS09_complete.csv") dat
In my qualifying exam, in the written part, I was asked about how to analyze the effect of continuous, not binary, treatment using propensity score analysis. I skipped it for the written but I spent a few days looking up how to analyze this in case I would be asked during my oral examination. Sadly, no one asked me even when I asked them to, so here is a blog detailing my explorations.
Binary Treatment
For a review of propensity score analysis with binary treatment, please see (Stuart, 2010). Below let \(e(X)\) denote propensity scores, \(D\) denote a binary treatment, and \(X\) denote all the observed confounders. In the case of binary treatment, propensity scores represent the probability of receiving treatment given the covariates:
\[e(X) = P(D = 1|X)\]
We estimate the scores using logistic regression or machine learning techniques like generalized boosted models.
Extension to Continuous Treatment
In binary treatment context, we assume that the potential outcomes (\(Y(1)\) and \(Y(0)\)) are independent of treatment given \(X\):
\[Y(1), Y(0) \perp\!\!\!\perp D |X\]
and by extension are independent given the propensity scores:
\[Y(0), Y(1) \perp\!\!\!\perp D|e(X)\]
Hirano & Imbens (2004) introduced the assumption of weak unconfoundedness in the context of continuous treatment. They stated: “we do not require joint independence of all potential outcomes. Instead, we require conditional independence to hold for each value of the treatment.” Below, let \(T\) denote a continuous treatment variable. The potential outcome when \(T = t\) is unrelated to the treatment given the set of covariates:
\[Y(t) \perp\!\!\!\perp T |X\]
To calculate the propensity scores, in the case of continuous treatment, we cannot find the probability that continuous treatment (\(T\)) equals a given value \(t\). The likelihood of continuous variables taking on a given value is zero. For continuous treatment variable, we find the conditional density, the probability that \(T\) is infinitely close to \(t\) given \(X\). Below let \(r(t,x)\) denote the propensity scores. The right hand side of the equation represents the probability density function of a normal distribution. To estimate the propensity scores, we need to run a linear regression predicting the treatment by a set of covariates (Austin, 2019). From that we get the fitted values (\(X\hat{\beta}\)) and the model variance (\({\sigma}^2\)) (Austin, 2019). The fitted values take the place of the mean in the density function.
\[ r(t, x) = {f_{T|X}}^{(t|x)} = \frac{1}{\sqrt{2\pi\hat{\sigma}^2}} e^{-\frac{(t - X\hat{\beta})^2}{2\pi\hat{\sigma}^2}}\]
Conditional on the propensity scores, we can assume that each potential outcome is independent of treatment:
\[Y(t) \perp\!\!\!\perp T |r(t,x)\]
Hirano & Imbens (2004) state that: “Within strata with the same value of \(r(t,X)\), the probability that \(T = t\) does not depend on the value of \(X\).” I have seen \(1\) and \(I\) in front of the \((T = t)\), denoting the indicator function (Bia & Mattei, 2008; Hirano & Imbens, 2004).
\[X \perp\!\!\!\perp 1(T = t)|r(t,x)\]
Calculating Weights
Following the same logic as the inverse propensity weights (IPW) for the estimation of the average treatment effect (ATE) for a binary treatment, we calculate the inverse of the propensity scores as the weights:
\[\frac{1}{{f_{T|X}}^{(t|x)}}\]
However, Robins, Hernan, & Brumback (2000) noted that such weights can result in infinite variance (Austin, 2019). They suggested to use stabilized weights as follows:
\[\frac{{f_{T}}^{(t)}}{{f_{T|X}}^{(t|x)}}\]
Here the numerator represents the marginal density of treatment:
\[{f_{T}}^{(t)} = \frac{1}{\sqrt{2\pi\hat{\sigma_t}^2}} e^{-\frac{(t - \mu_t)^2}{2\pi\hat{\sigma_t}^2}}\]
The stabilized weights make the distribution of the IPW narrower as there is less difference between the numerators and the denominators (Wal & Geskus, 2011).
Real Data Analysis Example
The data that I use here is from High School Longitudinal Study of 2009 to analyze the effect of dropping out of high school on later math achievement. The missing data in the original dataset have been replaced with one iteration of imputation using mice
(Van Buuren & Groothuis-Oudshoorn, 2011). This is not an appropriate method to analyze missing data but for the purpose of the example I am just using the one complete data. For the sake of this example, let’s analyze the effect of math efficacy on later math achievement.
Loading the Data
The Numerators
Here I am getting the numerators of the IPW, the marginal densities. I have regressed math_efficacy on just the intercept and used dnorm
function to extract the densities.
# the numerator
<- lm(math_efficacy ~ 1, data = dat)
mod_num
<- dnorm(x = dat$math_efficacy, # treatment
num mean = fitted.values(mod_num), # fitted values
sd = summary(mod_num)$sigma) # model sigma
The Denominators
Here I am getting the denominators of the IPW, the conditional densities. I have regressed math_efficacy on \(X\) and used dnorm
function to extract the densities. I am not quite sure whether to use the model sigma which divides the sum of errors squared by the degrees of freedom before taking the square root or whether I should just take the standard deviation of the errors. However, with large sample size the difference between the two are negligible.
# the demonimator
<- lm(math_efficacy ~ sex + race + language + repeated_grade + IEP + locale + region + SES, data = dat)
mod_den
<- dnorm(x = dat$math_efficacy, # treatment variable
den mean = fitted.values(mod_den), # fitted values
sd = summary(mod_den)$sigma)
The IPW
Below I calculate the stabilized weights:
<- dat %>%
dat mutate(ipw_s = num/den)
summary(dat$ipw_s)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1398 0.9186 0.9778 1.0001 1.0390 5.9782
Checking Balance and Outcome Analysis
Short story: For balance, we have to calculate weighted correlations, and for outcome analysis we estimate the expected outcome for each treatment level and compare (Austin, 2019).