10 Types of Treatment Effect You Should Know About
Contents
- 1 Causal effects as comparisons between potential outcomes
- 2 Individual treatment effects
- 3 Average treatment effects
- 4 Population and sample average treatment effects
- 5 Conditional average treatment effects
- 6 Intent-to-treat effects
- 7 Quantile Average Treatment Effects
- 8 Average marginal component effect
- 9 Direct and indirect effects and Eliminated Effects
- 10 Treatment Effects for Binary Outcomes: Log-Odds Treatment Effects and Attributable Effects
- 11 References
This guide describes estimands, theoretical parameters of interest, of researchers might want to identify. This guide should help researchers choose an estimand of interest to address their specific research question. It builds on 10 Things to Know About Causal Inference.
1 Causal effects as comparisons between potential outcomes
If a person who took an aspirin would have had a worse headache if she had not taken the aspirin, then we say that the aspirin caused an improvement in her headache. That is, the outcome for this suffering person after taking an aspirin could have been either a bad headache or a mild headache. But the aspirin caused her to experience, an a researcher to observe, a mild headache rather than the potential bad headache. This is an illustration of what we mean when we say that causal effects are comparisons of potential outcomes (see our guide on 10 Things to Know About Causal Inference to learn more about potential outcomes.
Researchers in every field are interested causal effects. Sometimes interest focuses on the average (say the average difference in headache severity across multiple people), or perhaps some other quantity like a ratio. This guide describes a few common causal effects that focus on averages.
2 Individual treatment effects
A first quantity of interest is the individual treatment effect. For this estimand a researcher would be interested in the effect of a treatment on each individual’s outcome. An individual could be a person, a household, a city, or any other level at which outcomes are measured. In the aspirin example, we could write \(Z_i=1\) to indicate that person \(i\) took an aspirin and \(Z_i=0\) to mean that person did not take an aspirin. We use the variable \(Z\) to refer to the experimental treatment and \(Y\) will be the outcome.
In order to see an effect of a treatment on a person’s outcome, one would want to observe that person’s potential outcome under treatment \(Y_i(Z_i=1)\) (which we write \(Y_i(1)\) as a shorthand) and the potential outcome under control \(Y_i(0)\). Taking the difference between these quantities would give the additive individual treatment effect for subject i: \(τ_{i}=Y_{i}(1)−Y_{i}(0)\). If we wanted to learn about the effect of a treatment on a ratio of outcomes (say, income was the outcome), we would write \(\tau_{i}=Y_{i}(1)/Y_{i}(0)\) as another individual treatment effect. Since we can only observe either \(Y_{i}(1)\) or \(Y_{i}(0)\) these individual treatment effects (and any others using those two potential outcomes) are unknowable although we can hypothesize about individual treatment effects and test those hypotheses and/or we can estimate individual treatment effects using models of outcomes.1
3 Average treatment effects
Some reseachers want to learn about the average treatment effect (ATE) across all observations in our experiment. Which we can define as the average of the individual level additive treatment effects across all units in the study.
\[ATE≡\frac{1}{N}∑^{N}_{i=1}τ_{i}=\frac{∑^{N}_{1}Y_{i}(1)}{N}−\frac{∑^{N}_{1}Y_{i}(0)}{N}\]
See 10 Strategies for Figuring out if X Causes Y for information on how we can estimate the ATE.2
4 Population and sample average treatment effects
When defining the average treatment effect, it isn’t immediately clear which units should be included in the all part of all units. Often we want to use our sample to make statements about some broader population.3 Let \(S_i\) be an indicator for whether an subject is in our sample. The sample average treatment effect (SATE) is defined simply as \(SATE = E(Y_i(1)−Y_i(0)|S_i=1)\) and the population \(PATE = E(Y_i(1)−Y_i(0))\). If we are interested in the effect of the treatment within our sample, we would be interested in the sample average treatment effect. For estimating the effect of the treatment on units that we do not observe but which we could observe, we would instead be interested in the population average treatment effect. With a large random sample from a well-defined population with full compliance with treatment, our SATE and PATE are equal in expectation.4
5 Conditional average treatment effects
Sometimes the theory or policy intervention motivating the study suggests that the treatment effect should differ for different sorts of people.
Random assignment ensures that treatment is independent of potential outcomes and any (observed and unobserved) covariates.5 Sometimes, however, we have additional information about the experimental units as they existed before the experiment was fielded. We can call such a baseline variable or “covariate”, \(X_{i}\). This information can can help us understand how treatment effects vary across subgroups. For example, we may suspect that men and women respond differently to treatment, and we can define an estimand that will have different values for each subgroup separately. We can define the conditional average treatment effect as:
\[CATE=E(Y_{i}(1)−Y_{i}(0)∣Z_{i},X_{i})\]
A version of a conditional ATE are the average treatment effects defined specifically for those to whom the treatment is administered. We define the average effects of treatment among the treated (ATT) and the control (ATC) as follows:
\[ATT=E(Y_i(1)-Y_i(0)|Z_i=1)=E(Y_i(1)|Z_i=1)-E(Y_i(0)|Z_i=1)\] \[ATC=E(Y_i(1)-Y_i(0)|Z_i=0)=E(Y_i(1)|Z_i=0)-E(Y_i(0)|Z_i=0)\]
Informally, the ATT is the effect for those that we treated; ATC is what the effect would be for those we did not treat.
Those interested in further reading on conditional average treatment effects should see chapter 9 of Gerber and Green (2012).
6 Intent-to-treat effects
Outside of a controlled laboratory setting, the subjects we assign to treatment often are not the same as the subjects who actually receive the treatment. For example, a public health agency could send postcards to people to encourage people to take aspirins when they have a headache, but they cannot force people to swallow the pills. Some people will take a pill because they were encouraged to do so by the postcards, but other will not comply with the instructions. We call this issue noncompliance6. The intent-to-treat effect (ITT) is the effect of giving someone the opportunity to receive treatment. In the absence of noncompliance, the ITT is the same as the ATE. We can define ITT as:
\[ITT = E[Y_i(Z=1)] - E[Y_i(Z=0)]\]
Where \(Z\) is an indicator for whether a subject has been assigned to treatment rather than indicating whether they received treatment.
What if you are interested in figuring out the effects of a treatment on those people who actually took up the treatment and not just those people that were administered the treatment? In this case, you would be interested in the complier average causal effect (CACE) or local average treatment effect (LATE). If we write \(D_i(Z_i=1)=1\) to mean that a person would take a “dose” of an assigned treatment and \(D_i(Z_i=0)=0\) to mean that a person would not take a dose of a treatment if it was not assigned, we can write the CACE as:
\[CACE= E[Y_i(1)-Y_i(0)|D_i(1)-D_i(0) = 1]\]
Which is the average effect conditional on the subject being a complier, a type that takes the treatment when assigned to treatment and does not take the treatment when assigned to control.7
7 Quantile Average Treatment Effects
The ATE focuses on the effect for a typical person, but we often also care about the distributional consequences of our treatment. We want to know not just whether our treatment raised average income, but also whether it made the distribution of income in the study more or less equal.
Claims about distributions are difficult. Even though we can estimate the ATE from a difference of sample means, in general, we cannot make statements about the joint distribution of potential outcomes \((F(Y_i(1),Y_i(0)))\) without further assumptions. Typically, these assumptions either limit our analysis to a specific sub-population8 or require us to assume some form of rank invariance in the distribution of responses to treatment effects.9
If these assumptions are justified for our data, we can obtain consistent estimates of quantile treatment effects (QTE) using quantile regression.10 Just as linear regression estimates the ATE as a difference in means (or, when covariates are used in the model, from a conditional mean), quantile regression fits a linear model to a conditional quantile and this model can then be used to estimates the effects of treatment for that particular quantile of the outcome. The approach can be extended to include covariates and instruments for non-compliance. Note that the interpretation of the QTE is for a given quantile, not an individual at that quantile.
8 Average marginal component effect
For a conjoint experiment (see 10 Things to Know About Survey Experiments), one might be interested in the marginal effect of changing one attribute. The average marginal component effect (AMCE) gives the average causal effect of changing an attribute from one value to another, while holding equal the joint distribution of the other attributes in the design, averaged over this distribution.
The probabilities associated with each factor are also informed by the choice of estimand. For the uniform AMCE (uAMCE), each factor is independently and uniformly marginalized. In contrast, the population AMCE (pAMCE) is marginalized over the target population distribution of profiles. The AMCE is always defined with respect to the distribution used for the random assignment; so if you change the randomization distribution, the interpretation changes.11
9 Direct and indirect effects and Eliminated Effects
When discussing the effects of a treatment, one could be interested in the total effect of the treatment on the outcome, as assumed above, or components of the effect of the treatment. Specifically, one might be interested in the direct effect of the treatment or any number of indirect effects of a treatment. Largely, interest in direct and indirect effects comes from an interest in mechanisms. It might be of interest to a researcher how the effect happens, rather than simply knowing whether all of the mechanisms together produce an effect.
When trying to decompose an effect into the components of indirect and direct effects, one might look to mediation analysis. Suppose that there is one mediator, potential outcomes can now be written as \(Y_i(Z_i)=Y_i(Z_i,M_i(Z_i))\), where \(Z_i\) is the treatment status (say, \(Z_i=1\) or \(Z_i=0\)) and \(M_i(Z_i)\) is a function for how the mediator responds to a given treatment assignment. Writing out the total effect in direct and indirect effects gives the following:
\[\begin{align} \underbrace{Y_i(1)-Y_i(0)}_{\text{Total Effect}} = \underbrace{Y_i(1,M_i(1))-Y_i(1,M_i(0))}_{\text{Indirect Effect}}+\underbrace{Y_i(1,M_i(0))-Y_i(0,M_i(0))}_{\text{Direct Effect}} \nonumber \end{align}\]
Equivalently,
\[\begin{align} \underbrace{Y_i(1)-Y_i(0)}_{\text{Total Effect}} = \underbrace{Y_i(0,M_i(1))-Y_i(0,M_i(0))}_{\text{Indirect Effect}}+\underbrace{Y_i(1,M_i(1))-Y_i(0,M_i(1))}_{\text{Direct Effect}} \nonumber \end{align}\]
See Glynn (2021) for more about mediation analysis.
When interested in indirect effects, one potential estiamnd is the eliminated effect. From section 9 it is clear to see that getting at the indirect effect is, at minimum, challenging. It is not immediately clear what it means to have the mediator respond to the treatment without the treatment being received as in \(Y_i(0,M_i(1))\). These cross-world potential outcomes are not possible to observe.
Despite this difficulty, it might be possible in some settings to manipulate the mediator, getting at controlled direct effects, where the researcher forces the mediator to respond in a certain way. Let the potential outcome where the subject is assigned to treatment and the mediator is forced to respond as if assigned to control be written as \(Y_i(1,0)\). The eliminated effect is the total effect minus the controlled direct effect.
\[\begin{align} \underbrace{Y_i(1)-Y_i(0)}_{\text{Total Effect}} &- \underbrace{Y_i(1,0)-Y_i(0,0)}_{\text{Controlled Direct Effect}} \nonumber \\ &= \underbrace{Y_i(1,M_i(1))-Y_i(1,M_i(0))}_{\text{Indirect Effect}} \nonumber \\ &+ \underbrace{[(Y_i(1,1)-Y_i(0,1)) - (Y_i(1,0)- Y_i(0,0))]*M_i(0)}_{\text{Mediated Interaction}} \nonumber \end{align}\]
Equivalently:
\[\begin{align} \underbrace{Y_i(1)-Y_i(0)}_{\text{Total Effect}} &- \underbrace{Y_i(10)-Y_i(00)}_{\text{Controlled Direct Effect}} \nonumber \\ &= \underbrace{Y_i(0,M_i(1))-Y_i(0,M_i(0))}_{\text{Indirect Effect}} \nonumber \\ &+ \underbrace{[(Y_i(1,1)-Y_i(0,1)) - (Y_i(1,0)- Y_i(0,0))]*M_i(1)}_{\text{Mediated Interaction}} \nonumber \end{align}\]
This effect can be observed, as both the total effect and the controlled direct effect can be obtained so long as the mediator is able to be manipulated.12
10 Treatment Effects for Binary Outcomes: Log-Odds Treatment Effects and Attributable Effects
Although a difference in proportion is an easy to interpret quantity (and it is the ATE when the outcomes are binary), there are at least two other effects that are in use when outcomes are binary.
Average treatment effects seem a bit hard to interpret when outcomes are not continuous. For example, a very common binary outcome in the study of elections is coded as 1 when subjects voted, and 0 when they did not. The average effect might be 0.2, but what does it really mean to say that a treatment increased voting by 0.2 for individual?
10.1 Log-Odds Treatment Effects
A common quantity of causal interest for dichotomous outcomes is our treatment’s effect on the log-odds of success, defined for the experimental pool as:
\[\Delta = log\frac{E(Y_i(1))}{1-E(Y_i(1))} - log\frac{E(Y_i(0))}{1-E(Y_i(0))}\]
(freemand_2008b?) shows that the coefficient from a logistic regression adjusting for covariates in a randomized experiments produces biased estimates of this causal effect. The basic intuition for Freedman’s argument comes from the fact that taking the log of averages is not the same as taking the average of logs and so the treatment coefficient estimated from a logistic regression conditioning on covariates will not provide a consistent estimator of log-odds of success. Instead, Freedman recommends taking the predicted probabilities varying subjects’ treatment status but maintaining their observed covariate profiles to produce a consistent estimator of the log-odds.
10.2 Attributable Effects
An alternative approach with binary outcomes is to infer about the sum of successes rather than the difference in proportions. Rosenbaum (2002) introduces this quantity in the context of matched observational studies and Hansen and Bowers (2009) use it in a randomized field experiment where voting or not voting is measured at the individual level.
Consider a simple case with a dichotomous outcome and treatment. Let \(A\) be the number of outcomes attributable to treatment, that is, the number of cases in which \(Y_i\) equaled 1 among treated subjects which would not have occurred had these units been assigned to control. For a range of \(A\)’s, we adjust the observed contingency table of outcomes among the treated, and compare this resulting distribution to a known null distribution (the distribution of outcomes we would have observed had treatment had no effect). The resulting range of \(A\)’s for which our test continues to reject the null hypothesis of no effect provides a range of effects that are attributable to our treatment.
Table 1 | \(D=1\) | \(D=0\) |
---|---|---|
\(Y=1\) | \(\sum Y_iD_i-A\) | \((1-Y_i)(D_i)\) |
\(Y=0\) | \(\sum Y_i(1-D_i)+A\) | \(\sum (1-Y_i)(1-D_i)\) |
First, we define an attributable effect as \(A=∑_iZ_iτ_i\), where \(τ_i=Y_i(1)−Y_i(0)\) and \(y∈0,1\). That is, the attributable effect is the number of “yes” or “success” or other “1” responses among those treated that we would not have seen if they had been assigned control.
Second, notice that if we write the set \(U\) as the experimental pool, and the set of control units is a subset of the whole pool, \(C⊆U\), then we can write \(∑_{i∈C}Y_i−Y_i(0)=0\). This means that we can represent \(A\) using totals:
\[A = ∑_{i=1}^NZ_iτ_i=∑_{i=1}^NZ_i(Y_i(1)−Y_i(0))=∑_{i∉C}Y_i(1)−∑_{i∉C}Y_i(0)\] \[ = ∑_{i∉C}Y_i−∑_{i∉C}Y_i(0)=∑_{i=1}^NY_i−∑_{i=1}^NY_i(0)=t_U−t_C\]Third, this representation allows us to produce a design-based confidence interval for \(\hat{A}\) by drawing on the survey sampling literature about statistical inference for sample totals because the observed total number of successful outcomes (the total number of people voting in the example), \(t_U\), is fixed across randomizations.
11 References
Footnotes
See our 10 Things to Know about Hypothesis Testing to learn about hypothesis testing for causal inference.↩︎
See also Chapter 2 of Gerber and Green (2012) or Chapter 2 of Angrist and Pischke (2008) for more information↩︎
See Imai, King, and Stuart (2008) and Coppock (2019) for a more detailed review of the issues discussed in this section.↩︎
TODO link to an appropriate guide↩︎
See 10 Things You Need to Know About the Local Average Treatment Effect for more on non-compliance.↩︎
Detailed discussion of both ITT and CACE can be found in chapters 5 and 6 of Gerber and Green (2012).↩︎
Abadie, Angrist, and Imbens (2002)↩︎
Chernozhukov and Hansen (2005). That is, treatment can have heterogeneous effects but the ordering of potential outcomes is preserved. See Angrist and Pischke Angrist, Joshua, and Jörn-Steffen Pischke. 2008. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton university press. See Frölich and Melly (2010) for fairly concise discussions of these issues and Abbring and Heckman (Abbring, Jaap H, and James J Heckman. 2007. “Econometric Evaluation of Social Programs, Part III: Distributional Treatment Effects, Dynamic Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy Evaluation.” Handbook of Econometrics 6. Elsevier: 5145–5303.) (2007) for a thorough overview.↩︎
See Koenker, Roger, and Kevin Hallock. 2001. “Quantile Regression: An Introduction.” Journal of Economic Perspectives 15 (4): 43–56. for a concise overview of quantile regression↩︎
For more on the AMCe and conjoint experiments, see Bansak et al. (2022) and Liu and Shiraito (2023)↩︎