10 Things You Need to Know About the Local Average Treatment Effect

Summary

Sometimes a treatment or a program is delivered but for some reason or another only some individuals or groups actually take the treatment. In this case it can be hard to estimate treatment effects for the whole population. For example maybe people for whom the treatment would have had a big effect decided not to take up the treatment. In these cases it is still possible to estimate what’s called the “Local Average Treatment Effect,” or LATE. This guide discusses the LATE: what it is, how to estimate it, and how to interpret it.¹

1 What the LATE is²

When subjects do not receive the treatment to which they were assigned, the experimenter faces a “noncompliance” problem. Some subjects may need the treatment so badly that they will always take up treatment, irrespective of whether they are assigned to the treatment or to the control group. These are called “Always-Takers”. Other subjects may not take the treatment even if they are assigned to the treatment group: the “Never-Takers”. Some subjects are “Compliers”. These are the subjects that do what they are supposed to do: they are treated when assigned to the treatment group, and they are not treated when they are assigned to the control group. Finally, some subjects do the exact opposite of what they are supposed to do. They are called “Defiers”. Table 1 shows these four different types of subjects in the population.

Noncompliance can make it impossible to estimate the average treatment effect (ATE) for the population. For example, say that in a population of 200, 100 people are randomly assigned to treatment and we find that only 80 people are actually treated. What is the impact of the treatment? One method to answer this question is simply to ignore the noncompliance and compare the outcome in the treatment (100 people) and control (100 people) groups. This method estimates the average intention-to-treat effect (ITT). (See our guide on different kinds of treatment effects for more on the ITT.) While informative, this method does not give a measure of the effect of the treatment itself. Another approach would be to compare the 120 really-untreated and 80 really-treated subjects. Doing so, however, might give you biased estimates. The reason is that the 20 subjects that did not comply with their assignment are likely to be a nonrandom subset of those that were assigned to treatment.

So what now? In some cases it is possible to estimate the “Local Average Treatment Effect” (LATE), also known as the “Complier Average Causal Effect” (CACE). The LATE is the average treatment effect for the Compliers. Under assumptions discussed below, the LATE equals the ITT effect divided by the share of compliers in the population.

2 With one-sided noncompliance you need to satisfy an exclusion restriction to estimate the LATE

The example introduced above is termed one-sided noncompliance: 80% of the population respond to the treatment assignment (the “Compliers”) and 20% do not (the “Never-Takers”). Say that after the treatment, the experimenter measures the average outcome to be 50 in the treatment group and 10 in the control group. This situation is illustrated in Table 2. Note that only those indicated with blue in Table 2 were in fact treated.

Before we can calculate the LATE under one-sided noncompliance we need to make an assumption. The exclusion restriction (also called “excludability”) stipulates that outcomes respond to treatments, not treatment assignments. In normal words this simply means that the outcome for a Never-Taker is the same regardless of whether they are assigned to the treatment or control group: in both cases the subject is not treated, and that is what matters.

Because the treatment was randomly assigned, we know that if there are 20% Never-Takers in the treatment group (left column), there are probably about 20% Never-Takers in the control group. Because of the exclusion restriction, the Never-Takers have the same outcome under both assignment conditions, and thus the difference in average outcomes (40) cannot be attributed to the Never-Takers. We can thus attribute the entire ITT effect to the Compliers. The LATE can therefore be estimated by dividing the ITT estimate by the share of Compliers: 40/0.8 = 50.

3 With two-sided noncompliance the LATE can be estimated assuming both the exclusion restriction and a “no defiers” assumption

The experimenter may also face two-sided noncompliance. In this case, some subjects in the treatment group go untreated and some in the control group receive the treatment. In this world, the population consists of the Compliers, the Never-Takers, the Always-Takers, and the Defiers. To estimate LATE under two-sided noncompliance we need a second assumption: that the population contains no Defiers (the assumption is also called the “monotonicity” assumption). To see the use of this assumption look at Table 3, which illustrates our example under two-sided noncompliance. Again, after the treatment the experimenter measures the average outcome to be 50 in the treatment group and 10 in the control group. Note that those subjects in blue were in fact treated.

With the exclusion restriction and the no Defiers assumption we can estimate the LATE. Because of the exclusion restriction, we can attribute the entire ITT effect to Compliers and not to Never-Takers or Always-Takers. Given that we have an estimate of the ITT effect (40), what remains is to estimate the share of Compliers in the population. (Table 3 shows the subjects’ types as seen by an omniscient deity. The experimenter cannot observe these types, but can estimate their shares, as explained below.)

We cannot observe whether any given subject is a Complier, but we do observe whether they took the treatment. In the treatment group, we observe that 90 people took the treatment (and thus must be either Compliers or Always-Takers) and 10 did not (and thus must be Never-Takers, since there are no Defiers). Thus, 10% of the treatment group are Never-Takers, and since treatment was assigned randomly, we estimate that about 10% of the control group are Never-Takers. Similarly, in the control group, we observe that 10 people took the treatment (and thus must be Always-Takers, since there are no Defiers) and 90 did not (and thus must be either Compliers or Never-Takers). Thus, 10% of the control group are Always-Takers, and we estimate that about 10% of the treatment group are Always-Takers. From all this, we can estimate the share of Compliers as 100% - 10% (Never-Takers) - 10% (Always-Takers) = 80%. Finally, we can estimate the LATE as 40/0.8=50.

4 You can estimate the LATE using an instrumental variables approach

The LATE estimate is equivalent to an instrumental variables estimate. This is most easily illustrated following a set of regressions. Say that 50 individuals from a population of 100 are randomly assigned to treatment. Regressing treatment status (D) on the treatment assignment (Z) gives the estimated share of compliers: 80%. The ITT effect is estimated by regressing outcome Y on the assignment to treatment (Z). Again, LATE is estimated by dividing the ITT estimate by the estimated share of compliers. A researcher will get exactly the same results when running a two-stage least squares (2SLS) regression in which the outcome (Y) is regressed on the treatment (D), using the assignment to treatment as an instrumental variable (Z). This is shown in the code below.

Code

Z <- rep(0:1,50) # Assign 50 to treatment group (Z = 1), 50 to control group (Z = 0)
D <- Z           # Compliers have D (treatment received) = Z (treatment assignment)
D[1:10]  <- 0    # 10 Never Takers
D[11:20] <- 1    # 10 Always Takers
Y        <- 50*D # Compliers have Y = 50 if treated, 0 if not treated
Y[1:10]  <- 100  # Never takers have high Y
Y[11:20] <- 0    # Always takers have low Y
# Estimated share of compliers
ITTD <- coef(lm(D~Z))[2]
# Estimated intention-to-treat effect
ITT  <- coef(lm(Y~Z))[2]
# LATE estimate
LATE <- ITT / ITTD
cbind(Y_1 = mean(Y[Z==1]), Y_0=mean(Y[Z==0]), ITTD, ITT, LATE)

  Y_1 Y_0 ITTD ITT LATE
Z  50  10  0.8  40   50

Code

# library(AER)
summary(ivreg(Y~ D | Z))


Call:
ivreg(formula = Y ~ D | Z)

Residuals:
   Min     1Q Median     3Q    Max 
   -55     -5     -5     -5     95 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    5.000      5.660   0.883    0.379    
D             50.000      8.839   5.657 1.53e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 35.36 on 98 degrees of freedom
Multiple R-Squared: -0.1136,    Adjusted R-squared: -0.125 
Wald test:    32 on 1 and 98 DF,  p-value: 1.528e-07

The LATE estimate is calculated as the intention-to-treat estimate (ITT) divided by the estimated share of Compliers in the population. With noncompliance, the share of Compliers in the population is smaller than one. As a result, the LATE estimate will always be larger than the ITT estimate. Another way to look at this is that following the exclusion restriction (reminder: the exclusion restriction states that the outcome for a Never-Taker or Always-Taker is the same regardless of whether they are assigned to the treatment or control group), the ITT effect for the Never-Takers and the Always-Takers is zero. Thus, given any positive number of Never and/or Always-Takers, the average ITT effect is smaller than the LATE.

5 How to conduct statistical inference about the LATE

Because the LATE equals the ratio of the ITT and the share of Compliers, the LATE will be equal to zero whenever the ITT equals zero. To test the null hypothesis that the LATE is zero, you can thus rely on tests of the null hypothesis that the ITT is zero. This is a straightforward way to go if you would like to rely on randomization inference for hypothesis tests. (See our guides on hypothesis testing and randomization inference for more.) Alternatively, you can make use of the conventional standard errors and confidence intervals generated by instrumental variables regression, which rely on parametric assumptions about the sampling distribution. As a rule of thumb, the standard error of the CACE will be roughly equal to the standard error of the ITT divided by the share of Compliers.

6 The LATE only reflects treatment effects among compliers

While a LATE estimate is better than nothing, it provides a consistent estimate of the average treatment effect only for a subgroup of the population: the Compliers. It does not capture effects of the treatment among everyone in the experimental sample (ATE). For instance, Angrist and Evans (1998) study the effect of childbearing on a mother’s labor supply. In their paper, compliers account for only 6% of the total population, while we would like to know the effect for everyone, or at least for a larger subgroup of the population of interest.³ Whether or not only obtaining the effect for Compliers is problematic depends on an experimenter’s objectives. Sometimes the LATE is exactly what the researcher is interested in: the average effect on those that actually comply with the assignment to treatment. Moreover, LATE might not be very different from the ATE if the share of compliers is large and the treatment effects for the different types in the population are similar enough. To explore the latter, researchers can compare the background attributes of the Compliers and Never-takers in the treatment group. Another approach is to compare the average outcomes of Always-Takers and Compliers among those that are treated (those in blue in Table 3), and the average outcomes of Never-Takers and Compliers among those that are not treated.⁴ Finally, it is important to keep in mind that whether a subject is a Defier, Complier, Never-Taker or Always-Taker also depends on the experimental design and the context in which the experiment is conducted. For example, using phone calls instead of a monetary incentive to encourage treatment take-up can alter the share of compliers in the population. As a result, different instruments will estimate different LATEs.

7 The LATE is an important estimand in “encouragement” designs and in downstream experiments

Encouragement Designs: In an encouragement design, subjects are randomly invited to participate in the treatment. The reason to do so is that in some cases it might be unethical to make subjects adhere to the treatment assignment. In other cases, it might require unnecessarily large incentives to obtain adherence. As a result, in encouragement designs subjects self-select into treatment. For example, Hirano et al. (2000) study the impact of a letter encouraging inoculation of patients at risk for flu, sent to a randomly selected set of physicians. The outcome of interest is an indicator for flu-related hospital visits. Needless to say, the incentives for the treatment (the letter) have only a limited effect on the actual treatment received (a flu shot by the physician), and thus the study population will consist of Compliers, Always-Takers, and Never-Takers. (We assume there are no Defiers, patients who would get the flu shot if and only if their physician did not get the letter.) It is thus easy to see how such an encouragement design corresponds to the two-sided noncompliance case.⁵

Downstream experiments: Downstream experiments are studies in which an initial randomization (e.g. distribution of school vouchers) causes a change in an outcome (e.g. education level), and this outcome is then considered a treatment affecting a subsequent outcome (e.g. income).⁶ Also, these experiments correspond to our two-sided noncompliance setup. Noncompliance occurs because the random intervention is just one of many “encouragements” that cause people to take the treatment. Downstream experiments place particular pressure on the exclusion restriction, which requires that (following the example) school vouchers influences income only through higher education. This assumption would be violated if school vouchers affected income for reasons other than education.

8 You can use a placebo-controlled design to identify the LATE

Another strategy to estimate the LATE involves designing your experiment in such a way that you can find out who the Compliers are even among those who did not receive the treatment. One way to do so is to create a placebo group that receives a version of the treatment without the treatment’s “active ingredient.” Gerber et al. (2010), for example, study the effect of an automated phone call that delivers an encouragement to vote on whether study participants turn out to vote.⁷ The experiment included a placebo group in which study participants received automated phone calls that encouraged them to recycle. Because respondents in the placebo group also received calls, it is possible to observe which respondents in the placebo group answered the phone. Under the assumption that the content of the phone call does not affect who answers the phone, one can estimate the LATE by comparing turnout rates among those who answered the phone in the treatment group to turnout rates among those who answered the phone in the placebo group. The key assumption which makes this strategy work is that respondents who comply with the treatment (pick up the voting phone call) also comply with the placebo (pick up the recycling phone call) and vice versa. Gerber et al. (2010) discuss the advantages of having both a pure control and a placebo group for estimating the LATE.

9 Addressing partial compliance can be complicated

“Partial compliance” occurs when a subject is assigned to a treatment but receives less than “all” of the treatment. This is possible in designs with compound treatments, multi-arm designs like factorial designs, and in dose-response trials where the treatment variable is continuous. For example, subjects assigned to a three-session job training program may only attend two of the three sessions. Patients in a clinical trial assigned to receive 100 mg dosages of an experimental drug once every week for five weeks may only receive four of the five assigned doses. Addressing partial compliance can be especially complicated because the effective number of treatment conditions exceeds the number intended in the original design. This expansion of the number of treatment conditions affects the definition of the LATE and how to estimate it. First, the number and definition of compliance statuses changes. The categories used in designs with a binary treatment (Always-Takers, Never-Takers, Compliers, and Defiers) no longer suffice. Instead, the set of possible compliance statuses is determined by all possible combinations of treatment assignment and treatment receipt. In the binary case, we ruled out Defiers. In the partial compliance case, we can make similar (design-specific) monotonicity assumptions that rule out some theoretically possible compliance statuses. Finally, we are no longer interested in a single LATE. Partial compliance means that the number of quantities we are trying to estimate increases. Unfortunately, the IV/2SLS estimator used under one- and two-way noncompliance in two-group designs is a biased estimator of LATEs under partial compliance. Instead, Bayesian approaches have emerged as an alternative method for inference.⁸

10 Statistical power analysis for the LATE

Performing a statistical power analysis for the LATE can be an extremely valuable step in one’s research design (see our guide on statistical power). A power analysis can, for instance, allow one to determine the necessary sample size or to understand the magnitude of causal effects that can be detected given a fixed budget. However, fully analyzing power for estimators of the LATE is more complicated than for the ITT (or ATE). The problem is that, while the LATE is equivalent to the ITT scaled by compliance as explained above, the ITT and compliance rate must both be estimated. Furthermore, the ITT and compliance rate estimates will be correlated, and they can be correlated in any way (positively or negatively) depending upon the unique features of the data-generating process. As a result, as prior research has shown, ITT and related power analysis results diverge from the true power of tests of the LATE, and sometimes dramatically so.⁹

To see the problem in another way, consider that for any power analysis, a key element is the underlying parameters that factor into an estimator’s variance (i.e. the things that factor into the “background noise” as described in our statistical power guide). For the ITT or ATE, there are two such “distribution parameters”: the variances of the outcome in the control and treatment conditions. In contrast, there are nine such parameters for the LATE—related to the distributions of the treated and control potential outcomes for the Compliers, Always-Takers, and Never-Takers.¹⁰ Typical instrumental-variable power analyses make simplifying assumptions that can be unrealistic and/or require supplying parameters one may not have good information on. Meanwhile, simulation-based approaches require specifying all the parameters, or making modeling choices that abstract some of them away. The end result is that, except in special high-information circumstances, many power analyses for the LATE can be misleading.

As a solution, Bansak (2020) proposes a bounds-based approach that does not require specifying or making assumptions about the distribution parameters and provides conservative (though still practical) worst-case power bounds. (At a conceptual level, one can think of the method as providing a single formula-based answer that captures the worst-case scenario out of a countless number of simulations one would otherwise need to run.) In doing so, the method provides conservative answers for the required sample size and minimum-detectable effects for the LATE. The method allows for the LATE to be specified either in terms of a raw effect or a standardized effect size. The method is implemented in the R package powerLATE. A comprehensive tutorial explaining how to use the package within one’s research context is provided at https://github.com/kbansak/powerLATE_tutorial, along with even more information and examples that can be accessed at https://github.com/kbansak/powerLATE. The package’s functionality is easy-to-use, and it provides straightforward outputs. For instance:

Code

library(powerLATE)

Code

powerLATE(pi = c(0.6, 0.7, 0.8), power = 0.8, sig.level = 0.05, kappa = 0.25)

Power analysis for two-sided test that LATE equals zero

 pZ = 0.5
 pi = Multiple values inputted (see table below)
 kappa = 0.25
 Power = 0.8
 sig.level  = 0.05 

Given these parameter values, the conservative (upper) bound for N (required sample size):
  N         User-inputted pi
1 1688.3812 0.6             
2 1216.3567 0.7             
3  907.0362 0.8             

NOTE:  
The Ordered-Means assumption is not being employed. If the user would like to make this assumption to narrow the bounds, set the argument assume.ord.means to TRUE.

11 References

Angrist, J. D. 2006. “Instrumental Variables Methods in Experimental Criminological Research: What, Why and How.” Journal of Experimental Criminology 2: 23–44.

Angrist, J. D., and W. N. Evans. 1998. “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review 88 (3): 450–77.

Angrist, J. D., G. W. Imbens, and D. B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91: 444–72.

Angrist, J. D., and J.-S. Pischke. 2009. Mostly Harmless Econometrics.

Baiocchi, M., J. Cheng, and D. S. Small. 2014. “Instrumental Variable Methods for Causal Inference.” Statistics in Medicine 33: 2297–2340.

Dunning, T. 2012. Natural Experiments in the Social Sciences.

Gerber, Alan S., Donald P. Green, Edward H. Kaplan, and Holger L. Kern. 2010. “Baseline, Placebo, and Treatment: Efficient Estimation for Three-Group Experiments.” Political Analysis 18 (3): 297–315.

Green, Donald P., and Alan S. Gerber. 2002. “The Downstream Benefits of Experimentation.” Political Analysis 10 (4): 394–402.

Hirano, K., G. W. Imbens, D. B. Rubin, and X. Zhou. 2000. “Assessing the Effect of an Influenza Vaccine in an Encouragement Design.” Biostatistics 1 (1): 69–88.

Imbens, G. W. 2010. “Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009).” Journal of Economic Literature 48 (2): 399–423.

Imbens, G. W., and J. D. Angrist. 1994. “Identification and Estimation of Local Average Treatment Effects.” Econometrica 62 (2): 467–75.

Imbens, G. W., and J. J. Wooldridge. 2007. “Lecture Notes and Slides.”

Jin, H., and D. B. Rubin. 2008. “Principal Stratification for Causal Inference with Extended Partial Compliance.” Journal of the American Statistical Association 103: 101–11.

———. 2009. “Public Schools Versus Private Schools: Causal Inference with Partial Compliance.” Journal of Educational and Behavioral Statistics 34: 24–45.

Qi Long, Roderick, J. A. Little, and Xihong Lin. 2010. “Estimating Causal Effects in Trials Involving Multi-Treatment Arms Subject to Non-Compliance: A Bayesian Framework.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 59 (3): 513–31.

Footnotes

Originating author: Peter van der Windt, 20 October 2014. Revisions: Winston Lin, 22 August 2016. The guide is a live document and subject to updating by EGAP members at any time; contributors listed are not responsible for subsequent edits. Thanks to Albert Fang for point 9. Thanks to Kirk Bansak for point 10.↩︎
For more extensive overviews, see: Angrist, Imbens, and Rubin (1996); Angrist (2006); Angrist and Pischke (2009), sections 4.4-4.6 and 6.2; Dunning (2012); Baiocchi, Cheng, and Small (2014) (with correction in Statistics in Medicine 33: 4859-4860).↩︎
See Angrist and Evans (1998).↩︎
See: Imbens (2010); Imbens and Angrist (1994); Imbens and Wooldridge (2007).↩︎
See Hirano et al. (2000).↩︎
See Green and Gerber (2002).↩︎
See Gerber et al. (2010).↩︎
See Qi Long, Little, and Lin (2010); Jin and Rubin (2009); Jin and Rubin (2008).↩︎
This has been shown by the following studies: Baiocchi, Michael, Jing Cheng, and Dylan S. Small. “Instrumental variable methods for causal inference.” Statistics in Medicine 33, no. 13 (2014): 2297-2340. Bansak, Kirk. “A generalized approach to power analysis for local average treatment effects.” Statistical Science 35, no. 2 (2020): 254-271. Jo, Booil. “Statistical power in randomized intervention studies with noncompliance.” Psychological Methods 7, no. 2 (2002): 178-193.↩︎
See Bansak, Kirk. “A generalized approach to power analysis for local average treatment effects.” Statistical Science 35, no. 2 (2020): 254-271.↩︎

1 What the LATE is2

2 With one-sided noncompliance you need to satisfy an exclusion restriction to estimate the LATE

3 With two-sided noncompliance the LATE can be estimated assuming both the exclusion restriction and a “no defiers” assumption

4 You can estimate the LATE using an instrumental variables approach

5 How to conduct statistical inference about the LATE

6 The LATE only reflects treatment effects among compliers

7 The LATE is an important estimand in “encouragement” designs and in downstream experiments

8 You can use a placebo-controlled design to identify the LATE

9 Addressing partial compliance can be complicated

10 Statistical power analysis for the LATE

11 References

Footnotes

1 What the LATE is²