Chapter 18 – Analysis of Covariance (ANCOVA)
In Chapter 17 you learned how to analyse mixed-model designs with both between-subjects and within-subjects factors. In this chapter we introduce a closely related idea: using a covariate to statistically control for pre-existing differences between participants.
Analysis of covariance (ANCOVA) combines the logic of regression and ANOVA. It answers questions like:
Do two treatment groups differ on a post-test after controlling for baseline differences?
Does a new therapy reduce anxiety over and above what we can predict from initial symptom severity?
Throughout the chapter we will work with a simple psychology example and
use pystatsv1 and pingouin to fit and interpret an ANCOVA model.
Learning goals
After working through this chapter you should be able to:
explain what a covariate is and why researchers include covariates in experimental designs,
distinguish between raw (unadjusted) group means and adjusted means from an ANCOVA,
describe the assumptions of ANCOVA (linearity, homogeneity of regression slopes, reliability of the covariate),
run a basic one-way ANCOVA in Python using
pingouin,interpret the output (F statistic, p-value, effect size, adjusted means),
understand how ANCOVA is related to multiple regression.
18.1 Statistical control and covariates
Suppose a researcher is evaluating a new study-skills workshop designed to improve exam performance. Students volunteer and are randomly assigned to either a control group (no workshop) or a treatment group (workshop). Everyone completes a pre-test measuring current study skills and a final exam at the end of term.
In an ideal randomized experiment, random assignment ensures that the two groups are similar on average before the intervention. In practice, however, there will always be some pre-existing differences. In our example, some students may start with better study skills or higher motivation.
A covariate is a continuous variable that is:
measured prior to the manipulation (e.g., pre-test score),
related to the outcome (e.g., final exam score), and
not directly affected by the experimental treatment.
ANCOVA uses the covariate to statistically control for pre-existing differences. Conceptually, we are asking:
“If all students had started with the **same* pre-test score, would the treatment and control groups still differ on the exam?”*
18.2 The logic of ANCOVA
ANCOVA can be viewed in two equivalent ways:
as an ANOVA that has been extended to include a continuous predictor, or
as a multiple regression in which group membership is coded as a categorical predictor and the covariate is a continuous predictor.
The key idea is to partition the variance in the outcome into:
variance explained by the covariate,
variance explained by the group factor after controlling for the covariate, and
residual (error) variance.
If the group factor explains a non-trivial amount of variance over and above the covariate, the ANCOVA will yield a significant F statistic for the group effect. The adjusted means provide a way to visualize that effect.
18.3 Adjusted means and interpretation
Because the covariate is continuous, each participant has a unique combination of covariate value and outcome value. ANCOVA uses the regression of the outcome on the covariate to compute adjusted means for each group at a common reference value of the covariate (often the overall mean).
In our example, imagine that we adjust all students to have the same pre-test score. The adjusted means then tell us what the average exam score would have been for each group if they had started at the same baseline.
When reporting ANCOVA results in APA style, researchers typically:
report the F statistic, degrees of freedom, p-value, and effect size for the group effect,
describe the direction and magnitude of the adjusted group difference,
mention the covariate and whether it was a significant predictor of the outcome.
For example:
“Controlling for pre-test study skills, students in the workshop condition scored higher on the final exam than those in the control condition, :math:`F(1, 77) = 8.42`, :math:`p = .005`, partial :math:`eta^2 = .10`.”
18.4 Assumptions of ANCOVA
ANCOVA shares many assumptions with regression and ANOVA:
Linearity – the relationship between the covariate and outcome is approximately linear within each group.
Homogeneity of regression slopes – the slope relating the covariate to the outcome is similar for each group. If the slopes differ substantially, a model with an interaction between group and covariate may be more appropriate.
Independence of observations – the usual assumption for between- subjects designs.
Normality and homogeneity of variance – residuals are approximately normal and have similar variance across groups.
Reliable covariate – the covariate should be measured with reasonable reliability; noisy covariates provide little benefit and can even reduce power.
In practice, researchers check these assumptions using plots (e.g., scatterplots and residual plots) and model diagnostics.
18.5 PyStatsV1 Lab – One-way ANCOVA with a pre-test covariate
The Chapter 18 lab shows how to run a simple one-way ANCOVA using a synthetic psychology dataset.
The script scripts.psych_ch18_ancova:
simulates data for a control and treatment group,
includes a pre-test covariate that is correlated with the post-test exam score,
compares an ordinary one-way ANOVA on the post-test scores to a one-way ANCOVA that controls for the pre-test,
uses
pingouin.ancova()to fit the ANCOVA and report the F statistic, p-value, partial \(\eta^2\), and adjusted means,saves the synthetic dataset and ANCOVA table to the usual
data/syntheticandoutputs/track_bfolders, andproduces a simple plot that visualizes the group effect before and after adjusting for the covariate.
To run the lab from the command line, use the Makefile target:
make psych-ch18
or, equivalently:
python -m scripts.psych_ch18_ancova
To run the tests for this chapter only:
make test-psych-ch18
As in earlier chapters, the tests provide a lightweight “contract” for the simulation:
the covariate must be positively correlated with the outcome,
the ANCOVA model must show a significant treatment effect when it is present in the data-generating process, and
the adjusted mean for the treatment group should exceed that of the control group.
Concept check
Why might an experimenter include a pre-test covariate instead of simply comparing post-test scores with a t-test or one-way ANOVA?
What does it mean to say that ANCOVA “controls for” a covariate?
How are adjusted means different from raw means?
What does the assumption of homogeneity of regression slopes require?
How is ANCOVA related to multiple regression?
In the next chapter we will turn to non-parametric statistics – tools that relax some of the assumptions we have relied on so far and allow us to analyse ordinal and highly non-normal data.