Chapter 18 – Analysis of Covariance (ANCOVA)
===========================================

.. contents:: Chapter overview
   :local:
   :depth: 2

In Chapter 17 you learned how to analyse *mixed-model* designs with both
between-subjects and within-subjects factors. In this chapter we introduce
a closely related idea: using a *covariate* to statistically control for
pre-existing differences between participants.

Analysis of covariance (ANCOVA) combines the logic of regression and ANOVA.
It answers questions like:

* Do two treatment groups differ on a post-test **after controlling for**
  baseline differences?
* Does a new therapy reduce anxiety **over and above** what we can predict
  from initial symptom severity?

Throughout the chapter we will work with a simple psychology example and
use :mod:`pystatsv1` and :mod:`pingouin` to fit and interpret an ANCOVA model.

Learning goals
--------------

After working through this chapter you should be able to:

* explain what a covariate is and why researchers include covariates
  in experimental designs,
* distinguish between **raw** (unadjusted) group means and **adjusted**
  means from an ANCOVA,
* describe the assumptions of ANCOVA (linearity, homogeneity of regression
  slopes, reliability of the covariate),
* run a basic one-way ANCOVA in Python using :mod:`pingouin`,
* interpret the output (F statistic, p-value, effect size, adjusted means),
* understand how ANCOVA is related to multiple regression.

18.1 Statistical control and covariates
---------------------------------------

Suppose a researcher is evaluating a new study-skills workshop designed
to improve exam performance. Students volunteer and are randomly assigned
to either a **control** group (no workshop) or a **treatment** group
(workshop). Everyone completes a pre-test measuring current study skills
and a final exam at the end of term.

In an ideal randomized experiment, random assignment ensures that the two
groups are similar *on average* before the intervention. In practice,
however, there will always be some pre-existing differences. In our example,
some students may start with better study skills or higher motivation.

A **covariate** is a continuous variable that is:

* measured prior to the manipulation (e.g., pre-test score),
* related to the outcome (e.g., final exam score), and
* **not** directly affected by the experimental treatment.

ANCOVA uses the covariate to statistically control for pre-existing
differences. Conceptually, we are asking:

    *“If all students had started with the **same** pre-test score,
    would the treatment and control groups still differ on the exam?”*

18.2 The logic of ANCOVA
------------------------

ANCOVA can be viewed in two equivalent ways:

* as an ANOVA that has been extended to include a continuous predictor, or
* as a multiple regression in which group membership is coded as a
  categorical predictor and the covariate is a continuous predictor.

The key idea is to partition the variance in the outcome into:

* variance explained by the covariate,
* variance explained by the group factor *after controlling for the
  covariate*, and
* residual (error) variance.

If the group factor explains a non-trivial amount of variance **over and
above** the covariate, the ANCOVA will yield a significant F statistic for
the group effect. The adjusted means provide a way to visualize that effect.

18.3 Adjusted means and interpretation
--------------------------------------

Because the covariate is continuous, each participant has a unique
combination of covariate value and outcome value. ANCOVA uses the
regression of the outcome on the covariate to compute **adjusted means**
for each group at a common reference value of the covariate
(often the overall mean).

In our example, imagine that we adjust all students to have the same
pre-test score. The adjusted means then tell us what the average exam
score *would have been* for each group **if** they had started at the
same baseline.

When reporting ANCOVA results in APA style, researchers typically:

* report the F statistic, degrees of freedom, p-value, and effect size
  for the group effect,
* describe the direction and magnitude of the adjusted group difference,
* mention the covariate and whether it was a significant predictor of
  the outcome.

For example:

    *“Controlling for pre-test study skills, students in the workshop
    condition scored higher on the final exam than those in the control
    condition, :math:`F(1, 77) = 8.42`, :math:`p = .005`, partial
    :math:`\eta^2 = .10`.”*

18.4 Assumptions of ANCOVA
--------------------------

ANCOVA shares many assumptions with regression and ANOVA:

* **Linearity** – the relationship between the covariate and outcome
  is approximately linear within each group.
* **Homogeneity of regression slopes** – the slope relating the covariate
  to the outcome is similar for each group. If the slopes differ
  substantially, a model with an interaction between group and covariate
  may be more appropriate.
* **Independence of observations** – the usual assumption for between-
  subjects designs.
* **Normality and homogeneity of variance** – residuals are approximately
  normal and have similar variance across groups.
* **Reliable covariate** – the covariate should be measured with reasonable
  reliability; noisy covariates provide little benefit and can even reduce
  power.

In practice, researchers check these assumptions using plots (e.g.,
scatterplots and residual plots) and model diagnostics.

18.5 PyStatsV1 Lab – One-way ANCOVA with a pre-test covariate
-------------------------------------------------------------

The Chapter 18 lab shows how to run a simple one-way ANCOVA using a
synthetic psychology dataset.

The script :mod:`scripts.psych_ch18_ancova`:

* simulates data for a control and treatment group,
* includes a **pre-test** covariate that is correlated with the **post-test**
  exam score,
* compares an ordinary one-way ANOVA on the post-test scores to a one-way
  ANCOVA that controls for the pre-test,
* uses :func:`pingouin.ancova` to fit the ANCOVA and report the F statistic,
  p-value, partial :math:`\eta^2`, and adjusted means,
* saves the synthetic dataset and ANCOVA table to the usual
  ``data/synthetic`` and ``outputs/track_b`` folders, and
* produces a simple plot that visualizes the group effect before and after
  adjusting for the covariate.

To run the lab from the command line, use the Makefile target:

.. code-block:: bash

   make psych-ch18

or, equivalently:

.. code-block:: bash

   python -m scripts.psych_ch18_ancova

To run the tests for this chapter only:

.. code-block:: bash

   make test-psych-ch18

As in earlier chapters, the tests provide a lightweight “contract” for the
simulation:

* the covariate must be positively correlated with the outcome,
* the ANCOVA model must show a significant treatment effect when it is
  present in the data-generating process, and
* the adjusted mean for the treatment group should exceed that of the
  control group.

Concept check
-------------

* Why might an experimenter include a pre-test covariate instead of simply
  comparing post-test scores with a t-test or one-way ANOVA?
* What does it mean to say that ANCOVA “controls for” a covariate?
* How are adjusted means different from raw means?
* What does the assumption of homogeneity of regression slopes require?
* How is ANCOVA related to multiple regression?

In the next chapter we will turn to non-parametric statistics – tools that
relax some of the assumptions we have relied on so far and allow us to
analyse ordinal and highly non-normal data.