Chapter 13 – Factorial Designs and the Two-Way ANOVA

In Chapters 10–12 you learned how to compare two or more groups on one independent variable (IV):

independent-samples t-tests for two groups (between-subjects),
paired-samples t-tests for repeated measures (within-subjects),
one-way ANOVA for three or more groups on a single factor.

In real psychological research, however, we rarely care about just one factor at a time. We ask questions like:

Does a training program help more under high stress than low stress?
Does a therapy work better for some age groups than others?
Does feedback style interact with time pressure to affect performance?

These questions involve more than one independent variable. The standard tool is the factorial design, analyzed with a two-way ANOVA (for two factors).

Our goals in this chapter are to help you:

understand the logic of factorial designs and the 2 × 2 notation,
distinguish main effects from interactions,
interpret interactions as “differences of differences”,
see examples of spreading vs. crossover interactions,
appreciate the idea of simple main effects as follow-up tests, and
run and interpret a two-way ANOVA using PyStatsV1 on a balanced design.

Design Logic: Two Factors at Once

We will use a simple 2 × 2 example throughout.

Suppose a lab studies a stress-management training program. Participants are randomly assigned to:

Training factor (Factor A) - control – no training - cbt – brief cognitive–behavioral training

and then complete a challenging task either under:

Context factor (Factor B) - low_stress – quiet room, no time pressure - high_stress – noisy room, strict time pressure

The dependent variable is a continuous stress_score (higher = more stress).

This design has two factors (Training and Context), each with two levels, so we call it a 2 × 2 factorial design.

Notation for Factorial Designs

A shorthand like 2 × 2 tells you:

the number of levels of each factor, and
how many experimental cells there are.

Examples:

2 × 2
- Factor A has 2 levels, Factor B has 2 levels.
- Total cells: \(2 \times 2 = 4\) (control/low, control/high, cbt/low, cbt/high).
2 × 3
- Factor A has 2 levels, Factor B has 3 levels.
- Total cells: \(2 \times 3 = 6\).
2 × 2 × 2
- Three factors, each with 2 levels.
- Total cells: \(2 \times 2 \times 2 = 8\).

In this chapter we focus on the two-way case (two factors). The ideas generalize to more complex designs, but two-way ANOVA is the workhorse for most undergraduate research projects.

Main Effects

A main effect is the overall effect of one factor, averaging over the levels of the other factor.

The main effect of Training asks:

> On average across both Context conditions, do participants in cbt > differ from those in control?
The main effect of Context asks:

> On average across both Training conditions, do participants in > high_stress differ from those in low_stress?

We compute marginal means to answer these questions. For example:

Mean stress in control: average of (control, low_stress) and (control, high_stress).
Mean stress in cbt: average of (cbt, low_stress) and (cbt, high_stress).

If those marginal means differ, there is evidence of a main effect for that factor.

Interactions: The “It Depends” Effect

The most important idea in factorial designs is the interaction.

An interaction occurs when the effect of one factor depends on the level of the other factor.

In our example, we ask:

Does the benefit of CBT training depend on whether the context is low-stress or high-stress?

Mathematically, we can think of an interaction as a difference of differences.

Let:

\(\bar{X}_{\text{control, low}}\) be the mean stress for control/low,
\(\bar{X}_{\text{control, high}}\) for control/high,
\(\bar{X}_{\text{cbt, low}}\) for cbt/low,
\(\bar{X}_{\text{cbt, high}}\) for cbt/high.

Compute the Training effect within each Context:

\[\text{Training effect at low stress} = \bar{X}_{\text{control, low}} - \bar{X}_{\text{cbt, low}},\]

\[\text{Training effect at high stress} = \bar{X}_{\text{control, high}} - \bar{X}_{\text{cbt, high}}.\]

The interaction for Training × Context is the difference between these two effects:

\[\text{Interaction (difference of differences)} = (\bar{X}_{\text{control, high}} - \bar{X}_{\text{cbt, high}}) - (\bar{X}_{\text{control, low}} - \bar{X}_{\text{cbt, low}}).\]

If that difference of differences is zero (within random error), we say there is no interaction. If it is clearly non-zero, we have an interaction: the effect of training changes across contexts.

Graphical View: Non-Parallel Lines

Interactions are often easiest to see in a line graph:

Put Context on the x-axis (low vs high).
Plot mean stress for each Training condition as a separate line.

Then:

If the lines are parallel, the Training effect is similar at low and high stress → no interaction (or only a trivial one).
If the lines spread apart, converge, or cross, the Training effect changes with Context → interaction.

Two common patterns:

Spreading interaction

Training helps under high stress but has little effect under low stress. The lines diverge as you move from low to high stress.
Crossover interaction

Control performs better in low stress, but CBT performs better in high stress (or vice versa). The lines literally cross.

Simple Main Effects

When an interaction is present, main effects can be hard to interpret on their own.

For example, suppose CBT reduces stress only in the high-stress context. The overall (marginal) means for Training might still show a “modest” effect, even though the real story is:

CBT ≈ control under low stress,
CBT < control under high stress.

To unpack this, researchers examine simple main effects:

the effect of one factor at a single level of the other factor.

Examples:

Simple effect of Training within low_stress: compare control vs cbt using only low-stress participants.
Simple effect of Training within high_stress: compare control vs cbt using only high-stress participants.
Simple effect of Context within control: compare low vs high stress among control participants only.

In practice, simple main effects are often tested with t-tests or one-way ANOVAs conducted within a subset of the data, sometimes combined with Bonferroni or other corrections for multiple tests.

In the PyStatsV1 lab for this chapter, the two-way ANOVA is the primary analysis. For pedagogical purposes, the script also shows how to compute a few simple main effects (for example, Training within each Context) using independent-samples t-tests when the interaction is statistically significant.

The Two-Way ANOVA: Partitioning Variance

Factorial ANOVA extends the one-way ANOVA logic from Chapter 12. We still partition the total variability in the outcome into meaningful components.

Let:

Factor A = Training (2 levels),
Factor B = Context (2 levels),
\(Y_{ijk}\) be the score for person \(k\) in cell \((i, j)\), where \(i\) indexes levels of A and \(j\) indexes levels of B,
\(n_{ij}\) be the number of participants in cell \((i, j)\),
\(\bar{Y}_{ij}\) be the cell mean for cell \((i, j)\),
\(\bar{Y}_{i\cdot}\) be the marginal mean for level \(i\) of A,
\(\bar{Y}_{\cdot j}\) be the marginal mean for level \(j\) of B,
\(\bar{Y}_{\cdot\cdot}\) be the grand mean across all participants.

Then the total sum of squares can be written as:

\[SS_{\text{Total}} = \sum_{i}\sum_{j}\sum_{k} (Y_{ijk} - \bar{Y}_{\cdot\cdot})^2.\]

For a balanced 2 × 2 design (equal \(n_{ij}\) in each cell), we can decompose this into:

\[SS_{\text{Total}} = SS_{A} + SS_{B} + SS_{AB} + SS_{\text{Within}},\]

where:

\(SS_A\) captures the main effect of Training,
\(SS_B\) captures the main effect of Context,
\(SS_{AB}\) captures the interaction,
\(SS_{\text{Within}}\) is the within-cell (error) variability.

Note: For these components to be additive (A + B + AB), the design must be balanced.

Each component has associated degrees of freedom (df) and a mean square (MS) obtained by dividing SS by its df. The F-tests for each effect are:

\[F_A = \frac{MS_A}{MS_{\text{Within}}}, \quad F_B = \frac{MS_B}{MS_{\text{Within}}}, \quad F_{AB} = \frac{MS_{AB}}{MS_{\text{Within}}}.\]

Effect sizes (for example, \(\eta^2\) for each effect) can be computed as the proportion of total variance associated with each SS component. As with one-way ANOVA, these sample-based measures tend to slightly overestimate the population effect sizes, but they are helpful descriptive summaries.

Assumptions in the Two-Way ANOVA

The classical two-way ANOVA relies on similar assumptions to the one-way case:

Independence

Observations are independent within and across cells (for example, each participant appears in only one cell).
Normality

The outcome scores within each cell are approximately Normally distributed.
Equal variances

The population variances are roughly equal across cells (homogeneity of variance).
Balanced design (for our manual calculations)

In this chapter, we assume equal sample sizes in each cell (for example, 25 participants per Training × Context combination). This greatly simplifies the sums of squares and matches the PyStatsV1 implementation.

Warning

Balanced vs. Unbalanced Designs

The manual calculations and PyStatsV1 helpers in this chapter assume a balanced design with equal sample sizes in every cell.

If sample sizes differ (unbalanced), the factors become correlated and sums of squares must be computed using more advanced methods (for example, Type III sums of squares). Professional software (such as SPSS, SAS, or R packages) handles this automatically, but our hand-calculation formulas and simple code do not.

For this reason, when you experiment with the Chapter 13 script, keep the cell sizes equal. If you need to analyze an unbalanced design in real research, use dedicated statistical software and pay close attention to how it defines and reports sums of squares.

PyStatsV1 Lab: Two-Way ANOVA on Stress Scores

In this lab, you will analyze a simulated 2 × 2 factorial experiment with:

Factor A: training (control vs cbt),
Factor B: context (low_stress vs high_stress),
Dependent variable: stress_score.

You will:

simulate a balanced dataset with the same \(n\) in each cell,
compute:
- cell means and sample sizes,
- marginal means for each level of Training and Context,
- sums of squares \(SS_A\), \(SS_B\), \(SS_{AB}\), \(SS_{\text{Within}}\), \(SS_{\text{Total}}\),
- corresponding degrees of freedom and mean squares,
- F-statistics and p-values for each main effect and the interaction,
- eta-squared style effect sizes for each effect,
visualize the interaction with a simple line plot of cell means,
optionally compute a small set of simple main effects (for example, Training within each Context) when the interaction is statistically significant.

All code for this lab lives in:

scripts/psych_ch13_two_way_anova.py

and the script can optionally write outputs to:

data/synthetic/psych_ch13_two_way_stress.csv

Running the Lab Script

From the project root, you can run:

python -m scripts.psych_ch13_two_way_anova

If your Makefile defines a convenience target, you can instead run:

make psych-ch13

This will:

simulate a balanced 2 × 2 Training × Context dataset,
print the cell means and sample sizes,
compute the two-way ANOVA table with F-tests for:
- main effect of Training,
- main effect of Context,
- Training × Context interaction,
report eta-squared style effect sizes for each effect,
draw (or save) a simple interaction plot of mean stress by Context, with separate lines for each Training condition,
optionally compute and print simple main effects (for example, Training within low_stress and within high_stress) when the interaction is statistically significant.

Expected Console Output

Your exact numbers will vary if you change the seed or parameters, but with the default settings you might see output like:

Two-way ANOVA on stress scores (Training × Context)
---------------------------------------------------
Cell means (n per cell = 25):
  control, low_stress    mean = 17.9
  control, high_stress   mean = 23.4
  cbt,     low_stress    mean = 16.8
  cbt,     high_stress   mean = 18.9

ANOVA table:
  SS_A  (Training)       =  95.21, df_A  = 1, MS_A  = 95.21,  F_A  =  4.10, p_A  = 0.046
  SS_B  (Context)        = 640.37, df_B  = 1, MS_B  = 640.37, F_B  = 27.56, p_B  < 0.001
  SS_AB (Interaction)    = 118.94, df_AB = 1, MS_AB = 118.94, F_AB =  5.11, p_AB = 0.026
  SS_within              = 1087.42, df_within = 96, MS_within = 11.33
  SS_total               = 1941.94, df_total  = 99

Effect sizes (eta-squared style):
  eta^2_Training    = 0.049
  eta^2_Context     = 0.330
  eta^2_Interaction = 0.061

Simple main effects (because interaction is significant):
  Training within low_stress:  t(48) = 0.82,  p = 0.416
  Training within high_stress: t(48) = 2.78,  p = 0.008

Interaction plot saved to: outputs/track_b/ch13_training_by_context.png

Focus on:

Cell means and lines in the interaction plot: are the Training lines parallel, spreading, or crossing?
Main effects: Are there overall differences between Training conditions, or between Contexts, when averaging across the other factor?
Interaction: Does the Training effect depend on Context? Are the differences between control and cbt larger in one context than the other?
Simple main effects: If the interaction is significant, do follow-up tests show that Training matters only under high stress, or in both contexts?

Your Turn: Practice Scenarios

As in earlier chapters, you can experiment by editing parameters in psych_ch13_two_way_anova.py. Some ideas:

Create a pure main-effect scenario

Make CBT slightly better than control in both contexts by the same amount. What happens to the Training main effect and the interaction?
Create a spreading interaction

Make Training have little effect under low_stress but a strong effect under high_stress. How does this change the interaction plot and the F for the interaction?
Create a crossover interaction

Make control slightly better under low_stress but cbt clearly better under high_stress. Can the overall Training main effect be small or even misleading, while the interaction is large?
Change the within-cell variability

Increase the standard deviation of the simulated scores. Watch how \(MS_{\text{Within}}\) grows and the F-statistics shrink even if the cell means stay the same.
(Do not break the balance!)

You can change the shared n_per_cell parameter (for example, 20 instead of 25), but resist the temptation to give different cells different sample sizes. Our manual formulas and PyStatsV1 helpers assume equal sample sizes in each cell. For unbalanced designs, you will need more advanced tools (for example, Type III sums of squares in specialized software).

Summary

In this chapter you learned:

why psychologists often use factorial designs with more than one independent variable,
how to interpret main effects as overall differences for each factor,
how to interpret interactions as “it depends” or difference of differences effects, often revealed by non-parallel lines in an interaction plot,
why simple main effects are useful follow-ups when interactions are present,
how the two-way ANOVA partitions variance into main effects, interaction, and error for a balanced design,
how to implement a two-way ANOVA and basic simple main-effects analyses using PyStatsV1.

In the bigger arc:

Chapter 10 introduced independent-samples t-tests for two groups.
Chapter 11 introduced paired-samples t-tests for within-subjects designs.
Chapter 12 extended the between-subjects logic to three or more groups using one-way ANOVA.
Chapter 13 generalizes the ANOVA framework to factorial designs, where more than one independent variable is manipulated at the same time.

Factorial designs are powerful tools. They let you ask richer questions about how psychological processes behave across different contexts, and they prepare you for even more complex models (mixed designs, ANCOVA, and beyond) in later chapters.

For the full Python implementation, see scripts/psych_ch13_two_way_anova.py in the PyStatsV1 GitHub repository.