Chapter 13 – Factorial Designs and the Two-Way ANOVA 
===================================================

In Chapters 10–12 you learned how to compare two or more groups on **one**
independent variable (IV):

* independent-samples *t*-tests for two groups (between-subjects),
* paired-samples *t*-tests for repeated measures (within-subjects),
* one-way ANOVA for three or more groups on a single factor.

In real psychological research, however, we rarely care about just one factor
at a time. We ask questions like:

* Does a training program help **more** under high stress than low stress?
* Does a therapy work **better for some age groups than others**?
* Does feedback style interact with **time pressure** to affect performance?

These questions involve **more than one independent variable**. The standard
tool is the **factorial design**, analyzed with a **two-way ANOVA** (for two
factors).

Our goals in this chapter are to help you:

* understand the logic of factorial designs and the 2 × 2 notation,
* distinguish **main effects** from **interactions**,
* interpret interactions as **"differences of differences"**,
* see examples of spreading vs. crossover interactions,
* appreciate the idea of **simple main effects** as follow-up tests, and
* run and interpret a two-way ANOVA using PyStatsV1 on a balanced design.

Design Logic: Two Factors at Once
---------------------------------

We will use a simple 2 × 2 example throughout.

Suppose a lab studies a stress-management training program. Participants are
randomly assigned to:

* **Training** factor (Factor A)
  - ``control`` – no training
  - ``cbt`` – brief cognitive–behavioral training

and then complete a challenging task either under:

* **Context** factor (Factor B)
  - ``low_stress`` – quiet room, no time pressure
  - ``high_stress`` – noisy room, strict time pressure

The dependent variable is a continuous ``stress_score`` (higher = more stress).

This design has **two factors** (Training and Context), each with **two
levels**, so we call it a **2 × 2 factorial design**.

Notation for Factorial Designs
------------------------------

A shorthand like **2 × 2** tells you:

* **the number of levels of each factor**, and
* how many experimental **cells** there are.

Examples:

* 2 × 2

  * Factor A has 2 levels, Factor B has 2 levels.
  * Total cells: :math:`2 \times 2 = 4` (control/low, control/high, cbt/low,
    cbt/high).

* 2 × 3

  * Factor A has 2 levels, Factor B has 3 levels.
  * Total cells: :math:`2 \times 3 = 6`.

* 2 × 2 × 2

  * Three factors, each with 2 levels.
  * Total cells: :math:`2 \times 2 \times 2 = 8`.

In this chapter we focus on the **two-way** case (two factors). The ideas
generalize to more complex designs, but two-way ANOVA is the workhorse for
most undergraduate research projects.

Main Effects
------------

A **main effect** is the overall effect of **one factor, averaging over the
levels of the other factor**.

* The **main effect of Training** asks:

  > On average across both Context conditions, do participants in ``cbt``
  > differ from those in ``control``?

* The **main effect of Context** asks:

  > On average across both Training conditions, do participants in
  > ``high_stress`` differ from those in ``low_stress``?

We compute **marginal means** to answer these questions. For example:

* Mean stress in ``control``:
  average of (control, low_stress) and (control, high_stress).
* Mean stress in ``cbt``:
  average of (cbt, low_stress) and (cbt, high_stress).

If those marginal means differ, there is evidence of a main effect for that
factor.

Interactions: The “It Depends” Effect
-------------------------------------

The most important idea in factorial designs is the **interaction**.

An **interaction** occurs when **the effect of one factor depends on the level
of the other factor**.

In our example, we ask:

*Does the benefit of CBT training depend on whether the context is
low-stress or high-stress?*

Mathematically, we can think of an interaction as a **difference of
differences**.

Let:

* :math:`\bar{X}_{\text{control, low}}` be the mean stress for control/low,
* :math:`\bar{X}_{\text{control, high}}` for control/high,
* :math:`\bar{X}_{\text{cbt, low}}` for cbt/low,
* :math:`\bar{X}_{\text{cbt, high}}` for cbt/high.

Compute the Training effect within each Context:

.. math::

   \text{Training effect at low stress}
   = \bar{X}_{\text{control, low}} - \bar{X}_{\text{cbt, low}},

.. math::

   \text{Training effect at high stress}
   = \bar{X}_{\text{control, high}} - \bar{X}_{\text{cbt, high}}.

The **interaction** for Training × Context is the difference between these
two effects:

.. math::

   \text{Interaction (difference of differences)}
   = (\bar{X}_{\text{control, high}} - \bar{X}_{\text{cbt, high}})
     - (\bar{X}_{\text{control, low}} - \bar{X}_{\text{cbt, low}}).

If that difference of differences is zero (within random error), we say there
is **no interaction**. If it is clearly non-zero, we have an interaction: the
effect of training changes across contexts.

Graphical View: Non-Parallel Lines
----------------------------------

Interactions are often easiest to see in a **line graph**:

* Put Context on the x-axis (low vs high).
* Plot mean stress for each Training condition as a separate line.

Then:

* If the lines are **parallel**, the Training effect is similar at low and
  high stress → no interaction (or only a trivial one).
* If the lines **spread apart**, **converge**, or **cross**, the Training
  effect changes with Context → interaction.

Two common patterns:

* **Spreading interaction**

  Training helps under high stress but has little effect under low stress.
  The lines diverge as you move from low to high stress.

* **Crossover interaction**

  Control performs better in low stress, but CBT performs better in high
  stress (or vice versa). The lines literally cross.

Simple Main Effects
-------------------

When an interaction is present, main effects can be hard to interpret on their
own.

For example, suppose CBT reduces stress **only** in the high-stress context.
The overall (marginal) means for Training might still show a "modest" effect,
even though the real story is:

* CBT ≈ control under low stress,
* CBT < control under high stress.

To unpack this, researchers examine **simple main effects**:

* the effect of one factor **at a single level of the other factor**.

Examples:

* Simple effect of Training **within low_stress**:
  compare control vs cbt using only low-stress participants.
* Simple effect of Training **within high_stress**:
  compare control vs cbt using only high-stress participants.
* Simple effect of Context **within control**:
  compare low vs high stress among control participants only.

In practice, simple main effects are often tested with *t*-tests or
one-way ANOVAs conducted **within** a subset of the data, sometimes combined
with Bonferroni or other corrections for multiple tests.

In the PyStatsV1 lab for this chapter, the two-way ANOVA is the primary
analysis. For pedagogical purposes, the script also shows how to compute a few
simple main effects (for example, Training within each Context) using
independent-samples *t*-tests when the interaction is statistically
significant.

The Two-Way ANOVA: Partitioning Variance
----------------------------------------

Factorial ANOVA extends the one-way ANOVA logic from Chapter 12. We still
partition the total variability in the outcome into meaningful components.

Let:

* Factor A = Training (2 levels),
* Factor B = Context (2 levels),
* :math:`Y_{ijk}` be the score for person :math:`k` in cell :math:`(i, j)`,
  where :math:`i` indexes levels of A and :math:`j` indexes levels of B,
* :math:`n_{ij}` be the number of participants in cell :math:`(i, j)`,
* :math:`\bar{Y}_{ij}` be the **cell mean** for cell :math:`(i, j)`,
* :math:`\bar{Y}_{i\cdot}` be the marginal mean for level :math:`i` of A,
* :math:`\bar{Y}_{\cdot j}` be the marginal mean for level :math:`j` of B,
* :math:`\bar{Y}_{\cdot\cdot}` be the **grand mean** across all participants.

Then the total sum of squares can be written as:

.. math::

   SS_{\text{Total}}
   = \sum_{i}\sum_{j}\sum_{k}
     (Y_{ijk} - \bar{Y}_{\cdot\cdot})^2.

For a **balanced** 2 × 2 design (equal :math:`n_{ij}` in each cell), we can
decompose this into:

.. math::

   SS_{\text{Total}} =
   SS_{A} + SS_{B} + SS_{AB} + SS_{\text{Within}},

where:

* :math:`SS_A` captures the main effect of Training,
* :math:`SS_B` captures the main effect of Context,
* :math:`SS_{AB}` captures the interaction,
* :math:`SS_{\text{Within}}` is the within-cell (error) variability.

*Note: For these components to be additive (A + B + AB), the design must be balanced.*

Each component has associated degrees of freedom (df) and a mean square (MS)
obtained by dividing SS by its df. The *F*-tests for each effect are:

.. math::

   F_A = \frac{MS_A}{MS_{\text{Within}}}, \quad
   F_B = \frac{MS_B}{MS_{\text{Within}}}, \quad
   F_{AB} = \frac{MS_{AB}}{MS_{\text{Within}}}.

Effect sizes (for example, :math:`\eta^2` for each effect) can be computed as
the proportion of total variance associated with each SS component. As with
one-way ANOVA, these sample-based measures tend to slightly overestimate the
population effect sizes, but they are helpful descriptive summaries.

Assumptions in the Two-Way ANOVA
--------------------------------

The classical two-way ANOVA relies on similar assumptions to the one-way case:

* **Independence**

  Observations are independent within and across cells (for example, each
  participant appears in only one cell).

* **Normality**

  The outcome scores within each cell are approximately Normally distributed.

* **Equal variances**

  The population variances are roughly equal across cells (homogeneity of
  variance).

* **Balanced design (for our manual calculations)**

  In this chapter, we assume **equal sample sizes in each cell** (for example,
  25 participants per Training × Context combination). This greatly simplifies
  the sums of squares and matches the PyStatsV1 implementation.

.. warning:: Balanced vs. Unbalanced Designs

   The manual calculations and PyStatsV1 helpers in this chapter assume a
   **balanced design** with equal sample sizes in every cell.

   If sample sizes differ (unbalanced), the factors become correlated and
   sums of squares must be computed using more advanced methods (for example,
   Type III sums of squares). Professional software (such as SPSS, SAS, or
   R packages) handles this automatically, but our hand-calculation formulas
   and simple code do **not**.

   For this reason, when you experiment with the Chapter 13 script, keep the
   cell sizes equal. If you need to analyze an unbalanced design in real
   research, use dedicated statistical software and pay close attention to
   how it defines and reports sums of squares.

PyStatsV1 Lab: Two-Way ANOVA on Stress Scores
---------------------------------------------

In this lab, you will analyze a simulated 2 × 2 factorial experiment with:

* Factor A: ``training`` (``control`` vs ``cbt``),
* Factor B: ``context`` (``low_stress`` vs ``high_stress``),
* Dependent variable: ``stress_score``.

You will:

* simulate a balanced dataset with the same :math:`n` in each cell,
* compute:

  - cell means and sample sizes,
  - marginal means for each level of Training and Context,
  - sums of squares :math:`SS_A`, :math:`SS_B`, :math:`SS_{AB}`,
    :math:`SS_{\text{Within}}`, :math:`SS_{\text{Total}}`,
  - corresponding degrees of freedom and mean squares,
  - *F*-statistics and *p*-values for each main effect and the interaction,
  - eta-squared style effect sizes for each effect,

* visualize the interaction with a simple line plot of cell means,
* optionally compute a small set of **simple main effects** (for example,
  Training within each Context) when the interaction is statistically
  significant.

All code for this lab lives in:

* ``scripts/psych_ch13_two_way_anova.py``

and the script can optionally write outputs to:

* ``data/synthetic/psych_ch13_two_way_stress.csv``

Running the Lab Script
----------------------

From the project root, you can run:

.. code-block:: bash

   python -m scripts.psych_ch13_two_way_anova

If your Makefile defines a convenience target, you can instead run:

.. code-block:: bash

   make psych-ch13

This will:

* simulate a balanced 2 × 2 Training × Context dataset,
* print the cell means and sample sizes,
* compute the two-way ANOVA table with *F*-tests for:

  - main effect of Training,
  - main effect of Context,
  - Training × Context interaction,

* report eta-squared style effect sizes for each effect,
* draw (or save) a simple interaction plot of mean stress by Context, with
  separate lines for each Training condition,
* optionally compute and print simple main effects (for example, Training
  within low_stress and within high_stress) when the interaction is
  statistically significant.

Expected Console Output
-----------------------

Your exact numbers will vary if you change the seed or parameters, but with the
default settings you might see output like:

.. code-block:: text

   Two-way ANOVA on stress scores (Training × Context)
   ---------------------------------------------------
   Cell means (n per cell = 25):
     control, low_stress    mean = 17.9
     control, high_stress   mean = 23.4
     cbt,     low_stress    mean = 16.8
     cbt,     high_stress   mean = 18.9

   ANOVA table:
     SS_A  (Training)       =  95.21, df_A  = 1, MS_A  = 95.21,  F_A  =  4.10, p_A  = 0.046
     SS_B  (Context)        = 640.37, df_B  = 1, MS_B  = 640.37, F_B  = 27.56, p_B  < 0.001
     SS_AB (Interaction)    = 118.94, df_AB = 1, MS_AB = 118.94, F_AB =  5.11, p_AB = 0.026
     SS_within              = 1087.42, df_within = 96, MS_within = 11.33
     SS_total               = 1941.94, df_total  = 99

   Effect sizes (eta-squared style):
     eta^2_Training    = 0.049
     eta^2_Context     = 0.330
     eta^2_Interaction = 0.061

   Simple main effects (because interaction is significant):
     Training within low_stress:  t(48) = 0.82,  p = 0.416
     Training within high_stress: t(48) = 2.78,  p = 0.008

   Interaction plot saved to: outputs/track_b/ch13_training_by_context.png

Focus on:

* **Cell means and lines** in the interaction plot:
  are the Training lines parallel, spreading, or crossing?
* **Main effects**: Are there overall differences between Training
  conditions, or between Contexts, when averaging across the other factor?
* **Interaction**: Does the Training effect depend on Context? Are the
  differences between control and cbt larger in one context than the other?
* **Simple main effects**: If the interaction is significant, do follow-up
  tests show that Training matters only under high stress, or in both contexts?

Your Turn: Practice Scenarios
-----------------------------

As in earlier chapters, you can experiment by editing parameters in
``psych_ch13_two_way_anova.py``. Some ideas:

* **Create a pure main-effect scenario**

  Make CBT slightly better than control in **both** contexts by the same
  amount. What happens to the Training main effect and the interaction?

* **Create a spreading interaction**

  Make Training have little effect under low_stress but a strong effect under
  high_stress. How does this change the interaction plot and the *F* for the
  interaction?

* **Create a crossover interaction**

  Make control slightly better under low_stress but cbt clearly better under
  high_stress. Can the overall Training main effect be small or even
  misleading, while the interaction is large?

* **Change the within-cell variability**

  Increase the standard deviation of the simulated scores. Watch how
  :math:`MS_{\text{Within}}` grows and the *F*-statistics shrink even if the
  cell means stay the same.

* **(Do not break the balance!)**

  You can change the **shared** ``n_per_cell`` parameter (for example, 20
  instead of 25), but resist the temptation to give different cells different
  sample sizes. Our manual formulas and PyStatsV1 helpers assume equal sample
  sizes in each cell. For unbalanced designs, you will need more advanced
  tools (for example, Type III sums of squares in specialized software).

Summary
-------

In this chapter you learned:

* why psychologists often use **factorial designs** with more than one
  independent variable,
* how to interpret **main effects** as overall differences for each factor,
* how to interpret **interactions** as "it depends" or **difference of
  differences** effects, often revealed by non-parallel lines in an
  interaction plot,
* why simple main effects are useful follow-ups when interactions are present,
* how the two-way ANOVA partitions variance into main effects, interaction,
  and error for a balanced design,
* how to implement a two-way ANOVA and basic simple main-effects analyses
  using PyStatsV1.

In the bigger arc:

* Chapter 10 introduced independent-samples *t*-tests for two groups.
* Chapter 11 introduced paired-samples *t*-tests for within-subjects designs.
* Chapter 12 extended the between-subjects logic to **three or more groups**
  using one-way ANOVA.
* Chapter 13 generalizes the ANOVA framework to **factorial designs**, where
  more than one independent variable is manipulated at the same time.

Factorial designs are powerful tools. They let you ask richer questions about
how psychological processes behave across different contexts, and they prepare
you for even more complex models (mixed designs, ANCOVA, and beyond) in later
chapters.

For the full Python implementation, see
``scripts/psych_ch13_two_way_anova.py`` in the PyStatsV1 GitHub repository.