Chapter 17 – Mixed-Model Designs ================================ Learning goals -------------- In this chapter you will learn how to: * describe the logic of *mixed-model* (split-plot) designs, * distinguish between **between-subjects** and **within-subjects** factors, * understand why mixed designs have different error terms for different effects, * interpret a mixed ANOVA table (Group, Time, and Group × Time), * and use :mod:`pystatsv1` and :mod:`pingouin` to analyze a simple treatment study. By the end of the chapter, you should be able to read a mixed ANOVA output and explain, in plain language, what each line means for a psychology research question. 17.1 The hybrid design: between-subjects + within-subjects ---------------------------------------------------------- So far in Track B you have seen: * **Between-subjects** designs (Chapters 10, 12, 13), where each participant belongs to one condition only; and * **Within-subjects** designs (Chapters 11 and 14), where the *same* participants are measured repeatedly (e.g., Pre, Post, Follow-up). A **mixed-model design** combines these two ideas. A classic example is a treatment study where: * people are randomly assigned to a **Group** (Treatment vs Control), and * everyone is measured at multiple **Time** points (Pre, Post, Follow-up). .. note:: In psychology, mixed designs are extremely common. Any longitudinal study that compares two or more groups over time is likely to be a mixed model. Terminology ~~~~~~~~~~~ **Between-subjects factor** A factor where different participants belong to different levels. In this chapter, ``group`` (``"control"`` vs ``"treatment"``) is a between-subjects factor. **Within-subjects factor** A factor where each participant is measured at *each* level. In this chapter, ``time`` (``"pre"``, ``"post"``, ``"followup"``) is a within-subjects factor. **Mixed design** A design that includes at least one between-subjects factor and at least one within-subjects factor. Sometimes called a *split-plot* design. Why use a mixed design? ~~~~~~~~~~~~~~~~~~~~~~~ Mixed designs give you the best of both worlds: * You can look at **group differences** (Treatment vs Control). * You can look at **change over time** (Pre vs Post vs Follow-up). * You can test whether the **pattern of change over time is different for each group** (the Group × Time *interaction*). This is usually the main scientific question: *Did the treatment group improve more over time than the control group?* 17.2 The split-plot logic and error terms ----------------------------------------- In a pure between-subjects ANOVA (Chapter 13), all error comes from **differences between participants** within each condition. In a pure repeated-measures ANOVA (Chapter 14), a lot of that individual difference error is removed because each person acts as their own control. In a **mixed** design, we have both types of variation: * differences **between participants** (some students are generally more anxious than others), and * differences **within participants over time** (everyone may change from Pre to Post to Follow-up). A mixed ANOVA therefore has different error terms for different effects: * The **Group** effect (Treatment vs Control) uses an error term based on *between-subject* variability. * The **Time** effect and the **Group × Time interaction** use an error term based on *within-subject* variability. You do **not** have to compute these error terms by hand in this chapter. Instead, we focus on: * understanding the design, * structuring the data correctly, * and learning how to read the output from a trusted library (here, :mod:`pingouin`). 17.3 Example: Treatment vs control across three time points ----------------------------------------------------------- We will work with a simple, synthetic example that mimics a common clinical psychology design. Scenario ~~~~~~~~ A clinical psychologist wants to test whether a new cognitive-behavioural program reduces anxiety compared to a waitlist control group. * Participants are randomly assigned to **Treatment** or **Control**. * Everyone completes an anxiety scale at three time points: * ``pre`` – before treatment starts, * ``post`` – immediately after treatment, and * ``followup`` – three months later. Hypotheses ~~~~~~~~~~ * **Group main effect**: *H0*: There is no overall difference in anxiety between Treatment and Control. *H1*: One group has higher average anxiety than the other. * **Time main effect**: *H0*: Average anxiety is the same at Pre, Post, and Follow-up. *H1*: Average anxiety changes over time (e.g., decreases after treatment). * **Group × Time interaction** (the most important): *H0*: The pattern of change over time is the same for both groups. *H1*: The pattern of change over time is *different* (e.g., Treatment improves more from Pre to Post and maintains gains at Follow-up). Data structure ~~~~~~~~~~~~~~ As with Chapter 14, we will work with both **wide** and **long** formats. *Wide format* (one row per participant):: subject group anxiety_pre anxiety_post anxiety_followup ----------------------------------------------------------- 01 control 52.3 50.1 48.7 02 control 47.8 49.2 48.9 03 treatment 51.9 40.6 38.2 ... *Long format* (one row per person *per time point*):: subject group time anxiety -------------------------------------- control_01 control pre 52.3 control_01 control post 50.1 control_01 control followup 48.7 treat_01 treatment pre 51.9 treat_01 treatment post 40.6 treat_01 treatment followup 38.2 ... Most mixed ANOVA functions (including :func:`pingouin.mixed_anova`) expect the **long** format with columns that label: * the **within-subjects factor** (``time``), * the **between-subjects factor** (``group``), and * the **dependent variable** (``anxiety``). 17.4 PyStatsV1 lab – structuring and analyzing a mixed design ------------------------------------------------------------- The Chapter 17 lab is implemented in the script :mod:`scripts.psych_ch17_mixed_models`. It has three main responsibilities: 1. **Simulate a mixed design dataset** where Treatment improves more than Control over time. 2. **Reshape the data** into both wide and long formats. 3. **Run a mixed ANOVA with :mod:`pingouin`** and save clean outputs for students to inspect. Simulating the data ~~~~~~~~~~~~~~~~~~~ The function :func:`simulate_mixed_design_dataset` constructs a dataset with: * two groups (``"control"`` and ``"treatment"``), * three time points (``"pre"``, ``"post"``, ``"followup"``), * and an anxiety outcome designed so that: * both groups start with similar anxiety at ``pre``, * the treatment group shows a strong drop from ``pre`` to ``post``, and * the control group changes very little. To keep the lab deterministic and testable, the simulation uses a fixed random seed by default. Running the mixed ANOVA ~~~~~~~~~~~~~~~~~~~~~~~ The core analysis uses :func:`pingouin.mixed_anova` applied to the long-format data:: import pingouin as pg aov = pg.mixed_anova( data=long_df, dv="anxiety", within="time", between="group", subject="subject", ) The resulting table contains separate rows for: * the **Group** main effect, * the **Time** main effect, and * the **Group × Time** interaction. For each effect, you get: * degrees of freedom (``DF1``, ``DF2``), * F-statistic (``F``), * p-value (``p-unc``), * and effect size metrics such as partial eta-squared (``np2``). When the simulation is working correctly, you should see: * a modest or small **Group** main effect, * a clear **Time** main effect (participants change over time), and * a strong **Group × Time** interaction (Treatment improves more than Control). Saved outputs ~~~~~~~~~~~~~ When you run the lab via the Makefile target:: make psych-ch17 the script will: * print key results to the console, * save the simulated long-format data to:: data/synthetic/psych_ch17_mixed_design_long.csv * save the wide-format data to:: data/synthetic/psych_ch17_mixed_design_wide.csv * save the mixed ANOVA table to:: outputs/track_b/ch17_mixed_anova.csv * and create an interaction plot (group means over time) at:: outputs/track_b/ch17_group_by_time_means.png Instructors can use these files for in-class demonstrations, and students can use them for homework or project work without having to re-run the simulation. Connection to future chapters ----------------------------- Mixed-model designs sit at the intersection of several ideas you have already seen: * **Factorial logic** from Chapter 13 (main effects and interactions), * **Repeated-measures logic** from Chapter 14 (within-subject error terms), * and **Regression logic** from Chapters 15–16 (predicting outcomes from multiple sources of information). The next chapters extend these ideas further: * Chapter 18 shows how to statistically control for a *covariate* using ANCOVA. * Chapter 19 introduces non-parametric alternatives for situations where standard ANOVA assumptions are not met. * Chapter 20 brings everything together in a full PyStatsV1 project. For now, the goal is not to master every technical detail of mixed-model mathematics, but to develop a solid *conceptual* understanding and a reliable, reproducible workflow for analyzing the kinds of treatment-over-time studies that are central to modern psychological science.