Chapter 16b – Regression Diagnostics with Pingouin
Track B: Psychological Science & Statistics – Appendix to Chapter 16
Overview
In Chapter 16, we introduced linear regression as a tool for prediction and interpretation. We focused on
the line of best fit (\(Y' = bX + a\)),
the least squares criterion,
the standard error of the estimate, and
multiple regression (predicting behavior from multiple variables).
However, a good PyStatsV1 workflow does not stop after fitting a model. We must engineer our results by checking whether the model and the data behave as the assumptions require.
This appendix shows how to:
use
pingouinto fit a multiple regression model,compute standard regression diagnostics (residuals, leverage, Cook’s distance),
identify potentially influential observations, and
illustrate the dangers of relying only on summary statistics using Anscombe’s Quartet.
The goal is to give students a reproducible, testable set of tools they can reuse in their own projects.
Learning goals
After working through this appendix, you should be able to:
Explain the difference between good fit (e.g., high \(R^2\)) and good model (reasonable assumptions).
Interpret standard regression diagnostics:
residuals and standardized residuals,
leverage (hat values),
Cook’s distance.
Use
pingouin’s regression tools together with NumPy and pandas to compute these diagnostics.Explain how Anscombe’s Quartet shows that
identical means, variances, and correlations can hide very different data patterns,
visualization and diagnostics are crucial in a PyStatsV1 workflow.
Files for this appendix
This appendix uses the following PyStatsV1 files:
Script:
scripts/psych_ch16b_pingouin_regression_diagnostics.pysimulates a psychology regression dataset (reusing the Chapter 16 data generator),
fits a multiple regression model with
pingouin.linear_regression(),computes regression diagnostics (residuals, leverage, Cook’s distance),
identifies the most influential observations,
generates diagnostic plots,
constructs and analyzes Anscombe’s Quartet to demonstrate why diagnostics and visualization matter.
Tests:
tests/test_psych_ch16b_pingouin_regression_diagnostics.pyverify that diagnostics have the expected shape and properties,
check that leverage behaves as theory predicts,
ensure that model \(R^2\) is in a reasonable range,
run the full end-to-end pipeline in a temporary directory,
verify that CSV and PNG outputs are written correctly,
check that the Anscombe datasets have nearly identical summary statistics while having different shapes.
Makefile targets (added in a separate CI branch):
make psych-ch16b– run the diagnostics demo (including Anscombe’s Quartet),make test-psych-ch16b– run tests for this appendix only.
Note
As with previous chapters, the script is written in a way that makes it easy to import its functions into other projects or Jupyter notebooks. The tests treat regression diagnostics as software objects that can be checked, versioned, and reused.
Section 1 – Regression diagnostics in practice
Recall that a linear regression model makes several assumptions:
Linearity – the relationship between predictors and outcome is approximately linear.
Homoscedasticity – the spread (variance) of residuals is roughly constant across the range of fitted values.
Independence – residuals are not systematically related to each other (e.g., no strong time trends).
Normality of residuals – residuals are approximately normally distributed.
The pingouin function pingouin.linear_regression() focuses
primarily on estimation and inference:
regression coefficients and standard errors,
\(t\)-tests and \(p\)-values,
\(R^2\) and adjusted \(R^2\).
To check assumptions, we need additional diagnostics. In
psych_ch16b_pingouin_regression_diagnostics.py we therefore:
Simulate a dataset that extends the Chapter 16 example, with variables such as:
stresssleep_hoursstudy_hoursmotivationexam_score(outcome)
Fit a multiple regression model predicting
exam_scorefrom several predictors.Compute diagnostics using NumPy and pandas:
fitted values – model predictions \(\\hat{y}\),
residuals – observed minus fitted (\(y - \\hat{y}\)),
standardized residuals – residuals scaled by their estimated standard deviation,
leverage – hat values on the diagonal of the hat matrix, \(H = X (X'X)^{-1} X'\),
Cook’s distance – a measure of how much the regression coefficients would change if we removed a given observation.
Save the diagnostics to CSV and plot simple diagnostics:
Residuals vs Fitted plot – to check linearity and homoscedasticity.
Leverage vs Cook’s distance plot – to identify high-leverage, influential observations.
Interpreting diagnostics (high level)
Residuals should be roughly centered around zero. A clear curve or pattern in residuals vs fitted values suggests non-linearity.
Leverage values near 0 indicate little influence on the model fit; values closer to 1 indicate observations that are far from the center of the predictor space.
Cook’s distance combines residual size and leverage. Points with unusually large Cook’s distance are candidates for closer inspection. They are not automatically “bad” data points, but they may be influential.
Section 2 – Anscombe’s Quartet
To see why diagnostics and visualization are essential, this appendix includes a second dataset: Anscombe’s Quartet.
Anscombe (1973) constructed four small datasets (I–IV) with the following surprising property:
Each dataset has nearly identical:
mean of \(x\),
mean of \(y\),
variance of \(x\),
variance of \(y\),
correlation \(r\) between \(x\) and \(y\),
regression line \(Y' = bX + a\).
But when you plot them, the shapes are completely different:
One looks like a typical linear relationship.
One is clearly non-linear.
One is linear except for a single outlier.
One has a nearly perfect vertical line with one extreme point.
In other words, summary statistics alone can mislead us. Two datasets can share the same correlation and regression line but tell completely different stories once we visualize them.
How we use Anscombe’s Quartet in PyStatsV1
The script psych_ch16b_pingouin_regression_diagnostics.py includes:
a helper that constructs a tidy version of Anscombe’s Quartet with columns
xydataset(I, II, III, IV)
a function that computes, for each dataset:
\(\\bar{x}\), \(\\bar{y}\),
\(s_x^2\), \(s_y^2\),
correlation \(r\),
simple regression line (\(a\) and \(b\)).
a 2x2 grid of scatterplots with
the same axis limits,
the fitted regression line overlaid,
one panel per dataset (I–IV).
The corresponding tests check that:
all four datasets have nearly identical summary statistics, and
the code produces the expected summary table and plot file.
Worked example (conceptual)
Run the diagnostics script (once your Makefile targets are wired):
make psych-ch16bThe script first runs the psychology regression diagnostics example (as described in Section 1).
Then the script constructs Anscombe’s Quartet, computes summary statistics by dataset, and prints something like:
Anscombe summary (per dataset): dataset mean_x mean_y var_x var_y r slope intercept 0 I 9.00 7.50 11.00 4.13 0.82 0.50 3.00 1 II 9.00 7.50 11.00 4.13 0.82 0.50 3.00 2 III 9.00 7.50 11.00 4.13 0.82 0.50 3.00 3 IV 9.00 7.50 11.00 4.13 0.82 0.50 3.00
The exact numbers may differ slightly due to floating point rounding, but the key idea is that the four datasets have almost identical summary statistics.
Finally, the script creates a 2x2 scatterplot figure and writes it to:
outputs/track_b/ch16b_anscombe_quartet.png
When you inspect this image, you will see four very different patterns, despite having “the same” regression summary.
Takeaway for students and instructors
Anscombe’s Quartet makes two core points that align with the PyStatsV1 philosophy:
Do not stop at statistics.
A single number like \(r\) or \(R^2\) can hide very different data stories.
Always pair numerical output with plots and diagnostics.
Treat models as software artifacts.
In PyStatsV1, every substantial analysis step is backed by functions, tests, and CI checks.
Adding a new diagnostic (e.g., Cook’s distance, Anscombe analysis) means adding new code and new tests.
Section 3 – The code: overview of key functions
You do not need to memorize the exact implementation details, but it is useful to know what the main functions do.
In scripts/psych_ch16b_pingouin_regression_diagnostics.py:
compute_regression_diagnostics(df, predictors, outcome)Fits a multiple regression model,
\[\begin{split}exam\\_score \\sim study\\_hours + sleep\\_hours + stress + motivation,\end{split}\]returns a diagnostics DataFrame with
fitted,residual,std_residual,leverage,cooks_distance,
and a
pingouinregression summary table for cross-checking.
run_ch16b_demo(n, random_state)Simulates the psychology regression dataset,
calls
compute_regression_diagnostics(),saves diagnostics and top influential points to CSV,
generates residuals vs fitted and leverage vs Cook’s distance plots,
constructs and analyzes Anscombe’s Quartet,
saves Anscombe summary statistics and plots,
prints a concise narrative summary to the console.
Anscombe helpers (internal names may differ slightly):
a function to construct the tidy Anscombe dataset,
a function to compute summary statistics by dataset,
a plotting function to generate the 2x2 Anscombe scatterplot figure with regression lines.
In tests/test_psych_ch16b_pingouin_regression_diagnostics.py:
One test verifies that diagnostics have the expected columns and that leverage behaves as theory predicts (e.g., the average leverage is approximately \(p / n\), where \(p\) is the number of parameters including the intercept).
Another test runs
run_ch16b_demo()in a temporary directory and verifies that all expected CSV and PNG files exist and are non-empty.A third test checks that Anscombe’s Quartet is implemented correctly:
there are four datasets with the expected number of rows,
group-level summary statistics are nearly identical across datasets,
the code produces an Anscombe summary CSV and plot image.
How this Appendix fits into the Track B narrative
Chapter 15 and 15a introduced correlation and partial correlation, using
pingouinas a high-level toolbox.Chapter 16 developed the core ideas of linear regression: prediction, least squares, standard error of the estimate, and multiple regression.
Appendix 16a expanded regression with additional estimation examples.
Appendix 16b (this chapter) emphasizes that
even a beautifully written model can be misleading if we ignore diagnostics,
the shape of the data always matters,
simple, testable diagnostics can be integrated into every analysis pipeline.
By the time students reach Chapter 17 (Mixed-Model Designs), they will have seen that a PyStatsV1-style analysis is not just about “getting significant results.” It is about building robust, transparent, and reproducible statistical workflows that can be trusted.
Next steps
After completing this appendix, you are ready to move into
Chapter 17 – Mixed-Model Designs, where we combine between-subjects and within-subjects factors, and
later, Chapter 18 – ANCOVA, where we explicitly control for covariates in more complex models.
In both chapters, the habits you practiced here—checking assumptions, visualizing patterns, and treating models as software artifacts—will remain central.