Chapter 15 Appendix – Pingouin for Correlation and Partial Correlation
In Chapter 15 – Correlation, you learned how to:
define and interpret Pearson’s correlation coefficient \(r\),
visualize relationships with scatterplots and heatmaps,
compute correlations using both NumPy and
pingouin, andrun a single partial correlation (exam score ~ study hours | stress).
In that main chapter, the focus was on the concepts of correlation and
partial correlation. This appendix shifts the emphasis to the
pingouin library itself. We will treat Pingouin as a compact
“stats workbench” that can:
compute all pairwise correlations (and effect sizes) for a whole set of variables at once,
correct p-values for multiple comparisons,
and estimate partial correlations while adjusting for one or more covariates.
As before, all examples are reproducible with PyStatsV1 and use synthetic datasets so that you can freely experiment without privacy concerns.
Why this appendix?
In introductory courses it is common to present correlation as a single number between two variables. In real research, we rarely look at just one pair. A typical psychology study might collect ten or more measures (stress, sleep, anxiety, mood, study hours, exam score, etc.). The interesting questions are then:
which variables are most strongly related,
whether those relationships survive correction for multiple testing,
and whether an association remains after we statistically control for one or more third variables.
Doing all of this by hand, or with low-level functions, is tedious and
error-prone. pingouin provides higher-level helpers that match how
researchers actually work. In this appendix we highlight two of them:
pingouin.pairwise_corr()– compute all pairwise correlations in one shot (with effect sizes, confidence intervals, p-values, and optional p-value correction); andpingouin.partial_corr()– compute partial correlations while adjusting for one or more covariates.
All examples below assume that Pingouin is installed and that you have completed the Chapter 15 lab at least once.
A quick reminder: installing Pingouin
If you are working on your own machine (rather than the course server), you can install or update Pingouin as follows:
pip install --upgrade pingouin
or, if you use Conda:
conda install -c conda-forge pingouin
For details, see the official documentation at https://pingouin-stats.org.
Pairwise correlations with pingouin.pairwise_corr()
In the Chapter 15 lab (scripts.psych_ch15_correlation) we created a
synthetic dataset with several variables:
stresssleep_hoursanxietystudy_hoursexam_score
To compute all pairwise Pearson correlations among these variables using Pingouin, we can write:
import pingouin as pg
from scripts.psych_ch15_correlation import simulate_psych_correlation_dataset
df = simulate_psych_correlation_dataset(n=200, random_state=456)
pairwise = pg.pairwise_corr(
data=df,
columns=df.columns,
method="pearson",
padjust="none", # or "fdr_bh", "bonf", ...
)
print(pairwise.head())
The resulting pandas.DataFrame has one row per unique variable
pair and includes:
XandY– the variable names,r– Pearson’s correlation coefficient,CI95%– a 95% confidence interval for \(r\),p-unc– the uncorrected p-value,BF10– an optional Bayes Factor,and several other useful columns.
Because each pair appears only once (e.g., stress–exam_score but
not also exam_score–stress), the number of rows is:
where \(k\) is the number of variables.
Correcting for multiple comparisons
When you compute many correlations at once, some may look “significant” purely by chance. Pingouin helps you control the family-wise error rate or the false discovery rate by adjusting p-values.
For example, to apply the Benjamini–Hochberg false discovery rate (FDR)
correction, use padjust="fdr_bh":
pairwise_fdr = pg.pairwise_corr(
data=df,
columns=df.columns,
method="pearson",
padjust="fdr_bh",
)
The output now includes a p-adjust column with corrected p-values.
In this course we mostly treat the corrected p-values as “advanced tools”
for research projects, but it is important for students to see that the
option exists and is easy to use.
Spearman correlations
Sometimes you may not want to assume a strictly linear relationship or you might worry about outliers. In those cases, Spearman’s rank correlation can be more robust. Switching methods is as simple as:
pairwise_spearman = pg.pairwise_corr(
data=df,
columns=df.columns,
method="spearman",
padjust="fdr_bh",
)
You can then compare Pearson and Spearman estimates for the same pair of variables to see whether outliers or non-linearity are having a large impact.
Partial correlations with pingouin.partial_corr()
In the main chapter we computed a single partial correlation between
study_hours and exam_score while controlling for stress.
Pingouin makes it easy to extend this idea to multiple covariates.
The basic usage is:
import pingouin as pg
df = simulate_psych_correlation_dataset(n=200, random_state=456)
partial = pg.partial_corr(
data=df,
x="study_hours",
y="exam_score",
covar=["stress"], # one or more covariates
method="pearson",
)
print(partial)
The result again is a one-row pandas.DataFrame with columns:
r– the partial correlation,CI95%– a confidence interval for \(r\),p-val– the p-value,plus the sample size
n.
Controlling for multiple covariates is just as easy:
partial_two = pg.partial_corr(
data=df,
x="study_hours",
y="exam_score",
covar=["stress", "anxiety"],
method="pearson",
)
In a research context, partial correlations are especially useful when trying to decide whether a relationship is likely to be “direct” or whether it can be explained away by a third (or fourth) variable.
PyStatsV1 demo scripts for Chapter 15a
To keep the main Chapter 15 lab focused, the PyStatsV1 repository includes two small helper scripts that live in this appendix:
scripts.psych_ch15a_pingouin_pairwise_demoShows how to:
generate the Chapter 15 synthetic dataset,
compute all pairwise Pearson and Spearman correlations with
pingouin.pairwise_corr(),apply FDR correction to the p-values,
and save the resulting tables to
outputs/track_b.
scripts.psych_ch15a_pingouin_partial_demoShows how to:
compare a zero-order correlation with one or more partial correlations,
control for multiple covariates at once,
and summarize the results in a compact table.
You can run these scripts from the command line (inside your PyStatsV1 virtual environment) using:
python -m scripts.psych_ch15a_pingouin_pairwise_demo
python -m scripts.psych_ch15a_pingouin_partial_demo
or, if you prefer the Makefile shortcuts (once they have been added):
make psych-ch15a
Unit tests for Chapter 15a
To make sure that the demos behave as expected, we include two small test files:
tests.test_psych_ch15a_pingouin_pairwise_demotests.test_psych_ch15a_pingouin_partial_demo
The tests do not check every value. Instead, they verify structural and conceptual properties such as:
pingouin.pairwise_corr()returns the expected number of pairs,the sign and approximate strength of the correlation between
stressandexam_scorematch the design of the synthetic dataset,partial correlations shrink (but do not reverse) the positive association between
study_hoursandexam_scorewhen we control forstress.
You can run just these tests with:
pytest tests/test_psych_ch15a_pingouin_pairwise_demo.py
pytest tests/test_psych_ch15a_pingouin_partial_demo.py
or run the full Track B test suite with:
pytest
Suggested student exercises
Add a new variable to the Chapter 15 synthetic dataset (for example,
social_support) that is negatively related tostressand positively related tosleep_hoursandexam_score. Re-run the pairwise correlation demo and interpret the changes in the correlation matrix.Use
pingouin.pairwise_corr()withmethod="spearman"and compare the results to the Pearson correlations. Are any pairs sensitive to outliers or non-linearity?Choose a pair of variables where you suspect a third variable might explain part of the relationship (for example,
stressandexam_scorewithsleep_hoursas a covariate). Compute zero-order and partial correlations and compare the results.For your own research project, design a small correlation study with at least five variables. Use PyStatsV1 and Pingouin to:
compute all pairwise correlations,
adjust for multiple comparisons,
and report at least one partial correlation in APA style.
This appendix is meant as a bridge between the introductory correlation chapter and more advanced courses in multivariate statistics. The goal is not to memorize every option of Pingouin, but to develop a habit of using well-tested tools to explore relationships among multiple psychological variables in a principled way.