Chapter 15 Appendix – Pingouin for Correlation and Partial Correlation ===================================================================== In :doc:`psych_ch15_correlation`, you learned how to: * define and interpret Pearson's correlation coefficient :math:`r`, * visualize relationships with scatterplots and heatmaps, * compute correlations using both NumPy and :mod:`pingouin`, and * run a single partial correlation (exam score ~ study hours | stress). In that main chapter, the focus was on the *concepts* of correlation and partial correlation. This appendix shifts the emphasis to the :mod:`pingouin` library itself. We will treat Pingouin as a compact “stats workbench” that can: * compute all pairwise correlations (and effect sizes) for a whole set of variables at once, * correct p-values for multiple comparisons, * and estimate partial correlations while adjusting for one or more covariates. As before, all examples are reproducible with PyStatsV1 and use synthetic datasets so that you can freely experiment without privacy concerns. Why this appendix? ------------------ In introductory courses it is common to present correlation as a single number between two variables. In real research, we rarely look at just one pair. A typical psychology study might collect ten or more measures (stress, sleep, anxiety, mood, study hours, exam score, etc.). The interesting questions are then: * which variables are most strongly related, * whether those relationships survive correction for multiple testing, * and whether an association remains after we statistically control for one or more third variables. Doing all of this by hand, or with low-level functions, is tedious and error-prone. :mod:`pingouin` provides higher-level helpers that match how researchers actually work. In this appendix we highlight two of them: * :func:`pingouin.pairwise_corr` – compute all pairwise correlations in one shot (with effect sizes, confidence intervals, p-values, and optional p-value correction); and * :func:`pingouin.partial_corr` – compute partial correlations while adjusting for one or more covariates. All examples below assume that Pingouin is installed and that you have completed the Chapter 15 lab at least once. A quick reminder: installing Pingouin ------------------------------------- If you are working on your own machine (rather than the course server), you can install or update Pingouin as follows:: pip install --upgrade pingouin or, if you use Conda:: conda install -c conda-forge pingouin For details, see the official documentation at https://pingouin-stats.org. Pairwise correlations with :func:`pingouin.pairwise_corr` --------------------------------------------------------- In the Chapter 15 lab (:mod:`scripts.psych_ch15_correlation`) we created a synthetic dataset with several variables: * ``stress`` * ``sleep_hours`` * ``anxiety`` * ``study_hours`` * ``exam_score`` To compute *all* pairwise Pearson correlations among these variables using Pingouin, we can write: .. code-block:: python import pingouin as pg from scripts.psych_ch15_correlation import simulate_psych_correlation_dataset df = simulate_psych_correlation_dataset(n=200, random_state=456) pairwise = pg.pairwise_corr( data=df, columns=df.columns, method="pearson", padjust="none", # or "fdr_bh", "bonf", ... ) print(pairwise.head()) The resulting :class:`pandas.DataFrame` has one row per *unique* variable pair and includes: * ``X`` and ``Y`` – the variable names, * ``r`` – Pearson's correlation coefficient, * ``CI95%`` – a 95% confidence interval for :math:`r`, * ``p-unc`` – the uncorrected p-value, * ``BF10`` – an optional Bayes Factor, * and several other useful columns. Because each pair appears only once (e.g., ``stress``–``exam_score`` but not also ``exam_score``–``stress``), the number of rows is: .. math:: \text{n_pairs} = \frac{k(k-1)}{2}, where :math:`k` is the number of variables. Correcting for multiple comparisons ----------------------------------- When you compute many correlations at once, some may look “significant” purely by chance. Pingouin helps you control the family-wise error rate or the false discovery rate by adjusting p-values. For example, to apply the Benjamini–Hochberg false discovery rate (FDR) correction, use ``padjust="fdr_bh"``: .. code-block:: python pairwise_fdr = pg.pairwise_corr( data=df, columns=df.columns, method="pearson", padjust="fdr_bh", ) The output now includes a ``p-adjust`` column with corrected p-values. In this course we mostly treat the corrected p-values as “advanced tools” for research projects, but it is important for students to see that the option exists and is easy to use. Spearman correlations --------------------- Sometimes you may not want to assume a strictly linear relationship or you might worry about outliers. In those cases, Spearman's rank correlation can be more robust. Switching methods is as simple as: .. code-block:: python pairwise_spearman = pg.pairwise_corr( data=df, columns=df.columns, method="spearman", padjust="fdr_bh", ) You can then compare Pearson and Spearman estimates for the same pair of variables to see whether outliers or non-linearity are having a large impact. Partial correlations with :func:`pingouin.partial_corr` ------------------------------------------------------- In the main chapter we computed a single partial correlation between ``study_hours`` and ``exam_score`` while controlling for ``stress``. Pingouin makes it easy to extend this idea to multiple covariates. The basic usage is: .. code-block:: python import pingouin as pg df = simulate_psych_correlation_dataset(n=200, random_state=456) partial = pg.partial_corr( data=df, x="study_hours", y="exam_score", covar=["stress"], # one or more covariates method="pearson", ) print(partial) The result again is a one-row :class:`pandas.DataFrame` with columns: * ``r`` – the partial correlation, * ``CI95%`` – a confidence interval for :math:`r`, * ``p-val`` – the p-value, * plus the sample size ``n``. Controlling for *multiple* covariates is just as easy: .. code-block:: python partial_two = pg.partial_corr( data=df, x="study_hours", y="exam_score", covar=["stress", "anxiety"], method="pearson", ) In a research context, partial correlations are especially useful when trying to decide whether a relationship is likely to be “direct” or whether it can be explained away by a third (or fourth) variable. PyStatsV1 demo scripts for Chapter 15a -------------------------------------- To keep the main Chapter 15 lab focused, the PyStatsV1 repository includes two small helper scripts that live in this appendix: * :mod:`scripts.psych_ch15a_pingouin_pairwise_demo` Shows how to: * generate the Chapter 15 synthetic dataset, * compute all pairwise Pearson and Spearman correlations with :func:`pingouin.pairwise_corr`, * apply FDR correction to the p-values, * and save the resulting tables to ``outputs/track_b``. * :mod:`scripts.psych_ch15a_pingouin_partial_demo` Shows how to: * compare a zero-order correlation with one or more partial correlations, * control for multiple covariates at once, * and summarize the results in a compact table. You can run these scripts from the command line (inside your PyStatsV1 virtual environment) using: .. code-block:: bash python -m scripts.psych_ch15a_pingouin_pairwise_demo python -m scripts.psych_ch15a_pingouin_partial_demo or, if you prefer the Makefile shortcuts (once they have been added): .. code-block:: bash make psych-ch15a Unit tests for Chapter 15a -------------------------- To make sure that the demos behave as expected, we include two small test files: * :mod:`tests.test_psych_ch15a_pingouin_pairwise_demo` * :mod:`tests.test_psych_ch15a_pingouin_partial_demo` The tests do not check every value. Instead, they verify structural and conceptual properties such as: * :func:`pingouin.pairwise_corr` returns the expected number of pairs, * the sign and approximate strength of the correlation between ``stress`` and ``exam_score`` match the design of the synthetic dataset, * partial correlations shrink (but do not reverse) the positive association between ``study_hours`` and ``exam_score`` when we control for ``stress``. You can run just these tests with: .. code-block:: bash pytest tests/test_psych_ch15a_pingouin_pairwise_demo.py pytest tests/test_psych_ch15a_pingouin_partial_demo.py or run the full Track B test suite with: .. code-block:: bash pytest Suggested student exercises --------------------------- 1. Add a new variable to the Chapter 15 synthetic dataset (for example, ``social_support``) that is negatively related to ``stress`` and positively related to ``sleep_hours`` and ``exam_score``. Re-run the pairwise correlation demo and interpret the changes in the correlation matrix. 2. Use :func:`pingouin.pairwise_corr` with ``method="spearman"`` and compare the results to the Pearson correlations. Are any pairs sensitive to outliers or non-linearity? 3. Choose a pair of variables where you suspect a third variable might explain part of the relationship (for example, ``stress`` and ``exam_score`` with ``sleep_hours`` as a covariate). Compute zero-order and partial correlations and compare the results. 4. For your own research project, design a small correlation study with at least five variables. Use PyStatsV1 and Pingouin to: * compute all pairwise correlations, * adjust for multiple comparisons, * and report at least one partial correlation in APA style. This appendix is meant as a bridge between the introductory correlation chapter and more advanced courses in multivariate statistics. The goal is not to memorize every option of Pingouin, but to develop a habit of using well-tested tools to explore relationships among multiple psychological variables in a principled way.