============================== Chapter 19 – Non-Parametric Statistics ============================== Learning goals ============== By the end of this chapter you will be able to: * Explain when non-parametric tests are preferred over traditional (parametric) procedures such as the *t*-test or ANOVA. * Describe the logic of the chi-square family of tests. * Distinguish between chi-square tests of **goodness of fit** and **independence**. * Recognize rank-based alternatives to *t*-tests and one-way ANOVA (Mann–Whitney U, Wilcoxon signed-rank, Kruskal–Wallis, Friedman). * Use :mod:`PyStatsV1` and :mod:`pingouin` to analyze survey-style data with chi-square tests on categorical variables. 19.1 When parametric assumptions break down =========================================== In earlier chapters, we focused on *parametric* procedures: * *t*-tests (Chapters 9–11) * One-way and factorial ANOVA (Chapters 12–14) * Regression and ANCOVA (Chapters 16–18) These procedures make several assumptions about the data: * **Quantitative scale** – variables are interval or ratio, not purely nominal. * **Normality** – scores within each group are (approximately) normally distributed. * **Homogeneity of variance** – population variances are equal across groups. * **Linearity** – for correlation and regression, the relationship between variables is approximately linear. When these assumptions are badly violated, parametric tests can give misleading *p*-values and confidence intervals. In those cases, we often turn to **non-parametric** methods. Non-parametric tests typically: * Work with **ranks** or **counts** rather than raw numeric values. * Make **fewer distributional assumptions**. * Are often slightly **less powerful** when parametric assumptions *are* met, but **more robust** when those assumptions fail. In psychological research, non-parametric tests are especially useful when: * The outcome is **ordinal** (e.g., Likert scales: “Strongly disagree” to “Strongly agree”). * The data are **severely skewed** or have heavy **outliers** that cannot be reasonably transformed. * The variable is **categorical** (e.g., therapy preference, diagnostic category, treatment response yes/no). 19.2 Chi-square tests for categorical data ========================================== The most common non-parametric tests in introductory psychology involve **frequency counts** in categories. The basic question is: *Do the observed counts differ from what we would expect by chance?* The chi-square family addresses this question in two main situations. 19.2.1 Goodness of fit ---------------------- A **chi-square goodness-of-fit test** compares observed category counts to a theoretical or expected distribution. For example: * A survey asks which coping strategy students use most often: *Exercise*, *Therapy*, *Mindfulness*, or *Social support*. * If there was **no preference**, we would expect roughly equal counts in each category (25% each). * The chi-square goodness-of-fit test asks whether the observed distribution differs significantly from this uniform expectation. Statistically, we compute:: χ² = Σ (Observed - Expected)² / Expected with degrees of freedom :math:`df = k - 1`, where *k* is the number of categories. If χ² is large relative to its degrees of freedom, the *p*-value will be small and we reject the null hypothesis that the observed frequencies match the expected distribution. 19.2.2 Test of independence --------------------------- A **chi-square test of independence** asks whether two categorical variables are related. For example: * Variable 1: Type of therapy received (*Control*, *CBT*, *Mindfulness*). * Variable 2: Treatment outcome (*Improved* vs. *Did not improve*). We arrange the counts in a **contingency table** and again compute a chi-square statistic. Here, the null hypothesis states that the variables are **independent** – knowing a person’s therapy type tells you nothing about their likelihood of improvement. We also report an **effect size** such as **Cramér’s V**, which is based on the chi-square value but scaled to lie between 0 and 1: * ~0.10: small association * ~0.30: medium * ~0.50 or higher: large 19.3 Rank-based tests ===================== Not all non-parametric tests are based on counts. Many are based on **ranks** of the outcome variable. Instead of analyzing raw scores, we: 1. Combine all scores across groups. 2. Rank them from lowest to highest. 3. Analyze the ranks using an appropriate test statistic. Some common rank-based tests and their parametric counterparts: * **Mann–Whitney U**: alternative to an independent-samples *t*-test. * **Wilcoxon signed-rank**: alternative to a paired-samples *t*-test. * **Kruskal–Wallis H**: alternative to a one-way ANOVA with independent groups. * **Friedman test**: alternative to a repeated-measures one-way ANOVA. These tests are especially helpful when: * The outcome is ordinal (e.g., 1–7 rating scales). * The data are heavily skewed or contain extreme outliers. * Sample sizes are small, making normality assumptions doubtful. 19.4 When to choose non-parametric methods ========================================== There is no single “magic rule,” but some practical guidelines: * **Use chi-square tests** when both your predictor and outcome are **categorical** (nominal) and you are working with **counts**, not percentages. * **Use rank-based tests** when: * The outcome variable is **ordinal**, or * You have strong violations of normality or homogeneity that cannot be fixed by transformations, and * You are more concerned about **validity** than about squeezing out every bit of statistical **power**. When in doubt, you can often: * Run the **parametric** test (e.g., *t*-test or ANOVA). * Run the **non-parametric** alternative. * Compare conclusions – if they agree, your result is probably robust. 19.5 PyStatsV1 Lab: Chi-square analysis of survey data ====================================================== In this chapter’s lab you will use :mod:`PyStatsV1` to analyze **simulated survey data** using chi-square tests. The code lives in: * :mod:`scripts.psych_ch19_nonparametrics` * :mod:`tests.test_psych_ch19_nonparametrics` Running the lab ---------------- From the project root (with your virtual environment activated), run:: make psych-ch19 make test-psych-ch19 The first command will: 1. Simulate a **coping strategies** survey with four categories (e.g., Exercise, Therapy, Mindfulness, Social support). 2. Run a **chi-square goodness-of-fit** test to check whether the observed distribution differs from a uniform (no-preference) null. 3. Save the raw data and a summary table to: * ``data/synthetic/psych_ch19_survey_gof.csv`` * ``outputs/track_b/ch19_gof_table.csv`` 4. Generate a bar chart comparing **observed** versus **expected** counts: * ``outputs/track_b/ch19_gof_barplot.png`` The second dataset in the script simulates a **therapy × improvement** contingency table: 1. Students are randomly assigned to *Control*, *CBT*, or *Mindfulness* conditions. 2. Each person is classified as *Improved* or *No change*. 3. The script uses: * :func:`scipy.stats.chi2_contingency` for a traditional chi-square test. * :func:`pingouin.chi2_independence` to obtain effect sizes (e.g., Cramér’s V) and power estimates. 4. The script saves: * ``data/synthetic/psych_ch19_survey_independence.csv`` – individual-level data. * ``outputs/track_b/ch19_independence_table.csv`` – full chi-square summary. * ``outputs/track_b/ch19_stacked_bar.png`` – a stacked bar plot showing the proportion improved within each therapy type. Interpreting the output ------------------------ After running ``make psych-ch19``, inspect the console output and figures: * For the **goodness-of-fit** example, ask: * Does the chi-square test detect that some coping strategies are preferred over others? * Which categories contribute most to the chi-square statistic (largest observed – expected differences)? * For the **independence** example, ask: * Is there evidence that treatment type and improvement are associated? * How large is the association (Cramér’s V)? * Do the stacked bar plots reveal a pattern that matches the numerical results? Connection to earlier chapters ------------------------------ This chapter ties together several themes from earlier in the book: * Just as in Chapter 7, we rely on **sampling distributions** to interpret chi-square statistics. * As in Chapters 9–12, we balance **Type I error** (false positives) against **power** (true positives). * In Chapters 16–18, we extended ANOVA to regression and ANCOVA. Here, we extend the logic of hypothesis testing to **categorical outcomes** and **ordinal data**. Non-parametric methods are not a separate universe – they are another set of tools in your scientific toolbox. When used thoughtfully, they allow you to test important psychological questions even when real-world data refuse to behave “nicely.”