Chapter 19 – Non-Parametric Statistics

Learning goals

By the end of this chapter you will be able to:

  • Explain when non-parametric tests are preferred over traditional (parametric) procedures such as the t-test or ANOVA.

  • Describe the logic of the chi-square family of tests.

  • Distinguish between chi-square tests of goodness of fit and independence.

  • Recognize rank-based alternatives to t-tests and one-way ANOVA (Mann–Whitney U, Wilcoxon signed-rank, Kruskal–Wallis, Friedman).

  • Use PyStatsV1 and pingouin to analyze survey-style data with chi-square tests on categorical variables.

19.1 When parametric assumptions break down

In earlier chapters, we focused on parametric procedures:

  • t-tests (Chapters 9–11)

  • One-way and factorial ANOVA (Chapters 12–14)

  • Regression and ANCOVA (Chapters 16–18)

These procedures make several assumptions about the data:

  • Quantitative scale – variables are interval or ratio, not purely nominal.

  • Normality – scores within each group are (approximately) normally distributed.

  • Homogeneity of variance – population variances are equal across groups.

  • Linearity – for correlation and regression, the relationship between variables is approximately linear.

When these assumptions are badly violated, parametric tests can give misleading p-values and confidence intervals. In those cases, we often turn to non-parametric methods.

Non-parametric tests typically:

  • Work with ranks or counts rather than raw numeric values.

  • Make fewer distributional assumptions.

  • Are often slightly less powerful when parametric assumptions are met, but more robust when those assumptions fail.

In psychological research, non-parametric tests are especially useful when:

  • The outcome is ordinal (e.g., Likert scales: “Strongly disagree” to “Strongly agree”).

  • The data are severely skewed or have heavy outliers that cannot be reasonably transformed.

  • The variable is categorical (e.g., therapy preference, diagnostic category, treatment response yes/no).

19.2 Chi-square tests for categorical data

The most common non-parametric tests in introductory psychology involve frequency counts in categories. The basic question is:

Do the observed counts differ from what we would expect by chance?

The chi-square family addresses this question in two main situations.

19.2.1 Goodness of fit

A chi-square goodness-of-fit test compares observed category counts to a theoretical or expected distribution. For example:

  • A survey asks which coping strategy students use most often: Exercise, Therapy, Mindfulness, or Social support.

  • If there was no preference, we would expect roughly equal counts in each category (25% each).

  • The chi-square goodness-of-fit test asks whether the observed distribution differs significantly from this uniform expectation.

Statistically, we compute:

χ² = Σ (Observed - Expected)² / Expected

with degrees of freedom \(df = k - 1\), where k is the number of categories.

If χ² is large relative to its degrees of freedom, the p-value will be small and we reject the null hypothesis that the observed frequencies match the expected distribution.

19.2.2 Test of independence

A chi-square test of independence asks whether two categorical variables are related. For example:

  • Variable 1: Type of therapy received (Control, CBT, Mindfulness).

  • Variable 2: Treatment outcome (Improved vs. Did not improve).

We arrange the counts in a contingency table and again compute a chi-square statistic. Here, the null hypothesis states that the variables are independent – knowing a person’s therapy type tells you nothing about their likelihood of improvement.

We also report an effect size such as Cramér’s V, which is based on the chi-square value but scaled to lie between 0 and 1:

  • ~0.10: small association

  • ~0.30: medium

  • ~0.50 or higher: large

19.3 Rank-based tests

Not all non-parametric tests are based on counts. Many are based on ranks of the outcome variable. Instead of analyzing raw scores, we:

  1. Combine all scores across groups.

  2. Rank them from lowest to highest.

  3. Analyze the ranks using an appropriate test statistic.

Some common rank-based tests and their parametric counterparts:

  • Mann–Whitney U: alternative to an independent-samples t-test.

  • Wilcoxon signed-rank: alternative to a paired-samples t-test.

  • Kruskal–Wallis H: alternative to a one-way ANOVA with independent groups.

  • Friedman test: alternative to a repeated-measures one-way ANOVA.

These tests are especially helpful when:

  • The outcome is ordinal (e.g., 1–7 rating scales).

  • The data are heavily skewed or contain extreme outliers.

  • Sample sizes are small, making normality assumptions doubtful.

19.4 When to choose non-parametric methods

There is no single “magic rule,” but some practical guidelines:

  • Use chi-square tests when both your predictor and outcome are categorical (nominal) and you are working with counts, not percentages.

  • Use rank-based tests when: * The outcome variable is ordinal, or * You have strong violations of normality or homogeneity that cannot be

    fixed by transformations, and

    • You are more concerned about validity than about squeezing out every bit of statistical power.

When in doubt, you can often:

  • Run the parametric test (e.g., t-test or ANOVA).

  • Run the non-parametric alternative.

  • Compare conclusions – if they agree, your result is probably robust.

19.5 PyStatsV1 Lab: Chi-square analysis of survey data

In this chapter’s lab you will use PyStatsV1 to analyze simulated survey data using chi-square tests. The code lives in:

  • scripts.psych_ch19_nonparametrics

  • tests.test_psych_ch19_nonparametrics

Running the lab

From the project root (with your virtual environment activated), run:

make psych-ch19
make test-psych-ch19

The first command will:

  1. Simulate a coping strategies survey with four categories (e.g., Exercise, Therapy, Mindfulness, Social support).

  2. Run a chi-square goodness-of-fit test to check whether the observed distribution differs from a uniform (no-preference) null.

  3. Save the raw data and a summary table to:

    • data/synthetic/psych_ch19_survey_gof.csv

    • outputs/track_b/ch19_gof_table.csv

  4. Generate a bar chart comparing observed versus expected counts:

    • outputs/track_b/ch19_gof_barplot.png

The second dataset in the script simulates a therapy × improvement contingency table:

  1. Students are randomly assigned to Control, CBT, or Mindfulness conditions.

  2. Each person is classified as Improved or No change.

  3. The script uses:

    • scipy.stats.chi2_contingency() for a traditional chi-square test.

    • pingouin.chi2_independence() to obtain effect sizes (e.g., Cramér’s V) and power estimates.

  4. The script saves:

    • data/synthetic/psych_ch19_survey_independence.csv – individual-level data.

    • outputs/track_b/ch19_independence_table.csv – full chi-square summary.

    • outputs/track_b/ch19_stacked_bar.png – a stacked bar plot showing the proportion improved within each therapy type.

Interpreting the output

After running make psych-ch19, inspect the console output and figures:

  • For the goodness-of-fit example, ask:

    • Does the chi-square test detect that some coping strategies are preferred over others?

    • Which categories contribute most to the chi-square statistic (largest observed – expected differences)?

  • For the independence example, ask:

    • Is there evidence that treatment type and improvement are associated?

    • How large is the association (Cramér’s V)?

    • Do the stacked bar plots reveal a pattern that matches the numerical results?

Connection to earlier chapters

This chapter ties together several themes from earlier in the book:

  • Just as in Chapter 7, we rely on sampling distributions to interpret chi-square statistics.

  • As in Chapters 9–12, we balance Type I error (false positives) against power (true positives).

  • In Chapters 16–18, we extended ANOVA to regression and ANCOVA. Here, we extend the logic of hypothesis testing to categorical outcomes and ordinal data.

Non-parametric methods are not a separate universe – they are another set of tools in your scientific toolbox. When used thoughtfully, they allow you to test important psychological questions even when real-world data refuse to behave “nicely.”