Psychological Science & Statistics – Chapter 6

The Normal Distribution and z-Scores

In Chapters 4 and 5 you learned how to describe a distribution using graphs (histograms, frequency polygons) and summary statistics (mean, median, variance, standard deviation). These tools help us understand what our data look like.

In this chapter we take an important next step: we introduce the normal distribution, a mathematical model used throughout psychology and statistical inference. The normal distribution helps us:

compare individual scores to a reference population,
identify “unusual” observations,
standardize variables measured on different scales, and
compute probabilities, percentiles, and standardized effect sizes.

We will use the same synthetic sleep-study dataset introduced earlier (psych_sleep_regen_note) and learn how to:

Fit a normal model to a single variable
Convert raw scores to z-scores
Interpret where an individual score falls in a distribution
Visualize empirical data alongside a theoretical normal curve
Compute simple probabilities under the normal model

A Statistical Clarification: Raw Data vs. Sampling Distributions

It is important to distinguish between:

The distribution of raw scores, which may or may not be normal (e.g., reaction times are usually positively skewed), and
The distribution of sample means, which is approximately normal under broad conditions, by the Central Limit Theorem (CLT).

Even when raw observations are skewed, the sample mean has an approximately normal distribution when the sample size is large enough. If the parent population is Normal then for small samples, the standardized (using the sample standard deviation) sample mean follows a Student’s t distribution, not a normal distribution.

In this chapter, we focus on z-scores for individual observations, a tool commonly used in psychological measurement. Later chapters (especially Chapter 7 and Chapter 9) introduce sampling distributions, the CLT, and the t distribution in more depth.

The Normal Distribution: A Workhorse in Applied Statistics

The normal distribution (also called the Gaussian or bell curve) is one of the most important models in psychological science and applied statistics. Many aggregated or psychometrically smoothed variables are approximately (“close enough”) normal or can be modeled as normal in appropriate scenarios. For example:

IQ scores (by design)
Composite cognitive performance scores
Height and weight
Many Likert-scale sum scores
Measurement error under many conditions

A normal distribution is defined by only two parameters:

The mean (center)
The standard deviation (spread)

Once these two numbers are known, we can describe the entire distribution mathematically.

Standardizing: From Raw Scores to z-Scores

A z-score expresses a raw score in standard deviation units:

\[z = \frac{X - \mu}{\sigma}\]

When population parameters are unknown (as is typical), we substitute sample estimates.

Interpreting z-scores (where average = mean):

z = 0 → Exactly average
z = +1 → One standard deviation above average
z = -2 → Two standard deviations below average
z = +2 → Approximately the 97.5th percentile (under the normal model)

In psychology, z-scores are useful because they make variables comparable even when measured on different scales.

Examples:

sleep_hours_z = -1.2 The participant slept less than average (mean) by 1.2 SDs.
reaction_time_z = +0.8 The participant was slower than the mean (higher RTs → slower responses).

In this chapter’s PyStatsV1 lab, you will compute z-scores using empirical summary statistics (sample mean and sample standard deviation). This is a common practice when working with real psychological data.

PyStatsV1 Lab: Normal Distribution and z-Scores

In this lab, you will:

Load the sleep-study dataset
Compute z-scores for one continuous variable (e.g., sleep_hours)
Generate a histogram overlaid with a normal density curve
Inspect the distribution of z-scores
Compute simple probabilities using a normal model

Remember: the normal curve is a model. The empirical data do not need to be perfectly normal in order for z-scores to be meaningful or useful.

All code for this chapter lives in:

scripts/psych_ch6_normal_zscores.py

and the dataset lives in:

data/synthetic/psych_sleep_study.csv

See psych_sleep_regen_note if you need to regenerate the dataset.

Running the Lab Script

From the project root, run:

python -m scripts.psych_ch6_normal_zscores

Or, if using the Makefile target:

make psych-ch06

This will:

Load the sleep-study dataset
Compute the sample mean, sample standard deviation, and corresponding z-scores
Generate a histogram overlaid with a fitted normal curve
Print summary information to the console
Save a PNG of the plot (depending on your local settings)

Expected Console Output

Numbers may vary slightly depending on random seed and sample size:

Loaded dataset with 200 participants
sleep_hours mean = 6.98
sleep_hours SD   = 1.02
First five z-scores:
    Participant 0:  0.13
    Participant 1: -0.87
    Participant 2:  1.10
    Participant 3: -0.05
    Participant 4:  0.44

Interpreting the Plot

The figure shows a histogram of sleep_hours and a fitted normal curve.

Questions to consider:

Does the empirical distribution appear approximately normal?
Are there signs of skewness or kurtosis?
Would the normal model be a reasonable approximation?

Remember that many psychological variables are only approximately normal, and some (like reaction times) are typically skewed.

Your Turn: Practice Interpreting z-Scores

Identify two typical sleepers. Look for participants with z-scores near 0. What does this tell you?
Identify one unusually high value. Find a participant with z > 2. What percentile does this correspond to?
Identify one unusually low value. Find a participant with z < -2. What might explain such low sleep values?
Estimate simple probabilities. Using the normal model, estimate:
- P(Z > 1)
- P(-1 < Z < 1)
- P(Z < -2)
How close are these to the 68–95–99.7 rule?

Optional Extension: Reaction Times

Try computing and visualizing z-scores for:

reaction_time_ms

Questions:

Are reaction times normally distributed?
Are they positively skewed?
What happens if you apply a log-transform before standardizing?

Summary

In this chapter you learned how to:

Model a psychological variable using a normal distribution
Compute and interpret z-scores
Overlay a theoretical normal curve on empirical data
Estimate probabilities and percentiles under the normal model
Distinguish between raw-score distributions and sampling distributions

These tools prepare you for:

Sampling distributions (Chapter 7)
Hypothesis testing and the t distribution (Chapter 8)
The one-sample t-test (Chapter 9)

Next Steps

In Chapter 7, you will learn how probability and sampling work together to form the basis of statistical inference, including why the sample mean becomes approximately normal even when the raw data are not.