Psychological Science & Statistics – Chapter 8
Hypothesis Testing and the One-Sample t-Test
In Chapters 6 and 7 you learned:
how to model a variable with the normal distribution and interpret z-scores (Chapter 6), and
how repeated sampling leads to a sampling distribution of the mean (Chapter 7).
In this chapter we put those pieces together to introduce null hypothesis significance testing (NHST) for a single mean using the one-sample t-test.
Our goals are to help you:
understand the logic of NHST as a decision procedure,
interpret the one-sample t-statistic,
see how sampling variability drives the p-value, and
connect the theoretical t-test to a simulation-based view.
The Logic of NHST for a Single Mean
Suppose we have a quantitative variable such as stress_score and we want
to test whether the mean in a population is equal to some reference value
\(\mu_0\) (for example, a published norm or a policy target).
The one-sample t-test follows these steps:
State the hypotheses.
\[ \begin{align}\begin{aligned}H_0 : \mu = \mu_0 \qquad\text{(null hypothesis)}\\H_1 : \mu \ne \mu_0 \qquad\text{(two-sided alternative)}\end{aligned}\end{align} \]Collect a sample of size \(n\) from the population and compute:
the sample mean \(\bar{x}\),
the sample standard deviation \(s\).
Compute the test statistic.
Because the population standard deviation \(\sigma\) is unknown, we use the t-statistic:
\[t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\]This tells us how many standard errors the sample mean is from the null value \(\mu_0\).
Ask: how surprising is this t if :math:`H_0` were true?
Under \(H_0\) and certain assumptions (independent observations, approximately normal population), the t-statistic follows a t distribution with \(n - 1\) degrees of freedom.
The p-value is the probability of obtaining a t-statistic as extreme as (or more extreme than) the one we observed if \(H_0\) is true.
Make a decision.
If the p-value is small (for example, \(p < 0.05\)), we say our observed t is unlikely under the null, and we reject \(H_0\).
If the p-value is not small, we fail to reject \(H_0\). This does not prove that \(H_0\) is true; it only says the data are not very inconsistent with it.
Connecting to Chapter 7: Sampling Distributions
In Chapter 7 you simulated the sampling distribution of the mean for a stress scale. You saw that:
the distribution of sample means is centered near the true population mean,
its spread shrinks as \(n\) increases (standard error idea), and
most sample means lie close to the population mean, with a few in the tails.
The one-sample t-statistic builds directly on that idea, but with an extra wrinkle: we do not know the population standard deviation \(\sigma\), so we estimate it with the sample standard deviation \(s\). Because \(s\) varies from sample to sample, the distribution of t has heavier tails than the normal distribution.
Simulation-Based View of a One-Sample Test
In Chapter 7, you simulated the sampling distribution of the mean. You tracked where sample means ended up when repeatedly sampling from a fixed population.
In this chapter’s lab, we use a similar idea to approximate a p-value, but with a crucial twist: instead of just tracking means, we track t-statistics.
Generate a large synthetic population of stress scores.
Specify a null hypothesis about the population mean:
\[H_0 : \mu = \mu_0\]Draw a single random sample from this population and compute:
the sample mean \(\bar{x}\),
the sample standard deviation \(s\),
the observed t-statistic \(t_\text{obs}\).
Construct a world where \(H_0\) is exactly true by recentering the population so its mean is \(\mu_0\). The shape and spread stay the same; only the center changes.
Draw many random samples from this recentered population. For each simulated sample, compute its own mean, its own standard deviation, and its own t-statistic \(t_\text{sim}\).
Approximate the two-sided p-value as:
\[\hat{p} = \frac{\text{number of simulations with } |t_\text{sim}| \ge |t_\text{obs}|} {\text{number of simulations}}\]This is a simulation-based analogue of the theoretical p-value. By checking how extreme our t-statistic is compared to the distribution of simulated t-statistics, we correctly account for the uncertainty in estimating the standard deviation.
PyStatsV1 Lab: A One-Sample Test on Stress Scores
In this lab, you will:
Generate a large synthetic population of stress scores.
State a null hypothesis about the population mean.
Draw one random sample of size \(n\) and compute the observed \(t\)-statistic.
Use simulation to generate a null distribution of t-values.
Approximate the two-sided p-value by locating your observed \(t\) in that distribution.
Make a decision about whether to reject the null hypothesis at \(\alpha = 0.05\).
All code for this lab lives in:
scripts/psych_ch8_one_sample_test.py
and it will write outputs to:
data/synthetic/psych_ch8_population_stress.csv(population),data/synthetic/psych_ch8_null_t_values.csv(simulated t-values),optionally
outputs/track_b/ch08_null_t_distribution.png(plot).
Running the Lab Script
From the project root, run:
python -m scripts.psych_ch8_one_sample_test
If your Makefile defines a convenience target, you can instead run:
make psych-ch08
This will:
Generate a synthetic
stress_scorepopulation.Specify a null value \(\mu_0\) (for example, 20).
Draw a sample of size \(n\) (for example, 25).
Compute the observed t-statistic for \(H_0 : \mu = \mu_0\).
Simulate a null distribution by recentring the population, resampling, and computing \(t\) for every resample.
Estimate a two-sided p-value as a long-run relative frequency.
Print a verbal conclusion (reject vs fail to reject at \(\alpha = 0.05\)).
Optionally, save a plot of the simulated t-distribution with the observed t-statistic marked.
Expected Console Output
Your exact numbers will vary, but the output will look similar to:
Generated population with 50000 individuals
Population mean stress_score = 19.98
Population SD stress_score = 9.95
Null hypothesis: mu = 20.00
Observed sample size n = 25
Observed sample mean = 22.13
Observed sample SD = 10.31
t statistic = 1.03
Using 4000 simulations under H0...
Approximate two-sided p-value = 0.31
Decision at alpha = 0.05: fail to reject H0
Interpreting the Output
Focus on the following pieces:
The observed t-statistic: how many standard errors the sample mean is from the null value.
The p-value: the probability of obtaining a t-statistic this extreme (or more) if the null hypothesis were true.
The decision: the binary result based on your alpha threshold. Remember that “fail to reject” is not the same as “prove the null true.”
Your Turn: Practice with Different Scenarios
Change the null hypothesis
Modify the null value \(\mu_0\) in the script. How does this change the t-statistic and the resulting p-value?
Change the sample size
Increase the sample size \(n\) (for example, from 25 to 100). Notice how the t-statistic changes. The \(\sqrt{n}\) in the denominator makes the test more sensitive to small departures as the sample size grows.
Replicate the experiment
Run the script multiple times. Do you always reach the same decision? If the true mean is close to \(\mu_0\), you may see the decision flip back and forth — this is the nature of sampling variability.
Optional Plot: Null t-Distribution
If enabled in the script, a plot file is saved to:
outputs/track_b/ch08_null_t_distribution.png
The figure shows:
a histogram of simulated t-statistics under \(H_0\), and
a vertical line marking your observed t-statistic.
Questions to consider:
Does the histogram look roughly bell-shaped and centered at 0?
Is your observed line in the main bulk of the distribution (a common result) or out in the thin tails (a rare result)?
Summary
In this chapter you learned how to:
frame a research question as a null hypothesis about a mean,
compute and interpret the one-sample t-statistic,
approximate a p-value using a simulated null distribution of t-statistics, and
make a decision to reject or fail to reject \(H_0\).
These ideas form the backbone of classical inference and set you up for the next steps:
confidence intervals for a mean,
comparisons of two means (independent and paired-samples t-tests), and
more complex models such as ANOVA and regression.