.. _psych_ch8_hypothesis_testing:

Psychological Science & Statistics – Chapter 8
==============================================

Hypothesis Testing and the One-Sample t-Test
--------------------------------------------

In Chapters 6 and 7 you learned:

* how to model a variable with the **normal distribution** and interpret
  **z-scores** (Chapter 6), and
* how repeated sampling leads to a **sampling distribution of the mean**
  (Chapter 7).

In this chapter we put those pieces together to introduce **null hypothesis
significance testing (NHST)** for a **single mean** using the
**one-sample t-test**.

Our goals are to help you:

* understand the logic of NHST as a decision procedure,
* interpret the **one-sample t-statistic**,
* see how sampling variability drives the **p-value**, and
* connect the theoretical t-test to a **simulation-based view**.

The Logic of NHST for a Single Mean
-----------------------------------

Suppose we have a quantitative variable such as ``stress_score`` and we want
to test whether the mean in a population is equal to some reference value
:math:`\mu_0` (for example, a published norm or a policy target).

The one-sample t-test follows these steps:

1. **State the hypotheses.**

   .. math::

      H_0 : \mu = \mu_0
      \qquad\text{(null hypothesis)}

      H_1 : \mu \ne \mu_0
      \qquad\text{(two-sided alternative)}

2. **Collect a sample** of size :math:`n` from the population and compute:

   * the sample mean :math:`\bar{x}`,
   * the sample standard deviation :math:`s`.

3. **Compute the test statistic.**

   Because the population standard deviation :math:`\sigma` is unknown, we
   use the **t-statistic**:

   .. math::

      t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}

   This tells us how many **standard errors** the sample mean is from the
   null value :math:`\mu_0`.

4. **Ask: how surprising is this t if :math:`H_0` were true?**

   Under :math:`H_0` and certain assumptions (independent observations,
   approximately normal population), the t-statistic follows a **t
   distribution** with :math:`n - 1` degrees of freedom.

   The **p-value** is the probability of obtaining a t-statistic as extreme
   as (or more extreme than) the one we observed if :math:`H_0` is true.

5. **Make a decision.**

   * If the p-value is small (for example, :math:`p < 0.05`), we say our
     observed t is **unlikely under the null**, and we **reject** :math:`H_0`.
   * If the p-value is not small, we **fail to reject** :math:`H_0`. This
     does *not* prove that :math:`H_0` is true; it only says the data are not
     very inconsistent with it.

Connecting to Chapter 7: Sampling Distributions
-----------------------------------------------

In Chapter 7 you simulated the **sampling distribution of the mean** for a
stress scale. You saw that:

* the distribution of sample means is centered near the true population mean,
* its spread shrinks as :math:`n` increases (standard error idea), and
* most sample means lie close to the population mean, with a few in the tails.

The one-sample t-statistic builds directly on that idea, but with an extra
wrinkle: we do not know the population standard deviation :math:`\sigma`, so
we estimate it with the sample standard deviation :math:`s`. Because :math:`s`
varies from sample to sample, the distribution of t has **heavier tails**
than the normal distribution.

Simulation-Based View of a One-Sample Test
------------------------------------------

In Chapter 7, you simulated the sampling distribution of the mean. You
tracked where sample means ended up when repeatedly sampling from a fixed
population.

In this chapter's lab, we use a similar idea to approximate a p-value, but
with a crucial twist: instead of just tracking means, we track
**t-statistics**.

1. Generate a large synthetic population of **stress scores**.

2. Specify a null hypothesis about the population mean:

   .. math::

      H_0 : \mu = \mu_0

3. Draw a single random sample from this population and compute:

   * the sample mean :math:`\bar{x}`,
   * the sample standard deviation :math:`s`,
   * the observed t-statistic :math:`t_\text{obs}`.

4. Construct a world where :math:`H_0` is exactly true by **recentering** the
   population so its mean is :math:`\mu_0`. The shape and spread stay the
   same; only the center changes.

5. Draw many random samples from this recentered population. For **each**
   simulated sample, compute its own mean, its own standard deviation, and
   its own t-statistic :math:`t_\text{sim}`.

6. Approximate the two-sided p-value as:

   .. math::

      \hat{p} =
      \frac{\text{number of simulations with }
      |t_\text{sim}| \ge |t_\text{obs}|}
      {\text{number of simulations}}

   This is a simulation-based analogue of the theoretical p-value. By checking
   how extreme our *t-statistic* is compared to the distribution of *simulated
   t-statistics*, we correctly account for the uncertainty in estimating the
   standard deviation.

PyStatsV1 Lab: A One-Sample Test on Stress Scores
-------------------------------------------------

In this lab, you will:

1. Generate a large synthetic population of stress scores.
2. State a null hypothesis about the population mean.
3. Draw one random sample of size :math:`n` and compute the observed
   :math:`t`-statistic.
4. Use simulation to generate a **null distribution of t-values**.
5. Approximate the two-sided p-value by locating your observed :math:`t` in
   that distribution.
6. Make a decision about whether to reject the null hypothesis at
   :math:`\alpha = 0.05`.

All code for this lab lives in:

* ``scripts/psych_ch8_one_sample_test.py``

and it will write outputs to:

* ``data/synthetic/psych_ch8_population_stress.csv`` (population),
* ``data/synthetic/psych_ch8_null_t_values.csv`` (simulated t-values),
* optionally ``outputs/track_b/ch08_null_t_distribution.png`` (plot).

Running the Lab Script
~~~~~~~~~~~~~~~~~~~~~~

From the project root, run:

.. code-block:: bash

   python -m scripts.psych_ch8_one_sample_test

If your Makefile defines a convenience target, you can instead run:

.. code-block:: bash

   make psych-ch08

This will:

* Generate a synthetic ``stress_score`` population.
* Specify a null value :math:`\mu_0` (for example, 20).
* Draw a sample of size :math:`n` (for example, 25).
* Compute the observed t-statistic for :math:`H_0 : \mu = \mu_0`.
* Simulate a null distribution by recentring the population, resampling, and
  computing :math:`t` for every resample.
* Estimate a two-sided p-value as a long-run relative frequency.
* Print a verbal conclusion (reject vs fail to reject at
  :math:`\alpha = 0.05`).
* Optionally, save a plot of the simulated t-distribution with the observed
  t-statistic marked.

Expected Console Output
~~~~~~~~~~~~~~~~~~~~~~~

Your exact numbers will vary, but the output will look similar to:

::

   Generated population with 50000 individuals
   Population mean stress_score = 19.98
   Population SD   stress_score = 9.95

   Null hypothesis: mu = 20.00
   Observed sample size n = 25
   Observed sample mean   = 22.13
   Observed sample SD     = 10.31
   t statistic            = 1.03

   Using 4000 simulations under H0...
   Approximate two-sided p-value = 0.31
   Decision at alpha = 0.05: fail to reject H0

Interpreting the Output
~~~~~~~~~~~~~~~~~~~~~~~

Focus on the following pieces:

* The **observed t-statistic**: how many standard errors the sample mean is
  from the null value.
* The **p-value**: the probability of obtaining a t-statistic this extreme
  (or more) if the null hypothesis were true.
* The **decision**: the binary result based on your alpha threshold.
  Remember that "fail to reject" is not the same as "prove the null true."

Your Turn: Practice with Different Scenarios
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **Change the null hypothesis**

   Modify the null value :math:`\mu_0` in the script. How does this change
   the t-statistic and the resulting p-value?

2. **Change the sample size**

   Increase the sample size :math:`n` (for example, from 25 to 100).
   Notice how the t-statistic changes. The :math:`\sqrt{n}` in the
   denominator makes the test more sensitive to small departures as the
   sample size grows.

3. **Replicate the experiment**

   Run the script multiple times. Do you always reach the same decision?
   If the true mean is close to :math:`\mu_0`, you may see the decision flip
   back and forth — this is the nature of sampling variability.

Optional Plot: Null t-Distribution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If enabled in the script, a plot file is saved to:

``outputs/track_b/ch08_null_t_distribution.png``

The figure shows:

* a histogram of simulated **t-statistics** under :math:`H_0`, and
* a vertical line marking your **observed t-statistic**.

Questions to consider:

* Does the histogram look roughly bell-shaped and centered at 0?
* Is your observed line in the main bulk of the distribution (a common
  result) or out in the thin tails (a rare result)?

Summary
-------

In this chapter you learned how to:

* frame a research question as a **null hypothesis** about a mean,
* compute and interpret the **one-sample t-statistic**,
* approximate a **p-value** using a simulated null distribution of
  t-statistics, and
* make a decision to reject or fail to reject :math:`H_0`.

These ideas form the backbone of classical inference and set you up for the
next steps:

* confidence intervals for a mean,
* comparisons of two means (independent and paired-samples t-tests), and
* more complex models such as ANOVA and regression.