Psychological Science & Statistics – Chapter 10

The Independent-Samples t-Test

In Chapter 8, you learned the logic of hypothesis testing using a one-sample t-test and a simulated null distribution of t-statistics.

In Chapter 9, you computed an analytic one-sample t-test and confidence interval for a single mean using the theoretical \(t\) distribution.

In this chapter, we extend those ideas to comparing two independent groups. This is the standard “between-subjects” design in experimental psychology: participants are randomly assigned to one of two conditions, and we compare the means.

Typical examples include:

  • Control vs. Treatment

  • Placebo vs. Drug

  • No-training vs. Training

Our running example will again use a stress_score variable.

When to Use the Independent-Samples t-Test

Use an independent-samples t-test when:

  • You have two groups that are independent of each other. (No person appears in both groups.)

  • Your dependent variable (DV) is approximately continuous and approximately Normal within each group.

  • You are interested in whether the population means differ:

    \[\begin{split}H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \ne \mu_2\end{split}\]

Here, \(\mu_1\) is the population mean for group 1 and \(\mu_2\) is the population mean for group 2.

The Logic of the Independent-Samples t-Test

The basic logic mirrors the one-sample case:

  1. State the hypotheses

    \[\begin{split}H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \ne \mu_2\end{split}\]
  2. Compute the observed difference in sample means

    \[\bar{x}_1 - \bar{x}_2\]
  3. Estimate the standard error of the difference (assuming equal variances)

    When we assume the two populations have equal variances, we first compute a pooled standard deviation:

    \[s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}\]

    where

    • \(n_1, n_2\) are the group sample sizes

    • \(s_1, s_2\) are the sample standard deviations

    Then the standard error of the difference in means is

    \[\mathrm{SE}_{\bar{x}_1 - \bar{x}_2} = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}.\]

Modern Best Practice: Welch’s t-test vs. Pooled t-test

The pooled-variance t-test above is the classical Student’s t-test taught in most introductory textbooks. It assumes that the two populations have equal variances.

In real research, however, variances are often not equal, especially when group sizes differ. In those situations, the pooled test can have an inflated Type I error rate (too many false positives).

Modern statistical software therefore defaults to Welch’s t-test, which does not assume equal variances and adjusts the degrees of freedom using the Welch–Satterthwaite equation.

Classical Pooled t-test vs. Welch’s t-test

Method

Variance assumption

Degrees of freedom

Typical use

Pooled t-test

Assumes \(\sigma_1^2 = \sigma_2^2\)

\(n_1 + n_2 - 2\)

Teaching; balanced designs with similar spreads

Welch’s t-test

Does not assume equal variances

Approximate df (Welch–Satterthwaite)

Modern default; safer when variances or \(n\) differ

In this chapter we focus on the pooled test to explain the logic of comparing two means. In applied work, however, it is good practice to report Welch’s t-test as well, especially when the variances or sample sizes are noticeably different.

The PyStatsV1 Chapter 10 script prints both pooled and Welch results so you can see how they compare on the same data.

  1. Compute the t-statistic (pooled version)

    \[t_{\text{pooled}} = \frac{\bar{x}_1 - \bar{x}_2}{\mathrm{SE}_{\bar{x}_1 - \bar{x}_2}}\]

    with degrees of freedom

    \[\mathrm{df}_{\text{pooled}} = n_1 + n_2 - 2.\]
  2. Find the p-value and make a decision (pooled version)

    Under \(H_0\), the statistic \(t_{\text{pooled}}\) follows a \(t\) distribution with \(\mathrm{df}_{\text{pooled}}\) degrees of freedom. For a two-sided test, the p-value is

    \[p_{\text{pooled}} = 2 \cdot P\bigl(T_{\mathrm{df}_{\text{pooled}}} \ge |t_{\text{obs}}|\bigr).\]

    If \(p_{\text{pooled}} < \alpha\) (typically \(0.05\)), we reject \(H_0\) and conclude that the group means differ.

Effect Size: Cohen’s d

Statistical significance does not tell us how large the effect is. For independent groups with a pooled standard deviation, a common effect size is Cohen’s :math:`d`:

\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}.\]

Rough guidelines (Cohen, 1988):

  • \(d \approx 0.2\) – small effect

  • \(d \approx 0.5\) – medium effect

  • \(d \approx 0.8\) – large effect

Confidence Interval for the Mean Difference (Pooled Version)

We can also construct a confidence interval for the difference in means:

\[(\bar{x}_1 - \bar{x}_2) \pm t_{\mathrm{crit}} \cdot \mathrm{SE}_{\bar{x}_1 - \bar{x}_2},\]

where \(t_{\mathrm{crit}}\) is the critical \(t\) value from the \(t\) distribution with \(\mathrm{df}_{\text{pooled}} = n_1 + n_2 - 2\) at your chosen \(\alpha\) level (e.g., \(\alpha = 0.05\) for a 95% CI).

PyStatsV1 Lab: Independent-Samples t-Test on Stress Scores

In this lab you will:

  1. Generate a synthetic dataset of stress scores for two independent groups:

    • control

    • treatment

  2. Compute sample means, standard deviations, and group sizes.

  3. Compute the pooled standard deviation and standard error.

  4. Compute the independent-samples t-statistic (pooled version) and its two-sided p-value.

  5. Compute Cohen’s :math:`d` as an effect size.

  6. Construct a 95% confidence interval for the difference in means.

  7. Compute Welch’s t-test as a modern, variance-robust comparison.

  8. Optionally, visualize the group means with error bars.

All code for this lab lives in:

  • scripts/psych_ch10_independent_t.py

and the script will optionally write outputs to:

  • data/synthetic/psych_ch10_independent_groups.csv

  • outputs/track_b/ch10_group_means_with_ci.png

Running the Lab Script

From the project root, run:

python -m scripts.psych_ch10_independent_t

If your Makefile defines a convenience target, you can instead run:

make psych-ch10

This will:

  • Generate a synthetic dataset with two groups (e.g., 25 participants per group).

  • Compute the independent-samples t-test comparing control vs. treatment using the pooled version.

  • Compute Welch’s t-test on the same data as a safety check.

  • Compute Cohen’s \(d\) and a 95% confidence interval for the mean difference.

  • Print a short APA-style summary line.

  • Optionally, save a bar plot of the group means with error bars.

Expected Console Output

Your exact numbers will vary, but the output will look similar to:

Generated independent groups with n = 25 per condition
Group: control    mean = 18.48  SD =  8.74  n = 25
Group: treatment  mean = 16.82  SD =  9.87  n = 25

--- Pooled-variance independent-samples t-test (classic Student's t) ---
Mean difference (control - treatment) = 1.66
Pooled SD = 9.32
SE of difference = 2.64
df (pooled) = 48
t (pooled) = 0.63
Two-sided p-value (pooled) = 0.53
95% CI (pooled) for mean difference: [-3.65, 6.97]
Cohen's d (pooled) = 0.18

--- Welch's t-test (modern default, equal_var = False) ---
df (Welch) ≈ 45.2
t (Welch) = 0.61
Two-sided p-value (Welch) = 0.54

Wrote data to: data/synthetic/psych_ch10_independent_groups.csv
Wrote plot to: outputs/track_b/ch10_group_means_with_ci.png

Interpreting the Output

Focus on the following pieces:

  • Mean difference: How far apart are the sample means?

  • t statistic and p-value (pooled vs. Welch): Do the methods agree about whether the difference is statistically significant?

  • Confidence interval: Does the 95% CI for \(\mu_1 - \mu_2\) include zero?

  • Cohen’s :math:`d`: How large is the effect in standardized units?

Your Turn: Practice Scenarios

  1. Change the group means

    In psych_ch10_independent_t.py, try changing the assumed population means for the two groups. How does this affect the mean difference, t, and Cohen’s \(d\)?

  2. Change the sample size

    Increase \(n\) per group (e.g., from 25 to 100). Notice how the standard error shrinks and the test becomes more sensitive to small differences.

  3. Make the variances very different

    Use very different standard deviations for the two groups. Compare the pooled and Welch results. How do the degrees of freedom and p-values differ?

  4. Practice APA-style reporting

    Using the script output, practice writing a short APA-style sentence, e.g.:

    “Participants in the treatment condition did not differ significantly from those in the control condition on stress scores, :math:`t(48) = 0.63`, :math:`p = .53`, :math:`d = 0.18` (pooled).”