Psychological Science & Statistics – Chapter 10 =============================================== The Independent-Samples t-Test ------------------------------ In Chapter 8, you learned the **logic of hypothesis testing** using a one-sample t-test and a simulated null distribution of t-statistics. In Chapter 9, you computed an **analytic one-sample t-test and confidence interval** for a single mean using the theoretical :math:`t` distribution. In this chapter, we extend those ideas to **comparing two independent groups**. This is the standard "between-subjects" design in experimental psychology: participants are randomly assigned to one of two conditions, and we compare the means. Typical examples include: * Control vs. Treatment * Placebo vs. Drug * No-training vs. Training Our running example will again use a **stress_score** variable. When to Use the Independent-Samples t-Test ------------------------------------------ Use an independent-samples t-test when: * You have **two groups** that are **independent** of each other. (No person appears in both groups.) * Your dependent variable (DV) is **approximately continuous** and **approximately Normal** within each group. * You are interested in whether the **population means differ**: .. math:: H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \ne \mu_2 Here, :math:`\mu_1` is the population mean for group 1 and :math:`\mu_2` is the population mean for group 2. The Logic of the Independent-Samples t-Test ------------------------------------------- The basic logic mirrors the one-sample case: 1. **State the hypotheses** .. math:: H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \ne \mu_2 2. **Compute the observed difference in sample means** .. math:: \bar{x}_1 - \bar{x}_2 3. **Estimate the standard error of the difference (assuming equal variances)** When we assume the two populations have **equal variances**, we first compute a **pooled standard deviation**: .. math:: s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} where * :math:`n_1, n_2` are the group sample sizes * :math:`s_1, s_2` are the sample standard deviations Then the **standard error of the difference in means** is .. math:: \mathrm{SE}_{\bar{x}_1 - \bar{x}_2} = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}. .. admonition:: Modern Best Practice: Welch’s t-test vs. Pooled t-test The pooled-variance t-test above is the **classical Student’s t-test** taught in most introductory textbooks. It assumes that the two populations have **equal variances**. In real research, however, variances are often **not** equal, especially when group sizes differ. In those situations, the pooled test can have an inflated Type I error rate (too many false positives). Modern statistical software therefore defaults to **Welch’s t-test**, which does **not** assume equal variances and adjusts the degrees of freedom using the Welch–Satterthwaite equation. .. list-table:: Classical Pooled t-test vs. Welch’s t-test :header-rows: 1 * - Method - Variance assumption - Degrees of freedom - Typical use * - Pooled t-test - Assumes :math:`\sigma_1^2 = \sigma_2^2` - :math:`n_1 + n_2 - 2` - Teaching; balanced designs with similar spreads * - Welch’s t-test - Does **not** assume equal variances - Approximate df (Welch–Satterthwaite) - Modern default; safer when variances or :math:`n` differ In this chapter we focus on the **pooled** test to explain the logic of comparing two means. In applied work, however, it is good practice to **report Welch’s t-test as well**, especially when the variances or sample sizes are noticeably different. The PyStatsV1 Chapter 10 script prints **both** pooled and Welch results so you can see how they compare on the same data. 4. **Compute the t-statistic (pooled version)** .. math:: t_{\text{pooled}} = \frac{\bar{x}_1 - \bar{x}_2}{\mathrm{SE}_{\bar{x}_1 - \bar{x}_2}} with degrees of freedom .. math:: \mathrm{df}_{\text{pooled}} = n_1 + n_2 - 2. 5. **Find the p-value and make a decision (pooled version)** Under :math:`H_0`, the statistic :math:`t_{\text{pooled}}` follows a :math:`t` distribution with :math:`\mathrm{df}_{\text{pooled}}` degrees of freedom. For a two-sided test, the p-value is .. math:: p_{\text{pooled}} = 2 \cdot P\bigl(T_{\mathrm{df}_{\text{pooled}}} \ge |t_{\text{obs}}|\bigr). If :math:`p_{\text{pooled}} < \alpha` (typically :math:`0.05`), we **reject** :math:`H_0` and conclude that the group means differ. Effect Size: Cohen's d ---------------------- Statistical significance does not tell us **how large** the effect is. For independent groups with a pooled standard deviation, a common effect size is **Cohen's :math:`d`**: .. math:: d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}. Rough guidelines (Cohen, 1988): * :math:`d \approx 0.2` – small effect * :math:`d \approx 0.5` – medium effect * :math:`d \approx 0.8` – large effect Confidence Interval for the Mean Difference (Pooled Version) ------------------------------------------------------------ We can also construct a **confidence interval** for the difference in means: .. math:: (\bar{x}_1 - \bar{x}_2) \pm t_{\mathrm{crit}} \cdot \mathrm{SE}_{\bar{x}_1 - \bar{x}_2}, where :math:`t_{\mathrm{crit}}` is the critical :math:`t` value from the :math:`t` distribution with :math:`\mathrm{df}_{\text{pooled}} = n_1 + n_2 - 2` at your chosen :math:`\alpha` level (e.g., :math:`\alpha = 0.05` for a 95% CI). PyStatsV1 Lab: Independent-Samples t-Test on Stress Scores ---------------------------------------------------------- In this lab you will: 1. Generate a synthetic dataset of **stress scores** for two independent groups: * **control** * **treatment** 2. Compute sample means, standard deviations, and group sizes. 3. Compute the **pooled standard deviation** and **standard error**. 4. Compute the **independent-samples t-statistic (pooled version)** and its **two-sided p-value**. 5. Compute **Cohen's :math:`d`** as an effect size. 6. Construct a **95% confidence interval** for the difference in means. 7. Compute **Welch’s t-test** as a modern, variance-robust comparison. 8. Optionally, visualize the group means with error bars. All code for this lab lives in: * ``scripts/psych_ch10_independent_t.py`` and the script will optionally write outputs to: * ``data/synthetic/psych_ch10_independent_groups.csv`` * ``outputs/track_b/ch10_group_means_with_ci.png`` Running the Lab Script ~~~~~~~~~~~~~~~~~~~~~~ From the project root, run: .. code-block:: bash python -m scripts.psych_ch10_independent_t If your Makefile defines a convenience target, you can instead run: .. code-block:: bash make psych-ch10 This will: * Generate a synthetic dataset with two groups (e.g., 25 participants per group). * Compute the independent-samples t-test comparing **control** vs. **treatment** using the **pooled** version. * Compute **Welch’s t-test** on the same data as a **safety check**. * Compute Cohen's :math:`d` and a 95% confidence interval for the mean difference. * Print a short APA-style summary line. * Optionally, save a bar plot of the group means with error bars. Expected Console Output ~~~~~~~~~~~~~~~~~~~~~~~ Your exact numbers will vary, but the output will look similar to: :: Generated independent groups with n = 25 per condition Group: control mean = 18.48 SD = 8.74 n = 25 Group: treatment mean = 16.82 SD = 9.87 n = 25 --- Pooled-variance independent-samples t-test (classic Student's t) --- Mean difference (control - treatment) = 1.66 Pooled SD = 9.32 SE of difference = 2.64 df (pooled) = 48 t (pooled) = 0.63 Two-sided p-value (pooled) = 0.53 95% CI (pooled) for mean difference: [-3.65, 6.97] Cohen's d (pooled) = 0.18 --- Welch's t-test (modern default, equal_var = False) --- df (Welch) ≈ 45.2 t (Welch) = 0.61 Two-sided p-value (Welch) = 0.54 Wrote data to: data/synthetic/psych_ch10_independent_groups.csv Wrote plot to: outputs/track_b/ch10_group_means_with_ci.png Interpreting the Output ~~~~~~~~~~~~~~~~~~~~~~~ Focus on the following pieces: * **Mean difference**: How far apart are the sample means? * **t statistic and p-value (pooled vs. Welch)**: Do the methods agree about whether the difference is statistically significant? * **Confidence interval**: Does the 95% CI for :math:`\mu_1 - \mu_2` include zero? * **Cohen's :math:`d`**: How large is the effect in standardized units? Your Turn: Practice Scenarios ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Change the group means** In ``psych_ch10_independent_t.py``, try changing the assumed population means for the two groups. How does this affect the mean difference, t, and Cohen's :math:`d`? 2. **Change the sample size** Increase :math:`n` per group (e.g., from 25 to 100). Notice how the standard error shrinks and the test becomes more sensitive to small differences. 3. **Make the variances very different** Use very different standard deviations for the two groups. Compare the pooled and Welch results. How do the degrees of freedom and p-values differ? 4. **Practice APA-style reporting** Using the script output, practice writing a short APA-style sentence, e.g.: *"Participants in the treatment condition did not differ significantly from those in the control condition on stress scores, :math:`t(48) = 0.63`, :math:`p = .53`, :math:`d = 0.18` (pooled)."*