Intro Stats 4 - Confidence intervals ==================================== This is Part 4 of the **Intro Stats case study pack**. A **confidence interval (CI)** is a range of values that is meant to capture plausible values for a population parameter. Here the parameter is: *the difference in mean score between treatment and control.* In this part you will: 1. learn what a confidence interval is (and what it is *not*), 2. compute two 95% CIs for the mean difference, 3. compare a formula-based method vs a simulation-based method, and 4. practice interpreting results in plain language. Big picture ----------- So far, you have a sample difference in means: * :math:`\bar{x}_{treat} - \bar{x}_{control}` But samples vary. If you repeated the study with new students, the difference would not be identical. A **confidence interval** answers: **“Given the data we observed, what range of mean differences is plausible?”** Run --- From inside your workbook folder: .. code-block:: bash pystatsv1 workbook run intro_stats_04_confidence_intervals Or directly: .. code-block:: bash python scripts/intro_stats_04_confidence_intervals.py What gets created ----------------- Outputs go to: * ``outputs/case_studies/intro_stats/`` You should see: * ``ci_mean_diff_welch_95.csv`` - Welch CI endpoints (formula-based) * ``ci_mean_diff_bootstrap_95.csv`` - bootstrap CI endpoints (simulation-based) Inspect ------- Step 1: open both CSVs ~~~~~~~~~~~~~~~~~~~~~~ Open both CI tables and compare: * Are both intervals mostly above 0? * Are they similar width? * If they differ, which one is wider and why might that be? Step 2: connect to the story ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Remember the research question: * Do students in the **treatment** group score higher than students in the **control** group? Now interpret your CI(s): * If the entire CI is **above 0**, that supports “treatment tends to score higher.” * If the CI includes **0**, the data are consistent with “no difference” (at least at this sample size). Concepts (plain language) ------------------------- What a 95% CI means (the repeated-sampling idea) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A 95% CI is often explained like this: If you repeated the entire study many times and built a 95% CI each time, then about 95% of those intervals would include the true population mean difference. This is a repeated-sampling idea about the *method*, not a probability statement about one interval. What a 95% CI does *not* mean ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is **not** correct to say: * “There is a 95% chance the true mean difference is in this interval.” That sentence sounds natural, but it is not the classical interpretation. What you *can* say safely (for this course) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For this Workbook, use plain language that stays accurate: * “A reasonable range of mean differences, given the data, is from A to B.” * “This range is mostly above 0, which supports a positive effect.” * “This range includes 0, so the data do not rule out no difference.” Welch CI vs bootstrap CI ------------------------ These are two different ways to get uncertainty around the mean difference. 1) Welch t-based CI (formula-based) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Uses a classic formula. * Good default for comparing means when variances may differ. * Often taught early in intro stats because it is fast and widely used. Why “Welch” matters: * It does **not** assume equal variances between groups. * That makes it safer in many real datasets. 2) Bootstrap percentile CI (simulation-based) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Uses resampling (simulation). * Intuition: “What mean differences would we see if we repeatedly resampled from the observed data?” * Great for beginners because you can *see* uncertainty as repetition. A percentile CI: * builds a distribution of simulated mean differences * takes the 2.5th and 97.5th percentiles as the endpoints When both agree, that is reassuring. When they differ, you’ve learned something about sample size, skew, or variability. Worked problems --------------- Worked problem A: interpreting a CI in words ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose a CI table shows: .. code-block:: text mean_diff, ci_low, ci_high 7.2, 2.1, 12.4 A strong plain-language interpretation: * “We estimate the treatment group scored about 7 points higher than control.” * “A reasonable range for the true mean difference is about 2 to 12 points.” * “Because the interval is above 0, the data support higher scores for treatment.” Worked problem B: what if the CI includes 0? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose you see: .. code-block:: text mean_diff, ci_low, ci_high 1.3, -2.5, 5.1 Interpretation: * “The best estimate is about 1.3 points higher for treatment.” * “But values from about -2.5 to 5.1 are plausible.” * “Because 0 is inside the interval, the data are consistent with no difference.” That does *not* mean “no effect.” It means the data are not precise enough to rule out zero. Reproducibility checkpoint -------------------------- Try rerunning the CI script: .. code-block:: bash pystatsv1 workbook run intro_stats_04_confidence_intervals pystatsv1 workbook run intro_stats_04_confidence_intervals You should get the same files. **Note:** If a method uses randomness (bootstrap), it should still be deterministic in this Workbook because the script sets a seed. Using your own data (student workflow) -------------------------------------- To use these scripts on your own dataset, you need: * two groups (control/treatment), and * a numeric outcome (score). Option 1: Replace rows in the dataset (fastest) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This reuses the exact same scripts without editing code. 1) Back up the original dataset: .. code-block:: bash cp data/intro_stats_scores.csv data/intro_stats_scores_backup.csv 2) Edit the CSV in Notepad (or any text editor): .. code-block:: bash notepad data/intro_stats_scores.csv 3) Keep the header exactly: .. code-block:: text id,group,score and paste your rows underneath. 4) Run the CI script: .. code-block:: bash pystatsv1 workbook run intro_stats_04_confidence_intervals 5) Inspect outputs: * ``outputs/case_studies/intro_stats/ci_mean_diff_welch_95.csv`` * ``outputs/case_studies/intro_stats/ci_mean_diff_bootstrap_95.csv`` 6) Restore the original when finished: .. code-block:: bash mv data/intro_stats_scores_backup.csv data/intro_stats_scores.csv Option 2: Use the general “My Own Data” scaffold ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If your columns do not match ``id,group,score``, use: .. code-block:: bash pystatsv1 workbook run my_data_01_explore pystatsv1 workbook check my_data That workflow helps clean types, missingness, and column naming before doing inference. Common pitfalls (quick fixes) ----------------------------- * If your CI is “weirdly huge,” check for outliers (Part 3). * If your CI is “weirdly tight,” confirm units and data-entry. * If the bootstrap CI changes run-to-run, confirm the script sets a random seed. Next ---- Go to :doc:`intro_stats_05_hypothesis_testing` to run a simulation-based hypothesis test and compute an effect size.