Chapter 20 – The Responsible Researcher (Conclusion)

Where this chapter fits in the story

Chapters 1–19 walked you through the tools of quantitative psychology:

From z-scores, \(t\) tests, and ANOVA,
Through correlation, regression, and mixed-model designs,
All the way to ANCOVA and non-parametric alternatives.

Along the way, the PyStatsV1 labs treated each analysis as production software:

Deterministic simulators with fixed random seeds.
Re-usable helper functions and command-line entry points.
Tests that automatically verify key properties of the results.

This final chapter zooms out from individual techniques to the broader question:

What does it mean to do responsible, cumulative, and reproducible research?

We will highlight three pillars:

Power analysis – planning sample sizes before you collect data.
Meta-analysis – combining evidence across studies.
Clear communication – writing honest, transparent summaries of what your data can (and cannot) support.

The chapter closes with a final PyStatsV1 project that guides you from raw data to an APA-style report, using the tools you developed in earlier labs.

20.1 Power analysis: Planning samples before you collect data

Why power matters

Every statistical test juggles four quantities:

Effect size (how big the effect really is).
Sample size :math:`N` (how much data you collect).
Significance level :math:`alpha` (your Type I error rate).
Power (the probability of detecting the effect if it is real).

Once you fix any three of these, the fourth is determined. Power analysis is about solving that equation on purpose instead of hoping that “\(N = 30\) per group” will magically be enough.

A study with very low power is problematic because:

True effects are often missed (high Type II error).
Effects that are detected tend to be over-estimated (“winner’s curse”).
Resources (time, money, participant goodwill) can be wasted on studies that were never likely to succeed.

A priori vs. post hoc power

A priori power analysis happens before you collect data.
- You specify a meaningful effect size (e.g., \(d = 0.5\) for a medium standardized mean difference).
- Choose \(\alpha\) (often 0.05) and desired power (often 0.80).
- Solve for \(N\) per group.
Post hoc (“observed”) power is calculated after the fact, using the effect size observed in your sample.
- This value is mostly a complicated re-expression of \(p\) and is rarely informative.
- In PyStatsV1 we emphasize a priori planning instead.

In the Chapter 20 lab script, we use pingouin to compute sample sizes for simple scenarios (e.g., independent-samples \(t\) tests), and we write the results to a small power grid for inspection.

Practical considerations

Some practical rules-of-thumb you will see in the wild:

If you expect a large effect (\(d \approx 0.8\)), smaller samples might be OK (but replication still matters).
If you expect a small effect (\(d \approx 0.2\)), you may need hundreds of participants per group to achieve good power.
In within-subjects designs, power is boosted by lower error variance (participants act as their own control).

Power analysis is ultimately ethical as well as technical. You are deciding how many people to involve, how much time to spend, and how likely your study is to make a cumulative contribution.

20.2 Meta-analysis: The study of studies

Individual experiments are noisy. Even well-planned studies will occasionally miss true effects or overstate them. Meta-analysis is a set of tools for combining evidence across multiple studies.

At a high level:

Each study contributes an effect size (e.g., Cohen’s \(d\), correlation \(r\)) and an estimate of its precision (e.g., a standard error or variance).
More precise studies (usually those with larger \(N\)) receive more weight.
A combined or pooled effect size is computed, along with confidence intervals, measures of heterogeneity, and often tests for moderation (do effects differ by method, sample, or context?).

Fixed-effect vs. random-effects models

In a fixed-effect meta-analysis, we assume all studies are estimating the same underlying true effect.
- Differences among studies are attributed only to sampling error.
- The pooled estimate answers the question: “What is the best estimate of the common effect size in this set of studies?”
In a random-effects meta-analysis, we allow the true effect to vary from study to study (e.g., different labs, populations, or protocols).
- We estimate both the typical effect and the heterogeneity among effects.
- The pooled estimate answers: “What is the average effect across a distribution of study contexts?”

In Chapter 20, we keep things simple with a fixed-effect illustration:

We simulate several “published” effect sizes with different sample sizes.
We compute a weighted mean effect size and its confidence interval.
We calculate a basic heterogeneity statistic (\(Q\) and \(I^2\)) to flag when the effects are more variable than chance alone would predict.

How this connects back to earlier chapters

Earlier chapters focused on within-study inference (what can we conclude from this experiment?). Meta-analysis steps back and asks:

How consistent are the effects across many experiments?
When results disagree, is it due to chance, small samples, or genuine differences in context or methods?
How can we make decisions (in policy, clinical practice, or scientific theory) that respect the whole body of evidence?

20.3 Communicating results responsibly

Statistical tools are only as useful as the stories we tell with them. Responsible communication means:

Being transparent about your methods (design, sampling, analytic choices).
Reporting effect sizes and confidence intervals, not just \(p\) values.
Discussing limitations and alternative explanations.
Being honest about the uncertainty that remains.

Writing the Discussion section

A good Discussion section typically:

Restates the research questions in plain language.
- What did we want to know?
- How does this connect to theory or prior research?
Summarizes the key findings without overstating them.
- Focus on patterns of results, not just individual \(p\) values.
- Tie back to effect sizes and confidence intervals.
Integrates with prior work.
- Do your results replicate or challenge previous findings?
- How might they fit into a broader meta-analytic picture?
Acknowledges limitations.
- Sample characteristics (e.g., only undergraduates from one university).
- Measurement issues (e.g., self-report scales, ceiling effects).
- Design constraints (e.g., no true random assignment).
Outlines future directions.
- What follow-up studies could clarify the story?
- How might improved design, larger samples, or different populations alter the conclusions?

The aim is not to sell your results, but to situate them – as one piece of a collaborative, cumulative effort.

20.4 PyStatsV1 Lab: A final project from raw data to APA report

The last PyStatsV1 lab is different from earlier chapters. Instead of a single, tightly scripted analysis, you will:

Choose a research question.
Select or import a dataset.
Design an analysis pipeline using the tools you have already implemented.
Generate a short APA-style report with transparent, reproducible code.

The Chapter 20 lab script provides a lightweight scaffold for this process.

What the Chapter 20 lab script does

The module scripts.psych_ch20_responsible_researcher includes three main components:

Power planning helper
- Uses pingouin.power_ttest() to compute required per-group sample sizes for different effect sizes and power levels.
- Writes a small CSV grid, outputs/track_b/ch20_power_grid.csv, that you can inspect or modify as you plan your own study.
Toy meta-analysis simulator
- Simulates several “studies” with varying sample sizes and effect sizes.
- Computes a fixed-effect pooled effect, confidence interval, and basic heterogeneity statistics \(Q\) and \(I^2\).
- Saves both the per-study table and a one-row summary to:
  - outputs/track_b/ch20_meta_studies.csv
  - outputs/track_b/ch20_meta_summary.csv
This is not meant to replace real meta-analysis software, but to demystify the core ideas using familiar PyStatsV1 tools.
Final project report template
- Creates a Markdown template at outputs/track_b/ch20_final_project_template.md.
- The template contains section headings and bullet prompts for:
  - Introduction & research questions
  - Methods (design, participants, measures, procedure)
  - Results (with placeholders for tables and figures generated by your PyStatsV1 scripts)
  - Discussion (including limitations and future directions)
  - Reproducibility notes (Git commit hash, random seeds, and CLI commands used)
You can open this file in any text editor or import it into a reference manager / writing tool.

Suggested project workflow

Here is a possible end-to-end workflow for your final project:

Pick a question
- Example: “Does a brief mindfulness exercise reduce stress scores relative to a control condition?”
- Example: “Is there an association between sleep quality and exam performance?”
Choose a dataset
- Start with one of the PyStatsV1 synthetic datasets (e.g., the sleep study or exam performance data), or
- Import a small real dataset of your own – but keep it simple enough to analyze reproducibly in a single notebook or script.
Plan your analysis
- Identify the appropriate model (t-test, ANOVA, mixed-model, regression, non-parametric alternative, etc.).
- Use the power helper in Chapter 20 to think about how many participants would be needed to replicate or extend your findings.
Run the analysis with PyStatsV1 tools
- Reuse simulation and analysis helpers from earlier chapters.
- Save any intermediate tables or figures in the consistent data and outputs directories.
Write the report using the template
- Copy ch20_final_project_template.md to a new location (or new filename) and gradually fill in each section.
- Whenever you report a result, note which script or function produced it.
Record reproducibility details
- Save your final notebook or script under version control.
- Record the Git commit hash and any command-line calls (e.g., make psych-ch16) that reproduce your figures and tables.
- Share both the code and narrative with collaborators or instructors.

Running the Chapter 20 lab

To run the Chapter 20 lab script from the project root:

make psych-ch20

This target runs:

python -m scripts.psych_ch20_responsible_researcher

To run only the tests for this chapter:

make test-psych-ch20

which wraps:

pytest tests/test_psych_ch20_responsible_researcher.py

Conceptual summary

Responsible research begins before data collection with thoughtful design and power analysis.
Meta-analysis helps synthesize evidence across studies, revealing both typical effects and meaningful differences across contexts.
Clear, honest communication – especially around uncertainty and limitations – is as important as any statistical computation.
The PyStatsV1 ecosystem encourages you to treat your analyses like production software:
- Deterministic, version-controlled, and reproducible.
- Easy to rerun, extend, and audit.
- Ready to support cumulative, collaborative science.

As you move on to more advanced courses or independent research, you can treat this mini-book (and its code) as a launch pad. The goal is not to memorize every formula, but to internalize a way of working:

Don’t just calculate your results — engineer them. We treat statistical analysis like production software. — PyStatsV1 Motto