Chapter 20 – The Responsible Researcher (Conclusion)
====================================================

Where this chapter fits in the story
------------------------------------

Chapters 1–19 walked you through the **tools** of quantitative psychology:

* From z-scores, :math:`t` tests, and ANOVA,
* Through correlation, regression, and mixed-model designs,
* All the way to ANCOVA and non-parametric alternatives.

Along the way, the PyStatsV1 labs treated each analysis as **production
software**:

* Deterministic simulators with fixed random seeds.
* Re-usable helper functions and command-line entry points.
* Tests that automatically verify key properties of the results.

This final chapter zooms out from individual techniques to the broader
question:

.. epigraph::

   *What does it mean to do responsible, cumulative, and reproducible
   research?*

We will highlight three pillars:

1. **Power analysis** – planning sample sizes before you collect data.
2. **Meta-analysis** – combining evidence across studies.
3. **Clear communication** – writing honest, transparent summaries of
   what your data can (and cannot) support.

The chapter closes with a **final PyStatsV1 project** that guides you
from raw data to an APA-style report, using the tools you developed in
earlier labs.

20.1 Power analysis: Planning samples before you collect data
-------------------------------------------------------------

Why power matters
~~~~~~~~~~~~~~~~~

Every statistical test juggles four quantities:

* **Effect size** (how big the effect really is).
* **Sample size :math:`N`** (how much data you collect).
* **Significance level :math:`\alpha`** (your Type I error rate).
* **Power** (the probability of detecting the effect if it is real).

Once you fix any three of these, the fourth is determined. Power
analysis is about *solving that equation on purpose* instead of hoping
that “:math:`N = 30` per group” will magically be enough.

A study with very low power is problematic because:

* True effects are often *missed* (high Type II error).
* Effects that *are* detected tend to be over-estimated (“winner’s
  curse”).
* Resources (time, money, participant goodwill) can be wasted on studies
  that were never likely to succeed.

A priori vs. post hoc power
~~~~~~~~~~~~~~~~~~~~~~~~~~~

* **A priori power analysis** happens *before* you collect data.

  - You specify a meaningful effect size (e.g., :math:`d = 0.5` for a
    medium standardized mean difference).
  - Choose :math:`\alpha` (often 0.05) and desired power (often 0.80).
  - Solve for :math:`N` per group.

* **Post hoc (“observed”) power** is calculated *after* the fact, using
  the effect size observed in your sample.

  - This value is mostly a complicated re-expression of :math:`p` and
    is rarely informative.
  - In PyStatsV1 we emphasize **a priori** planning instead.

In the Chapter 20 lab script, we use :mod:`pingouin` to compute sample
sizes for simple scenarios (e.g., independent-samples :math:`t` tests),
and we write the results to a small **power grid** for inspection.

Practical considerations
~~~~~~~~~~~~~~~~~~~~~~~~

Some practical rules-of-thumb you will see in the wild:

* If you expect a *large* effect (:math:`d \approx 0.8`), smaller
  samples might be OK (but replication still matters).
* If you expect a *small* effect (:math:`d \approx 0.2`), you may need
  hundreds of participants per group to achieve good power.
* In within-subjects designs, power is boosted by lower error variance
  (participants act as their own control).

Power analysis is ultimately **ethical** as well as technical. You are
deciding how many people to involve, how much time to spend, and how
likely your study is to make a cumulative contribution.

20.2 Meta-analysis: The study of studies
----------------------------------------

Individual experiments are noisy. Even well-planned studies will
occasionally miss true effects or overstate them. **Meta-analysis** is a
set of tools for combining evidence across multiple studies.

At a high level:

* Each study contributes an **effect size** (e.g., Cohen’s :math:`d`,
  correlation :math:`r`) and an **estimate of its precision**
  (e.g., a standard error or variance).
* More precise studies (usually those with larger :math:`N`) receive
  **more weight**.
* A combined or **pooled effect size** is computed, along with
  confidence intervals, measures of heterogeneity, and often tests for
  moderation (do effects differ by method, sample, or context?).

Fixed-effect vs. random-effects models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* In a **fixed-effect** meta-analysis, we assume all studies are
  estimating the *same* underlying true effect.

  - Differences among studies are attributed only to sampling error.
  - The pooled estimate answers the question: “What is the best
    estimate of the common effect size in this set of studies?”

* In a **random-effects** meta-analysis, we allow the true effect to
  *vary* from study to study (e.g., different labs, populations, or
  protocols).

  - We estimate both the *typical* effect and the *heterogeneity* among
    effects.
  - The pooled estimate answers: “What is the average effect across a
    distribution of study contexts?”

In Chapter 20, we keep things simple with a **fixed-effect illustration**:

* We simulate several “published” effect sizes with different sample
  sizes.
* We compute a weighted mean effect size and its confidence interval.
* We calculate a basic heterogeneity statistic (:math:`Q` and :math:`I^2`)
  to flag when the effects are more variable than chance alone would
  predict.

How this connects back to earlier chapters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Earlier chapters focused on **within-study** inference (what can we
conclude from *this* experiment?). Meta-analysis steps back and asks:

* How consistent are the effects across **many** experiments?
* When results disagree, is it due to chance, small samples, or genuine
  differences in context or methods?
* How can we make decisions (in policy, clinical practice, or
  scientific theory) that respect the *whole* body of evidence?

20.3 Communicating results responsibly
--------------------------------------

Statistical tools are only as useful as the **stories we tell** with
them. Responsible communication means:

* Being **transparent** about your methods (design, sampling, analytic
  choices).
* Reporting **effect sizes** and **confidence intervals**, not just
  :math:`p` values.
* Discussing **limitations** and **alternative explanations**.
* Being honest about the **uncertainty** that remains.

Writing the Discussion section
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A good Discussion section typically:

1. **Restates the research questions** in plain language.

   - What did we want to know?
   - How does this connect to theory or prior research?

2. **Summarizes the key findings** without overstating them.

   - Focus on patterns of results, not just individual :math:`p` values.
   - Tie back to effect sizes and confidence intervals.

3. **Integrates with prior work**.

   - Do your results replicate or challenge previous findings?
   - How might they fit into a broader meta-analytic picture?

4. **Acknowledges limitations**.

   - Sample characteristics (e.g., only undergraduates from one
     university).
   - Measurement issues (e.g., self-report scales, ceiling effects).
   - Design constraints (e.g., no true random assignment).

5. **Outlines future directions**.

   - What follow-up studies could clarify the story?
   - How might improved design, larger samples, or different populations
     alter the conclusions?

The aim is not to **sell** your results, but to **situate** them – as
one piece of a collaborative, cumulative effort.

20.4 PyStatsV1 Lab: A final project from raw data to APA report
---------------------------------------------------------------

The last PyStatsV1 lab is different from earlier chapters. Instead of a
single, tightly scripted analysis, you will:

1. Choose a **research question**.
2. Select or import a **dataset**.
3. Design an analysis pipeline using the tools you have already
   implemented.
4. Generate a short **APA-style report** with transparent, reproducible
   code.

The Chapter 20 lab script provides a lightweight scaffold for this
process.

What the Chapter 20 lab script does
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The module :mod:`scripts.psych_ch20_responsible_researcher` includes
three main components:

1. **Power planning helper**

   * Uses :func:`pingouin.power_ttest` to compute required
     per-group sample sizes for different effect sizes and power levels.
   * Writes a small CSV grid,
     :file:`outputs/track_b/ch20_power_grid.csv`, that you can inspect
     or modify as you plan your own study.

2. **Toy meta-analysis simulator**

   * Simulates several “studies” with varying sample sizes and effect
     sizes.
   * Computes a **fixed-effect pooled effect**, confidence interval, and
     basic heterogeneity statistics :math:`Q` and :math:`I^2`.
   * Saves both the per-study table and a one-row summary to:

     - :file:`outputs/track_b/ch20_meta_studies.csv`
     - :file:`outputs/track_b/ch20_meta_summary.csv`

   This is **not** meant to replace real meta-analysis software, but to
   demystify the core ideas using familiar PyStatsV1 tools.

3. **Final project report template**

   * Creates a Markdown template at
     :file:`outputs/track_b/ch20_final_project_template.md`.
   * The template contains section headings and bullet prompts for:

     - Introduction & research questions
     - Methods (design, participants, measures, procedure)
     - Results (with placeholders for tables and figures generated by
       your PyStatsV1 scripts)
     - Discussion (including limitations and future directions)
     - Reproducibility notes (Git commit hash, random seeds, and CLI
       commands used)

   You can open this file in any text editor or import it into a
   reference manager / writing tool.

Suggested project workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is a possible end-to-end workflow for your final project:

1. **Pick a question**

   * Example: “Does a brief mindfulness exercise reduce stress scores
     relative to a control condition?”
   * Example: “Is there an association between sleep quality and exam
     performance?”

2. **Choose a dataset**

   * Start with one of the PyStatsV1 synthetic datasets (e.g., the
     sleep study or exam performance data), or
   * Import a small real dataset of your own – but keep it simple
     enough to analyze reproducibly in a single notebook or script.

3. **Plan your analysis**

   * Identify the appropriate model (t-test, ANOVA, mixed-model,
     regression, non-parametric alternative, etc.).
   * Use the power helper in Chapter 20 to think about how many
     participants would be needed to replicate or extend your findings.

4. **Run the analysis with PyStatsV1 tools**

   * Reuse simulation and analysis helpers from earlier chapters.
   * Save any intermediate tables or figures in the consistent
     :mod:`data` and :mod:`outputs` directories.

5. **Write the report using the template**

   * Copy :file:`ch20_final_project_template.md` to a new location
     (or new filename) and gradually fill in each section.
   * Whenever you report a result, note which script or function
     produced it.

6. **Record reproducibility details**

   * Save your final notebook or script under version control.
   * Record the Git commit hash and any command-line calls (e.g.,
     ``make psych-ch16``) that reproduce your figures and tables.
   * Share both the **code** and **narrative** with collaborators or
     instructors.

Running the Chapter 20 lab
--------------------------

To run the Chapter 20 lab script from the project root:

.. code-block:: bash

   make psych-ch20

This target runs:

.. code-block:: bash

   python -m scripts.psych_ch20_responsible_researcher

To run only the tests for this chapter:

.. code-block:: bash

   make test-psych-ch20

which wraps:

.. code-block:: bash

   pytest tests/test_psych_ch20_responsible_researcher.py

Conceptual summary
------------------

* Responsible research begins **before** data collection with thoughtful
  design and power analysis.
* Meta-analysis helps synthesize evidence across studies, revealing
  both typical effects and meaningful differences across contexts.
* Clear, honest communication – especially around uncertainty and
  limitations – is as important as any statistical computation.
* The PyStatsV1 ecosystem encourages you to treat your analyses like
  **production software**:

  - Deterministic, version-controlled, and reproducible.
  - Easy to rerun, extend, and audit.
  - Ready to support cumulative, collaborative science.

As you move on to more advanced courses or independent research, you can
treat this mini-book (and its code) as a **launch pad**. The goal is not
to memorize every formula, but to internalize a way of working:

.. epigraph::

   *Don’t just calculate your results — engineer them. We treat statistical analysis like production software. — PyStatsV1 Motto*