Windows 11 setup (students) =========================== This guide is for students who want to run the **Workbook** on Windows 11 using **Git Bash** (a terminal) and **PyPI** (no cloning the repo required). If you already have Python installed and can run ``python --version`` in Git Bash, you can skip to :doc:`quickstart`. What you need ------------- Required: * **Python 3.10+** * **Git Bash** (installed via Git for Windows) Optional: * **make** (only needed for some developer workflows; the Workbook does **not** require it) 1) Install Python (3.10+) ------------------------- Install Python using the official Windows installer. During installation, make sure you enable: * **Add python.exe to PATH** (this makes ``python`` work in Git Bash) * **Disable path length limit** (recommended when Windows offers it) After installing, open **Git Bash** and confirm: .. code-block:: bash python --version python -m pip --version If ``python`` is not found, try the Windows Python launcher: .. code-block:: bash py -V py -m pip --version Then use ``py`` anywhere this Workbook documentation uses ``python``. 2) Install Git Bash ------------------- Install **Git for Windows**, which includes **Git Bash**. Confirm Git Bash is working: .. code-block:: bash git --version 3) Optional: Install make ------------------------- You can skip this step if you are only doing the Workbook. If you already have ``make`` installed: .. code-block:: bash make --version If your course/instructor asks you to install it, one common approach on Windows is to install it via a Windows package manager. If you already have **Chocolatey** installed, you can run this in PowerShell (**Run as Administrator**): .. code-block:: powershell choco install make -y Then verify in Git Bash: .. code-block:: bash make --version 4) Create a workbook folder + virtual environment ------------------------------------------------- Pick a folder you want to work in. In Git Bash: .. code-block:: bash mkdir -p ~/pystatsv1-workbook cd ~/pystatsv1-workbook Create and activate a virtual environment (recommended): .. code-block:: bash python -m venv .venv # Windows (Git Bash) source .venv/Scripts/activate Upgrade pip and install the Workbook extra from PyPI: .. code-block:: bash python -m pip install --upgrade pip python -m pip install "pystatsv1[workbook]" 5) Initialize and run the Workbook ---------------------------------- Create the Workbook starter folder: .. code-block:: bash pystatsv1 workbook init --dest my_workbook cd my_workbook List chapters (sanity check): .. code-block:: bash pystatsv1 workbook list Run a chapter and check your work: .. code-block:: bash pystatsv1 workbook run ch10 pystatsv1 workbook check ch10 Keeping PyStatsV1 up to date ---------------------------- To update to the latest version on PyPI: .. code-block:: bash python -m pip install --upgrade "pystatsv1[workbook]" If something goes wrong ----------------------- Common fixes (PATH, venv activation, missing pytest, etc.) are covered here: * :doc:`troubleshooting` Understanding the setup commands (Windows + Git Bash) ----------------------------------------------------- When you see commands like these:: python -m venv pystatsv1-env source pystatsv1-env/Scripts/activate python -m pip install -U pip python -m pip install "pystatsv1[workbook]" pystatsv1 doctor pystatsv1 workbook init here is what they mean. 1) Create a virtual environment ------------------------------- ``python -m venv pystatsv1-env`` creates an isolated Python environment (a folder named ``pystatsv1-env``). This keeps Workbook packages separate from your system Python and other projects. 2) Activate the virtual environment ----------------------------------- ``source pystatsv1-env/Scripts/activate`` turns the environment *on*. You can tell it worked when your prompt changes to include:: (pystatsv1-env) From now on, ``python`` and ``pip`` refer to the environment's Python and installer. Tip: If you close Git Bash, you must activate again next time. 3) Upgrade pip (inside the environment) --------------------------------------- ``python -m pip install -U pip`` updates the package installer to a newer version. If you see messages like "Requirement already satisfied" followed by an upgrade, that's normal. 4) Install PyStatsV1 + Workbook dependencies -------------------------------------------- ``python -m pip install "pystatsv1[workbook]"`` installs PyStatsV1 plus the extra libraries needed for the Workbook (for example: numpy, pandas, scipy, statsmodels, matplotlib, pingouin, etc.). The ``[workbook]`` part is important: it means "include the workbook extras". 5) Check your environment ------------------------- ``pystatsv1 doctor`` runs a quick health check. If you see:: [OK] Environment looks good. then your install is working. 6) Create your workbook starter folder -------------------------------------- ``pystatsv1 workbook init`` creates a new folder named ``pystatsv1_workbook`` in your current directory. That folder contains the starter files used in the workbook chapters. Next steps usually look like: * ``cd pystatsv1_workbook`` * ``pystatsv1 workbook run ch10`` * ``pystatsv1 workbook check ch10`` Chapter 11 note (Windows + Git Bash): missing helper file + PYTHONPATH ---------------------------------------------------------------------- Most students can run Chapter 11 the same way as Chapter 10:: pystatsv1 workbook run ch11 pystatsv1 workbook check ch11 However, on some Windows setups you may see an error like:: ModuleNotFoundError: No module named 'scripts.psych_ch11_paired_t' This means your workbook starter folder is missing a small helper file (``scripts/psych_ch11_paired_t.py``). You can fix it in under a minute. Step A — Confirm the file is missing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From inside your workbook folder (``pystatsv1_workbook``), run:: ls scripts/psych_ch11_paired_t.py If you see "No such file or directory", continue to Step B. Step B — Download the missing helper file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Still inside ``pystatsv1_workbook``, run:: curl -L -o scripts/psych_ch11_paired_t.py \ https://raw.githubusercontent.com/pystatsv1/PyStatsV1/main/scripts/psych_ch11_paired_t.py Step C — Re-run the Chapter 11 checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now this should pass:: pystatsv1 workbook check ch11 Step D — Run Chapter 11 (use PYTHONPATH on Windows Git Bash) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Windows + Git Bash, Chapter 11 may require ``PYTHONPATH=.`` so Python can find the ``scripts`` package. Run:: PYTHONPATH=. pystatsv1 workbook run ch11 Tip: If you ever see "No module named 'scripts'" on Windows Git Bash, retry the command with ``PYTHONPATH=.`` in front. Where are my outputs? ~~~~~~~~~~~~~~~~~~~~~ Chapter 11 writes results to the ``outputs`` folder. To open it quickly:: explorer outputs How to do the Study Habits Case Study Pack (TA instructions) ------------------------------------------------------------ This case study is designed to feel like a mini-course: one dataset, one story, and a short sequence of runs that produce outputs you can inspect and (optionally) submit. Before you start ~~~~~~~~~~~~~~~~ 1) Make sure your virtual environment is active (you should see something like ``(.venv)`` or ``(pystatsv1-env)`` in your prompt. 2) Make sure you are inside your workbook folder and then cd. ``cd /c/Users//.../pystatsv1_workbook`` in your prompt. Step 0 — Confirm the case study files are present ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These commands should succeed (no "No such file" errors):: ls data/study_habits.csv ls scripts/study_habits_01_explore.py ls scripts/study_habits_02_anova.py Step 1 — Run Part 1 (Explore) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run the explore script:: pystatsv1 workbook run study_habits_01_explore What to do next: * Open the outputs folder and look at the generated files (tables/plots):: explorer outputs/case_studies/study_habits * Read the dataset columns (group, pretest/posttest/retention, hours, sleep, etc.) and write 2–3 sentences describing the study design (groups + repeated tests). Step 2 — Run Part 2 (One-way ANOVA) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run the ANOVA script:: pystatsv1 workbook run study_habits_02_anova What to record: * Your ANOVA result (F, df, p-value). * The direction of the effect (which group is highest/lowest on posttest_score). * A short plain-English conclusion linked to the story. Step 3 — Check your work ~~~~~~~~~~~~~~~~~~~~~~~~ Run the case study check:: pystatsv1 workbook check study_habits If it passes, you have the correct dataset and the expected effect pattern. If it fails: * Re-run Part 1 and Part 2. * Confirm you are running commands from the workbook root folder. * Ask the TA to look at the first error message shown by ``check`` (it usually points to the exact missing file or output). Optional — Save your work for submission ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If your instructor requests a submission, a common approach is to zip the case study outputs folder:: cd outputs/case_studies # (zip using File Explorer: right-click study_habits -> Send to -> Compressed folder) At minimum, keep: * the generated plots/tables in ``outputs/case_studies/study_habits/`` * your written answers (or worksheet) for the story + ANOVA interpretation Where this goes next (mini-course path) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once this case study is working, you can reuse the same dataset later: * Ch10 (ANOVA): compare groups on posttest_score * Ch18 (ANCOVA): add pretest_score as a covariate * Ch19 (Regression): predict posttest_score from study_hours_per_week, sleep, etc. * Ch20 (Nonparametric): compare groups using rank-based alternatives Quick checklist (use this for every chapter) -------------------------------------------- For each chapter or case study, follow this loop: 1) Run it:: pystatsv1 workbook run 2) Inspect the outputs folder (plots/tables/CSVs):: explorer outputs 3) Write your short interpretation (what question, what result, what conclusion). 4) Check your work:: pystatsv1 workbook check Only move on when ``check`` passes. Where did my outputs go? ------------------------ Most workbook tasks write files under an ``outputs/`` folder. Common locations you may see: * Workbook folder outputs (most common):: /outputs/... * Environment outputs (rare, but possible on Windows if a script uses package paths):: .../.venv/Lib/outputs/... If you ran a script and ``explorer outputs`` looks empty, use the path printed by the command output (copy/paste it into File Explorer), or search for the results file name:: # Example: find all CSV outputs under the workbook folder find . -name "*.csv" | head Tip: In Git Bash, ``explorer .`` opens the current folder in File Explorer. Study Habits case study: what to write down ------------------------------------------- After running: * ``pystatsv1 workbook run study_habits_01_explore`` * ``pystatsv1 workbook run study_habits_02_anova`` Record these items in your notes or worksheet: 1) Design (1 sentence) ~~~~~~~~~~~~~~~~~~~~~~ Example: * "We compare three study strategies (control, flashcards, spaced) on posttest performance." 2) Key descriptive result (1 sentence) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use ``group_summary.csv`` to report which group has the highest mean posttest score. 3) Inferential result (1 sentence) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From the ANOVA table: * Report F(df1, df2), p-value, and effect size (``np2``). 4) Plain-English conclusion (1 sentence) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Example: * "Posttest scores differ by study strategy, with spaced repetition performing best in this dataset." Files to look at: * ``outputs/case_studies/study_habits/group_summary.csv`` * ``outputs/case_studies/study_habits/anova_posttest_by_group.csv`` "My Own Data" mini-guide (TA instructions) ------------------------------------------ This section helps you run the same **Run → Inspect → Check** workflow on a dataset you choose. The goal is not perfection — the goal is to practice getting your data into a clean CSV and generating sensible summaries and plots. Before you start ~~~~~~~~~~~~~~~~ 1) Make sure your virtual environment is active (you should see ``(.venv)`` or ``(pystatsv1-env)`` in your prompt). 2) Make sure you are inside your workbook folder ``cd /c/Users//.../pystatsv1_workbook`` Step 0 — Make a backup copy of the template (recommended) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before you edit the template, make a copy so you can always revert:: cp data/my_data.csv data/my_data_backup.csv Step 1 — Put your data into the template ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open ``data/my_data.csv`` in Excel (or another spreadsheet editor) and replace the example rows with your own data. Rules of thumb for "clean data": * One row = one observation (one person, one trial, one day, etc.) * One column = one variable (group, score, hours, temperature, ...) * Use simple column names (letters, numbers, underscores) * Keep column types consistent (numbers stay numeric; text stays text) * Use empty cells (or ``NA``) for missing values and be consistent Save as **CSV** (not XLSX) when you are done. Step 2 — Run the scaffold explore script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From your workbook folder, run:: pystatsv1 workbook run my_data_01_explore If your CSV is stored somewhere else, you can run the script directly:: python scripts/my_data_01_explore.py --csv path/to/your.csv --outdir outputs/my_data Step 3 — Inspect the outputs (what to look at first) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open the output folder in File Explorer:: explorer outputs/my_data Start with these tables: * ``outputs/my_data/tables/missingness.csv`` (which columns have missing values) * ``outputs/my_data/tables/numeric_summary.csv`` (means, SDs, min/max, etc.) Also check the plots folder: * ``outputs/my_data/plots/`` (histograms/boxplots depending on your data) Step 4 — Check your work (smoke test) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run the matching check:: pystatsv1 workbook check my_data If it passes, your CSV is readable and the script produced the expected outputs. If it fails, read the first error message carefully — it usually points to: * a missing file * a column name mismatch * a column that looks numeric but is actually text Step 5 — Customize the script for your column names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open ``scripts/my_data_01_explore.py`` and find:: # === Student edits start here === ID_COL = "id" GROUP_COL = "group" OUTCOME_COL = "outcome" If your CSV uses different column names, change the values to match exactly. Then re-run:: pystatsv1 workbook run my_data_01_explore pystatsv1 workbook check my_data Common problems (quick fixes) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * "Column not found": check spelling and capitalization in your CSV header row. * Numbers treated as text: remove commas (``1,200`` -> ``1200``) and avoid mixing words with numbers in the same column. * Missing values: use empty cells or ``NA`` consistently. * If you get stuck: run the template CSV first (unchanged) to confirm your setup works. Try "My Own Data" with Notepad (copy/paste example + expected results) ---------------------------------------------------------------------- If you don’t want to use Excel yet, you can edit the CSV directly in **Notepad**. Step 1 — Open the template CSV in Notepad ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From inside your workbook folder (``pystatsv1_workbook``), run:: notepad data/my_data.csv Replace the entire file contents with the example below (keep the header row). Then **save** and close Notepad. .. code-block:: text id,group,outcome,x1,x2 1,control,73,2.1,10 2,control,69,1.8,11 3,control,75,2.4,9 4,control,70,2.0,10 5,control,68,1.7, 6,control,74,2.3,9 7,treatment,82,3.0,12 8,treatment,79,2.7,13 9,treatment,85,3.2,12 10,treatment,81,2.9,14 11,treatment,88,3.4,13 12,treatment,,3.1,12 Notes: * Row 5 has a **missing x2** value (empty after the last comma). * Row 12 has a **missing outcome** value (empty after the second comma). * Missing values are allowed — they help you practice checking missingness. Step 2 — Run the scaffold script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run:: pystatsv1 workbook run my_data_01_explore You should see a quick report similar to this (formatting may vary slightly): .. code-block:: text My Own Data — quick report ======================== rows: 12 cols: 5 group: 2 group(s) - 'control': 6 - 'treatment': 6 numeric columns: - id - outcome - x1 - x2 Wrote outputs to: .../outputs/my_data Step 3 — Inspect the outputs you just created ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open the folder:: explorer outputs/my_data Start with these tables: * ``outputs/my_data/tables/missingness.csv`` Expected highlights for this example dataset: * ``outcome`` has **1** missing value (about **8.33%**) * ``x2`` has **1** missing value (about **8.33%**) * ``outputs/my_data/tables/group_means.csv`` Expected highlights for this example dataset: * control outcome mean ≈ **71.5** * treatment outcome mean ≈ **83.0** Also check the plot: * ``outputs/my_data/plots/numeric_histograms.png`` Step 4 — Check (smoke test) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run:: pystatsv1 workbook check my_data If it passes, your CSV is readable and the script is producing outputs correctly. Now you can replace the example rows with your own data and repeat: Run → Inspect → Check.