PyStatsV1 documentation
Welcome to the documentation for PyStatsV1 – chapter-based applied statistics examples in plain Python, mirroring classical R textbook analyses.
Students (recommended): install from PyPI and start the Workbook
If you’re a student (not a developer), the easiest path is to install the Workbook bundle from PyPI and let the CLI create a local copy of the labs for you.
python -m pip install "pystatsv1[workbook]"
pystatsv1 workbook init --dest my_workbook
Then run the built-in checks as you work:
cd my_workbook
pystatsv1 workbook check
If you’re on Windows 11 and this is your first time installing Python, start here: Windows 11 setup (students).
Lightweight install (no Workbook checks)
If you only want the core helper package (without the Workbook checks bundle), you can install the base package directly from PyPI:
python -m pip install pystatsv1
Developers / Contributing
If you want the full chapter-based labs (simulators, scripts, Makefile targets, and tests), clone the GitHub repository and install in editable mode:
git clone https://github.com/pystatsv1/PyStatsV1.git
cd PyStatsV1
python -m pip install -e .
See Getting started and Contributing for the contributor workflow.
Workbook – Student Labs (recommended starting point)
Track A – Applied Statistics with Python (Regression)
- Getting started
- Introduction: How to study applied statistics with Python and R
- Chapter 2 – Getting started with R (for Python-first learners)
- Applied Statistics with Python – Chapter 3
- Data and Programming (Python-first view)
- 3.1 Data Types
- 3.2 Data Structures: R vs Python mental map
- 3.2.1 One-dimensional containers: lists, ranges, and NumPy arrays
- 3.2.1.1 Subsetting and slicing
- 3.2.2 Vectorization in Python
- 3.2.3 Logical operators
- 3.2.4 Matrices and linear algebra (NumPy)
- 3.2.5 Heterogeneous containers: lists and dicts
- 3.2.6 Tabular data: pandas DataFrames
- 3.3 Programming Basics in Python
- 3.3.1 Control flow
- 3.3.2 Defining functions
- 3.4 What you should take away
- Applied Statistics with Python – Chapter 4
- Applied Statistics with Python – Chapter 5
- 5.1 Probability in Python (and R)
- 5.2 Hypothesis tests in Python
- 5.3 Simulation in Python
- 5.4 What you should take away
- Applied Statistics with Python – Chapter 6
- 6.1 Beginner tutorials and references
- 6.2 Intermediate references
- 6.3 Advanced references
- 6.4 Cross-language comparisons
- 6.5 IDEs, notebooks, and literate programming
- 6.6 How PyStatsV1 fits into this ecosystem
- Applied Statistics with Python – Chapter 7
- Simple linear regression in Python and R
- 7.1 From scatterplots to models
- 7.2 The simple linear regression model
- 7.3 Least squares: estimating the line
- 7.4 Residuals, variance, and \(R^2\)
- 7.5 Using
statsmodels: Python’s version oflm() - 7.6 Simulation: seeing SLR in action
- 7.7 What you should take away
- Applied Statistics with Python – Chapter 8
- Inference for simple linear regression
- 8.1 Recap: least squares and notation
- 8.2 Gauss–Markov in plain language (why least squares is “good”)
- 8.3 Sampling distributions of \(\hat\beta_0\) and \(\hat\beta_1\)
- 8.4 Standard errors and \(t\) statistics
- 8.5 Confidence intervals for slope and intercept
- 8.6 Hypothesis tests for slope and intercept
- 8.7 The
carsexample in Python - 8.8 Confidence intervals for mean response
- 8.9 Prediction intervals for new observations
- 8.10 Confidence and prediction bands
- 8.11 F-test and ANOVA: another view of “significance of regression”
- 8.12 What you should take away
- Applied Statistics with Python – Chapter 9
- Multiple linear regression
- 9.1 From simple to multiple regression
- 9.2 Auto MPG example
- 9.3 Fitting a multiple regression model
- 9.4 Matrix formulation of regression
- 9.5 Sampling distribution of \(\hat\beta\)
- 9.6 Testing individual coefficients
- 9.7 Confidence intervals for coefficients and mean response
- 9.8 Prediction intervals
- 9.9 Significance of regression: global F-test
- 9.10 Nested model comparisons
- 9.11 Simulation: checking the sampling distribution
- 9.12 What you should take away
- Applied Statistics with Python – Chapter 10
- Applied Statistics with Python – Chapter 11
- Categorical predictors and interactions
- 11.1 Dummy variables (indicator variables)
- 11.2 Interactions: when slopes depend on context
- 11.3 Factor variables and automatic dummies
- 11.4 Different parameterizations, same model
- 11.5 Building larger models with interactions
- 11.6 How this connects to PyStatsV1
- 11.7 What you should take away
- Applied Statistics with Python – Chapter 12
- Analysis of variance (ANOVA) and experiments
- Terminology for experiments
- Mathematical model
- R vs Python
- 12.3.1 Model and intuition
- Sums of squares (conceptually)
- 12.3.2 One-way ANOVA in Python
- 12.3.3 Factor variables and categorical dtype
- 12.3.4 Simulating the F distribution in Python
- 12.3.5 Power via simulation
- Naive approach
- Bonferroni adjustment
- Tukey’s HSD
- Model with interaction
- Additive model (no interaction)
- Model hierarchy and testing strategy
- Two-way ANOVA in Python
- Interaction plots
- Applied Statistics with Python – Chapter 13
- Applied Statistics with Python – Chapter 14
- Applied Statistics with Python – Chapter 15
- Applied Statistics with Python – Chapter 16
- Variable selection and model building
- 16.1 Quality criteria: balancing fit and complexity
- 16.2 Search procedures: which models to consider?
- 16.3 Example: seat position (AIC vs BIC vs LOOCV)
- 16.4 Higher-order terms: Auto MPG example
- 16.5 Explanation versus prediction
- 16.6 How this connects to PyStatsV1
- 16.7 What you should take away
- Applied Statistics with Python – Chapter 17
- Applied Statistics with Python – Chapter 18
- Beyond: where to go after this mini-book
- 18.1 Where you can go next
- 18.2 Python ecosystem: beyond the basics
- 18.3 R + Python “dual citizenship”
- 18.4 Tidy data and data workflows
- 18.5 Visualization: telling the story
- 18.6 Reproducible reports and small web apps
- 18.7 Experimental design and causal questions
- 18.8 Machine learning and predictive modeling
- 18.9 Time series and dependent data
- 18.10 Bayesian statistics and probabilistic programming
- 18.11 High-performance and large-scale computing
- 18.12 How this connects to PyStatsV1
- 18.13 Final thoughts
- Chapters overview
- Teaching guide
- Contributing
Track B – Psychological Science & Statistics (Psych track)
- Psychological Science & Statistics – From Inquiry to Insight
- Psychological Science & Statistics – Chapter 1
- Psychological Science & Statistics – Chapter 2
- Psychological Science & Statistics – Chapter 3
- Psychological Science & Statistics – Chapter 4
- Psychological Science & Statistics – Chapter 5
- Psychological Science & Statistics – Chapter 6
- Psychological Science & Statistics – Chapter 7
- Psychological Science & Statistics – Chapter 8
- Psychological Science & Statistics – Chapter 9
- Psychological Science & Statistics – Chapter 10
- Chapter 11 – Within-Subjects Designs and the Paired-Samples t-Test
- Chapter 12 – One-Way Analysis of Variance (ANOVA)
- Chapter 13 – Factorial Designs and the Two-Way ANOVA
- Chapter 14 – Repeated-Measures ANOVA
- Chapter 14 Appendix – Pingouin for Repeated-Measures and Mixed ANOVA
- Chapter 15 – Correlation
- Chapter 15 Appendix – Pingouin for Correlation and Partial Correlation
- Chapter 16 – Linear Regression
- Chapter 16a Appendix: Linear Regression with Pingouin
- Chapter 16b – Regression Diagnostics with Pingouin
- Chapter 17 – Mixed-Model Designs
- Chapter 18 – Analysis of Covariance (ANCOVA)
- Chapter 19 – Non-Parametric Statistics
- Chapter 19a – Rank-Based Non-Parametric Alternatives
- Chapter 20 – The Responsible Researcher (Conclusion)
Track C – Problem Sets & Worked Solutions (Psych track)
- Track C – Problem Sets & Worked Solutions
- Chapter 10 Problem Set – Independent-Samples \(t\) Test
- Chapter 11 Problem Set – Paired-Samples t Test
- Chapter 12 Problem Set – One-Way ANOVA
- Track C – Chapter 13: Factorial Designs (Two-Way ANOVA)
- Track C — Chapter 14 Problem Set (Repeated-Measures ANOVA)
- Track C — Chapter 15 Problem Set (Correlation)
- Track C — Chapter 16 Problem Set: Linear Regression
- Track C — Chapter 17 Problem Set (Mixed-Model Designs)
- Track C – Chapter 18 Problem Set (ANCOVA)
- Track C – Chapter 19 Problem Set (Non-Parametric Statistics)
- Track C – Chapter 20 Problem Set (Responsible Researcher)
Track D – Business Statistics & Forecasting for Accountants
- Track D – Business Statistics & Forecasting for Accountants
- Ch 01 — Accounting as a measurement system
- Ch 02 — Double-entry and the general ledger as a database
- Ch 03 — Financial statements as summary statistics
- Business Chapter 4: Assets — Inventory and Fixed Assets
- Business Chapter 5: Liabilities — Payroll, Taxes, Debt, and Equity
- Business Chapter 6: Reconciliations as Quality Control
- Business Chapter 7: Preparing accounting data for analysis
- Chapter 8 – Descriptive Statistics for Financial Performance
- Appendix 8A: Chapter 8 milestone and the big picture (Ch01–Ch08)
- Track D — Chapter 9: Visualization and reporting that doesn’t mislead
- Track D — Chapter 10: Probability and risk in business terms
- Business Chapter 11 — Sampling and Estimation (Audit and Controls Lens)
- Chapter 12 — Hypothesis Testing for Decisions
- Chapter 13 — Correlation, Causation, and Controlled Comparisons
- Track D — Chapter 14
- Appendix 14A: Chapter 14 milestone — Track D, the NSO system, and our synthetic datasets
- Appendix 14B: NSO v1 data dictionary cheat sheet (table → grain → keys → joins → checks)
- Appendix 14C: Chapter 14 artifact dictionary (what each output is for)
- Appendix 14D: Artifact QA checklist (big picture — what and why before you share results)
- Appendix 14E: Applying Track D through Chapter 14 to your own real-world data
- Track D — Chapter 15
- Track D — Chapter 16
- Track D — Chapter 17
- Track D — Chapter 18
- Track D — Chapter 19
- Track D — Chapter 20
- Business Statistics & Forecasting for Accountants (Track D)
- Business Statistics & Forecasting for Accountants (Track D)
- Chapter 23 — Communicating results: decision memos, dashboards, and governance
- Capstone — North Shore Outfitters: Close → Clean → Explain → Forecast → Decide
- Capstone templates
- Capstone rubric (100 points)
- Appendix — Accounting refresher map (from the PDF)
- Appendix — Track D authoring rules