Applied Statistics with Python – Chapter 6
==========================================

Resources for Python, R, and reproducible workflows
---------------------------------------------------

The earlier chapters moved quickly through Python, R, and basic statistical
ideas. This chapter is not required for the rest of PyStatsV1, but it collects
resources you can use to go deeper:

- tutorials for learning Python and R,
- books that go beyond this mini-textbook,
- cross-language “cheatsheets”,
- and tooling for reproducible work (RStudio, Jupyter, Quarto, etc.).

Use this as a menu: you do **not** need to read everything here. Pick one or
two resources that match your current level and goals.


6.1 Beginner tutorials and references
=====================================

If you’re just getting comfortable with code, you want **short, hands-on**
introductions where you can type and run examples immediately.

Python-focused
--------------

- **Official Python Tutorial**

  The tutorial in the Python documentation walks through basic syntax,
  control flow, functions, and modules.

  https://docs.python.org/3/tutorial/

- **Scientific Python “quick start” guides**

  Short introductions to NumPy, pandas, Matplotlib, and the scientific
  Python ecosystem. Good for bridging from “I know basic Python” to
  “I can do data analysis.”

  https://numpy.org/doc/stable/user/quickstart.html  
  https://pandas.pydata.org/docs/getting_started/index.html  
  https://matplotlib.org/stable/tutorials/index.html

- **JupyterLab / notebooks**

  Interactive environment for mixing code, narrative, and plots. Very
  helpful for experimentation and teaching.

  https://jupyter.org/

R-focused
---------

These mirror the resources listed in the original R chapter, but with brief
comments about how they complement PyStatsV1.

- **Try R (interactive tutorial)**

  Short, browser-based introduction to R syntax and objects. Nice if you
  want to see R once without installing anything.

- **Quick-R (Kabacoff)**

  Web-based reference for common R tasks: importing data, basic plots,
  regression, etc. Handy if you already know the statistics and just want
  to remember “how do I do this in R?”.

- **R Tutorial (Chi Yau)**

  A mix of tutorial and reference that covers core language features plus
  common data analysis tasks.

- **R Programming for Data Science (Roger Peng)**

  Free online book that builds R from the ground up, with an emphasis on
  good programming habits.

  https://bookdown.org/rdpeng/rprogdatascience/


6.2 Intermediate references
===========================

Once you’re comfortable running code and reading error messages, the next step
is to learn **data analysis workflows** and **programming patterns**.

Python-focused
--------------

- **Python for Data Analysis (Wes McKinney)**

  The pandas “founder’s book”. Great for learning how to manipulate data
  frames, handle time series, and write reusable analysis code.

- **Think Stats / Think Bayes (Allen Downey)**

  Statistics books that use Python from the start, with an emphasis on
  simulation and computation rather than hand algebra.

- **Statistical Thinking in Python (various tutorials)**

  Many online courses and notebooks walk through hypothesis testing,
  regression, and visualization in Python. PyStatsV1 chapters can be a
  complementary “worked examples” resource.

R-focused
---------

- **R for Data Science (Wickham & Grolemund)**

  A modern introduction to data wrangling, visualization, and modeling
  in the **tidyverse** ecosystem. Pairs nicely with PyStatsV1 if you want
  to see the same ideas in both R and Python.

  https://r4ds.had.co.nz/

- **The Art of R Programming (Norman Matloff)**

  Gentle but thorough introduction to R as a programming language (control
  flow, functions, object types), as opposed to just a statistics tool.


6.3 Advanced references
=======================

These are for when R or Python has become part of your regular toolkit and you
want to think about performance, internals, or large projects.

R-focused
---------

- **Advanced R (Hadley Wickham)**

  Deep dive into R’s object system, environments, functional programming,
  and metaprogramming. Helps explain *why* some R code behaves in surprising
  ways.

- **The R Inferno (Patrick Burns)**

  A humorous but very technical guide to R’s “gotchas”. Useful if you write
  a lot of complex R or maintain other people’s R code.

- **Efficient R Programming (Gillespie & Lovelace)**

  Focuses on writing R code that is fast and scalable, and on using R tools
  efficiently day to day.

Python-focused
--------------

- **Scientific Python ecosystem docs**

  NumPy, SciPy, pandas, and Matplotlib all have detailed documentation that
  covers vectorization, broadcasting, and performance tips.

- **Probabilistic programming / Bayesian methods**

  Libraries like PyMC, Stan (via CmdStanPy), or NumPyro provide powerful
  tools for Bayesian modeling. These are beyond the scope of PyStatsV1,
  but worth exploring if you continue into advanced applied statistics.


6.4 Cross-language comparisons
==============================

If you already know another language, sometimes the fastest way to learn is
via a **“Rosetta stone”** that shows equivalent idioms side by side.

Examples mentioned in the original R chapter include:

- **Numerical computing comparison**

  Cheat sheets comparing MATLAB, NumPy, Julia, and R for common numerical
  and matrix operations.

- **R vs Stata vs SAS**

  Short documents that show how the same data analysis is written in each
  language.

For Python + R specifically, useful patterns to practice are:

- indexing and slicing in NumPy vs R,
- data frame operations in pandas vs ``dplyr``,
- plotting with Matplotlib/Seaborn vs ``ggplot2``.

In PyStatsV1 we deliberately write code in a way that makes these parallels
easier to see: plain, explicit scripts in both languages rather than opaque
one-liners.


6.5 IDEs, notebooks, and literate programming
=============================================

A big part of modern applied statistics is organising code, text, and plots in
a single **reproducible document**.

R side
------

- **RStudio**

  Widely used IDE for R (and now other languages). Integrates console,
  editor, plots, and package management in one window.

- **RMarkdown**

  Framework for combining narrative text, R code, and output into a single
  report (HTML, PDF, slides, etc.). The original notes for this book are
  written with RMarkdown.

Python side
-----------

- **Jupyter notebooks / JupyterLab**

  Interactive notebooks where you can mix Python code, Markdown, LaTeX
  equations, and plots. Excellent for exploratory analysis and teaching.

- **VS Code, PyCharm, and other editors**

  For larger projects, an editor or IDE with good Python support
  (linting, debugging, Git integration) can make a big difference.

Bridging tools
--------------

- **Quarto**

  A modern, language-agnostic framework for literate programming that
  supports both Python and R in the same ecosystem. If you like RMarkdown
  + RStudio, Quarto + VS Code/Jupyter gives a similar experience for Python.

No single tool is “the right one”. The best setup is whatever makes it easy for
you to:

1. write clear code,
2. keep data and outputs organised,
3. rerun everything later and get the same results.


6.6 How PyStatsV1 fits into this ecosystem
==========================================

PyStatsV1 is **not** trying to replace full textbooks or advanced courses.
Instead, it aims to be:

- a **bridge** between R-first teaching materials and Python,
- a repository of **worked examples** that you can run, edit, and extend,
- and a place where you can practice **reproducible workflows** without a
  heavy framework.

You might use this chapter as follows:

- If you’re brand new to coding, pair the PyStatsV1 chapters with one of the
  beginner Python or R tutorials.
- If you already know R well, skim the Python resources to see how the
  same ideas look in NumPy/pandas.
- If you’re comfortable in Python, use the R resources when you need to read
  or adapt R-heavy applied work.

Most importantly: **don’t feel obligated to read everything.** Choose one or
two resources that look approachable, and treat PyStatsV1 as your sandbox for
trying ideas out in real code.