My Own Data: a mini-guide

Once you’ve run a chapter or the Study Habits case study, the next step is to try your own dataset.

PyStatsV1 includes:

a tiny CSV template: data/my_data.csv
a beginner-friendly scaffold script: scripts/my_data_01_explore.py
a test you can run any time: tests/test_my_data.py

The goal is simple: Run → Inspect → Check, on your data.

1) Put your data in the template

Open data/my_data.csv and replace the example rows with your own data.

Rules of thumb (“clean data”):

One row = one observation (one person, one trial, one day, etc.).
One column = one variable (group, score, hours, temperature, …).
Use clear column names (letters, numbers, underscores).
Avoid mixed types in one column (don’t mix numbers and text).
Use empty cells for missing values (or NA). Try to be consistent.

2) Run the scaffold script

From your workbook folder:

pystatsv1 workbook run my_data_01_explore

If your CSV is somewhere else:

python scripts/my_data_01_explore.py --csv path/to/your.csv --outdir outputs/my_data

3) Inspect what was created

The script writes outputs under:

outputs/my_data/tables/
outputs/my_data/plots/

Start with:

outputs/my_data/tables/missingness.csv
outputs/my_data/tables/numeric_summary.csv

4) Check (tests)

Run the matching smoke test:

pystatsv1 workbook check my_data

5) Customize for your dataset

Open scripts/my_data_01_explore.py and look for:

# === Student edits start here ===
ID_COL = "id"
GROUP_COL = "group"
OUTCOME_COL = "outcome"

Change these names to match your CSV columns (or leave them as-is).

Tip: If your numeric columns are being treated like text, fix the CSV first. For example, remove commas in numbers and avoid mixing text with numbers.