My Own Data: a mini-guide
Once you’ve run a chapter or the Study Habits case study, the next step is to try your own dataset.
PyStatsV1 includes:
a tiny CSV template:
data/my_data.csva beginner-friendly scaffold script:
scripts/my_data_01_explore.pya test you can run any time:
tests/test_my_data.py
The goal is simple: Run → Inspect → Check, on your data.
1) Put your data in the template
Open data/my_data.csv and replace the example rows with your own data.
Rules of thumb (“clean data”):
One row = one observation (one person, one trial, one day, etc.).
One column = one variable (group, score, hours, temperature, …).
Use clear column names (letters, numbers, underscores).
Avoid mixed types in one column (don’t mix numbers and text).
Use empty cells for missing values (or
NA). Try to be consistent.
2) Run the scaffold script
From your workbook folder:
pystatsv1 workbook run my_data_01_explore
If your CSV is somewhere else:
python scripts/my_data_01_explore.py --csv path/to/your.csv --outdir outputs/my_data
3) Inspect what was created
The script writes outputs under:
outputs/my_data/tables/outputs/my_data/plots/
Start with:
outputs/my_data/tables/missingness.csvoutputs/my_data/tables/numeric_summary.csv
4) Check (tests)
Run the matching smoke test:
pystatsv1 workbook check my_data
5) Customize for your dataset
Open scripts/my_data_01_explore.py and look for:
# === Student edits start here ===
ID_COL = "id"
GROUP_COL = "group"
OUTCOME_COL = "outcome"
Change these names to match your CSV columns (or leave them as-is).
Tip: If your numeric columns are being treated like text, fix the CSV first. For example, remove commas in numbers and avoid mixing text with numbers.