My Own Data: a mini-guide

Once you’ve run a chapter or the Study Habits case study, the next step is to try your own dataset.

PyStatsV1 includes:

  • a tiny CSV template: data/my_data.csv

  • a beginner-friendly scaffold script: scripts/my_data_01_explore.py

  • a test you can run any time: tests/test_my_data.py

The goal is simple: Run → Inspect → Check, on your data.

1) Put your data in the template

Open data/my_data.csv and replace the example rows with your own data.

Rules of thumb (“clean data”):

  • One row = one observation (one person, one trial, one day, etc.).

  • One column = one variable (group, score, hours, temperature, …).

  • Use clear column names (letters, numbers, underscores).

  • Avoid mixed types in one column (don’t mix numbers and text).

  • Use empty cells for missing values (or NA). Try to be consistent.

2) Run the scaffold script

From your workbook folder:

pystatsv1 workbook run my_data_01_explore

If your CSV is somewhere else:

python scripts/my_data_01_explore.py --csv path/to/your.csv --outdir outputs/my_data

3) Inspect what was created

The script writes outputs under:

  • outputs/my_data/tables/

  • outputs/my_data/plots/

Start with:

  • outputs/my_data/tables/missingness.csv

  • outputs/my_data/tables/numeric_summary.csv

4) Check (tests)

Run the matching smoke test:

pystatsv1 workbook check my_data

5) Customize for your dataset

Open scripts/my_data_01_explore.py and look for:

# === Student edits start here ===
ID_COL = "id"
GROUP_COL = "group"
OUTCOME_COL = "outcome"

Change these names to match your CSV columns (or leave them as-is).

Tip: If your numeric columns are being treated like text, fix the CSV first. For example, remove commas in numbers and avoid mixing text with numbers.