My Own Data: a mini-guide ========================= Once you’ve run a chapter or the Study Habits case study, the next step is to try **your own dataset**. PyStatsV1 includes: * a tiny CSV template: ``data/my_data.csv`` * a beginner-friendly scaffold script: ``scripts/my_data_01_explore.py`` * a test you can run any time: ``tests/test_my_data.py`` The goal is simple: **Run → Inspect → Check**, on *your* data. 1) Put your data in the template -------------------------------- Open ``data/my_data.csv`` and replace the example rows with your own data. Rules of thumb ("clean data"): * **One row = one observation** (one person, one trial, one day, etc.). * **One column = one variable** (group, score, hours, temperature, ...). * Use **clear column names** (letters, numbers, underscores). * Avoid mixed types in one column (don’t mix numbers and text). * Use empty cells for missing values (or ``NA``). Try to be consistent. 2) Run the scaffold script -------------------------- From your workbook folder: .. code-block:: bash pystatsv1 workbook run my_data_01_explore If your CSV is somewhere else: .. code-block:: bash python scripts/my_data_01_explore.py --csv path/to/your.csv --outdir outputs/my_data 3) Inspect what was created --------------------------- The script writes outputs under: * ``outputs/my_data/tables/`` * ``outputs/my_data/plots/`` Start with: * ``outputs/my_data/tables/missingness.csv`` * ``outputs/my_data/tables/numeric_summary.csv`` 4) Check (tests) ---------------- Run the matching smoke test: .. code-block:: bash pystatsv1 workbook check my_data 5) Customize for your dataset ----------------------------- Open ``scripts/my_data_01_explore.py`` and look for: .. code-block:: python # === Student edits start here === ID_COL = "id" GROUP_COL = "group" OUTCOME_COL = "outcome" Change these names to match your CSV columns (or leave them as-is). Tip: If your numeric columns are being treated like text, fix the CSV first. For example, remove commas in numbers and avoid mixing text with numbers.