Track D Lab 0/1 (PyPI-only)
TA Notes + Script for Explaining the Lab and the Outputs
This handout is for a TA running a lab section where students install PyStatsV1 from PyPI, initialize the Track D workbook, and run:
d00_peek_data(tour the datasets)d01(Chapter 1: accounting checks + key metrics)business_smoke(a short automated check suite)
It includes what to say, what students should see, and how to explain the output.
Recommended pre-reading (TA)
If you have 5 minutes before lab, skim these pages:
Track D Student Edition (Workbook Landing) — the “book-style” Track D entry point
Track D Dataset Map — the table mental model + intentional QC “warts”
Track D Outputs Guide — what each generated output means and how to interpret it
Track D: Apply what you learned to your data — a practical “bring your own data” bridge (30-minute recipe)
Track D assignments: labs + rubric (TA) — classroom-ready labs + rubric (gradeable artifacts + short answers)
How to use the assignments page
If you want Track D to be turnkey as a lab, point students at:
That page is designed to be self-contained:
Lab 1: run Track D and interpret outputs
Lab 2: reconcile a teachable QC issue (duplicate bank transaction id)
Lab 3: write a short executive summary memo from the outputs
The rubric is intended to reward analysis habits (contracts, checks, reconciliation, and clear writing), not just code execution.
1. Learning goals
By the end of this lab, students should be able to:
Set up a clean Python environment (virtualenv) for reproducible analysis.
Install a “batteries included” workbook from PyPI (no cloning repos).
Initialize a Track D project folder that contains:
a Track D workbook template
pre-installed synthetic datasets (seed=123)
Run a “data tour” to see what files exist and what they look like.
Run an “accounting data sanity check” and interpret the outputs:
Are entries balanced?
Does the accounting equation hold?
What are the basic business metrics?
Run a lightweight test suite (
business_smoke) as a professional habit.
TA framing line
“Today isn’t about memorizing accounting terms. It’s about learning the analyst’s workflow: install → initialize → inspect → validate → summarize → repeat.”
2. Lab structure
Total time: ~40–60 minutes
Setup (10–15 min) Create venv, upgrade pip, install
pystatsv1[workbook].Initialize workbook (5 min) Create a Track D workbook folder with datasets pre-installed.
Explore data (10–15 min) Run
d00_peek_data, interpret what’s in LedgerLab + NSO.Run first analysis/checks (10 min) Run
d01, interpret checks + key metrics.Confidence check (5 min) Run
business_smokeand explain what “13 passed” means.
3. Environment setup talk track
3.1 Why virtual environments matter (30 seconds)
“A virtual environment is a sealed sandbox. Everyone in this class can run the same commands and get the same results. It prevents dependency conflicts and makes troubleshooting easier.”
3.2 Commands
python -m venv .venv
# Windows (Git Bash):
source .venv/Scripts/activate
python -m pip install -U pip
pip install "pystatsv1[workbook]"
What students should notice
pip upgrades successfully.
The install pulls scientific stack packages (NumPy/Pandas/SciPy/Statsmodels/Matplotlib…).
The workbook extra includes
pytest, which powersworkbook check.
TA note: If installs are slow, reassure them it’s normal (large compiled wheels).
4. Initialize the Track D workbook
4.1 What init does
“workbook init creates a new project folder. It copies a starter template and unpacks the datasets into a predictable
location. You now have a ready-to-run lab workspace.”
pystatsv1 workbook init --track d --dest track_d_workbook
cd track_d_workbook
Students should see a message like:
“[OK] Track D workbook starter created at …”
“Datasets are pre-installed under
data/synthetic/, seed=123.”
4.2 Why seed=123 matters
“Seed=123 means the synthetic datasets are deterministic. If you and I run the same scripts, we get the same numbers. That’s key for teaching, grading, and reproducibility.”
5. List the available Track D runs
pystatsv1 workbook list --track d
Explain the list
Each
Dxxcorresponds to a chapter or checkpoint.d00_peek_datais the dataset tour.d01is the first content chapter runner.Later chapters (
d02–d23) provide a consistent “run menu” over the course.
TA line: “You can think of this as a menu of mini-programs: run, inspect outputs, then modify and extend.”
6. Run d00_peek_data (data tour)
pystatsv1 workbook run d00_peek_data
6.1 What d00_peek_data is doing
Explain it as three steps:
Locate datasets under
data/synthetic/…Read each CSV and print:
file name
number of rows/columns
column names
a small preview
Write a Markdown summary file:
outputs/track_d/d00_peek_data_summary.md
TA line: “Before statistics, confirm what data exists and what shape it’s in.”
6.2 Two datasets: LedgerLab vs NSO
LedgerLab (Ch01)
LedgerLab is a small “training wheels” business dataset you can trace end-to-end:
chart_of_accounts.csv(account dictionary)gl_journal.csv(debit/credit lines by transaction)trial_balance_monthly.csv(monthly balances)statements_is_monthly.csv(income statement)statements_bs_monthly.csv(balance sheet)statements_cf_monthly.csv(cash flow)
TA point: “LedgerLab helps you trace journal → trial balance → statements.”
NSO v1 running case
NSO is the “bigger business system” with multiple subledgers and derived outputs:
bank_statement.csv(includes a deliberately duplicated ID)ar_events.csv/ap_events.csvinventory_movements.csvpayroll_events.csvsales_tax_events.csvfixed_assets.csv+depreciation_schedule.csvdebt_schedule.csvplus statement/trial balance outputs
TA point: “NSO is designed to feel like real company data: multiple sources and common quality issues.”
6.3 Key columns to explain
chart_of_accounts.csv
account_type: Asset, Liability, Equity, Revenue, Expense, Contra Assetnormal_side:Assets/Expenses normally Debit
Liabilities/Equity/Revenue normally Credit
TA line: “Normal side is about sign conventions in the system.”
gl_journal.csv
txn_idgroups multiple lines into one transaction.Each transaction should balance: sum(debits) = sum(credits).
TA line: “A transaction is a mini-equation: where value came from and where it went.”
Statements and trial balance
Trial balance is database-style output.
Statements are human-facing summaries.
TA line: “Trial balance is the structured ledger; statements are the story.”
7. Run d01 (Chapter 1 checks + key metrics)
pystatsv1 workbook run d01
It prints Checks and Key metrics, then writes artifacts under outputs/track_d.
If students ask “what is this file?”, point them to Track D Outputs Guide.
7.1 Checks (what they mean)
transactions_balanced: True
Every transaction’s debits equal credits.
TA line: “If this fails, you fix the data pipeline before analysis.”
n_transactions
Count of transaction groups in the LedgerLab data used for d01.
n_unbalanced and max_abs_diff
n_unbalanced: number of transactions with debits ≠ creditsmax_abs_diff: largest absolute imbalance amount
TA line: “If max_abs_diff is nonzero, we have an integrity error.”
accounting_equation_balances: True
Total assets equal total liabilities plus equity (system-wide sanity check).
TA line: “Even if each transaction balances, you still want the big equation to hold.”
7.2 Key metrics (how to interpret)
Revenue and sales behavior
sales_total: total salesn_sales: number of sales eventsavg_sale: average sale sizepct_sales_on_account: fraction of sales made on credit
TA line: “A/R exists because not all sales are paid immediately. This hints at liquidity risk.”
Cost and margin
cogs_total: cost of goods soldgross_profit= sales − cogsgross_margin_pct= gross_profit / sales
TA line: “Gross margin is a core health metric and often a driver variable later.”
Net income and cash
net_incomemay be negativeending_cashmay still be positive
Teaching moment
“Profit and cash are not the same thing. You can lose money but still have cash (owner contributions, timing). You can also earn profit and run out of cash.”
8. Outputs (what to open)
Outputs are written under:
outputs/track_d/
Students should open:
outputs/track_d/d00_peek_data_summary.md(readable dataset inventory)Any CSV artifacts written by the runs (trial balance, statements, etc.)
TA line: “In real work, reproducible artifacts matter more than console output.”
9. Run the smoke tests (business_smoke)
pystatsv1 workbook check business_smoke
Students should see something like:
“13 passed …”
Explain plainly
“These are automated checks that verify the workbook behaves as promised: commands run, outputs appear, and key invariants stay true. Passing tests means your lab environment is healthy.”
10. Common issues and quick fixes
Command not found
If pystatsv1 isn’t recognized, use module form:
python -m pystatsv1 workbook --help
Wrong folder
If outputs/data can’t be found, confirm they’re inside the workbook folder:
pwd
ls
Reset everything
pystatsv1 workbook run d00_setup_data --force
pystatsv1 workbook run d00_peek_data
Confusion about negative income
Teaching moment: owner contributions are cash inflows but not revenue.
Show them the contribution entry in gl_journal.csv and compare to sales lines.
11. Discussion prompts (if time)
Why is
pct_sales_on_accountnot zero? What does credit sales imply about cash planning?Gross margin around ~45%: what types of businesses might fit?
Net income negative but cash positive: what events create that pattern?
NSO includes a deliberate duplicate bank transaction ID: why include intentional errors?
12. Closing script (30 seconds)
“Today you proved you can set up a reproducible environment, inspect accounting-style datasets, validate integrity constraints, and generate a first business summary. That’s the workflow: make data trustworthy before analyzing it. Next labs build on this foundation toward statistical reasoning and decision support.”
Track D Lab + TA Notes (PyPI-only)
Tip
If students are new to the Track D case, have them read Track D Student Edition (Workbook Landing) first.
Appendix A: Command block (TA slide)
# Setup (once)
python -m venv .venv
source .venv/Scripts/activate
python -m pip install -U pip
pip install "pystatsv1[workbook]"
# Start Track D
pystatsv1 workbook init --track d --dest track_d_workbook
cd track_d_workbook
# Tour + first checks
pystatsv1 workbook list --track d
pystatsv1 workbook run d00_peek_data
pystatsv1 workbook run d01
# Confidence check
pystatsv1 workbook check business_smoke