Business Chapter 11 — Sampling and Estimation (Audit and Controls Lens)

PyPI workbook run (Track D)

From inside your Track D workbook folder (created by pystatsv1 workbook init --track d --dest ...), run:

pystatsv1 workbook run |trackd_run|

Outputs are written under outputs/track_d/ by default. If you’re unsure what a file is for, start with Track D Outputs Guide.

To see the full chapter-by-chapter run map (D00–D23), see Track D chapter index (PyPI).

Optional: write to a custom output folder:

pystatsv1 workbook run |trackd_run| --outdir outputs/track_d_custom

Interpretation prompts (quick self-check):

What is the accounting or business measurement goal in this chapter?
Which invariant/check would catch a “numbers look fine but are wrong” mistake here?

Accountants and controllers often face a simple constraint: you cannot review every transaction. Sampling is a cost-effective control — but only if it is designed and communicated clearly.

This chapter translates sampling and confidence intervals into audit/control language:

Population vs Sample: what you’re trying to control vs what you actually reviewed.
Random vs Stratified Sampling: everyone has an equal chance vs risk-based groups.
Confidence Intervals: turning “95% confidence” into a plain-English range and a pass/fail control decision.

Learning objectives

After this chapter, you can:

Design a risk-based sampling plan (review 100% of material items, sample the long tail).
Compute a defensible error-rate confidence interval and interpret it in business language.
Draft a short memo that uses the vocabulary auditors expect: population, sample size, materiality, tolerance, confidence.

Data inputs (NSO v1)

We reuse the synthetic dataset from sim_business_nso_v1 and treat A/P invoices as the “pile” to audit:

ap_events.csv — invoice events and payments (we sample invoice rows)

Repro commands

make business-nso-sim
make business-ch11

Or run directly:

python -m scripts.business_ch11_sampling_estimation_audit_controls \
  --datadir data/synthetic/nso_v1 \
  --outdir outputs/track_d \
  --seed 123

Outputs (audit-friendly artifacts)

The chapter writes deterministic artifacts to outputs/track_d:

ch11_sampling_plan.json — explicit parameters + selected invoice IDs
ch11_sampling_summary.json — CI, tolerance decision, and a worked example
ch11_audit_memo.md — short justification memo (plain language)
ch11_figures_manifest.csv — figure metadata for auditability
figures/: * ch11_strata_sampling_bar.png — population vs sample by stratum * ch11_error_rate_ci.png — observed error rate with 95% CI

End-of-chapter problems (implemented concepts)

Design a sampling plan (risk-based). Review 100% of transactions over a materiality threshold (e.g., $1,000), and random-sample a small percentage of immaterial items (e.g., 5% under $50).
Confidence interval calculation (controls lens). Given a sample size and number of errors, compute a 95% CI for the true error rate. If the upper bound exceeds management’s tolerance (e.g., 2%), the control fails.
The audit memo. Justify the approach using proper terms: population, sample size, materiality, stratification, tolerance, confidence.