Chapter 8 – Descriptive Statistics for Financial Performance

PyPI workbook run (Track D)

From inside your Track D workbook folder (created by pystatsv1 workbook init --track d --dest ...), run:

pystatsv1 workbook run |trackd_run|

Outputs are written under outputs/track_d/ by default. If you’re unsure what a file is for, start with Track D Outputs Guide.

To see the full chapter-by-chapter run map (D00–D23), see Track D chapter index (PyPI).

Optional: write to a custom output folder:

pystatsv1 workbook run |trackd_run| --outdir outputs/track_d_custom

Interpretation prompts (quick self-check):

  • What is the accounting or business measurement goal in this chapter?

  • Which invariant/check would catch a “numbers look fine but are wrong” mistake here?

By Chapter 7 we have an analysis-ready General Ledger:

  • gl_tidy.csv – line-level tidy GL (one row per journal line)

  • gl_monthly_summary.csv – monthly rollup by account

Chapter 8 answers the next practical question:

“Now that the accounting data is tidy, how do we summarize performance and variability in a way that helps business decisions?”

This chapter focuses on descriptive statistics that accountants use every day:

  • level (mean / median)

  • spread (variance / standard deviation, coefficient of variation)

  • tails and skew (quantiles; why “average” can be misleading)

  • simple stability checks (rolling mean / rolling std; z-score style flags)

It also includes an A/R-focused section because receivables are a common source of cash-flow surprises in small business.

Learning goals

Accounting goals

  • Turn monthly financial statements into KPIs (gross margin %, net margin %, etc.).

  • Use variability measures to reason about operational stability.

  • Connect A/R behavior to cash-flow risk using practical metrics:

    • credit sales vs collections

    • A/R ending balance and approximate Days Sales Outstanding (DSO)

    • a simple FIFO application of collections to invoices to estimate a distribution of “days outstanding”.

Python/data goals

  • Convert long-form statement tables to wide data for analysis (pivot_table).

  • Compute rolling statistics with Series.rolling.

  • Build “analysis artifacts” as CSV + a JSON summary/data dictionary.

  • Keep scripts deterministic and testable.

Inputs

Chapter 8 reads from the NSO v1 synthetic dataset:

  • chart_of_accounts.csv

  • gl_journal.csv

  • statements_is_monthly.csv

  • statements_bs_monthly.csv

  • ar_events.csv (added in Chapter 6)

Outputs

This chapter writes the following files to outputs/track_d:

gl_kpi_monthly.csv

A compact monthly “performance dashboard” built from Income Statement + Balance Sheet lines. Includes ratios and rolling statistics.

ar_monthly_metrics.csv

Monthly receivables metrics:

  • credit sales vs collections (from GL)

  • A/R beginning, ending, average (from B/S)

  • A/R turnover and approximate DSO

ar_payment_slices.csv

Optional-but-recommended detail table. Each row represents a slice of a collection applied to an invoice under a FIFO assumption. This produces a realistic “days outstanding” distribution even when cash receipts do not explicitly reference invoice numbers.

ar_days_stats.csv

Descriptive stats for “days outstanding” overall and by customer.

ch08_summary.json

Summary metrics, checks, and a data dictionary.

How to run

From the repository root:

# 1) (Re)generate the NSO v1 synthetic dataset
make business-nso-sim

# 2) Run Chapter 8
make business-ch08

# 3) Inspect outputs
ls outputs/track_d | grep -E "gl_kpi_monthly|ar_monthly_metrics|ar_days_stats|ch08_summary"

How to interpret the results

KPIs: level vs variability

Two businesses can have the same average gross margin but very different risk profiles.

  • If gross_margin_pct_std_w3 is high, margin is unstable – pricing, input costs, and product mix might be swinging month to month.

  • gross_margin_pct_cv (coefficient of variation) normalizes volatility by the mean and is useful when comparing different scales.

A/R: why “average DSO” can hide tails

The mean days outstanding can be pulled upward by a few very late payments. The median is often a better “typical” payment time.

In ar_days_stats.csv look for:

  • p90_days / p95_days – tail risk (customers who pay very late)

  • differences between mean and median – skewness

The table ar_monthly_metrics.csv is your month-by-month monitoring view. Large DSO spikes (or big gaps between credit sales and collections) are often early warnings for cash-flow pressure.

Data dictionary highlights

gl_kpi_monthly.csv
  • gross_margin_pct = gross_profit / revenue

  • net_margin_pct = net_income / revenue

  • *_mean_w3 and *_std_w3 are 3-month rolling statistics

ar_monthly_metrics.csv
  • credit_sales = increase in A/R from invoices (GL A/R debits)

  • collections = decrease in A/R from cash collections (GL A/R credits)

  • dso_approx = avg_ar / credit_sales * days_in_month

ar_payment_slices.csv
  • days_outstanding is computed as payment_date - invoice_date

  • rows are amount-weighted payment slices created by FIFO application

Appendix

See Appendix 8A: Chapter 8 milestone and the big picture (Ch01–Ch08) for a big-picture recap of Chapters 1-8 and a roadmap beyond Chapter 8.

Next chapter

Chapter 9 focuses on visualization and reporting that doesn’t mislead. Using the KPIs and A/R artifacts from Chapter 8, we standardize how figures are labeled, how axes are handled (to avoid “chart crimes”), and how to produce a compact executive memo that tells a coherent story from a small chart pack.