Track D — Chapter 15

PyPI workbook run (Track D)

From inside your Track D workbook folder (created by pystatsv1 workbook init --track d --dest ...), run:

pystatsv1 workbook run |trackd_run|

Outputs are written under outputs/track_d/ by default. If you’re unsure what a file is for, start with Track D Outputs Guide.

To see the full chapter-by-chapter run map (D00–D23), see Track D chapter index (PyPI).

Optional: write to a custom output folder:

pystatsv1 workbook run |trackd_run| --outdir outputs/track_d_custom

Interpretation prompts (quick self-check):

What is the accounting or business measurement goal in this chapter?
Which invariant/check would catch a “numbers look fine but are wrong” mistake here?

Forecasting Foundations and Forecast Hygiene (NSO running case)

In Chapter 14 you built an explainable driver model (COGS explained by operational activity). In Chapter 15 you switch gears: you treat the accounting output as a time series and learn how to produce a defensible baseline forecast.

This chapter is deliberately “low-tech” on purpose:

You will compare three simple baseline forecasts.
You will backtest them on a 12-month holdout window.
You will pick a method using error metrics, not vibes.
You will create an assumptions log template so your forecast is auditable.

The goal is not to build the best forecast possible. The goal is to build a forecast process that an accountant/analyst can explain, reproduce, and improve.

What is “forecast hygiene”?

Forecast hygiene is the set of practices that keeps forecasting useful and honest:

Define the question (what are we forecasting, for whom, and for what decision?).
Define the grain (monthly vs weekly vs daily; consolidated vs by product/location).
Document assumptions (what is expected to change and why).
Measure error with backtesting (how wrong have we been, using history?).
Version the artifacts (inputs, method choice, metrics, memo, and figures).

In accounting, this matters because forecasts often feed budgeting, cash planning, staffing, and performance conversations. A forecast that can’t be explained or reproduced becomes a source of risk.

How this ties to earlier chapters

Chapter 15 builds directly on concepts you have already practiced:

Chapter 7–9 (data prep + reporting discipline): consistent month keys, clean joins, and reliable output artifacts.
Chapter 13 (controlled comparisons): “compare like with like” is the mindset behind backtesting (train vs holdout).
Chapter 14 (driver lens): the forecast is still a driver lens (planning tool), not a claim of causation or a guarantee.

What you will build

A clean monthly time series (from the NSO income statement)

From statements_is_monthly.csv you will build a wide monthly table:

month (YYYY-MM)
revenue (Sales Revenue)
cogs (Cost of Goods Sold)
gross_profit
operating_expenses
net_income

Three baseline forecasts (for revenue)

You will compare three baseline methods:

naive_last — next month equals the last observed month
moving_avg_3 — next month equals the average of the last 3 months
linear_trend — fit a straight line through time and extrapolate

A 12-month backtest + error metrics

You will hold out the last 12 months, forecast them using the first 12 months, and compute:

MAE (mean absolute error): typical size of the miss (in currency units)
MAPE (mean absolute percentage error): typical miss as a percent of actual

Then you will select the baseline method with the lowest MAPE (tie-break on MAE).

A forecast memo + auditable assumptions log

You will produce a short memo and an assumptions log CSV template so that:

the forecast is shareable with stakeholders,
the method selection is documented,
the “why” behind adjustments is captured in a durable, versioned form.

How to run Chapter 15

Prerequisite: generate the NSO dataset (once)

If you already ran Chapter 14, you likely already have the NSO dataset at:

data/synthetic/nso_v1

If not:

make business-nso-sim
make business-validate

Run the Chapter 15 analysis

make business-ch15

By default this runs:

python -m scripts.business_ch15_forecasting_foundations \
  --datadir data/synthetic/nso_v1 \
  --outdir outputs/track_d \
  --seed 123

Outputs

All artifacts are written under:

outputs/track_d/track_d

Open this first (recommended order):

ch15_backtest_metrics.csv (which baseline wins?)
figures/ch15_fig_backtest_overlay.png (does the chosen method track reality?)
ch15_forecast_memo.md (shareable summary)
ch15_forecast_next12.csv (numbers you plug into planning)

Core tables (CSV)

ch15_series_monthly.csv Clean monthly time series used for forecasting (revenue + key IS lines).
ch15_backtest_predictions.csv Month-by-month backtest predictions for each method (actual vs predicted + errors).
ch15_backtest_metrics.csv Summary metrics (MAE and MAPE) by method.
ch15_forecast_next12.csv Selected baseline method forecast for the next 12 months, including a simple range.
ch15_assumptions_log_template.csv Template you fill in when business context requires adjustments.

Design + narrative (JSON/MD)

ch15_forecast_design.json Machine-readable “cover sheet”: series, train/test windows, methods compared, selection rule, chosen method, and forecast months.
ch15_forecast_memo.md Human-readable memo with a small metrics table and the forecast table.

Figures (PNG) + manifest

Figures are written under outputs/track_d/track_d/figures and listed in:

ch15_figures_manifest.csv

Troubleshooting

“Expected statements_is_monthly.csv … but not found.”

You likely pointed --datadir at the wrong folder.

Correct: --datadir data/synthetic/nso_v1
Also confirm you ran: make business-nso-sim

Outputs are missing or in a surprising folder

This chapter writes into --outdir plus /track_d (to match Track D conventions).

What’s next (Chapter 16+)

Chapter 16: seasonality and seasonal baselines
Chapter 17: rolling forecasts + scenario planning
Chapter 18: forecasting with drivers (combine operational drivers + time series)

End-of-chapter exercises

Re-run Chapter 15 but forecast COGS instead of revenue.
Change the holdout window to 6 months. Does the best baseline method change?
Use the assumptions log template to document a hypothetical pricing change next quarter.