Track D — Chapter 15
====================

.. |trackd_run| replace:: d15
.. include:: _includes/track_d_run_strip.rst


Forecasting Foundations and Forecast Hygiene (NSO running case)
---------------------------------------------------------------

In Chapter 14 you built an **explainable driver model** (COGS explained by operational activity).
In Chapter 15 you switch gears: you treat the accounting output as a **time series** and learn how
to produce a **defensible baseline forecast**.

This chapter is deliberately “low-tech” on purpose:

- You will compare **three simple baseline forecasts**.
- You will **backtest** them on a 12-month holdout window.
- You will pick a method using **error metrics**, not vibes.
- You will create an **assumptions log template** so your forecast is auditable.

The goal is not to build the best forecast possible. The goal is to build a forecast process
that an accountant/analyst can explain, reproduce, and improve.

What is “forecast hygiene”?
---------------------------

Forecast hygiene is the set of practices that keeps forecasting useful and honest:

- **Define the question** (what are we forecasting, for whom, and for what decision?).
- **Define the grain** (monthly vs weekly vs daily; consolidated vs by product/location).
- **Document assumptions** (what is expected to change and why).
- **Measure error** with backtesting (how wrong have we been, using history?).
- **Version the artifacts** (inputs, method choice, metrics, memo, and figures).

In accounting, this matters because forecasts often feed budgeting, cash planning, staffing,
and performance conversations. A forecast that can’t be explained or reproduced becomes
a source of risk.

How this ties to earlier chapters
---------------------------------

Chapter 15 builds directly on concepts you have already practiced:

- **Chapter 7–9 (data prep + reporting discipline):** consistent month keys, clean joins, and
  reliable output artifacts.
- **Chapter 13 (controlled comparisons):** “compare like with like” is the mindset behind
  backtesting (train vs holdout).
- **Chapter 14 (driver lens):** the forecast is still a *driver lens* (planning tool), not a claim
  of causation or a guarantee.

What you will build
-------------------

A clean monthly time series (from the NSO income statement)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

From ``statements_is_monthly.csv`` you will build a wide monthly table:

- ``month`` (YYYY-MM)
- ``revenue`` (Sales Revenue)
- ``cogs`` (Cost of Goods Sold)
- ``gross_profit``
- ``operating_expenses``
- ``net_income``

Three baseline forecasts (for revenue)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You will compare three baseline methods:

1. **naive_last** — next month equals the last observed month
2. **moving_avg_3** — next month equals the average of the last 3 months
3. **linear_trend** — fit a straight line through time and extrapolate

A 12-month backtest + error metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You will hold out the last 12 months, forecast them using the first 12 months, and compute:

- **MAE** (mean absolute error): typical size of the miss (in currency units)
- **MAPE** (mean absolute percentage error): typical miss as a percent of actual

Then you will select the baseline method with the lowest MAPE (tie-break on MAE).

A forecast memo + auditable assumptions log
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You will produce a short memo and an ``assumptions log`` CSV template so that:

- the forecast is shareable with stakeholders,
- the method selection is documented,
- the “why” behind adjustments is captured in a durable, versioned form.

How to run Chapter 15
---------------------

Prerequisite: generate the NSO dataset (once)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you already ran Chapter 14, you likely already have the NSO dataset at:

``data/synthetic/nso_v1``

If not:

.. code-block:: bash

   make business-nso-sim
   make business-validate

Run the Chapter 15 analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   make business-ch15

By default this runs:

.. code-block:: bash

   python -m scripts.business_ch15_forecasting_foundations \
     --datadir data/synthetic/nso_v1 \
     --outdir outputs/track_d \
     --seed 123

Outputs
-------

All artifacts are written under:

``outputs/track_d/track_d``

**Open this first (recommended order):**

1. ``ch15_backtest_metrics.csv`` (which baseline wins?)
2. ``figures/ch15_fig_backtest_overlay.png`` (does the chosen method track reality?)
3. ``ch15_forecast_memo.md`` (shareable summary)
4. ``ch15_forecast_next12.csv`` (numbers you plug into planning)


Core tables (CSV)
^^^^^^^^^^^^^^^^^

- ``ch15_series_monthly.csv``  
  Clean monthly time series used for forecasting (revenue + key IS lines).

- ``ch15_backtest_predictions.csv``  
  Month-by-month backtest predictions for each method (actual vs predicted + errors).

- ``ch15_backtest_metrics.csv``  
  Summary metrics (MAE and MAPE) by method.

- ``ch15_forecast_next12.csv``  
  Selected baseline method forecast for the next 12 months, including a simple range.

- ``ch15_assumptions_log_template.csv``  
  Template you fill in when business context requires adjustments.

Design + narrative (JSON/MD)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- ``ch15_forecast_design.json``  
  Machine-readable “cover sheet”: series, train/test windows, methods compared,
  selection rule, chosen method, and forecast months.

- ``ch15_forecast_memo.md``  
  Human-readable memo with a small metrics table and the forecast table.

Figures (PNG) + manifest
^^^^^^^^^^^^^^^^^^^^^^^^

Figures are written under ``outputs/track_d/track_d/figures`` and listed in:

- ``ch15_figures_manifest.csv``

Troubleshooting
---------------

“Expected statements_is_monthly.csv … but not found.”
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You likely pointed ``--datadir`` at the wrong folder.

- Correct: ``--datadir data/synthetic/nso_v1``
- Also confirm you ran: ``make business-nso-sim``

Outputs are missing or in a surprising folder
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This chapter writes into ``--outdir`` plus ``/track_d`` (to match Track D conventions).

What’s next (Chapter 16+)
-------------------------

- Chapter 16: seasonality and seasonal baselines
- Chapter 17: rolling forecasts + scenario planning
- Chapter 18: forecasting with drivers (combine operational drivers + time series)

End-of-chapter exercises
------------------------

1. Re-run Chapter 15 but forecast **COGS** instead of revenue.
2. Change the holdout window to 6 months. Does the best baseline method change?
3. Use the assumptions log template to document a hypothetical pricing change next quarter.

See also
--------
- Appendix 14D (Artifact QA checklist): use the same pre-share mindset before circulating forecasts.
- Appendix 14E (Apply to real world): adapting the workflow to your own chart of accounts and datasets.