Appendix 14D: Artifact QA checklist (big picture — what and why before you share results)
========================================================================================

Chapter 14 produces an “analysis pack” of outputs: a driver table, a short memo, model summaries,
and figures. These artifacts are designed to be reproducible and explainable.

But “explainable” does not automatically mean “safe to share.”

This appendix is a **professional QA checklist** for accountants and analysts — the “what and why”
you should verify before you:

- send the memo to leadership,
- paste coefficients into a forecast model,
- brief operations on “what drives COGS,” or
- publish results in a report.

The goal is not perfection. The goal is **trustworthy communication**:
your results should be accurate, traceable, and framed with the right guardrails.

What QA protects you from
-------------------------

In real accounting/FP&A work, the biggest failure modes are rarely “math errors.”
They tend to be:

- **Measurement errors** (wrong sign, wrong month, wrong definition)
- **Broken lineage** (numbers cannot be traced back to a credible source)
- **Over-interpretation** (treating a driver lens as causation)
- **Model fragility** (one month or one event dominates the fit)
- **Reporting drift** (figures and memo say different things)
- **Reproducibility gaps** (nobody can recreate what you did next week)

Chapter 14 exists to teach a regression workflow that behaves like accounting work:
disciplined, auditable, and fit for planning conversations.

This QA checklist is the “controls layer” on top of the analysis pack.

Inputs and outputs in scope
---------------------------

This checklist assumes the standard Chapter 14 artifacts:

- ``ch14_driver_table.csv``
- ``ch14_regression_memo.md``
- ``ch14_regression_summary.json``
- ``ch14_regression_design.json``
- ``ch14_figures_manifest.csv``
- ``figures/`` (PNG charts referenced by the manifest)

and that the dataset (NSO v1) exists locally under:

- ``data/synthetic/nso_v1/``

Quick start: the 5-minute QA pass
---------------------------------

If you only have five minutes, do these checks in this order:

1) Open ``ch14_driver_table.csv`` and confirm months are complete and sorted.

2) Confirm sign conventions:
   - ``units_sold`` should be positive (for the Chapter 14 driver table definition).

3) Read ``ch14_regression_memo.md`` and check the “story” matches your intuition:
   - baseline vs rate interpretation
   - driver lens, not causation

4) Open the figures listed in ``ch14_figures_manifest.csv``:
   - does the relationship look reasonably linear?
   - any obvious outlier month dominating?

5) Confirm all expected artifacts exist (no missing files) and rerun:

   .. code-block:: bash

      make business-ch14

If anything looks off in the 5-minute pass, stop and do the deeper pass below.

Deeper QA: the 30-minute professional pass
------------------------------------------

This section is organized as a set of gates. You do not need to “check everything forever.”
You are trying to reach a reasonable professional standard before sharing results.

Gate 1 — Provenance and reproducibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** if you can’t recreate the results, you can’t defend them.

Checks:

- Record (or confirm) the dataset and run command used.
  In this project, the most common baseline run is:

  .. code-block:: bash

     make business-nso-sim
     make business-validate
     make business-ch14

- Confirm that artifacts are generated (not manually edited).
  The outputs live under:

  - ``outputs/track_d/track_d/``

- Confirm you can regenerate cleanly and get the same results (same dataset + seed).

Red flags:

- You cannot identify which dataset folder was used.
- A teammate cannot reproduce your numbers on the same commit.

Gate 2 — Driver table integrity (the foundation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** regression can look “precise” even when the inputs are wrong.
The driver table is the “data contract” of Chapter 14.

Open: ``ch14_driver_table.csv``

Checks:

- **Month coverage**
  - Are months continuous (no unexpected gaps)?
  - Is the time window what you expect (e.g., last 12/24 months)?

- **Sorting and duplicates**
  - One row per month.
  - No duplicated months.
  - Months are in chronological order.

- **Sign conventions**
  - ``units_sold`` is positive (per Chapter 14 definition).
  - Revenue and COGS have the expected sign convention for your statements.

- **Magnitude sanity**
  - Units, revenue, and COGS are in plausible ranges.
  - No single month is wildly out of scale unless the business story explains it.

- **Lineage plausibility**
  - Units sold came from inventory movements.
  - Invoice count came from A/R events.
  - Revenue and COGS came from monthly income statement lines.

Common errors this catches:

- Units sold computed with the wrong sign
- Mismatched month keys (e.g., revenue in one month, units in another)
- Missing months due to filtering
- “Accidental duplicates” caused by grouping logic

Gate 3 — Model sanity (baseline vs rate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** accountants often use regression to separate “fixed vs variable”
or to build planning rates. If slopes/intercepts are not plausible, your plan will be wrong.

Open: ``ch14_regression_summary.json`` and/or read ``ch14_regression_memo.md``

Checks:

- **Direction makes sense**
  - Revenue slope vs units should be positive.
  - COGS slope vs units should be positive.
  - If a slope is negative, treat it as a stop sign until explained.

- **Intercept (baseline) plausibility**
  - For revenue: a large positive intercept may indicate timing or non-unit revenue.
  - For COGS: a positive intercept might reflect baseline costs, but it should be explainable.

- **Rate plausibility**
  - Revenue-per-unit lens: does it roughly match a blended selling price you’d believe?
  - COGS-per-unit lens: does it roughly match a blended unit cost you’d believe?

- **R² interpretation discipline**
  - R² is not “truth”; it’s “how much of the variance the model explains in-sample.”
  - High R² may also reflect seasonality/timing; it does not prove causation.

Common errors this catches:

- A driver computed incorrectly yields implausible slopes
- A single outlier month inflates or distorts fit
- A mismatch between “what the memo claims” and the actual coefficients

Gate 4 — Outliers and dominance (is one month driving everything?)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** a fit can look good but be driven by one extreme point.
That’s dangerous for planning.

Checks:

- Open the figures referenced by ``ch14_figures_manifest.csv`` and look for:
  - a single month far from the rest,
  - a fitted line that “aims” mainly at one point,
  - clusters that suggest regime changes (early months behave differently than later months).

- Optional: do a “remove-one-month” thought experiment:
  - If one month is extreme, ask: would the slope change drastically if that month disappeared?

Interpretation outcome:

- If one point dominates, you can still share the analysis — but you must say:
  “Results are sensitive to month X; treat slopes as tentative.”

Gate 5 — Residual pattern check (is the driver lens stable?)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** patterned residuals often indicate missing drivers, mix shifts, or nonlinearity.

Checks:

- Look for systematic patterns in the “actual vs predicted” style figure(s):
  - consecutive months consistently above/below prediction,
  - seasonal shape not captured by the model,
  - “break point” behavior (model works then stops working).

How to respond professionally:

- If residuals look random-ish, your driver lens is likely stable enough for planning.
- If residuals show patterns, share results with explicit caveats:
  “This suggests mix shifts/seasonality; consider segmentation or seasonality indicators.”

Gate 6 — Artifact completeness and internal consistency
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** leadership will ask “where did that number come from?”
You want your artifacts to agree with each other.

Checks:

- All expected files exist in the outputs folder.
- ``ch14_figures_manifest.csv`` lists figure files that actually exist under ``figures/``.
- The memo’s described slopes/baseline story matches the JSON summary.
- The “design contract” matches the chapter narrative (models m1/m2/m3 as documented).

This is where you catch subtle errors like:

- The code changed but the memo language wasn’t updated
- Figures were generated from an older run
- A refactor changed output paths

Gate 7 — Communication guardrails (what you are allowed to claim)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Why this matters:** the most common professional failure is claiming causation or certainty.

Before you share the memo, confirm it clearly communicates:

- Regression here is a **driver lens**, not causation proof.
- Slopes are **planning rates**, not immutable laws.
- Structural changes (pricing, mix, capacity, policy) can break the relationship.
- Residuals represent “unexplained movement” that may require investigation.

A safe, professional phrasing pattern is:

- “Given the recent history in this dataset, the implied rate is…”
- “If units move by X, the model implies outcome moves by about Y, subject to residual variation.”
- “This month is an outlier relative to the driver story; investigate…”
- “This is useful for planning; not evidence of causality.”

Sharing checklist (go / no-go)
------------------------------

**Go** when:

- Driver table looks correct and complete.
- Slopes and intercepts are plausible and interpretable.
- Figures do not show a single point dominating the fit.
- Residuals do not show severe patterns (or patterns are clearly caveated).
- Memo matches JSON results and includes guardrails.

**No-go (pause and fix)** when:

- Units sold has the wrong sign or months are missing/duplicated.
- Slopes have implausible direction or magnitude without explanation.
- A single month dominates and the memo does not warn about sensitivity.
- Artifacts are missing or inconsistent (manifest vs files, memo vs JSON).

Closing note: QA is part of the method, not a separate chore
------------------------------------------------------------

In Track D, the method is not just “run regression.”
The method is:

- measurement discipline,
- reproducible artifacts,
- explainable coefficients,
- and professional communication guardrails.

Appendix 14C tells you what each artifact is.
Appendix 14D tells you how to **trust and communicate** those artifacts responsibly.